Preprints

1. Multi-sample non-negative spatial factorization (2024)

Wang, Y., Woyshner, K., Sriworarat, C., Stein-O'Brien, G., Goff LA, Hansen, K. D., bioRxiv

Analyzing multi-sample spatial transcriptomics data requires accounting for biological variation. We present multi-sample non-negative spatial factorization (mNSF), an alignment-free framework extending single-sample spatial factorization (NSF) to multi-sample datasets. mNSF incorporates sample-specific spatial correlation modeling and extracts low-dimensional data representations. Through simulations and real data analysis, we demonstrate mNSFs efficacy in identifying true factors, shared anatomical regions, and region-specific biological functions. mNSFs performance is comparable to alignment-based methods when alignment is feasible, while enabling analysis in scenarios where spatial alignment is unfeasible. mNSF shows promise as a robust method for analyzing spatially resolved transcriptomics data across multiple samples.

10.1101/2024.07.01.599554

2. A consensus definition for deep layer 6 excitatory neurons in mouse neocortex (2024)

Kim, S.-J., Babola, T. A., Lee, K., Matney, C. J., Spiegel, A. C., Liew, M. H., Schulteis, E. M., Coye, A. E., Proskurin, M., Kang, H., Kim, J. A., Chevee, M., Lee, K., Kanold, P. O., Goff LA, Kim, J., Brown, S. P., bioRxiv

To understand neocortical function, we must first define its cell types. Recent studies indicate that neurons in the deepest cortical layer play roles in mediating thalamocortical interactions and modulating brain state and are implicated in neuropsychiatric disease. However, understanding the functions of deep layer 6 (L6b) neurons has been hampered by the lack of agreed upon definitions for these cell types. We compared commonly used methods for defining L6b neurons, including molecular, transcriptional and morphological approaches as well as transgenic mouse lines, and identified a core population of L6b neurons. This population does not innervate sensory thalamus, unlike layer 6 corticothalamic neurons (L6CThNs) in more superficial layer 6. Rather, single L6b neurons project ipsilaterally between cortical areas. Although L6b neurons undergo early developmental changes, we found that their intrinsic electrophysiological properties were stable after the first postnatal week. Our results provide a consensus definition for L6b neurons, enabling comparisons across studies.

10.1101/2024.11.04.621933

3. Transcriptional profiles of murine oligodendrocyte precursor cells across the lifespan (2024)

Heo, D., Kim, A. A., Neumann, B., Doze, V. N., Xu, Y. K. T., Mironova, Y. A., Slosberg, J., Goff LA, Franklin, R. J. M., Bergles, D. E., bioRxiv

Oligodendrocyte progenitor cells (OPCs) are highly dynamic, widely distributed glial cells of the central nervous system (CNS) that are responsible for generating myelinating oligodendrocytes during development. By also generating new oligodendrocytes in the adult CNS, OPCs allow formation of new myelin sheaths in response to environmental and behavioral changes and play a crucial role in regenerating myelin following demyelination (remyelination). However, the rates of OPC proliferation and differentiation decline dramatically with aging, which may impair homeostasis, remyelination, and adaptive myelination during learning. To determine how aging influences OPCs, we generated a novel transgenic mouse line that expresses membrane-anchored EGFP under the endogenous promoter/enhancer of Matrilin-4 (Matn4-mEGFP) and performed high-throughput single-cell RNA sequencing, providing enhanced resolution of transcriptional changes during key transitions from quiescence to proliferation and differentiation across the lifespan. Comparative analysis of OPCs isolated from mice aged 30 to 720 days, revealed that aging induces distinct inflammatory transcriptomic changes in OPCs in different states, including enhanced activation of HIF-1 and Wnt pathways. Inhibition of these pathways in acutely isolated OPCs from aged animals restored their ability to differentiate, suggesting that this enhanced signaling may contribute to the decreased regenerative potential of OPCs with aging. This Matn4-mEGFP mouse line and single-cell mRNA datasets of cortical OPCs across ages help to define the molecular changes guiding their behavior in various physiological and pathological contexts.

10.1101/2024.10.27.620502

4. Foveolar cone subtype patterning in human retinal organoids (2023)

Hussey, K., Eldred, K., Reh, T. A., Johnston, R. J., bioRxiv

The mechanisms that generate patterns of cell types unique to humans are poorly understood. In the central region of the human retina, the high-acuity foveola is notable, in part, for its dense packing of green (M) and red (L) cones and absence of blue (S) cones. To identify mechanisms that promote M/L and suppress S cone patterning in the foveola, we examined human fetal retinas and differentiated human retinal organoids. During development, sparse S-opsin-expressing cones are initially observed in the foveola. Later in fetal development, the foveola contains a mix of cones that either co-express S- and M/L-opsins or exclusively express M/L-opsin. In adults, only M/L cones are present. Two signaling pathway regulators are highly and continuously expressed in the central retina: Cytochrome P450 26 subfamily A member 1 (CYP26A1), which degrades retinoic acid (RA) and Deiodinase 2 (DIO2), which promotes thyroid hormone (TH) signaling. Both CYP26A1 null mutant organoids and high RA conditions increased the number of S cones and reduced the number of M/L cones in human retinal organoids. In contrast, sustained TH signaling promoted the generation of M/L-opsin-expressing cones and induced M/L-opsin expression in S-opsin-expressing cones, showing that cone fate is plastic. Our data suggest that CYP26A1 degrades RA to specify M/L cones and limit S cones and that continuous DIO2 expression sustains high levels of TH to convert S cones into M/L cones, resulting in the foveola containing only M/L cones. Since the foveola is highly susceptible to impairment in diseases such as macular degeneration, a leading cause of vision loss, our findings inform organoid design for potential therapeutic applications.

10.1101/2023.01.28.526051

5. Multi-sample non-negative spatial factorization (2024)

Wang, Y., Woyshner, K., Sriworarat, C., Stein-O'Brien, G., Goff LA, Hansen, K. D., bioRxiv

Analyzing multi-sample spatial transcriptomics data requires accounting for biological variation. We present multi-sample non-negative spatial factorization (mNSF), an alignment-free framework extending single-sample spatial factorization (NSF) to multi-sample datasets. mNSF incorporates sample-specific spatial correlation modeling and extracts low-dimensional data representations. Through simulations and real data analysis, we demonstrate mNSFs efficacy in identifying true factors, shared anatomical regions, and region-specific biological functions. mNSFs performance is comparable to alignment-based methods when alignment is feasible, while enabling analysis in scenarios where spatial alignment is unfeasible. mNSF shows promise as a robust method for analyzing spatially resolved transcriptomics data across multiple samples.

10.1101/2024.07.01.599554

6. Transcriptional Control of Neocortical Size and Microcephaly (2023)

Barao, S., Xu, Y., Vistein, R., Goff LA, Nielsen, K., Bae, B.-I., Smith, R. S., Walsh, C. A., Stein O Brien, G., Muller, U., bioRxiv

The mammalian neocortex differs vastly in size and complexity between mammalian species, yet the mechanisms that lead to an increase in brain size during evolution are not known. We show here that two transcription factors coordinate gene expression programs in progenitor cells of the neocortex to regulate their proliferative capacity and neuronal output in order to determine brain size. Comparative studies in mice, ferrets and macaques demonstrate an evolutionary conserved function for these transcription factors to regulate progenitor behaviors across the mammalian clade. Strikingly, the two transcriptional regulators control the expression of large numbers of genes linked to microcephaly suggesting that transcriptional deregulation as an important determinant of the molecular pathogenesis of microcephaly, which is consistent with the finding that genetic manipulation of the two transcription factors leads to severe microcephaly.

10.1101/2023.11.02.565322

7. Semaphorin 6A in Retinal Ganglion Cells Regulates Functional Specialization of the Inner Retina (2023)

James, R. E., Hamilton, N. R., Huffman, L. N., Pasterkamp, J., Goff LA, Kolodkin, A. L., bioRxiv

To form functional circuits, neurons must settle in their appropriate cellular locations and then project and elaborate neurites to contact their target synaptic neuropils. Laminar organization within the vertebrate retinal inner plexiform layer (IPL) facilitates pre- and postsynaptic neurite targeting, yet, the precise mechanisms underlying establishment of functional IPL subdomains are not well understood. Here we explore mechanisms defining the compartmentalization of OFF and ON neurites generally, and OFF and ON direction-selective neurites specifically, within the developing IPL. We show that semaphorin 6A (Sema6A), a repulsive axon guidance cue, is required for delineation of OFF versus ON circuits within the IPL: in the Sema6a null IPL, the boundary between OFF and ON domains is blurred. Furthermore, Sema6A expressed by retinal ganglion cells (RGCs) directs laminar segregation of OFF and ON starburst amacrine cell (SAC) dendritic scaffolds, which themselves serve as a substrate upon which other retinal neurites elaborate. These results demonstrate for the first time that RGCs, the first neuron-type born within the retina, play an active role in functional specialization of the IPL. Retinal ganglion cell-dependent regulation of OFF and ON starburst amacrine cell dendritic scaffold segregation prevents blurring of OFF versus ON functional domains in the murine inner plexiform layer.

10.1101/2023.11.18.567662

8. Inferring cellular and molecular processes in single-cell data with non-negative matrix factorization using Python, R, and GenePattern Notebook implementations of CoGAPS (2022)

Johnson, J. A. I., Tsang, A., Mitchell, J. T., Davis-Marcisak, E. F., Sherman, T., Liefeld, T., Loth, M., Goff LA, Zimmerman, J., Kinny-Köster, B., Jaffee, E., Tamayo, P., Mesirov, J., Reich, M., Fertig, E. J., Stein-O'Brien, G. L., bioRxiv

Non-negative matrix factorization (NMF) is an unsupervised learning method well suited to high-throughput biology. Still, inferring biological processes requires additional post hoc statistics and annotation for interpretation of features learned from software packages developed for NMF implementation. Here, we aim to introduce a suite of computational tools that implement NMF and provide methods for accurate, clear biological interpretation and analysis. A generalized discussion of NMF covering its benefits, limitations, and open questions in the field is followed by three vignettes for the Bayesian NMF algorithm CoGAPS (Coordinated Gene Activity across Pattern Subsets). Each vignette will demonstrate NMF analysis to quantify cell state transitions in public domain single-cell RNA-sequencing (scRNA-seq) data of malignant epithelial cells in 25 pancreatic ductal adenocarcinoma (PDAC) tumors and 11 control samples. The first uses PyCoGAPS, our new Python interface for CoGAPS that we developed to enhance runtime of Bayesian NMF for large datasets. The second vignette steps through the same analysis using our R CoGAPS interface, and the third introduces two new cloud-based, plug-and-play options for running CoGAPS using GenePattern Notebook and Docker. By providing Python support, cloud-based computing options, and relevant example workflows, we facilitate user-friendly interpretation and implementation of NMF for single-cell analyses.

10.1101/2022.07.09.499398

9. Pumping the brakes on RNA velocity -- understanding and interpreting RNA velocity estimates (2022)

Zheng, S. C., Stein-O'Brien, G., Boukas, L., Goff LA, Hansen, K. D., bioRxiv

RNA velocity analysis of single cells promises to predict temporal dynamics from gene expression. Indeed, in many systems, it has been observed that RNA velocity produces a vector field that qualitatively reflects known features of the system. Despite this observation, the limitations of RNA velocity estimates are poorly understood. Using real data and simulations, we dissect the impact of different steps in the RNA velocity workflow on the estimated vector field. We find that the process of mapping RNA velocity estimates into a low-dimensional representation, such as those produced by UMAP, has a large impact on the result. The RNA velocity vector field strongly depends on the k-NN graph of the data. This dependence leads to significant estimator errors when the k-NN graph is not a faithful representation of the true data structure, a feature that cannot be known for most real datasets. Finally, we establish that RNA velocity estimates expression speed neither at the gene nor cellular level. We propose that RNA velocity is best considered a smoothed interpolation of the observed k-NN structure, as opposed to an extrapolation of future cellular states, and that the use of RNA velocity as a validation of latent space embedding structures is circular.

10.1101/2022.06.19.494717

10. Pantr2, a trans-acting lncRNA, modulates the differentiation potential of neural progenitors in vivo (2022)

Augustin, J. J., Takayangi, S., Hoang, T., Winer, B., Blackshaw, S., Goff LA,, bioRxiv

Ablation of the long non-coding RNA (lncRNA) Pantr2 results in microcephaly in a knockout murine model of corticogenesis, however, the precise mechanisms used are unknown. We present evidence that Pantr2 is a trans-acting lncRNA that regulates gene expression and chromatin accessibility both in vivo and in vitro. We demonstrate that ectopic expression of Pantr2 in a neuroblastoma cell line alters gene expression under differentiating conditions, and that both loss and gain of function of Pantr2 results in changes to cell-cycle dynamics. We show that expression of both the transcription factor Nfix and the cell cycle regulator Rgcc are negatively regulated by Pantr2. Using RNA binding protein motif analysis and existing CLIP-seq data, we annotate potential HuR and QKI binding sites on Pantr2, and demonstrate that HuR does not directly bind Pantr2 using RNA immunoprecipitation assay. Finally, using Gene Ontology enrichment analysis, we identify disruption of both Notch and Wnt signaling following loss of Pantr2 expression, indicating potential Pantr2-dependent regulation of these pathways.

10.1101/2022.10.07.511381

11. Ret loss-of-function decreases enteric neural crest progenitor proliferation and restricts developmental fate potential during enteric nervous system development (2021)

Vincent, E., Chatterjee, S., Cannon, G. H., Auer, D., Ross, H., Chakravarti, A., Goff LA,, bioRxiv

The receptor tyrosine kinase gene RET plays a critical role in the fate specification of enteric neural crest-derived cells (ENCDCs) during enteric nervous system (ENS) development. Pathogenic RET loss of function (LoF) alleles are associated with Hirschsprung disease (HSCR), which is marked by aganglionosis of the gastrointestinal (GI) tract. ENCDCs invade the developing GI tract, proliferate, migrate caudally, and differentiate into all of the major ENS cell types. Although the major phenotypic consequences and the underlying transcriptional changes from Ret LoF in the developing ENS have been described, its cell type and state-specific effects are unknown. Consequently, we performed single- cell RNA sequencing (scRNA-seq) on an enriched population of ENCDCs isolated from the developing GI tract of Ret null heterozygous and homozygous mouse embryos at embryonic day (E)12.5 and E14.5. We demonstrate four significant findings: (1) Ret-expressing ENCDCs are a heterogeneous population composed of ENS progenitors as well as glial and neuronal committed cells; (2) neurons committed to a predominantly inhibitory motor neuron developmental trajectory are not produced under Ret LoF, leaving behind a mostly excitatory motor neuron developmental program; (3) HSCR-associated and Ret gene regulatory network genes exhibit distinct expression patterns across Ret-expressing ENCDC with their expression impacted by Ret LoF; and (4) Ret deficiency leads to precocious differentiation and reduction in the number of proliferating ENS precursors. Our results support a model in which Ret contributes to multiple distinct cellular phenotypes associated with the proper development of the ENS, including the specification of inhibitory neuron subtypes, cell cycle dynamics of ENS progenitors, and the developmental timing of neuronal and glial commitment. Summary StatementRet LoF affects proper development of the mouse ENS through multiple distinct cellular phenotypes including restriction of neuronal fate potential, disruption of ENCDC migration, and modulation of progenitor proliferation rate.

10.1101/2021.12.28.474390

12. Prenatal immune stress induces a prolonged blunting of microglia activation that impacts striatal connectivity (2021)

Hayes, L. N., An, K., Carloni, E., Li, F., Vincent, E., Paranjpe, M., Dolen, G., Goff LA, Ramos, A., Kano, S.-i., Sawa, A., bioRxiv

Recent studies suggested that microglia, the primary brain immune cells, can affect circuit connectivity and neuronal function1-3. Microglia infiltrate the neuroepithelium early in embryonic development and are maintained in the brain throughout adulthood4,5. Several maternal environmental factors, such as aberrant microbiome, immune activation, and poor nutrition, can influence prenatal brain development6-8. Nevertheless, it is unknown how changes in the prenatal environment instruct the developmental trajectory of infiltrating microglia, which in turn affect brain development and function. Here we show that after maternal immune activation (MIA) microglia from the offspring have a long-lived decrease in immune reactivity (blunting) across the developmental trajectory. The blunted immune response was concomitant with changes in the chromatin accessibility and reduced transcription factor occupancy of the open chromatin. Single cell RNA sequencing revealed that MIA does not induce a distinct subpopulation but rather decreases the contribution to inflammatory microglia states. Prenatal replacement of MIA microglia with physiological infiltration of naive microglia ameliorated the immune blunting and restored a decrease in presynaptic vesicle release probability onto dopamine receptor type-two medium spiny neurons, indicating that aberrantly formed microglia due to an adverse prenatal environment impacts the long-term microglia reactivity and proper striatal circuit development.

10.1101/2021.12.27.473694

13. Identifying Gene-wise Differences in Latent Space Projections Across Cell Types and Species in Single Cell Data using scProject (2021)

Baraban, A., Clark, B. S., Slosberg, J., Fertig, E. J., Goff LA, Stein-O'Brien, G., bioRxiv

Latent space techniques have emerged as powerful tools to identify genes and gene sets responsible for cell-type and species-specific differences in single-cell data. Transfer learning methods can compare learned latent spaces across biological systems. However, the robustness that comes from leveraging information across multiple genes in transfer learning is often attained at the sacrifice of gene-wise precision. Thus, methods are needed to identify genes, defined as important within a particular latent space, that significantly differ between contexts. To address this challenge, we have developed a new framework, scProject, and a new metric, projectionDrivers, to quantitatively examine latent space usage across single-cell experimental systems while concurrently extracting the genes driving the differential usage of the latent space between defined contrasts. Here, we demonstrate the efficacy, utility, and scalability of scProject with projectionDrivers and provide experimental validation for predicted species-specific differences between the developing mouse and human retina.

10.1101/2021.08.25.457650

14. Universal prediction of cell cycle position using transfer learning (2021)

Zheng, S. C., Stein-O'Brien, G., Augustin, J. J., Slosberg, J., Carosso, G. A., Winer, B., Shin, G., Bjornsson, H. T., Goff LA, Hansen, K. D., bioRxiv

BackgroundThe cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. ResultsHere, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell cycle embedding using a fixed reference dataset and project new data into this reference embedding; an approach that overcomes key limitations of learning a dataset dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species and even sequencing assays. ConclusionsTricycle generalizes across datasets, is highly scalable and applicable to atlas-level single-cell RNA-seq data.

10.1101/2021.04.06.438463

15. Differential expression levels of Sox9 in early neocortical radial glial cells regulate the decision between stem cell maintenance and differentiation (2020)

Fabra-Beser, J., Alves Medeiros de Araujo, J., Coelho, D. M., Goff LA, Mueller, U., Gil-Sanz, C., bioRxiv

Radial glial progenitor cells (RGCs) in the dorsal forebrain directly or indirectly produce excitatory projection neurons and macroglia of the neocortex. Recent evidence shows that the pool of RGCs is more heterogeneous than originally thought and that progenitor subpopulations can generate particular neuronal cell types. Using single cell RNA sequencing, we have studied gene expression patterns of two subtypes of RGCs that differ in their neurogenic behavior. One progenitor type rapidly produces postmitotic neurons, whereas the second progenitor remains relatively quiescence before generating neurons. We have identified candidate genes that are differentially expressed between these RGCs progenitor subtypes, including the transcription factor Sox9. Using in utero electroporation, we demonstrate that elevated Sox9 expression in progenitors prevents RGC division and leads to the generation of upper-layer cortical neurons from these progenitors at later ages. Our data thus reveal molecular differences between cortical progenitors with different neurogenic behavior and indicates that Sox9 is critical for the maintenance of RGCs to regulate the generation of upper layer neurons. SIGNIFICANCE STATEMENTThe existence of heterogeneity in the pool of RGCs and its relationship with the generation of cellular diversity in the cerebral cortex has been an interesting topic of debate for many years. Here we describe the existence of a subpopulation of RGCs with reduced neurogenic behavior at early embryonic ages presenting a particular molecular signature. This molecular signature consists of differential expression of some genes including the transcription factor Sox9, found to be a specific master regulator of this subpopulation of progenitor cells. Functional experiments perturbing Sox9 expressions levels reveal its instructive role in the regulation of the neurogenic behavior of RGCs and its relationship with the generation of upper layer projection neurons at later ages.

10.1101/2020.12.09.417931

16. Neural crest-derived neurons are replaced by a newly identified mesodermal lineage in the post-natal and aging enteric nervous system (2020)

Kulkarni, S., Saha, M., Becker, L. S., Wang, Z., Liu, G., Leser, J., Kumar, M., Bakhshi, S., Anderson, M. J., Lewandoski, M., Slosberg, J., Nagaraj, S., Vincent, E., Goff LA, Pasricha, P. J., bioRxiv

The enteric nervous system (ENS), a collection of neural cells contained in the wall of the gut, is of fundamental importance to gastrointestinal and systemic health. According to the prevailing paradigm, the ENS arises from progenitor cells migrating from the neural crest and remains largely unchanged thereafter. Here, we show that the lineage composition of maturing ENS changes with time, with a decline in the canonical lineage of neural-crest derived neurons and their replacement by a newly identified lineage of mesoderm-derived neurons. Single cell transcriptomics and immunochemical approaches establish a distinct expression profile of mesoderm-derived neurons. The dynamic balance between the proportions of neurons from these two different lineages in the post-natal gut is dependent on the availability of their respective trophic signals, GDNF-RET and HGF-MET. With increasing age, the mesoderm-derived neurons become the dominant form of neurons in the ENS, a change associated with significant functional effects on intestinal motility which can be reversed by GDNF supplementation. Transcriptomic analyses of human gut tissues show reduced GDNF-RET signaling in patients with intestinal dysmotility which is associated with reduction in neural crest-derived neuronal markers and concomitant increase in transcriptional patterns specific to mesoderm-derived neurons. Normal intestinal function in the adult gastrointestinal tract therefore appears to require an optimal balance between these two distinct lineages within the ENS.

10.1101/2020.08.25.262832

17. A Feedback Mechanism Regulates Odorant Receptor Expression in the Malaria Mosquito, Anopheles gambiae (2020)

Maguire, S. E., Afify, A., Goff LA, Potter, C. J., bioRxiv

Mosquitoes locate and approach humans ( host-seek) when specific Olfactory Neurons (ORNs) in the olfactory periphery activate a specific combination of glomeruli in the mosquito Antennal Lobe (AL). We hypothesize that dysregulating proper glomerular activation in the presence of human odor will prevent host-seeking behavior. In experiments aimed at ectopically activating most ORNs in the presence of human odor, we made a surprising finding: ectopic expression of an AgOr (AgOr2) in Anopheles gambiae ORNs dampens the activity of the expressing neuron. This contrasts studies in Drosophila melanogaster, the typical insect model of olfaction, in which ectopic expression of non-native ORs in ORNs confers ectopic neuronal responses without interfering with native olfactory physiology. To gain insight into this dysfunction in mosquitoes, RNA-seq analyses were performed comparing wild-type antennae to those ectopically expressing AgOr2 in ORNs. Remarkably, almost all Or transcripts were significantly downregulated (except for AgOr2), and additional experiments suggest that it is AgOR2 protein rather than mRNA that mediates this downregulation. Our study shows that ORNs of Anopheles mosquitoes (in contrast to Drosophila) employ a currently unexplored regulatory mechanism of OR expression, which may be adaptable as a vector-control strategy. SIGNIFICANCE STATEMENTStudies in Drosophila melanogaster suggest that insect Olfactory Receptor Neurons (ORNs) do not contain mechanisms by which Odorant Receptors (ORs) regulate OR expression. This has proved useful in studies where ectopic expression of an OR in Drosophila ORNs confers responses to the odorants that activate the newly expressed OR. In experiments in Anopheles gambiae mosquitoes, we found that ectopic expression of an OR in most Anopheles ORNs dampened the activity of the expressing neurons. RNA-seq analyses demonstrated that ectopic OR expression in Anopheles ORNs leads to downregulation of endogenous Or transcripts. Additional experiments suggest that this downregulation required ectopic expression of a functional OR protein. These findings reveal that Anopheles mosquitoes, in contrast to Drosophila, contain a feedback mechanism to regulate OR expression. Mosquito ORNs might employ regulatory mechanisms of OR expression previously thought to occur only in non-insect olfactory systems.

10.1101/2020.07.23.218586

18. Parallel social information processing circuits are differentially impacted in autism (2020)

Lewis, E. M., Stein-O'Brien, G., Patino, A., Nardou, R., Grossman, C. D., Brown, M., Bangamwabo, B., Ndiaye, N., Giovinazzo, D., Dardani, I., Jiang, C., Goff LA, Dolen, G., bioRxiv

Parallel processing circuits are thought to dramatically expand the network capabilities of the nervous system. Magnocellular and parvocellular oxytocin neurons have been proposed to subserve two parallel streams of social information processing, which allow a single molecule to encode a diverse array of ethologically distinct behaviors, although to date direct evidence to support this hypothesis is lacking. Here we provide the first comprehensive characterization of magnocellular and parvocellular oxytocin neurons, validated across anatomical, projection target, electrophysiological, and transcriptional criteria. We next used novel multiple feature selection tools in Fmr1 KO mice to provide direct evidence that normal functioning of the parvocellular but not magnocellular oxytocin pathway is required for autism-relevant social reward behavior. Finally, we demonstrate that autism risk genes are uniquely enriched in parvocellular oxytocin neurons. Taken together these results provide the first evidence that oxytocin pathway specific pathogenic mechanisms account for social impairments across a broad range of autism etiologies. One Sentence SummaryPathoclisis of parvocellular oxytocin neurons plays an important role in the pathogenesis of social impairments in autism.

10.1101/2020.03.13.990549

19. Single-cell analysis of human retina identifies evolutionarily conserved and species-specific mechanisms controlling development. (2019)

Lu, Y., Shiau, F., Yi, W., Lu, S., Wu, Q., Pearson, J., Kallman, A., Hoang, T., Zhong, S., Zuo, Z., Zhao, F., Zhang, M., Tsai, N., Zhou, Y., He, S., Zhang, J., Stein-O'Brien, G., Sherman, T. D., Duan, X., Fertig, E. J., Goff LA, Zack, D., Handa, J. T., Xu, T., Bremner, R., Blackshaw, S., Wang, X., Clark, B. S., bioRxiv

The development of single-cell RNA-Sequencing (scRNA-Seq) has allowed high resolution analysis of cell type diversity and transcriptional networks controlling cell fate specification. To identify the transcriptional networks governing human retinal development, we performed scRNA-Seq over retinal organoid and in vivo retinal development, across 20 timepoints. Using both pseudotemporal and cross-species analyses, we examined the conservation of gene expression across retinal progenitor maturation and specification of all seven major retinal cell types. Furthermore, we examined gene expression differences between developing macula and periphery and between two distinct populations of horizontal cells. We also identify both shared and species-specific patterns of gene expression during human and mouse retinal development. Finally, we identify an unexpected role for ATOH7 expression in regulation of photoreceptor specification during late retinogenesis. These results provide a roadmap to future studies of human retinal development, and may help guide the design of cell-based therapies for treating retinal dystrophies.

10.1101/779694

20. projectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation, and clustering (2019)

Sharma, G., Colantuoni, C., Goff LA, Stein-O'Brien, G. L., Fertig, E., bioRxiv

MotivationDimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically import to large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically-driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset.\n\nResultsWe developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation, and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.\n\nAvailabilityprojectR is available on Bioconductor and at https://github.com/genesofeve/projectR.\n\nContactgsteinobrien@jhmi.edu; ejfertig@jhmi.edu

10.1101/726547

21. Transcriptional suppression from KMT2D loss disrupts cell cycle and hypoxic responses in neurodevelopmental models of Kabuki syndrome (2018)

Carosso, G. A., Boukas, L., Augustin, J. J., Nguyen, H. N., Winer, B. L., Cannon, G. H., Robertson, J. D., Zhang, L., Hansen, K. D., Goff LA, Bjornsson, H. T., bioRxiv

Chromatin modifiers act to coordinate gene expression changes critical to neuronal differentiation from neural stem/progenitor cells (NSPCs). Lysine-specific methyltransferase 2D (KMT2D) encodes a histone methyltransferase that promotes transcriptional activation, and is frequently mutated in cancers and in the majority (>70%) of patients diagnosed with the congenital, multisystem intellectual disability (ID) disorder Kabuki syndrome 1 (KS1). Critical roles for KMT2D are established in various non-neural tissues, but the effects of KMT2D loss in brain cell development have not been described. We conducted parallel studies of proliferation, differentiation, transcription, and chromatin profiling in KMT2D-deficient human and mouse models to define KMT2D-regulated functions in neurodevelopmental contexts, including adult-born hippocampal NSPCs in vivo and in vitro. We report cell-autonomous defects in proliferation, cell cycle, and survival, accompanied by early NSPC maturation in several KMT2D-deficient model systems. Transcriptional suppression in KMT2D-deficient cells indicated strong perturbation of hypoxia-responsive metabolism pathways. Functional experiments confirmed abnormalities of cellular hypoxia responses in KMT2D-deficient neural cells, and accelerated NSPC maturation in vivo. Together, our findings support a model in which loss of KMT2D function suppresses expression of oxygen-responsive gene programs important to neural progenitor maintenance, resulting in precocious neuronal differentiation in a mouse model of KS1.\n\nGraphical Abstract\n\nO_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=91 SRC=\"FIGDIR/small/484410v3_ufig1.gif\" ALT=\"Figure 1\">\nView larger version (22K):\norg.highwire.dtl.DTLVardef@55dd1aorg.highwire.dtl.DTLVardef@1270c80org.highwire.dtl.DTLVardef@a5bda2org.highwire.dtl.DTLVardef@144df37_HPS_FORMAT_FIGEXP M_FIG C_FIG

10.1101/484410

22. Testing the Regulatory Consequences of 1,049 Schizophrenia Associated Variants With a Massively Parallel Reporter Assay (2018)

Myint, L., Wang, R., Boukas, L., Hansen, K. D., Goff LA, Avramopoulos, D., bioRxiv

Recent genome-wide association studies (GWAS) identified numerous schizophrenia (SZ) and Alzheimers disease (AD) associated loci, most outside protein-coding regions and hypothesized to affect gene transcription. We used a massively parallel reporter assay (MPRA) to screen, 1,049 SZ and 30 AD variants in 64 and 9 loci respectively for allele differences in driving reporter gene expression. A library of synthetic oligonucleotides assaying each allele 5 times was transfected into K562 chronic myelogenous leukemia lymphoblasts and SK-SY5Y human neuroblastoma cells. 148 variants showed allelic differences in K562 and 53 in SK-SY5Y cells, on average 2.6 variants per locus. Nine showed significant differences in both lines, a modest overlap reflecting different regulatory landscapes of these lines that also differ significantly in chromatin marks. Eight of nine were in the same direction. We observe no preference for risk alleles to increase or decrease expression. We find a positive correlation between the number of SNPs in Linkage Disequilibrium (LD) and the proportion of functional SNPs supporting combinatorial effects that may lead to haplotype selection. Our results prioritize future functional follow up of disease associated SNPs to determine the driver GWAS variant(s), at each locus and enhance our understanding of gene regulation dynamics.

10.1101/447557

23. Expression variation analysis for tumor heterogeneity in single-cell RNA-sequencing data (2018)

Davis-Marcisak, E. F., Orugunta, P., Stein-O'Brien, G., Puram, S. V., Roussos Torres, E., Hopkins, A., Jaffee, E. M., Favorov, A. V., Afsari, B., Goff LA, Fertig, E. J., bioRxiv

Tumor heterogeneity provides a complex challenge to cancer treatment and is a critical component of therapeutic response, disease recurrence, and patient survival. Single-cell RNA-sequencing (scRNA-seq) technologies reveal the prevalence of intra-and inter-tumor heterogeneity. Computational techniques are essential to quantify the differences in variation of these profiles between distinct cell types, tumor subtypes, and patients to fully characterize intra-and inter-tumor molecular heterogeneity. We devised a new algorithm, Expression Variation Analysis in Single Cells (EVAsc), to perform multivariate statistical analyses of differential variation of expression in gene sets for scRNA-seq. EVAsc has high sensitivity and specificity to detect pathways with true differential heterogeneity in simulated data. We then apply EVAsc to several public domain scRNA-seq tumor datasets to quantify the landscape of tumor heterogeneity in several key applications in cancer genomics, i.e. immunogenicity, cancer subtypes, and metastasis. Immune pathway heterogeneity in hematopoietic cell populations in breast tumors corresponded to the amount diversity present in the T-cell repertoire of each individual. In head and neck squamous cell carcinoma (HNSCC) patients, we found dramatic differences in pathway dysregulation across basal primary tumors. Within the basal primary tumors we also identified increased immune dysregulation in individuals with a high proportion of fibroblasts present in the tumor microenvironment. Moreover, cells in HNSCC primary tumors had significantly more heterogeneity across pathways than cells in metastases, consistent with a model of clonal outgrowth. These results demonstrate the broad utility of EVAsc to quantify inter-and intra-tumor heterogeneity from scRNA-seq data without reliance on low dimensional visualization.

10.1101/479287

24. Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species. (2018)

Stein-O'Brien, G. L., Clark, B. S., Sherman, T., Zibetti, C., Hu, Q., Sealfon, R., Liu, S., Qian, J., Colantuoni, C., Blackshaw, S., Goff LA, Fertig, E. J., bioRxiv

New approaches are urgently needed to glean biological insights from the vast amounts of single cell RNA sequencing (scRNA-Seq) data now being generated. To this end, we propose that cell identity should map to a reduced set of factors which will describe both exclusive and shared biology of individual cells, and that the dimensions which contain these factors reflect biologically meaningful relationships across different platforms, tissues and species. To find a robust set of dependent factors in large-scale scRNA- Seq data, we developed a Bayesian non-negative matrix factorization (NMF) algorithm, scCoGAPS. Application of scCoGAPS to scRNA-Seq data obtained over the course of mouse retinal development identified gene expression signatures for factors associated with specific cell types and continuous biological processes. To test whether these signatures are shared across diverse cellular contexts, we developed projectR to map biologically disparate datasets into the factors learned by scCoGAPS. Because projecting these dimensions preserve relative distances between samples, biologically meaningful relationships/factors will stratify new data consistent with their underlying processes, allowing labels or information from one dataset to be used for annotation of the other--a machine learning concept called transfer learning. Using projectR, data from multiple datasets was used to annotate latent spaces and reveal novel parallels between developmental programs in other tissues, species and cellular assays. Using this approach we are able to transfer cell type and state designations across datasets to rapidly annotate cellular features in a new dataset without a priori knowledge of their type, identify a species-specific signature of microglial cells, and identify a previously undescribed subpopulation of neurosecretory cells within the lung. Together, these algorithms define biologically meaningful dimensions of cellular identity, state, and trajectories that persist across technologies, molecular features, and species.\n\nGRAPHICAL ABSTRACT\n\nO_FIG O_LINKSMALLFIG WIDTH=174 HEIGHT=200 SRC=\"FIGDIR/small/395004_ufig1.gif\" ALT=\"Figure 1\">\nView larger version (81K):\norg.highwire.dtl.DTLVardef@dd1c07org.highwire.dtl.DTLVardef@5b1109org.highwire.dtl.DTLVardef@bb6714org.highwire.dtl.DTLVardef@16c66f0_HPS_FORMAT_FIGEXP M_FIG C_FIG

10.1101/395004

25. Comprehensive analysis of retinal development at single cell resolution identifies NFI factors as essential for mitotic exit and specification of late-born cells. (2018)

Clark, B., Stein-O'Brien, G., Shiau, F., Cannon, G., Davis, E., Sherman, T., Rajaii, F., James-Esposito, R., Gronostajski, R., Fertig, E., Goff LA, Blackshaw, S., bioRxiv

Precise temporal control of gene expression in neuronal progenitors is necessary for correct regulation of neurogenesis and cell fate specification. However, the extensive cellular heterogeneity of the developing CNS has posed a major obstacle to identifying the gene regulatory networks that control these processes. To address this, we used single cell RNA-sequencing to profile ten developmental stages encompassing the full course of retinal neurogenesis. This allowed us to comprehensively characterize changes in gene expression that occur during initiation of neurogenesis, changes in developmental competence, and specification and differentiation of each of the major retinal cell types. These data identify transitions in gene expression between early and late-stage retinal progenitors, as well as a classification of neurogenic progenitors. We identify here the NFI family of transcription factors (Nfia, Nfib, and Nfix) as genes with enriched expression within late RPCs, and show they are regulators of bipolar interneuron and Muller glia specification and the control of proliferative quiescence.

10.1101/378950

26. Enter the matrix: Interpreting unsupervised feature learning with matrix decomposition to discover hidden knowledge in high-throughput omics data (2017)

Stein-O'Brien, G. L., Arora, R., Culhane, A. C., Favorov, A., Greene, C., Goff LA, Li, Y., Ngom, A., Ochs, M. F., Xu, Y., Fertig, E., bioRxiv

Omics data contains signal from the molecular, physical, and kinetic inter- and intra-cellular interactions that control biological systems. Matrix factorization techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in topics ranging from pathway discovery to time course analysis. We review exemplary applications of matrix factorization for systems-level analyses. We discuss appropriate application of these methods, their limitations, and focus on analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with matrix factorization enables discovery from high-throughput data beyond the limits of current biological knowledge--answering questions from high-dimensional data that we have not yet thought to ask.

10.1101/196915

27. Linear models enable powerful differential activity analysis in massively parallel reporter assays (2017)

Myint, L., Avramopoulos, D. G., Goff LA, Hansen, K., bioRxiv

Massively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets. We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments. An R package is available from the Bioconductor project at https://bioconductor.org/packages/mpra.

10.1101/196394

28. Temporal and spatial variation among single dopaminergic neuron transcriptomes informs cellular phenotype diversity and Parkinson’s Disease gene prioritization (2017)

Hook, P., McClymont, S. A., Cannon, G. H., Law, W. D., Goff LA, McCallion, A. S., bioRxiv

Parkinsons disease (PD) is caused by the collapse of substantia nigra (SN) dopaminergic (DA) neurons of the midbrain (MB), while other DA populations remain relatively intact. Common variation influencing susceptibility to sporadic PD has been primarily identified through genome wide association studies (GWAS). However, like many other common genetic diseases, the genes impacted by common PD-associated variation remain to be elucidated. Here, we used single-cell RNA-seq to characterize DA neuron populations in the mouse brain at embryonic and early postnatal timepoints. These data allow for the unbiased identification of DA neuron subpopulations, including a novel postnatal neuroblast population and SN DA neurons. Comparison of SN DA neurons with other DA neurons populations in the brain reveals a unique transcriptional profile, novel marker genes, and specific gene regulatory networks. By integrating these cell population specific data with published GWAS, we develop a scoring system for prioritizing candidate genes in PD-associated loci. With this, we prioritize candidate genes in all 32 GWAS intervals implicated in sporadic PD risk, the first such systematically generated list. From this we confirm that the prioritized candidate gene CPLX1 disrupts the nigrostriatal pathway when knocked out in mice. Ultimately, this systematic rationale leads to the identification of biologically pertinent candidates and testable hypotheses for sporadic PD that will inform a new era of PD genetic research.

10.1101/148049

29. Variation in neuronal activity state, axonal projection target, and position principally define the transcriptional identity of individual neocortical projection neurons. (2017)

Chevee, M. A., Robertson, J. D., Cannon, G. H., Brown, S. P., Goff LA,, bioRxiv

Single-cell RNA sequencing technologies have generated the first catalogs of transcriptionally defined neuronal subtypes of the brain. However, the biologically informative cellular processes that contribute to neuronal subtype specification and transcriptional heterogeneity remain unclear. By comparing the gene expression profiles of single layer 6 corticothalamic neurons in somatosensory cortex, we show that transcriptional subtypes primarily reflect axonal projection pattern, laminar position within the cortex, and neuronal activity state. Pseudotemporal ordering of 1023 cellular responses to manipulation of sensory input demonstrates that changes in expression of activity-induced genes both reinforced cell-type identity and contributed to increased transcriptional heterogeneity within each cell type. This is due to cell-type specific biases in the choice of transcriptional states following manipulation of neuronal activity. These results reveal that axonal projection pattern, laminar position, and activity state define significant axes of variation that contribute both to the transcriptional identity of individual neurons and to the transcriptional heterogeneity within each neuronal subtype.

10.1101/157149