EBI Summary page | Google Scholar | Pubmed | ORCHID: 0000-0003-2875-451X | Impactstory

The diversity of visual input processed by the mammalian visual system requires the generation of many distinct retinal ganglion cell (RGC) types, each tuned to a particular feature. The molecular code needed to generate this cell-type diversity is poorly understood. Here, we focus on the molecules needed to specify one type of retinal cell: the upward-preferring ON direction-selective ganglion cell (up-oDSGC) of the mouse visual system. Single-cell transcriptomic profiling of up- and down-oDSGCs shows that the transcription factor Tbx5 is selectively expressed in up-oDSGCs. The loss of Tbx5 in up-oDSGCs results in a selective defect in the formation of up-oDSGCs and a corresponding inability to detect vertical motion. A downstream effector of Tbx5, Sfrp1, is also critical for vertical motion detection but not up-oDSGC formation. These results advance our understanding of the molecular mechanisms that specify a rare retinal cell type and show how disrupting this specification leads to a corresponding defect in neural circuitry and behavior.


Heterozygous, loss of function mutations in positive regulators of the Transforming Growth Factor-beta (TGF-beta) pathway cause hereditary forms of thoracic aortic aneurysm. It is unclear whether and how the initial signaling deficiency triggers secondary signaling upregulation in the remaining functional branches of the pathway, and if this contributes to maladaptive vascular remodeling. To examine this process in a mouse model in which time-controlled, partial interference with postnatal TGF-beta signaling in vascular smooth muscle cells (VSMCs) could be assessed, we used a VSMC-specific tamoxifen-inducible system, and a conditional allele, to inactivate Smad3 at 6 weeks of age, after completion of perinatal aortic development. This intervention induced dilation and histological abnormalities in the aortic root, with minor involvement of the ascending aorta. To analyze early and late events associated with disease progression, we performed a comparative single cell transcriptomic analysis at 10- and 18-weeks post-deletion, when aortic dilation is undetectable and moderate, respectively. At the early time-point, Smad3-inactivation resulted in a broad reduction in the expression of extracellular matrix components and critical components of focal adhesions, including integrins and anchoring proteins, which was reflected histologically by loss of connections between VSMCs and elastic lamellae. At the later time point, however, expression of several transcripts belonging to the same functional categories was normalized or even upregulated; this occurred in association with upregulation of transcripts coding for TGF-beta ligands, and persistent downregulation of negative regulators of the pathway. To interrogate how VSMC heterogeneity may influence this transition, we examined transcriptional changes in each of the four VSMC subclusters identified, regardless of genotype, as partly reflecting the proximal-to-distal anatomic location based on in situ RNA hybridization. The response to Smad3-deficiency varied depending on subset, and VSMC subsets over-represented in the aortic root, the site most vulnerable to dilation, most prominently upregulated TGF-beta ligands and pro-pathogenic factors such as thrombospondin-1, angiotensin converting enzyme, and pro-inflammatory mediators. These data suggest that Smad3 is required for maintenance of focal adhesions, and that loss of contacts with the extracellular matrix has consequences specific to each VSMC subset, possibly contributing to the regional susceptibility to dilation in the aorta.


Mosquitoes locate and approach humans based on the activity of odorant receptors (ORs) expressed on olfactory receptor neurons (ORNs). Olfactogenetic experiments in Anopheles gambiae mosquitoes revealed that the ectopic expression of an AgOR (AgOR2) in ORNs dampened the activity of the expressing neuron. This contrasts with studies in Drosophila melanogaster in which the ectopic expression of non-native ORs in ORNs confers ectopic neuronal responses without interfering with native olfactory physiology. RNA-seq analyses comparing wild-type antennae to those ectopically expressing AgOR2 in ORNs indicated that nearly all AgOR transcripts were significantly downregulated (except for AgOR2). Additional experiments suggest that AgOR2 protein rather than mRNA mediates this downregulation. Using in situ hybridization, we find that AgOR gene choice is active into adulthood and that AgOR2 expression inhibits AgORs from turning on at this late stage. Our study shows that the ORNs of Anopheles mosquitoes (in contrast to Drosophila) are sensitive to a currently unexplored mechanism of AgOR regulation.


Hair cell (HC) loss within the inner ear cochlea is a leading cause for deafness in humans. Before the onset of hearing, immature supporting cells (SCs) in neonatal mice have some limited capacity for HC regeneration. Here, we show that in organoid culture, transient activation of the progenitor-specific RNA binding protein LIN28B and Activin antagonist follistatin (FST) enhances regenerative competence of maturing/mature cochlear SCs by reprogramming them into progenitor-like cells. Transcriptome profiling and mechanistic studies reveal that LIN28B drives SC reprogramming, while FST is required to counterbalance hyperactivation of transforming growth factor-beta-type signaling by LIN28B. Last, we show that LIN28B and FST coactivation enhances spontaneous cochlear HC regeneration in neonatal mice and that LIN28B may be part of an endogenous repair mechanism that primes SCs for HC regeneration. These findings indicate that SC dedifferentiation is critical for HC regeneration and identify LIN28B and FST as main regulators.


BACKGROUND: The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. RESULTS: Here, we present tricycle, an R/Bioconductor package, to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the use of transfer learning. We estimate a cell-cycle embedding using a fixed reference dataset and project new data into this reference embedding, an approach that overcomes key limitations of learning a dataset-dependent embedding. Tricycle then predicts a cell-specific position in the cell cycle based on the data projection. The accuracy of tricycle compares favorably to gold-standard experimental assays, which generally require specialized measurements in specifically constructed in vitro systems. Using internal controls which are available for any dataset, we show that tricycle predictions generalize to datasets with multiple cell types, across tissues, species, and even sequencing assays. CONCLUSIONS: Tricycle generalizes across datasets and is highly scalable and applicable to atlas-level single-cell RNA-seq data.


Radial glial progenitor cells (RGCs) in the dorsal telencephalon directly or indirectly produce excitatory projection neurons and macroglia of the neocortex. Recent evidence shows that the pool of RGCs is more heterogeneous than originally thought and that progenitor subpopulations can generate particular neuronal cell types. Using single-cell RNA sequencing, we have studied gene expression patterns of RGCs with different neurogenic behavior at early stages of cortical development. At this early age, some RGCs rapidly produce postmitotic neurons, whereas others self-renew and undergo neurogenic divisions at a later age. We have identified candidate genes that are differentially expressed among these early RGC subpopulations, including the transcription factor Sox9. Using in utero electroporation in embryonic mice of either sex, we demonstrate that elevated Sox9 expression in progenitors affects RGC cell cycle duration and leads to the generation of upper layer cortical neurons. Our data thus reveal molecular differences between progenitor cells with different neurogenic behavior at early stages of corticogenesis and indicates that Sox9 is critical for the maintenance of RGCs to regulate the generation of upper layer neurons.SIGNIFICANCE STATEMENT The existence of heterogeneity in the pool of RGCs and its relationship with the generation of cellular diversity in the cerebral cortex has been an interesting topic of debate for many years. Here we describe the existence of RGCs with reduced neurogenic behavior at early embryonic ages presenting a particular molecular signature. This molecular signature consists of differential expression of some genes including the transcription factor Sox9, which has been found to be a specific regulator of this subpopulation of progenitor cells. Functional experiments perturbing expression levels of Sox9 reveal its instructive role in the regulation of the neurogenic behavior of RGCs and its relationship with the generation of upper layer projection neurons at later ages.


Haematopoiesis relies on tightly controlled gene expression patterns as development proceeds through a series of progenitors. While the regulation of hematopoietic development has been well studied, the role of noncoding elements in this critical process is a developing field. In particular, the discovery of new regulators of lymphopoiesis could have important implications for our understanding of the adaptive immune system and disease. Here we elucidate how a noncoding element is capable of regulating a broadly expressed transcription factor, Ikaros, in a lymphoid lineage-specific manner, such that it imbues Ikaros with the ability to specify the lymphoid lineage over alternate fates. Deletion of the Daedalus locus, which is proximal to Ikaros, led to a severe reduction in early lymphoid progenitors, exerting control over the earliest fate decisions during lymphoid lineage commitment. Daedalus locus deletion led to alterations in Ikaros isoform expression and a significant reduction in Ikaros protein. The Daedalus locus may function through direct DNA interaction as Hi-C analysis demonstrated an interaction between the two loci. Finally, we identify an Ikaros-regulated erythroid-lymphoid checkpoint that is governed by Daedalus in a lymphoid-lineage-specific manner. Daedalus appears to act as a gatekeeper of Ikaros's broad lineage-specifying functions, selectively stabilizing Ikaros activity in the lymphoid lineage and permitting diversion to the erythroid fate in its absence. These findings represent a key illustration of how a transcription factor with broad lineage expression must work in concert with noncoding elements to orchestrate hematopoietic lineage commitment.


Somatic LINE-1 (L1) retrotransposition has been detected in early embryos, adult brains, and the gastrointestinal (GI) tract, and many cancers, including epithelial GI tumors. We previously found numerous somatic L1 insertions in paired normal and GI cancerous tissues. Here, using a modified method of single-cell analysis for somatic L1 insertions, we studied adenocarcinomas of colon, pancreas, and stomach, and found a variable number of somatic L1 insertions in tumors of the same type from patient to patient. We detected no somatic L1 insertions in single cells of 5 of 10 tumors studied. In three tumors, aneuploid cells were detected by FACS. In one pancreatic tumor, there were many more L1 insertions in aneuploid than in euploid tumor cells. In one gastric cancer, both aneuploid and euploid cells contained large numbers of likely clonal insertions. However, in a second gastric cancer with aneuploid cells, no somatic L1 insertions were found. We suggest that when the cellular environment is favorable to retrotransposition, aneuploidy predisposes tumor cells to L1 insertions, and retrotransposition may occur at the transition from euploidy to aneuploidy. Seventeen percent of insertions were also present in normal cells, similar to findings in genomic DNA from normal tissues of GI tumor patients. We provide evidence that: 1) The number of L1 insertions in tumors of the same type is highly variable, 2) most somatic L1 insertions in GI cancer tissues are absent from normal tissues, and 3) under certain conditions, somatic L1 retrotransposition exhibits a propensity for occurring in aneuploid cells.


Parallel processing circuits are thought to dramatically expand the network capabilities of the nervous system. Magnocellular and parvocellular oxytocin neurons have been proposed to subserve two parallel streams of social information processing, which allow a single molecule to encode a diverse array of ethologically distinct behaviors. Here we provide the first comprehensive characterization of magnocellular and parvocellular oxytocin neurons in male mice, validated across anatomical, projection target, electrophysiological, and transcriptional criteria. We next use novel multiple feature selection tools in Fmr1-KO mice to provide direct evidence that normal functioning of the parvocellular but not magnocellular oxytocin pathway is required for autism-relevant social reward behavior. Finally, we demonstrate that autism risk genes are enriched in parvocellular compared with magnocellular oxytocin neurons. Taken together, these results provide the first evidence that oxytocin-pathway-specific pathogenic mechanisms account for social impairments across a broad range of autism etiologies.


In the hippocampus, a widely accepted model posits that the dentate gyrus improves learning and memory by enhancing discrimination between inputs. To test this model, we studied conditional knockout mice in which the vast majority of dentate granule cells (DGCs) fail to develop - including nearly all DGCs in the dorsal hippocampus - secondary to eliminating Wntless (Wls) in a subset of cortical progenitors with Gfap-Cre. Other cells in the Wls(fl/-);Gfap-Cre hippocampus were minimally affected, as determined by single nucleus RNA sequencing. CA3 pyramidal cells, the targets of DGC-derived mossy fibers, exhibited normal morphologies with a small reduction in the numbers of synaptic spines. Wls(fl/-);Gfap-Cre mice have a modest performance decrement in several complex spatial tasks, including active place avoidance. They were also modestly impaired in one simpler spatial task, finding a visible platform in the Morris water maze. These experiments support a role for DGCs in enhancing spatial learning and memory.


INTRODUCTION: The microtubule-associated protein tau (MAPT) gene is considered a strong genetic risk factor for Parkinson's disease (PD) in Caucasians. MAPT is located within an inversion region of high linkage disequilibrium designated as H1 and H2 haplotype, and contains eight other genes which have been implicated in neurodegeneration. The aim of the current study was to identify common coding variants in strong linkage disequilibrium (LD) within the associated loci on chr17q21 harboring MAPT. METHODS: Sanger sequencing of coding exons in 90 Caucasian late-onset PD (LOPD) patients was performed. Specific gene sequencing for LRRC37A, LRRC37A2, ARL17A and ARL17B was not possible given the high homology, presence of pseudogenes and copy number variants that are in the region, and therefore four genes (NSF, KANSL1, SPPL2C, and CRHR1) were included in the analysis. Coding variants from these four genes that did not perfectly tag (r(2) = 1) the MAPT H1/H2 haplotype were genotyped in an independent replication series of Caucasian PD cases (N = 851) and controls (N = 730). RESULTS: In the 90 LOPD cases we identified 30 coding variants. Eleven non-synonymous variants tagged the MAPT H1/H2 haplotype, including two SPPL2C variants (rs12185233 and rs12373123) that had high pathogenic combined annotation dependent depletion (CADD) scores of >20. In the replication series, the non-synonymous KANSL1 rs17585974 variant was in very strong LD with MAPT H1/H2 and had a high CADD score of 24.7. CONCLUSION: We have identified several non-synonymous variants across neighboring genes of MAPT that may warrant further genetic and functional investigation within the biological etiology of PD.


The development of single-cell RNA sequencing (scRNA-seq) has allowed high-resolution analysis of cell-type diversity and transcriptional networks controlling cell-fate specification. To identify the transcriptional networks governing human retinal development, we performed scRNA-seq analysis on 16 time points from developing retina as well as four early stages of retinal organoid differentiation. We identified evolutionarily conserved patterns of gene expression during retinal progenitor maturation and specification of all seven major retinal cell types. Furthermore, we identified gene-expression differences between developing macula and periphery and between distinct populations of horizontal cells. We also identified species-specific patterns of gene expression during human and mouse retinal development. Finally, we identified an unexpected role for ATOH7 expression in regulation of photoreceptor specification during late retinogenesis. These results provide a roadmap to future studies of human retinal development and may help guide the design of cell-based therapies for treating retinal dystrophies.


We previously discovered in mouse adipocytes an lncRNA (the homolog of human LINC00116) regulating adipogenesis that contains a highly conserved coding region. Here, we show human protein expression of a peptide within LINC00116, and demonstrate that this peptide modulates triglyceride clearance in human adipocytes by regulating lipolysis and mitochondrial beta-oxidation. This gene has previously been identified as mitoregulin (MTLN). We conclude that MTLN has a regulatory role in adipocyte metabolism as demonstrated by systemic lipid phenotypes in knockout mice. We also assert its adipocyte-autonomous phenotypes in both isolated murine adipocytes as well as human stem cell-derived adipocytes. MTLN directly interacts with the beta subunit of the mitochondrial trifunctional protein, an enzyme critical in the beta-oxidation of long-chain fatty acids. Our human and murine models contend that MTLN could be an avenue for further therapeutic research, albeit not without caveats, for example, by promoting white adipocyte triglyceride clearance in obese subjects.


MOTIVATION: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. RESULTS: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. AVAILABILITY AND IMPLEMENTATION: projectR is available on Bioconductor and at CONTACT: or SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Autoimmune uveoretinitis is a significant cause of visual loss, and mouse models offer unique opportunities to study its disease mechanisms. Aire (-/-) mice fail to express self-antigens in the thymus, exhibit reduced central tolerance, and develop a spontaneous, chronic, and progressive uveoretinitis. Using single-cell RNA sequencing (scRNA-seq), we characterized wild-type and Aire (-/-) retinas to define, in a comprehensive and unbiased manner, the cell populations and gene expression patterns associated with disease. Based on scRNA-seq, immunostaining, and in situ hybridization, we infer that 1) the dominant effector response in Aire (-/-) retinas is Th1-driven, 2) a subset of monocytes convert to either a macrophage/microglia state or a dendritic cell state, 3) the development of tertiary lymphoid structures constitutes part of the Aire (-/-) retinal phenotype, 4) all major resident retinal cell types respond to interferon gamma (IFNG) by changing their patterns of gene expression, and 5) Muller glia up-regulate specific genes in response to IFN gamma and may act as antigen-presenting cells.


Diabetes is a common complication of cystic fibrosis (CF) that affects approximately 20% of adolescents and 40%-50% of adults with CF. The age at onset of CF-related diabetes (CFRD) (marked by clinical diagnosis and treatment initiation) is an important measure of the disease process. DNA variants associated with age at onset of CFRD reside in and near SLC26A9. Deep sequencing of the SLC26A9 gene in 762 individuals with CF revealed that 2 common DNA haplotypes formed by the risk variants account for the association with diabetes. Single-cell RNA sequencing (scRNA-Seq) indicated that SLC26A9 is predominantly expressed in pancreatic ductal cells and frequently coexpressed with CF transmembrane conductance regulator (CFTR) along with transcription factors that have binding sites 5' of SLC26A9. These findings were replicated upon reanalysis of scRNA-Seq data from 4 independent studies. DNA fragments derived from the 5' region of SLC26A9-bearing variants from the low-risk haplotype generated 12%-20% higher levels of expression in PANC-1 and CFPAC-1 cells compared with the high- risk haplotype. Taken together, our findings indicate that an increase in SLC26A9 expression in ductal cells of the pancreas delays the age at onset of diabetes, suggesting a CFTR-agnostic treatment for a major complication of CF.


Recent genome-wide association studies (GWAS) identified numerous schizophrenia (SZ) and Alzheimer's disease (AD) associated loci, most outside protein-coding regions and hypothesized to affect gene transcription. We used a massively parallel reporter assay to screen, 1,049 SZ and 30 AD variants in 64 and nine loci, respectively for allele differences in driving reporter gene expression. A library of synthetic oligonucleotides assaying each allele five times was transfected into K562 chronic myelogenous leukemia lymphoblasts and SK-SY5Y human neuroblastoma cells. One hundred forty eight variants showed allelic differences in K562 and 53 in SK-SY5Y cells, on average 2.6 variants per locus. Nine showed significant differences in both lines, a modest overlap reflecting different regulatory landscapes of these lines that also differ significantly in chromatin marks. Eight of nine were in the same direction. We observe no preference for risk alleles to increase or decrease expression. We find a positive correlation between the number of SNPs in linkage disequilibrium and the proportion of functional SNPs supporting combinatorial effects that may lead to haplotype selection. Our results prioritize future functional follow up of disease associated SNPs to determine the driver GWAS variant(s), at each locus and enhance our understanding of gene regulation dynamics.


Chromatin modifiers act to coordinate gene expression changes critical to neuronal differentiation from neural stem/progenitor cells (NSPCs). Lysine-specific methyltransferase 2D (KMT2D) encodes a histone methyltransferase that promotes transcriptional activation and is frequently mutated in cancers and in the majority (>70%) of patients diagnosed with the congenital, multisystem intellectual disability disorder Kabuki syndrome 1 (KS1). Critical roles for KMT2D are established in various non-neural tissues, but the effects of KMT2D loss in brain cell development have not been described. We conducted parallel studies of proliferation, differentiation, transcription, and chromatin profiling in KMT2D-deficient human and mouse models to define KMT2D-regulated functions in neurodevelopmental contexts, including adult-born hippocampal NSPCs in vivo and in vitro. We report cell-autonomous defects in proliferation, cell cycle, and survival, accompanied by early NSPC maturation in several KMT2D-deficient model systems. Transcriptional suppression in KMT2D-deficient cells indicated strong perturbation of hypoxia-responsive metabolism pathways. Functional experiments confirmed abnormalities of cellular hypoxia responses in KMT2D-deficient neural cells and accelerated NSPC maturation in vivo. Together, our findings support a model in which loss of KMT2D function suppresses expression of oxygen-responsive gene programs important to neural progenitor maintenance, resulting in precocious neuronal differentiation in a mouse model of KS1.


Tumor heterogeneity provides a complex challenge to cancer treatment and is a critical component of therapeutic response, disease recurrence, and patient survival. Single-cell RNA-sequencing (scRNA-seq) technologies have revealed the prevalence of intratumor and intertumor heterogeneity. Computational techniques are essential to quantify the differences in variation of these profiles between distinct cell types, tumor subtypes, and patients to fully characterize intratumor and intertumor molecular heterogeneity. In this study, we adapted our algorithm for pathway dysregulation, Expression Variation Analysis (EVA), to perform multivariate statistical analyses of differential variation of expression in gene sets for scRNA-seq. EVA has high sensitivity and specificity to detect pathways with true differential heterogeneity in simulated data. EVA was applied to several public domain scRNA-seq tumor datasets to quantify the landscape of tumor heterogeneity in several key applications in cancer genomics such as immunogenicity, metastasis, and cancer subtypes. Immune pathway heterogeneity of hematopoietic cell populations in breast tumors corresponded to the amount of diversity present in the T-cell repertoire of each individual. Cells from head and neck squamous cell carcinoma (HNSCC) primary tumors had significantly more heterogeneity across pathways than cells from metastases, consistent with a model of clonal outgrowth. Moreover, there were dramatic differences in pathway dysregulation across HNSCC basal primary tumors. Within the basal primary tumors, there was increased immune dysregulation in individuals with a high proportion of fibroblasts present in the tumor microenvironment. These results demonstrate the broad utility of EVA to quantify intertumor and intratumor heterogeneity from scRNA-seq data without reliance on low-dimensional visualization. SIGNIFICANCE: This study presents a robust statistical algorithm for evaluating gene expression heterogeneity within pathways or gene sets in single-cell RNA-seq data.


Precise temporal control of gene expression in neuronal progenitors is necessary for correct regulation of neurogenesis and cell fate specification. However, the cellular heterogeneity of the developing CNS has posed a major obstacle to identifying the gene regulatory networks that control these processes. To address this, we used single-cell RNA sequencing to profile ten developmental stages encompassing the full course of retinal neurogenesis. This allowed us to comprehensively characterize changes in gene expression that occur during initiation of neurogenesis, changes in developmental competence, and specification and differentiation of each major retinal cell type. We identify the NFI transcription factors (Nfia, Nfib, and Nfix) as selectively expressed in late retinal progenitor cells and show that they control bipolar interneuron and Muller glia cell fate specification and promote proliferative quiescence.


Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity.


The mammalian CNS is capable of tolerating chronic hypoxia, but cell type-specific responses to this stress have not been systematically characterized. In the Norrin KO (Ndp (KO) ) mouse, a model of familial exudative vitreoretinopathy (FEVR), developmental hypovascularization of the retina produces chronic hypoxia of inner nuclear-layer (INL) neurons and Muller glia. We used single-cell RNA sequencing, untargeted metabolomics, and metabolite labeling from (13)C-glucose to compare WT and Ndp (KO) retinas. In Ndp (KO) retinas, we observe gene expression responses consistent with hypoxia in Muller glia and retinal neurons, and we find a metabolic shift that combines reduced flux through the TCA cycle with increased synthesis of serine, glycine, and glutathione. We also used single-cell RNA sequencing to compare the responses of individual cell types in Ndp (KO) retinas with those in the hypoxic cerebral cortex of mice that were housed for 1 week in a reduced oxygen environment (7.5% oxygen). In the hypoxic cerebral cortex, glial transcriptome responses most closely resemble the response of Muller glia in the Ndp (KO) retina. In both retina and brain, vascular endothelial cells activate a previously dormant tip cell gene expression program, which likely underlies the adaptive neoangiogenic response to chronic hypoxia. These analyses of retina and brain transcriptomes at single-cell resolution reveal both shared and cell type-specific changes in gene expression in response to chronic hypoxia, implying both shared and distinct cell type-specific physiologic responses.


BACKGROUND: Massively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets. RESULTS: We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. An R package is available from the Bioconductor project. CONCLUSIONS: Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments.


Vascular endothelial cell (EC) function depends on appropriate organ-specific molecular and cellular specializations. To explore genomic mechanisms that control this specialization, we have analyzed and compared the transcriptome, accessible chromatin, and DNA methylome landscapes from mouse brain, liver, lung, and kidney ECs. Analysis of transcription factor (TF) gene expression and TF motifs at candidate cis-regulatory elements reveals both shared and organ-specific EC regulatory networks. In the embryo, only those ECs that are adjacent to or within the central nervous system (CNS) exhibit canonical Wnt signaling, which correlates precisely with blood-brain barrier (BBB) differentiation and Zic3 expression. In the early postnatal brain, single-cell RNA-seq of purified ECs reveals (1) close relationships between veins and mitotic cells and between arteries and tip cells, (2) a division of capillary ECs into vein-like and artery-like classes, and (3) new endothelial subtype markers, including new validated tip cell markers.


Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


Genetic variation modulating risk of sporadic Parkinson disease (PD) has been primarily explored through genome-wide association studies (GWASs). However, like many other common genetic diseases, the impacted genes remain largely unknown. Here, we used single-cell RNA-seq to characterize dopaminergic (DA) neuron populations in the mouse brain at embryonic and early postnatal time points. These data facilitated unbiased identification of DA neuron subpopulations through their unique transcriptional profiles, including a postnatal neuroblast population and substantia nigra (SN) DA neurons. We use these population-specific data to develop a scoring system to prioritize candidate genes in all 49 GWAS intervals implicated in PD risk, including genes with known PD associations and many with extensive supporting literature. As proof of principle, we confirm that the nigrostriatal pathway is compromised in Cplx1-null mice. Ultimately, this systematic approach establishes biologically pertinent candidates and testable hypotheses for sporadic PD, informing a new era of PD genetic research.


Single-cell RNA sequencing has generated catalogs of transcriptionally defined neuronal subtypes of the brain. However, the cellular processes that contribute to neuronal subtype specification and transcriptional heterogeneity remain unclear. By comparing the gene expression profiles of single layer 6 corticothalamic neurons in somatosensory cortex, we show that transcriptional subtypes primarily reflect axonal projection pattern, laminar position within the cortex, and neuronal activity state. Pseudotemporal ordering of 1,023 cellular responses to sensory manipulation demonstrates that changes in expression of activity-induced genes both reinforced cell-type identity and contributed to increased transcriptional heterogeneity within each cell type. This is due to cell-type biased choices of transcriptional states following manipulation of neuronal activity. These results reveal that axonal projection pattern, laminar position, and activity state define significant axes of variation that contribute both to the transcriptional identity of individual neurons and to the transcriptional heterogeneity within each neuronal subtype.


Commitment to the innate lymphoid cell (ILC) lineage is determined by Id2, a transcriptional regulator that antagonizes T and B cell-specific gene expression programs. Yet how Id2 expression is regulated in each ILC subset remains poorly understood. We identified a cis-regulatory element demarcated by a long non-coding RNA (lncRNA) that controls the function and lineage identity of group 1 ILCs, while being dispensable for early ILC development and homeostasis of ILC2s and ILC3s. The locus encoding this lncRNA, which we termed Rroid, directly interacted with the promoter of its neighboring gene, Id2, in group 1 ILCs. Moreover, the Rroid locus, but not the lncRNA itself, controlled the identity and function of ILC1s by promoting chromatin accessibility and deposition of STAT5 at the promoter of Id2 in response to interleukin (IL)-15. Thus, non-coding elements responsive to extracellular cues unique to each ILC subset represent a key regulatory layer for controlling the identity and function of ILCs.


Cell type-specific changes in neuronal excitability have been proposed to contribute to the selective degeneration of corticospinal neurons in amyotrophic lateral sclerosis (ALS) and to neocortical hyperexcitability, a prominent feature of both inherited and sporadic variants of the disease, but the mechanisms underlying selective loss of specific cell types in ALS are not known. We analyzed the physiological properties of distinct classes of cortical neurons in the motor cortex of hSOD1(G93A) mice of both sexes and found that they all exhibit increases in intrinsic excitability that depend on disease stage. Targeted recordings and in vivo calcium imaging further revealed that neurons adapt their functional properties to normalize cortical excitability as the disease progresses. Although different neuron classes all exhibited increases in intrinsic excitability, transcriptional profiling indicated that the molecular mechanisms underlying these changes are cell type specific. The increases in excitability in both excitatory and inhibitory cortical neurons show that selective dysfunction of neuronal cell types cannot account for the specific vulnerability of corticospinal motor neurons in ALS. Furthermore, the stage-dependent alterations in neuronal function highlight the ability of cortical circuits to adapt as disease progresses. These findings show that both disease stage and cell type must be considered when developing therapeutic strategies for treating ALS.SIGNIFICANCE STATEMENT It is not known why certain classes of neurons preferentially die in different neurodegenerative diseases. It has been proposed that the enhanced excitability of affected neurons is a major contributor to their selective loss. We show using a mouse model of amyotrophic lateral sclerosis (ALS), a disease in which corticospinal neurons exhibit selective vulnerability, that changes in excitability are not restricted to this neuronal class and that excitability does not increase monotonically with disease progression. Moreover, although all neuronal cell types tested exhibited abnormal functional properties, analysis of their gene expression demonstrated cell type-specific responses to the ALS-causing mutation. These findings suggest that therapies for ALS may need to be tailored for different cell types and stages of disease.


Kabuki syndrome is a Mendelian intellectual disability syndrome caused by mutations in either of two genes (KMT2D and KDM6A) involved in chromatin accessibility. We previously showed that an agent that promotes chromatin opening, the histone deacetylase inhibitor (HDACi) AR-42, ameliorates the deficiency of adult neurogenesis in the granule cell layer of the dentate gyrus and rescues hippocampal memory defects in a mouse model of Kabuki syndrome (Kmt2d(+/betaGeo)). Unlike a drug, a dietary intervention could be quickly transitioned to the clinic. Therefore, we have explored whether treatment with a ketogenic diet could lead to a similar rescue through increased amounts of beta-hydroxybutyrate, an endogenous HDACi. Here, we report that a ketogenic diet in Kmt2d(+/betaGeo) mice modulates H3ac and H3K4me3 in the granule cell layer, with concomitant rescue of both the neurogenesis defect and hippocampal memory abnormalities seen in Kmt2d(+/betaGeo) mice; similar effects on neurogenesis were observed on exogenous administration of beta-hydroxybutyrate. These data suggest that dietary modulation of epigenetic modifications through elevation of beta-hydroxybutyrate may provide a feasible strategy to treat the intellectual disability seen in Kabuki syndrome and related disorders.


We previously reported a schizophrenia-associated polymorphic CT di-nucleotide repeat (DNR) at the 5'-untranslated repeat (UTR) of DPYSL2, which responds to mammalian target of Rapamycin (mTOR) signaling with allelic differences in reporter assays. Now using microarray analysis, we show that the DNR alleles interact differentially with specific proteins, including the mTOR-related protein HuD/ELAVL4. We confirm the differential binding to HuD and other known mTOR effectors by electrophoretic mobility shift assays. We edit HEK293 cells by CRISPR/Cas9 to carry the schizophrenia risk variant (13DNR) and observe a significant reduction of the corresponding CRMP2 isoform. These edited cells confirm the response to mTOR inhibitors and show a twofold shortening of the cellular projections. Transcriptome analysis of these modified cells by RNA-seq shows changes in 12.7% of expressed transcripts at a false discovery rate of 0.05. These transcripts are enriched in immunity-related genes, overlap significantly with those modified by the schizophrenia-associated gene, ZNF804A, and have a reverse expression signature from that seen with antipsychotic drugs. Our results support the functional importance of the DPYSL2 DNR and a role for mTOR signaling in schizophrenia.


Neutrophils, eosinophils and 'classical' monocytes collectively account for about 70% of human blood leukocytes and are among the shortest-lived cells in the body. Precise regulation of the lifespan of these myeloid cells is critical to maintain protective immune responses and minimize the deleterious consequences of prolonged inflammation. However, how the lifespan of these cells is strictly controlled remains largely unknown. Here we identify a long non-coding RNA that we termed Morrbid, which tightly controls the survival of neutrophils, eosinophils and classical monocytes in response to pro-survival cytokines in mice. To control the lifespan of these cells, Morrbid regulates the transcription of the neighbouring pro-apoptotic gene, Bcl2l11 (also known as Bim), by promoting the enrichment of the PRC2 complex at the Bcl2l11 promoter to maintain this gene in a poised state. Notably, Morrbid regulates this process in cis, enabling allele-specific control of Bcl2l11 transcription. Thus, in these highly inflammatory cells, changes in Morrbid levels provide a locus-specific regulatory mechanism that allows rapid control of apoptosis in response to extracellular pro-survival signals. As MORRBID is present in humans and dysregulated in individuals with hypereosinophilic syndrome, this long non-coding RNA may represent a potential therapeutic target for inflammatory disorders characterized by aberrant short-lived myeloid cell lifespan.


The number of long noncoding RNAs (lncRNAs) has grown rapidly; however, our understanding of their function remains limited. Although cultured cells have facilitated investigations of lncRNA function at the molecular level, the use of animal models provides a rich context in which to investigate the phenotypic impact of these molecules. Promising initial studies using animal models demonstrated that lncRNAs influence a diverse number of phenotypes, ranging from subtle dysmorphia to viability. Here, we highlight the diversity of animal models and their unique advantages, discuss the use of animal models to profile lncRNA expression, evaluate experimental strategies to manipulate lncRNA function in vivo, and review the phenotypes attributable to lncRNAs. Despite a limited number of studies leveraging animal models, lncRNAs are already recognized as a notable class of molecules with important implications for health and disease.


The development of the central nervous system (CNS) is a complex orchestration of stem cells, transcription factors, growth/differentiation factors, and epigenetic control. Noncoding RNAs have been identified, classified, and studied for their functional roles in many systems including the CNS. In particular, the class of long noncoding RNAs (lncRNAs) has generated both enthusiasm and skepticism due to the unexpected discovery, the diversity of mechanisms, and the lower level of expression than found in protein-coding RNAs. Here we describe evidence supporting the role of lncRNAs in driving CNS-specific differentiation. It is clear that lncRNAs exhibit a functional diversity that makes their study and compartmentalization more challenging than other classes of noncoding RNAs. We predict, however, that lncRNAs will be essential for the characterization of discrete neuronal cell types in the age of single-cell transcriptomics and that these regulatory RNAs contribute to the multitude of functional mechanisms during CNS differentiation that will rival the diversities of protein-based mechanisms.


BACKGROUND: Analysis of the functional consequences and treatment response of rare CFTR variants is challenging due to the limited availability of primary airways cells. METHODS: A Flp recombination target (FRT) site for stable expression of CFTR was incorporated into an immortalized CF bronchial epithelial cell line (CFBE41o-). CFTR cDNA was integrated into the FRT site. Expression was evaluated by western blotting and confocal microscopy and function measured by short circuit current. RNA sequencing was used to compare the transcriptional profile of the resulting CF8Flp cell line to primary cells and tissues. RESULTS: Functional CFTR was expressed from integrated cDNA at the FRT site of the CF8Flp cell line at levels comparable to that seen in native airway cells. CF8Flp cells expressing WT-CFTR have a stable transcriptome comparable to that of primary cultured airway epithelial cells, including genes that play key roles in CFTR pathways. CONCLUSION: CF8Flp cells provide a viable substitute for primary CF airway cells for the analysis of CFTR variants in a native context.


The regulatory potential of RNA has never ceased to amaze: from RNA catalysis, to RNA-mediated splicing, to RNA-based silencing of an entire chromosome during dosage compensation. More recently, thousands of long noncoding RNA (lncRNA) transcripts have been identified, the majority with unknown function. Thus, it is tempting to think that these lncRNAs represent a cadre of new factors that function through ribonucleic mechanisms. Some evidence points to several lncRNAs with tantalizing physiological contributions and thought-provoking molecular modalities. However, dissecting the RNA biology of lncRNAs has been difficult, and distinguishing the independent contributions of functional RNAs from underlying DNA elements, or the local act of transcription, is challenging. Here, we aim to survey the existing literature and highlight future approaches that will be needed to link the RNA-based biology and mechanisms of lncRNAs in vitro and in vivo.


Long noncoding RNAs (lncRNAs) have been implicated in numerous cellular processes including brain development. However, the in vivo expression dynamics and molecular pathways regulated by these loci are not well understood. Here, we leveraged a cohort of 13 lncRNAnull mutant mouse models to investigate the spatiotemporal expression of lncRNAs in the developing and adult brain and the transcriptome alterations resulting from the loss of these lncRNA loci. We show that several lncRNAs are differentially expressed both in time and space, with some presenting highly restricted expression in only selected brain regions. We further demonstrate altered regulation of genes for a large variety of cellular pathways and processes upon deletion of the lncRNA loci. Finally, we found that 4 of the 13 lncRNAs significantly affect the expression of several neighboring proteincoding genes in a cis-like manner. By providing insight into the endogenous expression patterns and the transcriptional perturbations caused by deletion of the lncRNA locus in the developing and postnatal mammalian brain, these data provide a resource to facilitate future examination of the specific functional relevance of these genes in neural development, brain function, and disease.


Neuronal development requires a complex choreography of transcriptional decisions to obtain specific cellular identities. Realizing the ultimate goal of identifying genome-wide signatures that define and drive specific neuronal fates has been hampered by enormous complexity in both time and space during development. Here, we have paired high-throughput purification of pyramidal neuron subclasses with deep profiling of spatiotemporal transcriptional dynamics during corticogenesis to resolve lineage choice decisions. We identified numerous features ranging from spatial and temporal usage of alternative mRNA isoforms and promoters to a host of mRNA genes modulated during fate specification. Notably, we uncovered numerous long noncoding RNAs with restricted temporal and cell-type-specific expression. To facilitate future exploration, we provide an interactive online database to enable multidimensional data mining and dissemination. This multifaceted study generates a powerful resource and informs understanding of the transcriptional regulation underlying pyramidal neuron diversity in the neocortex. VIDEO ABSTRACT:


The neocortex contains an unparalleled diversity of neuronal subtypes, each defined by distinct traits that are developmentally acquired under the control of subtype-specific and pan-neuronal genes. The regulatory logic that orchestrates the expression of these unique combinations of genes is unknown for any class of cortical neuron. Here, we report that Fezf2 is a selector gene able to regulate the expression of gene sets that collectively define mouse corticospinal motor neurons (CSMN). We find that Fezf2 directly induces the glutamatergic identity of CSMN via activation of Vglut1 (Slc17a7) and inhibits a GABAergic fate by repressing transcription of Gad1. In addition, we identify the axon guidance receptor EphB1 as a target of Fezf2 necessary to execute the ipsilateral extension of the corticospinal tract. Our data indicate that co-regulated expression of neuron subtype-specific and pan-neuronal gene batteries by a single transcription factor is one component of the regulatory logic responsible for the establishment of CSMN identity.


MiR-9, a neuron-specific miRNA, is an important regulator of neurogenesis. In this study we identify how miR-9 is regulated during early differentiation from a neural stem-like cell. We utilized two immortalized rat precursor clones, one committed to neurogenesis (L2.2) and another capable of producing both neurons and non-neuronal cells (L2.3), to reproducibly study early neurogenesis. Exogenous miR-9 is capable of increasing neurogenesis from L2.3 cells. Only one of three genomic loci capable of encoding miR-9 was regulated during neurogenesis and the promoter region of this locus contains sufficient functional elements to drive expression of a luciferase reporter in a developmentally regulated pattern. Furthermore, among a large number of potential regulatory sites encoded in this sequence, Mef2 stood out because of its known pro-neuronal role. Of four Mef2 paralogs, we found only Mef2C mRNA was regulated during neurogenesis. Removal of predicted Mef2 binding sites or knockdown of Mef2C expression reduced miR-9-2 promoter activity. Finally, the mRNA encoding the Mef2C binding partner HDAC4 was shown to be targeted by miR-9. Since HDAC4 protein could be co-immunoprecipitated with Mef2C protein or with genomic Mef2 binding sequences, we conclude that miR-9 regulation is mediated, at least in part, by Mef2C binding but that expressed miR-9 has the capacity to reduce inhibitory HDAC4, stabilizing its own expression in a positive feedback mechanism.


RNA, including long noncoding RNA (lncRNA), is known to be an abundant and important structural component of the nuclear matrix. However, the molecular identities, functional roles and localization dynamics of lncRNAs that influence nuclear architecture remain poorly understood. Here, we describe one lncRNA, Firre, that interacts with the nuclear-matrix factor hnRNPU through a 156-bp repeating sequence and localizes across an ~5-Mb domain on the X chromosome. We further observed Firre localization across five distinct trans-chromosomal loci, which reside in spatial proximity to the Firre genomic locus on the X chromosome. Both genetic deletion of the Firre locus and knockdown of hnRNPU resulted in loss of colocalization of these trans-chromosomal interacting loci. Thus, our data suggest a model in which lncRNAs such as Firre can interface with and modulate nuclear architecture across chromosomes.


Although numerous approaches have been developed to map RNA-binding sites of individual RNA-binding proteins (RBPs), few methods exist that allow assessment of global RBP-RNA interactions. Here, we describe PIP-seq, a universal, high-throughput, ribonuclease-mediated protein footprint sequencing approach that reveals RNA-protein interaction sites throughout a transcriptome of interest. We apply PIP-seq to the HeLa transcriptome and compare binding sites found using different cross-linkers and ribonucleases. From this analysis, we identify numerous putative RBP-binding motifs, reveal novel insights into co-binding by RBPs, and uncover a significant enrichment for disease-associated polymorphisms within RBP interaction sites.


Many studies are uncovering functional roles for long noncoding RNAs (lncRNAs), yet few have been tested for in vivo relevance through genetic ablation in animal models. To investigate the functional relevance of lncRNAs in various physiological conditions, we have developed a collection of 18 lncRNA knockout strains in which the locus is maintained transcriptionally active. Initial characterization revealed peri- and postnatal lethal phenotypes in three mutant strains (Fendrr, Peril, and Mdgt), the latter two exhibiting incomplete penetrance and growth defects in survivors. We also report growth defects for two additional mutant strains (linc-Brn1b and linc-Pint). Further analysis revealed defects in lung, gastrointestinal tract, and heart in Fendrr(-/-) neonates, whereas linc-Brn1b(-/-) mutants displayed distinct abnormalities in the generation of upper layer II-IV neurons in the neocortex. This study demonstrates that lncRNAs play critical roles in vivo and provides a framework and impetus for future larger-scale functional investigation into the roles of lncRNA molecules. DOI:


An unresolved question in mammalian epigenetic regulation is how ubiquitously expressed chromatin-modifying complexes such as Polycomb group complex 2 (PRC2) find their specific target sites across an intricate choreography of localization events in time and space. Two recent studies now provide critical new insights into an intriguing genome-wide role for RNA in PRC2 regulation.


DNA methylation was first described almost a century ago; however, the rules governing its establishment and maintenance remain elusive. Here we present data demonstrating that active transcription regulates levels of genomic methylation. We identify a novel RNA arising from the CEBPA gene locus that is critical in regulating the local DNA methylation profile. This RNA binds to DNMT1 and prevents CEBPA gene locus methylation. Deep sequencing of transcripts associated with DNMT1 combined with genome-scale methylation and expression profiling extend the generality of this finding to numerous gene loci. Collectively, these results delineate the nature of DNMT1-RNA interactions and suggest strategies for gene-selective demethylation of therapeutic targets in human diseases.


Regulation of metabolic pathways in the immune system provides a mechanism to actively control cellular function, growth, proliferation, and survival. Here, we report that miR-181 is a nonredundant determinant of cellular metabolism and is essential for supporting the biosynthetic demands of early NKT cell development. As a result, miR-181-deficient mice showed a complete absence of mature NKT cells in the thymus and periphery. Mechanistically, miR-181 modulated expression of the phosphatase PTEN to control PI3K signaling, which was a primary stimulus for anabolic metabolism in immune cells. Thus miR-181-deficient mice also showed severe defects in lymphoid development and T cell homeostasis associated with impaired PI3K signaling. These results uncover miR-181 as essential for NKT cell development and establish this family of miRNAs as central regulators of PI3K signaling and global metabolic fitness during development and homeostasis.


The prevalence of obesity has led to a surge of interest in understanding the detailed mechanisms underlying adipocyte development. Many protein-coding genes, mRNAs, and microRNAs have been implicated in adipocyte development, but the global expression patterns and functional contributions of long noncoding RNA (lncRNA) during adipogenesis have not been explored. Here we profiled the transcriptome of primary brown and white adipocytes, preadipocytes, and cultured adipocytes and identified 175 lncRNAs that are specifically regulated during adipogenesis. Many lncRNAs are adipose-enriched, strongly induced during adipogenesis, and bound at their promoters by key transcription factors such as peroxisome proliferator-activated receptor gamma (PPARgamma) and CCAAT/enhancer-binding protein alpha (CEBPalpha). RNAi-mediated loss of function screens identified functional lncRNAs with varying impact on adipogenesis. Collectively, we have identified numerous lncRNAs that are functionally required for proper adipogenesis.


Differential analysis of gene and transcript expression using high-throughput RNA sequencing (RNA-seq) is complicated by several sources of measurement variability and poses numerous statistical challenges. We present Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries. Cuffdiff 2 robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes. We demonstrate the accuracy of our approach through differential analysis of lung fibroblasts in response to loss of the developmental transcription factor HOXA1, which we show is required for lung fibroblast and HeLa cell cycle progression. Loss of HOXA1 results in significant expression level changes in thousands of individual transcripts, along with isoform switching events in key regulators of the cell cycle. Cuffdiff 2 performs robust differential analysis in RNA-seq experiments at transcript resolution, revealing a layer of regulation not readily observable with other high-throughput technologies.


Noncoding RNAs have emerged as important key players in the cell. Understanding their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze noncoding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources.


Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and approximately 1 h of hands-on time.


In recent years, microRNAs or miRNAs have been proposed to target neuronal mRNAs localized near the synapse, exerting a pivotal role in modulating local protein synthesis, and presumably affecting adaptive mechanisms such as synaptic plasticity. In the present study we have characterized the distribution of miRNAs in five regions of the adult mammalian brain and compared the relative abundance between total fractions and purified synaptoneurosomes (SN), using three different methodologies. The results show selective enrichment or depletion of some miRNAs when comparing total versus SN fractions. These miRNAs were different for each brain region explored. Changes in distribution could not be attributed to simple diffusion or to a targeting sequence inside the miRNAs. In silico analysis suggest that the differences in distribution may be related to the preferential concentration of synaptically localized mRNA targeted by the miRNAs. These results favor a model of co-transport of the miRNA-mRNA complex to the synapse, although further studies are required to validate this hypothesis. Using an in vivo model for increasing excitatory activity in the cortex and the hippocampus indicates that the distribution of some miRNAs can be modulated by enhanced neuronal (epileptogenic) activity. All these results demonstrate the dynamic modulation in the local distribution of miRNAs from the adult brain, which may play key roles in controlling localized protein synthesis at the synapse.


Large intergenic noncoding RNAs (lincRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lincRNAs remains a challenge. Recent advances in RNA sequencing (RNA-seq) and computational methods allow for an unprecedented analysis of such transcripts. Here, we present an integrative approach to define a reference catalog of >8000 human lincRNAs. Our catalog unifies previously existing annotation sources with transcripts we assembled from RNA-seq data collected from approximately 4 billion RNA-seq reads across 24 tissues and cell types. We characterize each lincRNA by a panorama of >30 properties, including sequence, structural, transcriptional, and orthology features. We found that lincRNA expression is strikingly tissue-specific compared with coding genes, and that lincRNAs are typically coexpressed with their neighboring genes, albeit to an extent similar to that of pairs of neighboring protein-coding genes. We distinguish an additional subset of transcripts that have high evolutionary conservation but may include short ORFs and may serve as either lincRNAs or small peptides. Our integrated, comprehensive, yet conservative reference catalog of human lincRNAs reveals the global properties of lincRNAs and will facilitate experimental studies and further functional classification of these genes.


MicroRNAs (miRNAs) are endogenous single-stranded RNA molecules of about 21 nucleotides in length that are fundamental post-transcriptional regulators of gene expression. Although the transcriptional and processing events involved in the generation of miRNAs have been extensively studied, very little is known pertaining to components that regulate the stability of individual miRNAs. All RNAs have distinct inherent half-lives that dictate their level of accumulation and miRNAs would be expected to follow a similar principle. Here we demonstrate that although most miRNA appear to be stable, like mRNAs, miRNAs possess differential stability in human cells. In particular, we found that miR-382, a miRNA that contributes to HIV-1 provirus latency, is unstable in cells. To determine the region of miR-382 responsible for its rapid decay, we developed a cell-free system that recapitulated the observed cell-based-regulated miR-382 turnover. The system utilizes in vitro-processed mature miRNA derived from pre-miRNA and follows the decay of the processed miRNA. Using this system, we demonstrate that instability of miR-382 is driven by sequences outside its seed region and required the 3' terminal seven nucleotides where mutations in this region increased the stability of the RNA. Moreover, the exosome 3'-5' exoribonuclease complex was identified as the primary nuclease involved in miR-382 decay with a more modest contribution by the Xrn1 and no detectable contribution by Xrn2. These studies provide evidence for an miRNA element essential for rapid miRNA decay and implicate the exosome in this process. The development of a biochemically amendable system to analyze the mechanism of differential miRNA stability provides an important step in efforts to regulate gene expression by modulating miRNA stability.


BACKGROUND: MicroRNAs are required for maintenance of pluripotency as well as differentiation, but since more microRNAs have been computationally predicted in genome than have been found, there are likely to be undiscovered microRNAs expressed early in stem cell differentiation. METHODOLOGY/PRINCIPAL FINDINGS: SOLiD ultra-deep sequencing identified >10(7) unique small RNAs from human embryonic stem cells (hESC) and neural-restricted precursors that were fit to a model of microRNA biogenesis to computationally predict 818 new microRNA genes. These predicted genomic loci are associated with chromatin patterns of modified histones that are predictive of regulated gene expression. 146 of the predicted microRNAs were enriched in Ago2-containing complexes along with 609 known microRNAs, demonstrating association with a functional RISC complex. This Ago2 IP-selected subset was consistently expressed in four independent hESC lines and exhibited complex patterns of regulation over development similar to previously-known microRNAs, including pluripotency-specific expression in both hESC and iPS cells. More than 30% of the Ago2 IP-enriched predicted microRNAs are new members of existing families since they share seed sequences with known microRNAs. CONCLUSIONS/SIGNIFICANCE: Extending the classic definition of microRNAs, this large number of new microRNA genes, the majority of which are less conserved than their canonical counterparts, likely represent evolutionarily recent regulators of early differentiation. The enrichment in Ago2 containing complexes, the presence of chromatin marks indicative of regulated gene expression, and differential expression over development all support the identification of 146 new microRNAs active during early hESC differentiation.


Cell-based therapy has been widely evaluated in spinal cord injury (SCI) animal models and shown to improve functional recovery. However, host response to cell transplants at gene expression level is rarely discussed. We reported previously that acute transplantation of radial glial cells RG3.6 following SCI promoted early locomotion improvement within 1 week post-injury. To identify rapid molecular changes induced by RG3.6 transplantation in the host tissue, distal spinal cord segments were subjected to microarray analysis. Although RG3.6 transplantation, reduced activity of macrophages as early as 1-2 weeks post-injury, the expression levels of inflammatory genes (e.g., IL-6, MIP-2, MCP-1) were not decreased by RG3.6 treatment as compared to medium or other cell controls at 6-12 h post-injury. However, genes associated with tissue protection (Hsp70 and Hsp32) and neural cell development (Foxg1, Top2a, Sox11, Nkx2.2, Vimentin) were found to be significantly up-regulated by RG3.6 transplants. Foxg1 was the most highly induced gene in the RG3.6-treated spinal cords, and its expression by immunocytochemistry was confirmed in the host tissue. Moreover, RG3.6 treatment boosted the number of Nkx2.2 cells in the spinal cord, and these cells frequently co-expressed NG2, which marks progenitor cells. Taken together, these results demonstrate that radial glial transplants induced rapid and specific gene expression in the injured host tissue, and suggest that these early responses are associated with mechanisms of tissue protection and activation of endogenous neural progenitor cells.


We have generated clones (L2.3 and RG3.6) of neural progenitors with radial glial properties from rat E14.5 cortex that differentiate into astrocytes, neurons, and oligodendrocytes. Here, we describe a different clone (L2.2) that gives rise exclusively to neurons, but not to glia. Neuronal differentiation of L2.2 cells was inhibited by bone morphogenic protein 2 (BMP2) and enhanced by Sonic Hedgehog (SHH) similar to cortical interneuron progenitors. Compared with L2.3, differentiating L2.2 cells expressed significantly higher levels of mRNAs for glutamate decarboxylases (GADs), DLX transcription factors, calretinin, calbindin, neuropeptide Y (NPY), and somatostatin. Increased levels of DLX-2, GADs, and calretinin proteins were confirmed upon differentiation. L2.2 cells differentiated into neurons that fired action potentials in vitro, and their electrophysiological differentiation was accelerated and more complete when cocultured with developing astroglial cells but not with conditioned medium from these cells. The combined results suggest that clone L2.2 resembles GABAergic interneuron progenitors in the developing forebrain.


OBJECTIVE: Human multipotent mesenchymal stromal cells (MSC) have the potential to differentiate into multiple cell types, although little is known about factors that control their fate. Differentiation-specific microRNAs may play a key role in stem cell self-renewal and differentiation. We propose that specific intracellular signaling pathways modulate gene expression during differentiation by regulating microRNA expression. MATERIALS AND METHODS: Illumina mRNA and NCode microRNA expression analyses were performed on MSC and their differentiated progeny. A combination of bioinformatic prediction and pathway inhibition was used to identify microRNAs associated with platelet-derived growth factor (PDGF) signaling. RESULTS: The pattern of microRNA expression in MSC is distinct from that in pluripotent stem cells, such as human embryonic stem cells. Specific populations of microRNAs are regulated in MSC during differentiation targeted toward specific cell types. Complementary mRNA expression analysis increases the pool of markers characteristic of MSC or differentiated progeny. To identify microRNA expression patterns affected by signaling pathways, we examined the PDGF pathway found to be regulated during osteogenesis by microarray studies. A set of microRNAs bioinformatically predicted to respond to PDGF signaling was experimentally confirmed by direct PDGF inhibition. CONCLUSION: Our results demonstrate that a subset of microRNAs regulated during osteogenic differentiation of MSCs is responsive to perturbation of the PDGF pathway. This approach not only identifies characteristic classes of differentiation-specific mRNAs and microRNAs, but begins to link regulated molecules with specific cellular pathways.


Many of the currently established human embryonic stem (hES) cell lines have been characterized extensively in terms of their gene expression profiles and genetic stability in culture. Recent studies have indicated that microRNAs (miRNAs), a class of noncoding small RNAs that participate in the regulation of gene expression, may play a key role in stem cell self-renewal and differentiation. Using both microarrays and quantitative PCR, we report here the differences in miRNA expression between undifferentiated hES cells and their corresponding differentiated cells that underwent differentiation in vitro over a period of 2 weeks. Our results confirm the identity of a signature miRNA profile in pluripotent cells, comprising a small subset of differentially expressed miRNAs in hES cells. Examining both mRNA and miRNA profiles under multiple conditions using cross-correlation, we find clusters of miRNAs grouped with specific, biologically interpretable mRNAs. We identify patterns of expression in the progression from hES cells to differentiated cells that suggest a role for selected miRNAs in maintenance of the undifferentiated, pluripotent state. Profiling of the hES cell "miRNA-ome" provides an insight into molecules that control cellular differentiation and maintenance of the pluripotent state, findings that have broad implications in development, homeostasis, and human disease states.


Regulated mRnAs during differentiation of rat neural stem cells were analyzed using the ABi1700 microarray platform. This microarray, while technically advanced, suffers from the difficulty of integrating hybridization results into public databases for systems-level analysis. This is particularly true for the rat array, since many of the probes were designed for transcripts based on predicted human and mouse homologs. using several strategies, we increased the public annotation of the 27,531 probes from 43% to over 65%. To increase the dynamic range of annotation, probes were mapped to numerous public keys from several data sources. consensus annotation from multiple sources was determined for well-scoring alignments, and a confidence-based ranking system established for probes with less agreement across multiple data sources. previous attempts at genomic interpretation using the celera annotation model resulted in poor overlap with expected genomic sequences. since the public keys are more precisely mapped to the genome, we could now analyze the relationships between predicted transcription-factor binding sites and expression clusters. Results collected from a differentiation time course of two neural stem cell clones were clustered using a model-based algorithm. Transcription-factor binding sites were predicted from upstream regions of mapped transcripts using position weight matrices from either JAspAR or TRAnsFAc, and the resulting scores were used to discriminate between observed expression clusters. A classification and regression tree analysis was conducted using cluster numbers as gene identifiers and TFBs scores as predictors, pruning back to obtain a tree with the lowest gene class prediction error rate. Results identify several transcription factors, the presence or absence of which are sufficient to describe clusters of mRnAs changing over time-those that are static, as well as clusters describing cell line differences. public annotation of the AB1700 rat genome array will be valuable for integrating results into future systems-level analyses.


MicroRNAs (miRNAs) are post-transcriptional regulators participating in biological processes ranging from differentiation to carcinogenesis. We developed a rational probe design algorithm and a sensitive labelling scheme for optimizing miRNA microarrays. Our microarray contains probes for all validated miRNAs from five species, with the potential for drawing on species conservation to identify novel miRNAs with homologous probes. These methods are useful for high-throughput analysis of micro RNAs from various sources, and allow analysis with limiting quantities of RNA. The system design can also be extended for use on Luminex beads or on 96-well plates in an ELISA-style assay. We optimized hybridization temperatures using sequence variations on 20 of the probes and determined that all probes distinguish wild-type from 2 nt mutations, and most probes distinguish a 1 nt mutation, producing good selectivity between closely-related small RNA sequences. Results of tissue comparisons on our microarrays reveal patterns of hybridization that agree with results from Northern blots and other methods.


BACKGROUND: RNA amplification is required for incorporating laser-capture microdissection techniques into microarray assays. However, standard oligonucleotide microarrays contain sense-strand probes, so traditional T7 amplification schemes producing anti-sense RNA are not appropriate for hybridization when combined with conventional reverse transcription labeling methods. We wished to assess the accuracy of a new sense-strand RNA amplification method by comparing ratios between two samples using quantitative real-time PCR (qPCR), mimicking a two-color microarray assay. RESULTS: We performed our validation using qPCR. Three samples of rat brain RNA and three samples of rat liver RNA were amplified using several kits (Ambion messageAmp, NuGen Ovation, and several versions of Genisphere SenseAmp). Results were assessed by comparing the liver/brain ratio for 192 mRNAs before and after amplification. In general, all kits produced strong correlations with unamplified RNAs. The SenseAmp kit produced the highest correlation, and was also able to amplify a partially degraded sample accurately. CONCLUSION: We have validated an optimized sense-strand RNA amplification method for use in comparative studies such as two-color microarrays.