Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 74
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 182(1): 145-161.e23, 2020 07 09.
Article in English | MEDLINE | ID: mdl-32553272

ABSTRACT

Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.


Subject(s)
Crops, Agricultural/genetics , Gene Expression Regulation, Plant , Genomic Structural Variation , Solanum lycopersicum/genetics , Alleles , Cytochrome P-450 Enzyme System/genetics , Ecotype , Epistasis, Genetic , Fruit/genetics , Gene Duplication , Genome, Plant , Genotype , Inbreeding , Molecular Sequence Annotation , Phenotype , Plant Breeding , Quantitative Trait Loci/genetics
2.
Cell ; 179(3): 772-786.e19, 2019 10 17.
Article in English | MEDLINE | ID: mdl-31626774

ABSTRACT

Understanding neural circuits requires deciphering interactions among myriad cell types defined by spatial organization, connectivity, gene expression, and other properties. Resolving these cell types requires both single-neuron resolution and high throughput, a challenging combination with conventional methods. Here, we introduce barcoded anatomy resolved by sequencing (BARseq), a multiplexed method based on RNA barcoding for mapping projections of thousands of spatially resolved neurons in a single brain and relating those projections to other properties such as gene or Cre expression. Mapping the projections to 11 areas of 3,579 neurons in mouse auditory cortex using BARseq confirmed the laminar organization of the three top classes (intratelencephalic [IT], pyramidal tract-like [PT-like], and corticothalamic [CT]) of projection neurons. In depth analysis uncovered a projection type restricted almost exclusively to transcriptionally defined subtypes of IT neurons. By bridging anatomical and transcriptomic approaches at cellular resolution with high throughput, BARseq can potentially uncover the organizing principles underlying the structure and formation of neural circuits.


Subject(s)
Auditory Cortex/metabolism , Nerve Net/metabolism , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Animals , Brain Mapping , Humans , Integrases/genetics , Mice , Neurites/metabolism , Pyramidal Cells/metabolism , Pyramidal Tracts/metabolism
3.
Cell ; 171(3): 522-539.e20, 2017 Oct 19.
Article in English | MEDLINE | ID: mdl-28942923

ABSTRACT

Understanding the organizational logic of neural circuits requires deciphering the biological basis of neuronal diversity and identity, but there is no consensus on how neuron types should be defined. We analyzed single-cell transcriptomes of a set of anatomically and physiologically characterized cortical GABAergic neurons and conducted a computational genomic screen for transcriptional profiles that distinguish them from one another. We discovered that cardinal GABAergic neuron types are delineated by a transcriptional architecture that encodes their synaptic communication patterns. This architecture comprises 6 categories of ∼40 gene families, including cell-adhesion molecules, transmitter-modulator receptors, ion channels, signaling proteins, neuropeptides and vesicular release components, and transcription factors. Combinatorial expression of select members across families shapes a multi-layered molecular scaffold along the cell membrane that may customize synaptic connectivity patterns and input-output signaling properties. This molecular genetic framework of neuronal identity integrates cell phenotypes along multiple axes and provides a foundation for discovering and classifying neuron types.


Subject(s)
GABAergic Neurons/cytology , Gene Expression Profiling , Single-Cell Analysis , Animals , Cell Adhesion Molecules, Neuronal/metabolism , Extracellular Matrix/metabolism , GABAergic Neurons/metabolism , Mice , Receptors, GABA/metabolism , Receptors, Ionotropic Glutamate/metabolism , Signal Transduction , Synapses , Transcription, Genetic , Zinc/metabolism , gamma-Aminobutyric Acid/metabolism
4.
Nature ; 2024 Apr 24.
Article in English | MEDLINE | ID: mdl-38658747

ABSTRACT

The cerebral cortex is composed of neuronal types with diverse gene expression that are organized into specialized cortical areas. These areas, each with characteristic cytoarchitecture1,2, connectivity3,4 and neuronal activity5,6, are wired into modular networks3,4,7. However, it remains unclear whether these spatial organizations are reflected in neuronal transcriptomic signatures and how such signatures are established in development. Here we used BARseq, a high-throughput in situ sequencing technique, to interrogate the expression of 104 cell-type marker genes in 10.3 million cells, including 4,194,658 cortical neurons over nine mouse forebrain hemispheres, at cellular resolution. De novo clustering of gene expression in single neurons revealed transcriptomic types consistent with previous single-cell RNA sequencing studies8,9. The composition of transcriptomic types is highly predictive of cortical area identity. Moreover, areas with similar compositions of transcriptomic types, which we defined as cortical modules, overlap with areas that are highly connected, suggesting that the same modular organization is reflected in both transcriptomic signatures and connectivity. To explore how the transcriptomic profiles of cortical neurons depend on development, we assessed cell-type distributions after neonatal binocular enucleation. Notably, binocular enucleation caused the shifting of the cell-type compositional profiles of visual areas towards neighbouring cortical areas within the same module, suggesting that peripheral inputs sharpen the distinct transcriptomic identities of areas within cortical modules. Enabled by the high throughput, low cost and reproducibility of BARseq, our study provides a proof of principle for the use of large-scale in situ sequencing to both reveal brain-wide molecular architecture and understand its development.

5.
Nature ; 617(7962): 785-791, 2023 May.
Article in English | MEDLINE | ID: mdl-37165193

ABSTRACT

Different plant species within the grasses were parallel targets of domestication, giving rise to crops with distinct evolutionary histories and traits1. Key traits that distinguish these species are mediated by specialized cell types2. Here we compare the transcriptomes of root cells in three grass species-Zea mays, Sorghum bicolor and Setaria viridis. We show that single-cell and single-nucleus RNA sequencing provide complementary readouts of cell identity in dicots and monocots, warranting a combined analysis. Cell types were mapped across species to identify robust, orthologous marker genes. The comparative cellular analysis shows that the transcriptomes of some cell types diverged more rapidly than those of others-driven, in part, by recruitment of gene modules from other cell types. The data also show that a recent whole-genome duplication provides a rich source of new, highly localized gene expression domains that favour fast-evolving cell types. Together, the cell-by-cell comparative analysis shows how fine-scale cellular profiling can extract conserved modules from a pan transcriptome and provide insight on the evolution of cells that mediate key functions in crops.


Subject(s)
Crops, Agricultural , Setaria Plant , Sorghum , Transcriptome , Zea mays , Base Sequence , Gene Expression Regulation, Plant/genetics , Sorghum/cytology , Sorghum/genetics , Transcriptome/genetics , Zea mays/cytology , Zea mays/genetics , Setaria Plant/cytology , Setaria Plant/genetics , Plant Roots/cytology , Single-Cell Gene Expression Analysis , Sequence Analysis, RNA , Crops, Agricultural/cytology , Crops, Agricultural/genetics , Evolution, Molecular
6.
Nature ; 598(7879): 159-166, 2021 10.
Article in English | MEDLINE | ID: mdl-34616071

ABSTRACT

An essential step toward understanding brain function is to establish a structural framework with cellular resolution on which multi-scale datasets spanning molecules, cells, circuits and systems can be integrated and interpreted1. Here, as part of the collaborative Brain Initiative Cell Census Network (BICCN), we derive a comprehensive cell type-based anatomical description of one exemplar brain structure, the mouse primary motor cortex, upper limb area (MOp-ul). Using genetic and viral labelling, barcoded anatomy resolved by sequencing, single-neuron reconstruction, whole-brain imaging and cloud-based neuroinformatics tools, we delineated the MOp-ul in 3D and refined its sublaminar organization. We defined around two dozen projection neuron types in the MOp-ul and derived an input-output wiring diagram, which will facilitate future analyses of motor control circuitry across molecular, cellular and system levels. This work provides a roadmap towards a comprehensive cellular-resolution description of mammalian brain architecture.


Subject(s)
Motor Cortex/anatomy & histology , Motor Cortex/cytology , Neurons/classification , Animals , Atlases as Topic , Female , GABAergic Neurons/cytology , GABAergic Neurons/metabolism , Glutamates/metabolism , Male , Mice , Mice, Inbred C57BL , Neuroimaging , Neurons/cytology , Neurons/metabolism , Organ Specificity , Sequence Analysis, RNA , Single-Cell Analysis
7.
PLoS Biol ; 21(6): e3002133, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37390046

ABSTRACT

Characterizing cellular diversity at different levels of biological organization and across data modalities is a prerequisite to understanding the function of cell types in the brain. Classification of neurons is also essential to manipulate cell types in controlled ways and to understand their variation and vulnerability in brain disorders. The BRAIN Initiative Cell Census Network (BICCN) is an integrated network of data-generating centers, data archives, and data standards developers, with the goal of systematic multimodal brain cell type profiling and characterization. Emphasis of the BICCN is on the whole mouse brain with demonstration of prototype feasibility for human and nonhuman primate (NHP) brains. Here, we provide a guide to the cellular and spatial approaches employed by the BICCN, and to accessing and using these data and extensive resources, including the BRAIN Cell Data Center (BCDC), which serves to manage and integrate data across the ecosystem. We illustrate the power of the BICCN data ecosystem through vignettes highlighting several BICCN analysis and visualization tools. Finally, we present emerging standards that have been developed or adopted toward Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience. The combined BICCN ecosystem provides a comprehensive resource for the exploration and analysis of cell types in the brain.


Subject(s)
Brain , Neurosciences , Animals , Humans , Mice , Ecosystem , Neurons
8.
Genome Res ; 32(4): 738-749, 2022 04.
Article in English | MEDLINE | ID: mdl-35256454

ABSTRACT

The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.


Subject(s)
Genome, Human , Genomics , Consensus , Genomics/methods , Humans , RNA-Seq , Exome Sequencing
9.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36549922

ABSTRACT

MOTIVATION: Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS: In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.


Subject(s)
Chromatin , Epigenesis, Genetic , Animals , Mice , Chromatin/genetics , Regulatory Sequences, Nucleic Acid , Neural Networks, Computer
10.
BMC Bioinformatics ; 25(1): 198, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38789920

ABSTRACT

BACKGROUND: Single-cell transcriptome sequencing (scRNA-Seq) has allowed new types of investigations at unprecedented levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into distinct types. Many approaches build on existing clustering literature to develop tools specific to single-cell. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist, in general, there is little assurance that any given set of parameters will represent an optimal choice in the trade-off between cluster resolution and replicability. For instance, another set of parameters may result in more clusters that are also more replicable. RESULTS: Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html . CONCLUSIONS: Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.


Subject(s)
RNA-Seq , Single-Cell Analysis , Software , Single-Cell Analysis/methods , RNA-Seq/methods , Cluster Analysis , Algorithms , Sequence Analysis, RNA/methods , Humans , Transcriptome/genetics , Reproducibility of Results , Gene Expression Profiling/methods , Single-Cell Gene Expression Analysis
11.
PLoS Biol ; 19(7): e3001341, 2021 07.
Article in English | MEDLINE | ID: mdl-34280183

ABSTRACT

High-throughput, spatially resolved gene expression techniques are poised to be transformative across biology by overcoming a central limitation in single-cell biology: the lack of information on relationships that organize the cells into the functional groupings characteristic of tissues in complex multicellular organisms. Spatial expression is particularly interesting in the mammalian brain, which has a highly defined structure, strong spatial constraint in its organization, and detailed multimodal phenotypes for cells and ensembles of cells that can be linked to mesoscale properties such as projection patterns, and from there, to circuits generating behavior. However, as with any type of expression data, cross-dataset benchmarking of spatial data is a crucial first step. Here, we assess the replicability, with reference to canonical brain subdivisions, between the Allen Institute's in situ hybridization data from the adult mouse brain (Allen Brain Atlas (ABA)) and a similar dataset collected using spatial transcriptomics (ST). With the advent of tractable spatial techniques, for the first time, we are able to benchmark the Allen Institute's whole-brain, whole-transcriptome spatial expression dataset with a second independent dataset that similarly spans the whole brain and transcriptome. We use regularized linear regression (LASSO), linear regression, and correlation-based feature selection in a supervised learning framework to classify expression samples relative to their assayed location. We show that Allen Reference Atlas labels are classifiable using transcription in both data sets, but that performance is higher in the ABA than in ST. Furthermore, models trained in one dataset and tested in the opposite dataset do not reproduce classification performance bidirectionally. While an identifying expression profile can be found for a given brain area, it does not generalize to the opposite dataset. In general, we found that canonical brain area labels are classifiable in gene expression space within dataset and that our observed performance is not merely reflecting physical distance in the brain. However, we also show that cross-platform classification is not robust. Emerging spatial datasets from the mouse brain will allow further characterization of cross-dataset replicability ultimately providing a valuable reference set for understanding the cell biology of the brain.


Subject(s)
Brain/metabolism , Gene Expression Profiling , Animals , Atlases as Topic , Brain/anatomy & histology , Datasets as Topic , Mice , Reproducibility of Results
12.
Nucleic Acids Res ; 50(8): 4302-4314, 2022 05 06.
Article in English | MEDLINE | ID: mdl-35451481

ABSTRACT

What makes a mouse a mouse, and not a hamster? Differences in gene regulation between the two organisms play a critical role. Comparative analysis of gene coexpression networks provides a general framework for investigating the evolution of gene regulation across species. Here, we compare coexpression networks from 37 species and quantify the conservation of gene activity 1) as a function of evolutionary time, 2) across orthology prediction algorithms, and 3) with reference to cell- and tissue-specificity. We find that ancient genes are expressed in multiple cell types and have well conserved coexpression patterns, however they are expressed at different levels across cell types. Thus, differential regulation of ancient gene programs contributes to transcriptional cell identity. We propose that this differential regulation may play a role in cell diversification in both the animal and plant kingdoms.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Mice , Animals , Gene Regulatory Networks/genetics , Organ Specificity/genetics , Gene Expression Regulation/genetics , Gene Expression Profiling
13.
Genome Res ; 30(7): 1047-1059, 2020 07.
Article in English | MEDLINE | ID: mdl-32759341

ABSTRACT

We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.


Subject(s)
Transcription, Genetic , Cell Line , Endothelial Cells/metabolism , Epithelial Cells/metabolism , Female , Gene Expression Profiling , Gynecomastia/genetics , Gynecomastia/metabolism , Humans , Male , Mesoderm/cytology , Mesoderm/metabolism , Neoplasms/genetics , Organ Specificity , Sequence Analysis, RNA
14.
Genome Res ; 30(1): 49-61, 2020 01.
Article in English | MEDLINE | ID: mdl-31727682

ABSTRACT

We show the use of 5'-Acrydite oligonucleotides to copolymerize single-cell DNA or RNA into balls of acrylamide gel (BAGs). Combining this step with split-and-pool techniques for creating barcodes yields a method with advantages in cost and scalability, depth of coverage, ease of operation, minimal cross-contamination, and efficient use of samples. We perform DNA copy number profiling on mixtures of cell lines, nuclei from frozen prostate tumors, and biopsy washes. As applied to RNA, the method has high capture efficiency of transcripts and sufficient consistency to clearly distinguish the expression patterns of cell lines and individual nuclei from neurons dissected from the mouse brain. By using varietal tags (UMIs) to achieve sequence error correction, we show extremely low levels of cross-contamination by tracking source-specific SNVs. The method is readily modifiable, and we will discuss its adaptability and diverse applications.


Subject(s)
Acrylamide , Nucleic Acids , Single-Cell Analysis/methods , Acrylamide/chemistry , DNA , DNA Contamination , DNA Copy Number Variations , Gene Dosage , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Gene Library , Humans , Neoplasms/genetics , Neoplasms/metabolism , Neoplasms/pathology , Nucleic Acids/chemistry , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/standards , Polymerization , RNA , Single-Cell Analysis/standards
15.
Bioinformatics ; 38(24): 5390-5397, 2022 12 13.
Article in English | MEDLINE | ID: mdl-36271855

ABSTRACT

MOTIVATION: Interactions between proteins help us understand how genes are functionally related and how they contribute to phenotypes. Experiments provide imperfect 'ground truth' information about a small subset of potential interactions in a specific biological context, which can then be extended to the whole genome across different contexts, such as conditions, tissues or species, through machine learning methods. However, evaluating the performance of these methods remains a critical challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves. RESULTS: We identify Functional Equivalence Classes (FECs), subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves built from gene-centric prediction tasks, such as function or interaction predictions. FECs are widespread across data types and methods, they can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10-50 genes), and tissue-specific secondary markers (100-500 genes). In addition, FECs suggest the existence of functional modules that span a wide range of the genome, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in the definition of functional gene sets. AVAILABILITY AND IMPLEMENTATION: Code for analyses and figures is available at https://github.com/yexilein/pyroc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Machine Learning , Gene Ontology , Phenotype , Proteins
16.
Mol Cell Proteomics ; 19(11): 1876-1895, 2020 11.
Article in English | MEDLINE | ID: mdl-32817346

ABSTRACT

Co-fractionation MS (CF-MS) is a technique with potential to characterize endogenous and unmanipulated protein complexes on an unprecedented scale. However this potential has been offset by a lack of guidelines for best-practice CF-MS data collection and analysis. To obtain such guidelines, this study thoroughly evaluates novel and published Saccharomyces cerevisiae CF-MS data sets using very high proteome coverage libraries of yeast gold standard complexes. A new method for identifying gold standard complexes in CF-MS data, Reference Complex Profiling, and the Extending 'Guilt-by-Association' by Degree (EGAD) R package are used for these evaluations, which are verified with concurrent analyses of published human data. By evaluating data collection designs, which involve fractionation of cell lysates, it is found that near-maximum recall of complexes can be achieved with fewer samples than published studies. Distributing sample collection across orthogonal fractionation methods, rather than a single high resolution data set, leads to particularly efficient recall. By evaluating 17 different similarity scoring metrics, which are central to CF-MS data analysis, it is found that two metrics rarely used in past CF-MS studies - Spearman and Kendall correlations - and the recently introduced Co-apex metric frequently maximize recall, whereas a popular metric-Euclidean distance-delivers poor recall. The common practice of integrating external genomic data into CF-MS data analysis is also evaluated, revealing that this practice may improve the precision and recall of known complexes but is generally unsuitable for predicting novel complexes in model organisms. If studying nonmodel organisms using orthologous genomic data, it is found that particular subsets of fractionation profiles (e.g. the lowest abundance quartile) should be excluded to minimize false discovery. These assessments are summarized in a series of universally applicable guidelines for precise, sensitive and efficient CF-MS studies of known complexes, and effective predictions of novel complexes for orthogonal experimental validation.


Subject(s)
Chemical Fractionation/methods , Mass Spectrometry/methods , Proteome/metabolism , Proteomics/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Chromatography, Gel , Chromatography, Liquid/methods , Gene Ontology , Humans , Reference Standards
17.
Nucleic Acids Res ; 48(W1): W566-W571, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32392296

ABSTRACT

Co-expression analysis has provided insight into gene function in organisms from Arabidopsis to zebrafish. Comparison across species has the potential to enrich these results, for example by prioritizing among candidate human disease genes based on their network properties or by finding alternative model systems where their co-expression is conserved. Here, we present CoCoCoNet as a tool for identifying conserved gene modules and comparing co-expression networks. CoCoCoNet is a resource for both data and methods, providing gold standard networks and sophisticated tools for on-the-fly comparative analyses across 14 species. We show how CoCoCoNet can be used in two use cases. In the first, we demonstrate deep conservation of a nucleolus gene module across very divergent organisms, and in the second, we show how the heterogeneity of autism mechanisms in humans can be broken down by functional groups and translated to model organisms. CoCoCoNet is free to use and available to all at https://milton.cshl.edu/CoCoCoNet, with data and R scripts available at ftp://milton.cshl.edu/data.


Subject(s)
Gene Regulatory Networks , Software , Animals , Autism Spectrum Disorder/genetics , Gene Expression , Humans , RNA-Seq , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
18.
Proc Natl Acad Sci U S A ; 116(13): 6491-6500, 2019 03 26.
Article in English | MEDLINE | ID: mdl-30846554

ABSTRACT

Differential expression (DE) is commonly used to explore molecular mechanisms of biological conditions. While many studies report significant results between their groups of interest, the degree to which results are specific to the question at hand is not generally assessed, potentially leading to inaccurate interpretation. This could be particularly problematic for metaanalysis where replicability across datasets is taken as strong evidence for the existence of a specific, biologically relevant signal, but which instead may arise from recurrence of generic processes. To address this, we developed an approach to predict DE based on an analysis of over 600 studies. A predictor based on empirical prior probability of DE performs very well at this task (mean area under the receiver operating characteristic curve, ∼0.8), indicating that a large fraction of DE hit lists are nonspecific. In contrast, predictors based on attributes such as gene function, mutation rates, or network features perform poorly. Genes associated with sex, the extracellular matrix, the immune system, and stress responses are prominent within the "DE prior." In a series of control studies, we show that these patterns reflect shared biology rather than technical artifacts or ascertainment biases. Finally, we demonstrate the application of the DE prior to data interpretation in three use cases: (i) breast cancer subtyping, (ii) single-cell genomics of pancreatic islet cells, and (iii) metaanalysis of lung adenocarcinoma and renal transplant rejection transcriptomics. In all cases, we find hallmarks of generic DE, highlighting the need for nuanced interpretation of gene phenotypic associations.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation , Human Genetics , Probability , Adenocarcinoma/genetics , Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Electronic Data Processing , Female , Gene Regulatory Networks , Genes, Essential , Genomics , Graft Rejection , Humans , Kidney Transplantation , Lung Neoplasms , ROC Curve , Recurrence , Sensitivity and Specificity , Transcriptome
19.
Trends Genet ; 34(11): 823-831, 2018 11.
Article in English | MEDLINE | ID: mdl-30146183

ABSTRACT

As a fundamental unit of life, the cell has rightfully been the subject of intense investigation throughout the history of biology. Technical innovations now make it possible to assay cellular features at genomic scale, yielding breakthroughs in our understanding of the molecular organization of tissues, and even whole organisms. As these data accumulate we will soon be faced with a new challenge: making sense of the plethora of results. Early investigations into the replicability of cell type profiles inferred from single-cell RNA sequencing data have indicated that this is likely to be surprisingly straightforward due to consistent gene co-expression. In this opinion article we discuss the evidence for this claim and its implications for interpreting cell type-specific gene expression.


Subject(s)
Genome/genetics , Sequence Analysis, RNA/trends , Single-Cell Analysis/trends , Transcriptome/genetics , Animals , Computational Biology , Gene Expression Profiling/trends , Humans
20.
Mol Cell ; 50(5): 736-48, 2013 Jun 06.
Article in English | MEDLINE | ID: mdl-23665228

ABSTRACT

A large fraction of our genome consists of mobile genetic elements. Governing transposons in germ cells is critically important, and failure to do so compromises genome integrity, leading to sterility. In animals, the piRNA pathway is the key to transposon constraint, yet the precise molecular details of how piRNAs are formed and how the pathway represses mobile elements remain poorly understood. In an effort to identify general requirements for transposon control and components of the piRNA pathway, we carried out a genome-wide RNAi screen in Drosophila ovarian somatic sheet cells. We identified and validated 87 genes necessary for transposon silencing. Among these were several piRNA biogenesis factors. We also found CG3893 (asterix) to be essential for transposon silencing, most likely by contributing to the effector step of transcriptional repression. Asterix loss leads to decreases in H3K9me3 marks on certain transposons but has no effect on piRNA levels.


Subject(s)
DNA Transposable Elements , Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , RNA, Small Interfering/metabolism , Animals , Drosophila Proteins/genetics , Drosophila melanogaster/metabolism , Female , Gene Knockdown Techniques , Gene Silencing , Genome, Insect , Ovary/physiology , RNA Interference , RNA, Small Interfering/genetics , Reproducibility of Results , SUMO-1 Protein/genetics , SUMO-1 Protein/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL