RESUMEN
Innovative analytical frameworks are required to capture the complex gene-environment interactions. We investigate the insufficiency of commonly used models for disease genome analysis and suggest considering genetic interactions in complex diseases. For non-genetic factors, we study the emerging wearable technologies that have enabled quantification of physiological and environmental factors at an unprecedented breadth and depth. We propose a Bayesian framework to hierarchically model personalized gene-environmental interaction to enable precision health and medicine.
Asunto(s)
Medicina de Precisión/métodos , Medicina de Precisión/tendencias , Dispositivos Electrónicos Vestibles/tendencias , Teorema de Bayes , Epistasis Genética/genética , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo/métodos , Humanos , Herencia Multifactorial/genéticaRESUMEN
Recent studies of the tumor genome seek to identify cancer pathways as groups of genes in which mutations are epistatic with one another or, specifically, "mutually exclusive." Here, we show that most mutations are mutually exclusive not due to pathway structure but to interactions with disease subtype and tumor mutation load. In particular, many cancer driver genes are mutated preferentially in tumors with few mutations overall, causing mutations in these cancer genes to appear mutually exclusive with numerous others. Researchers should view current epistasis maps with caution until we better understand the multiple cause-and-effect relationships among factors such as tumor subtype, positive selection for mutations, and gross tumor characteristics including mutational signatures and load.
Asunto(s)
Epistasis Genética/genética , Genes Relacionados con las Neoplasias/genética , Neoplasias/genética , Algoritmos , Biología Computacional/métodos , Epistasis Genética/fisiología , Genes Relacionados con las Neoplasias/fisiología , Humanos , Modelos Genéticos , Mutación/genética , Oncogenes/genéticaRESUMEN
Single-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed "dropout," which can severely obscure important gene-gene relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. Applied to the epithilial to mesenchymal transition, MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures, and infers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Línea Celular , Epistasis Genética/genética , Redes Reguladoras de Genes/genética , Humanos , Cadenas de Markov , MicroARNs/genética , ARN Mensajero/genética , Programas InformáticosRESUMEN
In eukaryotes, DNA compacts into chromatin through nucleosomes1,2. Replication of the eukaryotic genome must be coupled to the transmission of the epigenome encoded in the chromatin3,4. Here we report cryo-electron microscopy structures of yeast (Saccharomyces cerevisiae) replisomes associated with the FACT (facilitates chromatin transactions) complex (comprising Spt16 and Pob3) and an evicted histone hexamer. In these structures, FACT is positioned at the front end of the replisome by engaging with the parental DNA duplex to capture the histones through the middle domain and the acidic carboxyl-terminal domain of Spt16. The H2A-H2B dimer chaperoned by the carboxyl-terminal domain of Spt16 is stably tethered to the H3-H4 tetramer, while the vacant H2A-H2B site is occupied by the histone-binding domain of Mcm2. The Mcm2 histone-binding domain wraps around the DNA-binding surface of one H3-H4 dimer and extends across the tetramerization interface of the H3-H4 tetramer to the binding site of Spt16 middle domain before becoming disordered. This arrangement leaves the remaining DNA-binding surface of the other H3-H4 dimer exposed to additional interactions for further processing. The Mcm2 histone-binding domain and its downstream linker region are nested on top of Tof1, relocating the parental histones to the replisome front for transfer to the newly synthesized lagging-strand DNA. Our findings offer crucial structural insights into the mechanism of replication-coupled histone recycling for maintaining epigenetic inheritance.
Asunto(s)
Cromatina , Replicación del ADN , Epistasis Genética , Histonas , Saccharomyces cerevisiae , Sitios de Unión , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , Cromatina/ultraestructura , Microscopía por Crioelectrón , Replicación del ADN/genética , ADN de Hongos/biosíntesis , ADN de Hongos/química , ADN de Hongos/metabolismo , ADN de Hongos/ultraestructura , Epistasis Genética/genética , Histonas/química , Histonas/metabolismo , Histonas/ultraestructura , Complejos Multienzimáticos/química , Complejos Multienzimáticos/metabolismo , Complejos Multienzimáticos/ultraestructura , Nucleosomas/química , Nucleosomas/metabolismo , Nucleosomas/ultraestructura , Unión Proteica , Dominios Proteicos , Multimerización de Proteína , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/ultraestructura , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/ultraestructuraRESUMEN
Uncovering the genes, variants, and interactions underlying crop diversity is a frontier in plant genetics. Phenotypic variation often does not reflect the cumulative effect of individual gene mutations. This deviation is due to epistasis, in which interactions between alleles are often unpredictable and quantitative in effect. Recent advances in genomics and genome-editing technologies are elevating the study of epistasis in crops. Using the traits and developmental pathways that were major targets in domestication and breeding, we highlight how epistasis is central in guiding the behavior of the genetic variation that shapes quantitative trait variation. We outline new strategies that illuminate how quantitative epistasis from modified gene dosage defines background dependencies. Advancing our understanding of epistasis in crops can reveal new principles and approaches to engineering targeted improvements in agriculture.
Asunto(s)
Productos Agrícolas/genética , Epistasis Genética/genética , Variación Genética/genética , Sitios de Carácter Cuantitativo/genética , Animales , Domesticación , Edición Génica/métodos , Genoma de Planta/genética , Genómica/métodos , Humanos , Fitomejoramiento/métodosRESUMEN
Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous computational framework for calculating and modeling epistatic interactions at the level of individual genotypic values (known as genetical, biological or physiological epistasis), and can therefore be used to address fundamental questions related to sequence-to-function encodings. However, one of its main limitations is that it can only accommodate two alleles (amino acid or nucleotide states) per sequence position. In this paper we provide an extension of the Walsh-Hadamard transform that allows the calculation and modeling of background-averaged epistasis (also known as ensemble epistasis) in genetic landscapes with an arbitrary number of states per position (20 for amino acids, 4 for nucleotides, etc.). We also provide a recursive formula for the inverse matrix and then derive formulae to directly extract any element of either matrix without having to rely on the computationally intensive task of constructing or inverting large matrices. Finally, we demonstrate the utility of our theory by using it to model epistasis within both simulated and empirical multiallelic fitness landscapes, revealing that both pairwise and higher-order genetic interactions are enriched between physically interacting positions.
Asunto(s)
Epistasis Genética , Modelos Genéticos , Epistasis Genética/genética , Biología Computacional/métodos , Algoritmos , Mutación/genética , GenotipoRESUMEN
Epistasis among driver mutations is pervasive and explains relevant features of cancer, such as differential therapy response and convergence towards well-characterized molecular subtypes. Furthermore, a growing body of evidence suggests that tumor development could be hampered by the accumulation of slightly deleterious passenger mutations. In this work, we combined empirical epistasis networks, computer simulations, and mathematical models to explore how synergistic interactions among driver mutations affect cancer progression under the burden of slightly deleterious passengers. We found that epistasis plays a crucial role in tumor development by promoting the transformation of precancerous clones into rapidly growing tumors through a process that is analogous to evolutionary rescue. The triggering of epistasis-driven rescue is strongly dependent on the intensity of epistasis and could be a key rate-limiting step in many tumors, contributing to their unpredictability. As a result, central genes in cancer epistasis networks appear as key intervention targets for cancer therapy.
Asunto(s)
Simulación por Computador , Epistasis Genética , Modelos Genéticos , Mutación , Neoplasias , Epistasis Genética/genética , Humanos , Neoplasias/genética , Biología Computacional/métodos , Redes Reguladoras de Genes/genéticaRESUMEN
Cellular development is orchestrated by evolutionarily conserved signaling pathways, which are often pleiotropic and involve intra- and interpathway epistatic interactions that form intricate, complex regulatory networks. Cryptococcus species are a group of closely related human fungal pathogens that grow as yeasts yet transition to hyphae during sexual reproduction. Additionally, during infection they can form large, polyploid titan cells that evade immunity and develop drug resistance. Multiple known signaling pathways regulate cellular development, yet how these are coordinated and interact with genetic variation is less well understood. Here, we conducted quantitative trait locus (QTL) analyses of a mapping population generated by sexual reproduction of two parents, only one of which is unisexually fertile. We observed transgressive segregation of the unisexual phenotype among progeny, as well as a large-cell phenotype under mating-inducing conditions. These large-cell progeny were found to produce titan cells both in vitro and in infected animals. Two major QTLs and corresponding quantitative trait genes (QTGs) were identified: RIC8 (encoding a guanine-exchange factor) and CNC06490 (encoding a putative Rho-GTPase activator), both involved in G protein signaling. The two QTGs interact epistatically with each other and with the mating-type locus in phenotypic determination. These findings provide insights into the complex genetics of morphogenesis during unisexual reproduction and pathogenic titan cell formation and illustrate how QTL analysis can be applied to identify epistasis between genes. This study shows that phenotypic outcomes are influenced by the genetic background upon which mutations arise, implicating dynamic, complex genotype-to-phenotype landscapes in fungal pathogens and beyond.
Asunto(s)
Criptococosis/genética , Cryptococcus/genética , Epistasis Genética/genética , Evolución Biológica , Cryptococcus/metabolismo , Cryptococcus/patogenicidad , Proteínas Fúngicas/genética , Genes del Tipo Sexual de los Hongos/genética , Hifa/crecimiento & desarrollo , Morfogénesis , Fenotipo , Sitios de Carácter Cuantitativo/genética , Reproducción/genética , Reproducción AsexuadaRESUMEN
BACKGROUND: Researchers have long studied the regulatory processes of genes to uncover their functions. Gene regulatory network analysis is one of the popular approaches for understanding these processes, requiring accurate identification of interactions among the genes to establish the gene regulatory network. Advances in genome-wide association studies and expression quantitative trait loci studies have led to a wealth of genomic data, facilitating more accurate inference of gene-gene interactions. However, unknown confounding factors may influence these interactions, making their interpretation complicated. Mendelian randomization (MR) has emerged as a valuable tool for causal inference in genetics, addressing confounding effects by estimating causal relationships using instrumental variables. In this paper, we propose a new statistical method, MR-GGI, for accurately inferring gene-gene interactions using Mendelian randomization. RESULTS: MR-GGI applies one gene as the exposure and another as the outcome, using causal cis-single-nucleotide polymorphisms as instrumental variables in the inverse-variance weighted MR model. Through simulations, we have demonstrated MR-GGI's ability to control type 1 error and maintain statistical power despite confounding effects. MR-GGI performed the best when compared to other methods using the F1 score on the DREAM5 dataset. Additionally, when applied to yeast genomic data, MR-GGI successfully identified six clusters. Through gene ontology analysis, we have confirmed that each cluster in our study performs distinct functional roles by gathering genes with specific functions. CONCLUSION: These findings demonstrate that MR-GGI accurately inferences gene-gene interactions despite the confounding effects in real biological environments.
Asunto(s)
Análisis de la Aleatorización Mendeliana , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Redes Reguladoras de Genes/genética , Epistasis Genética/genética , Sitios de Carácter Cuantitativo , Humanos , Saccharomyces cerevisiae/genéticaRESUMEN
The fungus Parastagonospora nodorum uses proteinaceous necrotrophic effectors (NEs) to induce tissue necrosis on wheat leaves during infection, leading to the symptoms of septoria nodorum blotch (SNB). The NEs Tox1 and Tox3 induce necrosis on wheat possessing the dominant susceptibility genes Snn1 and Snn3B1/Snn3D1, respectively. We previously observed that Tox1 is epistatic to the expression of Tox3 and a quantitative trait locus (QTL) on chromosome 2A that contributes to SNB resistance/susceptibility. The expression of Tox1 is significantly higher in the Australian strain SN15 compared to the American strain SN4. Inspection of the Tox1 promoter region revealed a 401 bp promoter genetic element in SN4 positioned 267 bp upstream of the start codon that is absent in SN15, called PE401. Analysis of the world-wide P. nodorum population revealed that a high proportion of Northern Hemisphere isolates possess PE401 whereas the opposite was observed in representative P. nodorum isolates from Australia and South Africa. The presence of PE401 removed the epistatic effect of Tox1 on the contribution of the SNB 2A QTL but not Tox3. PE401 was introduced into the Tox1 promoter regulatory region in SN15 to test for direct regulatory roles. Tox1 expression was markedly reduced in the presence of PE401. This suggests a repressor molecule(s) binds PE401 and inhibits Tox1 transcription. Infection assays also demonstrated that P. nodorum which lacks PE401 is more pathogenic on Snn1 wheat varieties than P. nodorum carrying PE401. An infection competition assay between P. nodorum isogenic strains with and without PE401 indicated that the higher Tox1-expressing strain rescued the reduced virulence of the lower Tox1-expressing strain on Snn1 wheat. Our study demonstrated that Tox1 exhibits both 'selfish' and 'altruistic' characteristics. This offers an insight into a complex NE-NE interaction that is occurring within the P. nodorum population. The importance of PE401 in breeding for SNB resistance in wheat is discussed.
Asunto(s)
Ascomicetos/genética , Ascomicetos/patogenicidad , Micosis/genética , Enfermedades de las Plantas/genética , Triticum/microbiología , Resistencia a la Enfermedad/genética , Susceptibilidad a Enfermedades , Epistasis Genética/genética , Interacciones Huésped-Patógeno/genética , Regiones Promotoras Genéticas , Sitios de Carácter Cuantitativo , Virulencia/genéticaRESUMEN
Identifying drivers of viral diversity is key to understanding the evolutionary as well as epidemiological dynamics of the COVID-19 pandemic. Using rich viral genomic data sets, we show that periods of steadily rising diversity have been punctuated by sudden, enormous increases followed by similarly abrupt collapses of diversity. We introduce a mechanistic model of saltational evolution with epistasis and demonstrate that these features parsimoniously account for the observed temporal dynamics of inter-genomic diversity. Our results provide support for recent proposals that saltational evolution may be a signature feature of SARS-CoV-2, allowing the pathogen to more readily evolve highly transmissible variants. These findings lend theoretical support to a heightened awareness of biological contexts where increased diversification may occur. They also underline the power of pathogen genomics and other surveillance streams in clarifying the phylodynamics of emerging and endemic infections. In public health terms, our results further underline the importance of equitable distribution of up-to-date vaccines.
Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiología , Pandemias , Epistasis Genética/genética , GenómicaRESUMEN
Epistasis and cooperativity of folding both result from networks of energetic interactions in proteins. Epistasis results from energetic interactions among mutants, whereas cooperativity results from energetic interactions during folding that reduce the presence of intermediate states. The two concepts seem intuitively related, but it is unknown how they are related, particularly in terms of selection. To investigate their relationship, we simulated protein evolution under selection for cooperativity and separately under selection for epistasis. Strong selection for cooperativity created strong epistasis between contacts in the native structure but weakened epistasis between nonnative contacts. In contrast, selection for epistasis increased epistasis in both native and nonnative contacts and reduced cooperativity. Because epistasis can be used to predict protein structure only if it preferentially occurs in native contacts, this result indicates that selection for cooperativity may be key for predicting structure using epistasis. To evaluate this inference, we simulated the evolution of guanine nucleotide-binding protein (GB1) with and without cooperativity. With cooperativity, strong epistatic interactions clearly map out the native GB1 structure, while allowing the presence of intermediate states (low cooperativity) obscured the structure. This indicates that using epistasis measurements to reconstruct protein structure may be inappropriate for proteins with stable intermediates.
Asunto(s)
Epistasis Genética/genética , Predicción/métodos , Pliegue de Proteína , Epistasis Genética/fisiología , Evolución Molecular , Cinética , Modelos Moleculares , Conformación Proteica , Proteínas/química , TermodinámicaRESUMEN
Cryptococcal disease is estimated to affect nearly a quarter of a million people annually. Environmental isolates of Cryptococcus deneoformans, which make up 15 to 30% of clinical infections in temperate climates such as Europe, vary in their pathogenicity, ranging from benign to hyper-virulent. Key traits that contribute to virulence, such as the production of the pigment melanin, an extracellular polysaccharide capsule, and the ability to grow at human body temperature have been identified, yet little is known about the genetic basis of variation in such traits. Here we investigate the genetic basis of melanization, capsule size, thermal tolerance, oxidative stress resistance, and antifungal drug sensitivity using quantitative trait locus (QTL) mapping in progeny derived from a cross between two divergent C. deneoformans strains. Using a "function-valued" QTL analysis framework that exploits both time-series information and growth differences across multiple environments, we identified QTL for each of these virulence traits and drug susceptibility. For three QTL we identified the underlying genes and nucleotide differences that govern variation in virulence traits. One of these genes, RIC8, which encodes a regulator of cAMP-PKA signaling, contributes to variation in four virulence traits: melanization, capsule size, thermal tolerance, and resistance to oxidative stress. Two major effect QTL for amphotericin B resistance map to the genes SSK1 and SSK2, which encode key components of the HOG pathway, a fungal-specific signal transduction network that orchestrates cellular responses to osmotic and other stresses. We also discovered complex epistatic interactions within and between genes in the HOG and cAMP-PKA pathways that regulate antifungal drug resistance and resistance to oxidative stress. Our findings advance the understanding of virulence traits among diverse lineages of Cryptococcus, and highlight the role of genetic variation in key stress-responsive signaling pathways as a major contributor to phenotypic variation.
Asunto(s)
Criptococosis/genética , Cryptococcus neoformans/genética , Epistasis Genética/genética , Pleiotropía Genética/genética , Mapeo Cromosómico , Criptococosis/microbiología , Cryptococcus neoformans/patogenicidad , Farmacorresistencia Fúngica/genética , Genotipo , Humanos , Sitios de Carácter Cuantitativo/genética , Transducción de Señal/genética , Virulencia/genéticaRESUMEN
The RarA protein, homologous to human WRNIP1 and yeast MgsA, is a AAA+ ATPase and one of the most highly conserved DNA repair proteins. With an apparent role in the repair of stalled or collapsed replication forks, the molecular function of this protein family remains obscure. Here, we demonstrate that RarA acts in late stages of recombinational DNA repair of post-replication gaps. A deletion of most of the rarA gene, when paired with a deletion of ruvB or ruvC, produces a growth defect, a strong synergistic increase in sensitivity to DNA damaging agents, cell elongation, and an increase in SOS induction. Except for SOS induction, these effects are all suppressed by inactivating recF, recO, or recJ, indicating that RarA, along with RuvB, acts downstream of RecA. SOS induction increases dramatically in a rarA ruvB recF/O triple mutant, suggesting the generation of large amounts of unrepaired ssDNA. The rarA ruvB defects are not suppressed (and in fact slightly increased) by recB inactivation, suggesting RarA acts primarily downstream of RecA in post-replication gaps rather than in double strand break repair. Inactivating rarA, ruvB and recG together is synthetically lethal, an outcome again suppressed by inactivation of recF, recO, or recJ. A rarA ruvB recQ triple deletion mutant is also inviable. Together, the results suggest the existence of multiple pathways, perhaps overlapping, for the resolution or reversal of recombination intermediates created by RecA protein in post-replication gaps within the broader RecF pathway. One of these paths involves RarA.
Asunto(s)
Adenosina Trifosfatasas/genética , Proteínas Bacterianas/genética , Proteínas de Unión al ADN/genética , Epistasis Genética/genética , Proteínas de Escherichia coli/genética , RecQ Helicasas/genética , Daño del ADN/genética , Reparación del ADN/genética , Replicación del ADN/genética , ADN de Cadena Simple , Escherichia coli/genética , Exodesoxirribonucleasas , Recombinación Homóloga/genética , Recombinación Genética/genética , Mutaciones Letales Sintéticas/genéticaRESUMEN
We previously identified a deletion on chromosome 16p12.1 that is mostly inherited and associated with multiple neurodevelopmental outcomes, where severely affected probands carried an excess of rare pathogenic variants compared to mildly affected carrier parents. We hypothesized that the 16p12.1 deletion sensitizes the genome for disease, while "second-hits" in the genetic background modulate the phenotypic trajectory. To test this model, we examined how neurodevelopmental defects conferred by knockdown of individual 16p12.1 homologs are modulated by simultaneous knockdown of homologs of "second-hit" genes in Drosophila melanogaster and Xenopus laevis. We observed that knockdown of 16p12.1 homologs affect multiple phenotypic domains, leading to delayed developmental timing, seizure susceptibility, brain alterations, abnormal dendrite and axonal morphology, and cellular proliferation defects. Compared to genes within the 16p11.2 deletion, which has higher de novo occurrence, 16p12.1 homologs were less likely to interact with each other in Drosophila models or a human brain-specific interaction network, suggesting that interactions with "second-hit" genes may confer higher impact towards neurodevelopmental phenotypes. Assessment of 212 pairwise interactions in Drosophila between 16p12.1 homologs and 76 homologs of patient-specific "second-hit" genes (such as ARID1B and CACNA1A), genes within neurodevelopmental pathways (such as PTEN and UBE3A), and transcriptomic targets (such as DSCAM and TRRAP) identified genetic interactions in 63% of the tested pairs. In 11 out of 15 families, patient-specific "second-hits" enhanced or suppressed the phenotypic effects of one or many 16p12.1 homologs in 32/96 pairwise combinations tested. In fact, homologs of SETD5 synergistically interacted with homologs of MOSMO in both Drosophila and X. laevis, leading to modified cellular and brain phenotypes, as well as axon outgrowth defects that were not observed with knockdown of either individual homolog. Our results suggest that several 16p12.1 genes sensitize the genome towards neurodevelopmental defects, and complex interactions with "second-hit" genes determine the ultimate phenotypic manifestation.
Asunto(s)
Encéfalo/metabolismo , Deleción Cromosómica , Cromosomas Humanos Par 16/genética , Trastornos del Neurodesarrollo/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Animales , Encéfalo/patología , Canales de Calcio/genética , Moléculas de Adhesión Celular/genética , Proteínas de Unión al ADN/genética , Modelos Animales de Enfermedad , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Epistasis Genética/genética , Regulación del Desarrollo de la Expresión Génica , Humanos , Metiltransferasas/genética , Trastornos del Neurodesarrollo/patología , Proteínas Nucleares/genética , Fosfohidrolasa PTEN/genética , Factores de Transcripción/genética , Ubiquitina-Proteína Ligasas/genética , Proteínas de Xenopus/genética , Xenopus laevis/genéticaRESUMEN
Instinctive behaviors are genetically programmed behaviors that occur independent of experience. How genetic programs that give rise to the manifestation of such behaviors evolve remains an unresolved question. I propose that evolution of species-specific innate behaviors is accomplished through progressive modifications of pre-existing genetic networks composed of allelic variants. I hypothesize that changes in frequencies of one or more constituent allelic variants within the network leads to changes in gene network connectivity and the emergence of a reorganized network that can support the emergence of a novel behavioral phenotype and becomes stabilized when key allelic variants are driven to fixation.
Asunto(s)
Evolución Biológica , Epistasis Genética/genética , Evolución Molecular , Instinto , Alelos , Animales , Redes Reguladoras de Genes/genética , Variación Genética/genética , Mutación/genética , FenotipoRESUMEN
Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.
Asunto(s)
Epistasis Genética/genética , RNA-Seq , Análisis de la Célula Individual/métodos , Programas Informáticos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Humanos , Secuenciación del ExomaRESUMEN
The composition of the cell nucleus is highly heterogeneous, with different constituents forming complex interactomes. However, the global patterns of these interwoven heterogeneous interactomes remain poorly understood. Here we focus on two different interactomes, chromatin interaction network and gene regulatory network, as a proof of principle to identify heterogeneous interactome modules (HIMs), each of which represents a cluster of gene loci that is in spatial contact more frequently than expected and that is regulated by the same group of transcription factors. HIM integrates transcription factor binding and 3D genome structure to reflect "transcriptional niche" in the nucleus. We develop a new algorithm, MOCHI, to facilitate the discovery of HIMs based on network motif clustering in heterogeneous interactomes. By applying MOCHI to five different cell types, we found that HIMs have strong spatial preference within the nucleus and show distinct functional properties. Through integrative analysis, this work shows the utility of MOCHI to identify HIMs, which may provide new perspectives on the interplay between transcriptional regulation and 3D genome organization.
Asunto(s)
Cromatina/genética , Epistasis Genética/genética , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Algoritmos , Análisis por Conglomerados , Genoma Humano/genética , Humanos , Unión Proteica/genética , Factores de Transcripción/genéticaRESUMEN
The mapping from genotype to phenotype to fitness typically involves multiple nonlinearities that can transform the effects of mutations. For example, mutations may contribute additively to a phenotype, but their effects on fitness may combine non-additively because selection favors a low or intermediate value of that phenotype. This can cause incongruence between the topographical properties of a fitness landscape and its underlying genotype-phenotype landscape. Yet, genotype-phenotype landscapes are often used as a proxy for fitness landscapes to study the dynamics and predictability of evolution. Here, we use theoretical models and empirical data on transcription factor-DNA interactions to systematically study the incongruence of genotype-phenotype and fitness landscapes when selection favors a low or intermediate phenotypic value. Using the theoretical models, we prove a number of fundamental results. For example, selection for low or intermediate phenotypic values does not change simple sign epistasis into reciprocal sign epistasis, implying that genotype-phenotype landscapes with only simple sign epistasis motifs will always give rise to single-peaked fitness landscapes under such selection. More broadly, we show that such selection tends to create fitness landscapes that are more rugged than the underlying genotype-phenotype landscape, but this increased ruggedness typically does not frustrate adaptive evolution because the local adaptive peaks in the fitness landscape tend to be nearly as tall as the global peak. Many of these results carry forward to the empirical genotype-phenotype landscapes, which may help to explain why low- and intermediate-affinity transcription factor-DNA interactions are so prevalent in eukaryotic gene regulation.
Asunto(s)
Epistasis Genética , Modelos Genéticos , Epistasis Genética/genética , Aptitud Genética/genética , Genotipo , Mutación/genética , Fenotipo , Factores de TranscripciónRESUMEN
Chromosomes are likely to have assembled from unlinked genes in early evolution. Genetic linkage reduces the assortment load and intragenomic conflict in reproducing protocell models to the extent that chromosomes can go to fixation even if chromosomes suffer from a replicative disadvantage, relative to unlinked genes, proportional to their length. Here we numerically show that chromosomes spread within protocells even if recurrent deleterious mutations affecting replicating genes (as ribozymes) are considered. Dosage effect selects for optimal genomic composition within protocells that carries over to the genic composition of emerging chromosomes. Lacking an accurate segregation mechanism, protocells continue to benefit from the stochastic corrector principle (group selection of early replicators), but now at the chromosome level. A remarkable feature of this process is the appearance of multigene families (in optimal genic proportions) on chromosomes. An added benefit of chromosome formation is an increase in the selectively maintainable genome size (number of different genes), primarily due to the marked reduction of the assortment load. The establishment of chromosomes is under strong positive selection in protocells harboring unlinked genes. The error threshold of replication is raised to higher genome size by linkage due to the fact that deleterious mutations affecting protocells metabolism (hence fitness) show antagonistic (diminishing return) epistasis. This result strengthens the established benefit conferred by chromosomes on protocells allowing for the fixation of highly specific and efficient enzymes.