RESUMO
In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data.
Assuntos
Compressão de Dados/métodos , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Algoritmos , Biologia Computacional/métodos , Segurança Computacional/normas , Privacidade Genética , Genoma Humano , HumanosRESUMO
The low costs of array-synthesized oligonucleotide libraries are empowering rapid advances in quantitative and synthetic biology. However, high synthesis error rates, uneven representation, and lack of access to individual oligonucleotides limit the true potential of these libraries. We have developed a cost-effective method called Recombinase Directed Indexing (REDI), which involves integration of a complex library into yeast, site-specific recombination to index library DNA, and next-generation sequencing to identify desired clones. We used REDI to generate a library of ~3,300 DNA probes that exhibited > 96% purity and remarkable uniformity (> 95% of probes within twofold of the median abundance). Additionally, we created a collection of ~9,000 individually accessible CRISPR interference yeast strains for > 99% of genes required for either fermentative or respiratory growth, demonstrating the utility of REDI for rapid and cost-effective creation of strain collections from oligonucleotide pools. Our approach is adaptable to any complex DNA library, and fundamentally changes how these libraries can be parsed, maintained, propagated, and characterized.
Assuntos
Análise de Sequência de DNA/métodos , Leveduras/genética , Sistemas CRISPR-Cas , Biologia Computacional/métodos , DNA Fúngico/genética , Biblioteca GênicaRESUMO
Microorganisms produce a wide range of natural products (NPs) with clinically and agriculturally relevant biological activities. In bacteria and fungi, genes encoding successive steps in a biosynthetic pathway tend to be clustered on the chromosome as biosynthetic gene clusters (BGCs). Historically, "activity-guided" approaches to NP discovery have focused on bioactivity screening of NPs produced by culturable microbes. In contrast, recent "genome mining" approaches first identify candidate BGCs, express these biosynthetic genes using synthetic biology methods, and finally test for the production of NPs. Fungal genome mining efforts and the exploration of novel sequence and NP space are limited, however, by the lack of a comprehensive catalog of BGCs encoding experimentally-validated products. In this study, we generated a comprehensive reference set of fungal NPs whose biosynthetic gene clusters are described in the published literature. To generate this dataset, we first identified NCBI records that included both a peer-reviewed article and an associated nucleotide record. We filtered these records by text and homology criteria to identify putative NP-related articles and BGCs. Next, we manually curated the resulting articles, chemical structures, and protein sequences. The resulting catalog contains 197 unique NP compounds covering several major classes of fungal NPs, including polyketides, non-ribosomal peptides, terpenoids, and alkaloids. The distribution of articles published per compound shows a bias toward the study of certain popular compounds, such as the aflatoxins. Phylogenetic analysis of biosynthetic genes suggests that much chemical and enzymatic diversity remains to be discovered in fungi. Our catalog was incorporated into the recently launched Minimum Information about Biosynthetic Gene cluster (MIBiG) repository to create the largest known set of fungal BGCs and associated NPs, a resource that we anticipate will guide future genome mining and synthetic biology efforts toward discovering novel fungal enzymes and metabolites.
Assuntos
Produtos Biológicos , Vias Biossintéticas/genética , Genes Fúngicos , Genoma Fúngico , Família Multigênica , Alcaloides , Sequência de Aminoácidos , Biologia Computacional , Curadoria de Dados , Fungos/genética , Filogenia , Policetídeos , TerpenosRESUMO
Unraveling the molecular processes that lead from genotype to phenotype is crucial for the understanding and effective treatment of genetic diseases. Knowledge of the causative genetic defect most often does not enable treatment; therefore, causal intermediates between genotype and phenotype constitute valuable candidates for molecular intervention points that can be therapeutically targeted. Mapping genetic determinants of gene expression levels (also known as expression quantitative trait loci or eQTL studies) is frequently used for this purpose, yet distinguishing causation from correlation remains a significant challenge. Here, we address this challenge using extensive, multi-environment gene expression and fitness profiling of hundreds of genetically diverse yeast strains, in order to identify truly causal intermediate genes that condition fitness in a given environment. Using functional genomics assays, we show that the predictive power of eQTL studies for inferring causal intermediate genes is poor unless performed across multiple environments. Surprisingly, although the effects of genotype on fitness depended strongly on environment, causal intermediates could be most reliably predicted from genetic effects on expression present in all environments. Our results indicate a mechanism explaining this apparent paradox, whereby immediate molecular consequences of genetic variation are shared across environments, and environment-dependent phenotypic effects result from downstream integration of environmental signals. We developed a statistical model to predict causal intermediates that leverages this insight, yielding over 400 transcripts, for the majority of which we experimentally validated their role in conditioning fitness. Our findings have implications for the design and analysis of clinical omics studies aimed at discovering personalized targets for molecular intervention, suggesting that inferring causation in a single cellular context can benefit from molecular profiling in multiple contexts.
Assuntos
Expressão Gênica , Interação Gene-Ambiente , Redes e Vias Metabólicas/genética , Locos de Características Quantitativas/genética , Teorema de Bayes , Meio Ambiente , Genótipo , Humanos , Modelos Estatísticos , Fenótipo , Saccharomyces cerevisiae/genéticaRESUMO
Recent research has uncovered extensive variability in the boundaries of transcript isoforms, yet the functional consequences of this variation remain largely unexplored. Here, we systematically discriminate between the molecular phenotypes of overlapping coding and non-coding transcriptional events from each genic locus using a novel genome-wide, nucleotide-resolution technique to quantify the half-lives of 3' transcript isoforms in yeast. Our results reveal widespread differences in stability among isoforms for hundreds of genes in a single condition, and that variation of even a single nucleotide in the 3' untranslated region (UTR) can affect transcript stability. While previous instances of negative associations between 3' UTR length and transcript stability have been reported, here, we find that shorter isoforms are not necessarily more stable. We demonstrate the role of RNA-protein interactions in conditioning isoform-specific stability, showing that PUF3 binds and destabilizes specific polyadenylation isoforms. Our findings indicate that although the functional elements of a gene are encoded in DNA sequence, the selective incorporation of these elements into RNA through transcript boundary variation allows a single gene to have diverse functional consequences.
Assuntos
Processamento Pós-Transcricional do RNA/genética , Proteínas de Ligação a RNA/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Regiões 3' não Traduzidas/genética , Poliadenilação , Estabilidade de RNA/genética , Proteínas de Ligação a RNA/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
Only a few genes remain in the mitochondrial genome retained by every eukaryotic organism that carry out essential functions and are implicated in severe diseases. Experimentally relocating these few genes to the nucleus therefore has both therapeutic and evolutionary implications. Numerous unproductive attempts have been made to do so, with a total of only 5 successes across all organisms. We have taken a novel approach to relocating mitochondrial genes that utilizes naturally nuclear versions from other organisms. We demonstrate this approach on subunit 9/c of ATP synthase, successfully relocating this gene for the first time in any organism by expressing the ATP9 genes from Podospora anserina in Saccharomyces cerevisiae. This study substantiates the role of protein structure in mitochondrial gene transfer: expression of chimeric constructs reveals that the P. anserina proteins can be correctly imported into mitochondria due to reduced hydrophobicity of the first transmembrane segment. Nuclear expression of ATP9, while permitting almost fully functional oxidative phosphorylation, perturbs many cellular properties, including cellular morphology, and activates the heat shock response. Altogether, our study establishes a novel strategy for allotopic expression of mitochondrial genes, demonstrates the complex adaptations required to relocate ATP9, and indicates a reason that this gene was only transferred to the nucleus during the evolution of multicellular organisms.
Assuntos
Núcleo Celular/genética , Proteínas Fúngicas/genética , Mitocôndrias/genética , ATPases Mitocondriais Próton-Translocadoras/genética , Podospora/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Evolução Biológica , Núcleo Celular/enzimologia , Proteínas Fúngicas/metabolismo , Deleção de Genes , Genes Mitocondriais , Genoma Mitocondrial , Mitocôndrias/enzimologia , ATPases Mitocondriais Próton-Translocadoras/metabolismo , Fosforilação Oxidativa , Podospora/enzimologia , Subunidades Proteicas/genética , Subunidades Proteicas/metabolismo , Saccharomyces cerevisiae/enzimologia , Proteínas de Saccharomyces cerevisiae/metabolismo , TransgenesRESUMO
Due to the lack of relevant animal models, development of effective treatments for human mitochondrial diseases has been limited. Here we establish a rapid, yeast-based assay to screen for drugs active against human inherited mitochondrial diseases affecting ATP synthase, in particular NARP (neuropathy, ataxia, and retinitis pigmentosa) syndrome. This method is based on the conservation of mitochondrial function from yeast to human, on the unique ability of yeast to survive without production of ATP by oxidative phosphorylation, and on the amenability of the yeast mitochondrial genome to site-directed mutagenesis. Our method identifies chlorhexidine by screening a chemical library and oleate through a candidate approach. We show that these molecules rescue a number of phenotypes resulting from mutations affecting ATP synthase in yeast. These compounds are also active on human cybrid cells derived from NARP patients. These results validate our method as an effective high-throughput screening approach to identify drugs active in the treatment of human ATP synthase disorders and suggest that this type of method could be applied to other mitochondrial diseases.
Assuntos
Clorexidina/farmacologia , Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Miopatias Mitocondriais/tratamento farmacológico , ATPases Mitocondriais Próton-Translocadoras/genética , Ácido Oleico/farmacologia , Retinose Pigmentar/tratamento farmacológico , Linhagem Celular , Clorexidina/uso terapêutico , Perfilação da Expressão Gênica , Humanos , Mutagênese Sítio-Dirigida , Mutação/genética , Ácido Oleico/uso terapêutico , SaccharomycetalesRESUMO
Drug discovery for diseases such as Parkinson's disease are impeded by the lack of screenable cellular phenotypes. We present an unbiased phenotypic profiling platform that combines automated cell culture, high-content imaging, Cell Painting, and deep learning. We applied this platform to primary fibroblasts from 91 Parkinson's disease patients and matched healthy controls, creating the largest publicly available Cell Painting image dataset to date at 48 terabytes. We use fixed weights from a convolutional deep neural network trained on ImageNet to generate deep embeddings from each image and train machine learning models to detect morphological disease phenotypes. Our platform's robustness and sensitivity allow the detection of individual-specific variation with high fidelity across batches and plate layouts. Lastly, our models confidently separate LRRK2 and sporadic Parkinson's disease lines from healthy controls (receiver operating characteristic area under curve 0.79 (0.08 standard deviation)), supporting the capacity of this platform for complex disease modeling and drug screening applications.
Assuntos
Aprendizado Profundo , Doença de Parkinson , Fibroblastos , Humanos , Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
One in four myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) patients are estimated to be severely affected by the disease, and these house-bound or bedbound patients are currently understudied. Here, we report a comprehensive examination of the symptoms and clinical laboratory tests of a cohort of severely ill patients and healthy controls. The greatly reduced quality of life of the patients was negatively correlated with clinical depression. The most troublesome symptoms included fatigue (85%), pain (65%), cognitive impairment (50%), orthostatic intolerance (45%), sleep disturbance (35%), post-exertional malaise (30%), and neurosensory disturbance (30%). Sleep profiles and cognitive tests revealed distinctive impairments. Lower morning cortisol level and alterations in its diurnal rhythm were observed in the patients, and antibody and antigen measurements showed no evidence for acute infections by common viral or bacterial pathogens. These results highlight the urgent need of developing molecular diagnostic tests for ME/CFS. In addition, there was a striking similarity in symptoms between long COVID and ME/CFS, suggesting that studies on the mechanism and treatment of ME/CFS may help prevent and treat long COVID and vice versa.
RESUMO
Determining the genetic factors in a disease is crucial to elucidating its molecular basis. This task is challenging due to a lack of information on gene function. The integration of large-scale functional genomics data has proven to be an effective strategy to prioritize candidate disease genes. Mitochondrial disorders are a prevalent and heterogeneous class of diseases that are particularly amenable to this approach. Here we explain the application of integrative approaches to the identification of mitochondrial disease genes. We first examine various datasets that can be used to evaluate the involvement of each gene in mitochondrial function. The data integration methodology is then described, accompanied by examples of common implementations. Finally, we discuss how gene networks are constructed using integrative techniques and applied to candidate gene prioritization. Relevant public data resources are indicated. This report highlights the success and potential of data integration as well as its applicability to the search for mitochondrial disease genes.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genes Mitocondriais , Doenças Mitocondriais/genética , Alphaproteobacteria/genética , Animais , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Mitocôndrias/metabolismo , Mapeamento de Interação de Proteínas , Proteômica/métodosRESUMO
For decades, fungi have been a source of U.S. Food and Drug Administration-approved natural products such as penicillin, cyclosporine, and the statins. Recent breakthroughs in DNA sequencing suggest that millions of fungal species exist on Earth, with each genome encoding pathways capable of generating as many as dozens of natural products. However, the majority of encoded molecules are difficult or impossible to access because the organisms are uncultivable or the genes are transcriptionally silent. To overcome this bottleneck in natural product discovery, we developed the HEx (Heterologous EXpression) synthetic biology platform for rapid, scalable expression of fungal biosynthetic genes and their encoded metabolites in Saccharomyces cerevisiae. We applied this platform to 41 fungal biosynthetic gene clusters from diverse fungal species from around the world, 22 of which produced detectable compounds. These included novel compounds with unexpected biosynthetic origins, particularly from poorly studied species. This result establishes the HEx platform for rapid discovery of natural products from any fungal species, even those that are uncultivable, and opens the door to discovery of the next generation of natural products.
Assuntos
Produtos Biológicos/metabolismo , Fungos/genética , Fungos/metabolismo , Expressão Gênica , Engenharia Genética , Vias Biossintéticas , Fermentação , Engenharia Genética/métodos , Ensaios de Triagem em Larga Escala , Regiões Promotoras Genéticas , Fluxo de TrabalhoRESUMO
Our understanding of how genotype controls phenotype is limited by the scale at which we can precisely alter the genome and assess the phenotypic consequences of each perturbation. Here we describe a CRISPR-Cas9-based method for multiplexed accurate genome editing with short, trackable, integrated cellular barcodes (MAGESTIC) in Saccharomyces cerevisiae. MAGESTIC uses array-synthesized guide-donor oligos for plasmid-based high-throughput editing and features genomic barcode integration to prevent plasmid barcode loss and to enable robust phenotyping. We demonstrate that editing efficiency can be increased more than fivefold by recruiting donor DNA to the site of breaks using the LexA-Fkh1p fusion protein. We performed saturation editing of the essential gene SEC14 and identified amino acids critical for chemical inhibition of lipid signaling. We also constructed thousands of natural genetic variants, characterized guide mismatch tolerance at the genome scale, and ascertained that cryptic Pol III termination elements substantially reduce guide efficacy. MAGESTIC will be broadly useful to uncover the genetic basis of phenotypes in yeast.
Assuntos
Código de Barras de DNA Taxonômico/métodos , Edição de Genes/métodos , Saccharomyces cerevisiae/genética , Substituição de Aminoácidos , Biotecnologia , Sistemas CRISPR-Cas , DNA Fúngico/genética , Genoma Fúngico , Recombinação Homóloga , Proteínas de Transferência de Fosfolipídeos/genética , Plasmídeos/genética , RNA Fúngico/genética , Proteínas de Saccharomyces cerevisiae/genéticaRESUMO
Mitochondrial diseases are severe and largely untreatable. Owing to the many essential processes carried out by mitochondria and the complex cellular systems that support these processes, these diseases are diverse, pleiotropic, and challenging to study. Much of our current understanding of mitochondrial function and dysfunction comes from studies in the baker's yeast Saccharomyces cerevisiae. Because of its good fermenting capacity, S. cerevisiae can survive mutations that inactivate oxidative phosphorylation, has the ability to tolerate the complete loss of mitochondrial DNA (a property referred to as 'petite-positivity'), and is amenable to mitochondrial and nuclear genome manipulation. These attributes make it an excellent model system for studying and resolving the molecular basis of numerous mitochondrial diseases. Here, we review the invaluable insights this model organism has yielded about diseases caused by mitochondrial dysfunction, which ranges from primary defects in oxidative phosphorylation to metabolic disorders, as well as dysfunctions in maintaining the genome or in the dynamics of mitochondria. Owing to the high level of functional conservation between yeast and human mitochondrial genes, several yeast species have been instrumental in revealing the molecular mechanisms of pathogenic human mitochondrial gene mutations. Importantly, such insights have pointed to potential therapeutic targets, as have genetic and chemical screens using yeast.
Assuntos
Doenças Mitocondriais/metabolismo , Doenças Mitocondriais/terapia , Saccharomyces cerevisiae/metabolismo , Animais , DNA Fúngico/metabolismo , Humanos , Mitocôndrias/metabolismo , Modelos Biológicos , Pesquisa Translacional BiomédicaRESUMO
Dissecting the molecular basis of quantitative traits is a significant challenge and is essential for understanding complex diseases. Even in model organisms, precisely determining causative genes and their interactions has remained elusive, due in part to difficulty in narrowing intervals to single genes and in detecting epistasis or linked quantitative trait loci. These difficulties are exacerbated by limitations in experimental design, such as low numbers of analyzed individuals or of polymorphisms between parental genomes. We address these challenges by applying three independent high-throughput approaches for QTL mapping to map the genetic variants underlying 11 phenotypes in two genetically distant Saccharomyces cerevisiae strains, namely (1) individual analysis of >700 meiotic segregants, (2) bulk segregant analysis, and (3) reciprocal hemizygosity scanning, a new genome-wide method that we developed. We reveal differences in the performance of each approach and, by combining them, identify eight polymorphic genes that affect eight different phenotypes: colony shape, flocculation, growth on two nonfermentable carbon sources, and resistance to two drugs, salt, and high temperature. Our results demonstrate the power of individual segregant analysis to dissect QTL and address the underestimated contribution of interactions between variants. We also reveal confounding factors like mutations and aneuploidy in pooled approaches, providing valuable lessons for future designs of complex trait mapping studies.
Assuntos
Genômica/métodos , Locos de Características Quantitativas , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Aneuploidia , Mapeamento Cromossômico , Variação Genética , Genoma Fúngico , Mutação , FenótipoRESUMO
Mitochondrial diseases are systemic, prevalent and often fatal; yet treatments remain scarce. Identifying molecular intervention points that can be therapeutically targeted remains a major challenge, which we confronted via a screening assay we developed. Using yeast models of mitochondrial ATP synthase disorders, we screened a drug repurposing library, and applied genomic and biochemical techniques to identify pathways of interest. Here we demonstrate that modulating the sorting of nuclear-encoded proteins into mitochondria, mediated by the TIM23 complex, proves therapeutic in both yeast and patient-derived cells exhibiting ATP synthase deficiency. Targeting TIM23-dependent protein sorting improves an array of phenotypes associated with ATP synthase disorders, including biogenesis and activity of the oxidative phosphorylation machinery. Our study establishes mitochondrial protein sorting as an intervention point for ATP synthase disorders, and because of the central role of this pathway in mitochondrial biogenesis, it holds broad value for the treatment of mitochondrial diseases.
Assuntos
Proteínas de Membrana Transportadoras/metabolismo , Doenças Mitocondriais/metabolismo , Proteínas de Transporte da Membrana Mitocondrial/metabolismo , ATPases Mitocondriais Próton-Translocadoras/genética , Proteínas Nucleares/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Antifúngicos/farmacologia , Núcleo Celular/metabolismo , Bases de Dados de Produtos Farmacêuticos , Reposicionamento de Medicamentos , Regulação da Expressão Gênica , Humanos , Proteínas de Membrana Transportadoras/genética , Mitocôndrias/efeitos dos fármacos , Mitocôndrias/metabolismo , Mitocôndrias/patologia , Doenças Mitocondriais/tratamento farmacológico , Doenças Mitocondriais/genética , Doenças Mitocondriais/patologia , Proteínas de Transporte da Membrana Mitocondrial/genética , Proteínas do Complexo de Importação de Proteína Precursora Mitocondrial , ATPases Mitocondriais Próton-Translocadoras/deficiência , Terapia de Alvo Molecular , Mutação , Proteínas Nucleares/genética , Fosforilação Oxidativa/efeitos dos fármacos , Transporte Proteico/efeitos dos fármacos , Piridinas/farmacologia , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Transdução de Sinais , Tionas/farmacologiaRESUMO
HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies performed using HeLa cells require accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. Some of the extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology.