RESUMEN
Understanding the basis for cellular growth, proliferation, and function requires determining the roles of essential genes in diverse cellular processes, including visualizing their contributions to cellular organization and morphology. Here, we combined pooled CRISPR-Cas9-based functional screening of 5,072 fitness-conferring genes in human HeLa cells with microscopy-based imaging of DNA, the DNA damage response, actin, and microtubules. Analysis of >31 million individual cells identified measurable phenotypes for >90% of gene knockouts, implicating gene targets in specific cellular processes. Clustering of phenotypic similarities based on hundreds of quantitative parameters further revealed co-functional genes across diverse cellular activities, providing predictions for gene functions and associations. By conducting pooled live-cell screening of â¼450,000 cell division events for 239 genes, we additionally identified diverse genes with functional contributions to chromosome segregation. Our work establishes a resource detailing the consequences of disrupting core cellular processes that represents the functional landscape of essential human genes.
Asunto(s)
Sistemas CRISPR-Cas , Genes Esenciales , Humanos , Células HeLa , Técnicas de Inactivación de Genes , FenotipoRESUMEN
Antibacterial agents target the products of essential genes but rarely achieve complete target inhibition. Thus, the all-or-none definition of essentiality afforded by traditional genetic approaches fails to discern the most attractive bacterial targets: those whose incomplete inhibition results in major fitness costs. In contrast, gene "vulnerability" is a continuous, quantifiable trait that relates the magnitude of gene inhibition to the effect on bacterial fitness. We developed a CRISPR interference-based functional genomics method to systematically titrate gene expression in Mycobacterium tuberculosis (Mtb) and monitor fitness outcomes. We identified highly vulnerable genes in various processes, including novel targets unexplored for drug discovery. Equally important, we identified invulnerable essential genes, potentially explaining failed drug discovery efforts. Comparison of vulnerability between the reference and a hypervirulent Mtb isolate revealed incomplete conservation of vulnerability and that differential vulnerability can predict differential antibacterial susceptibility. Our results quantitatively redefine essential bacterial processes and identify high-value targets for drug development.
Asunto(s)
Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , Mycobacterium tuberculosis/genética , Aminoacil-ARNt Sintetasas/metabolismo , Antituberculosos/farmacología , Teorema de Bayes , Evolución Biológica , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas/genética , Regulación Bacteriana de la Expresión Génica/efectos de los fármacos , Silenciador del Gen/efectos de los fármacos , Pruebas de Sensibilidad Microbiana , Mycobacterium tuberculosis/efectos de los fármacos , ARN Guía de Kinetoplastida/genéticaRESUMEN
Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.
Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento/genética , Genotipo , Bancos de Muestras Biológicas , Reino Unido , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Cluster analysis is one of the most widely used exploratory methods for visualization and grouping of gene expression patterns across multiple samples or treatment groups. Although several existing online tools can annotate clusters with functional terms, there is no all-in-one webserver to effectively prioritize genes/clusters using gene essentiality as well as congruency of mRNA-protein expression. Hence, we developed CAP-RNAseq that makes possible (1) upload and clustering of bulk RNA-seq data followed by identification, annotation and network visualization of all or selected clusters; and (2) prioritization using DepMap gene essentiality and/or dependency scores as well as the degree of correlation between mRNA and protein levels of genes within an expression cluster. In addition, CAP-RNAseq has an integrated primer design tool for the prioritized genes. Herein, we showed using comparisons with the existing tools and multiple case studies that CAP-RNAseq can uniquely aid in the discovery of co-expression clusters enriched with essential genes and prioritization of novel biomarker genes that exhibit high correlations between their mRNA and protein expression levels. CAP-RNAseq is applicable to RNA-seq data from different contexts including cancer and available at http://konulabapps.bilkent.edu.tr:3838/CAPRNAseq/ and the docker image is downloadable from https://hub.docker.com/r/konulab/caprnaseq.
Asunto(s)
Proteómica , Análisis de Secuencia de ARN/métodos , RNA-Seq , ARN Mensajero/genéticaRESUMEN
Switching genes on and off on cue is a cornerstone for understanding gene functions. One contemporary approach for loss-of-function studies of essential genes involves CRISPR-mediated knockout of the endogenous locus in conjunction with the expression of a rescue construct, which can subsequently be turned off to produce a gene inactivation effect in mammalian cell lines. A broadening of this approach would involve simultaneously switching on a second construct to interrogate the functions of a gene in the pathway. In this study, we developed a pair of switches that were independently controlled by both inducible promoters and degrons, enabling the toggling between two constructs with comparable kinetics and tightness. The gene-OFF switch was based on TRE transcriptional control coupled with auxin-induced degron-mediated proteolysis. A second independently controlled gene-ON switch was based on a modified ecdysone promoter and mutated FKBP12-derived destabilization domain degron, allowing acute and tuneable gene activation. This platform facilitates efficient generation of knockout cell lines containing a two-gene switch that is regulated tightly and can be flipped within a fraction of the time of a cell cycle.
Asunto(s)
Regulación de la Expresión Génica , Ácidos Indolacéticos , Animales , Línea Celular , Ácidos Indolacéticos/farmacología , Proteolisis , Regiones Promotoras Genéticas/genética , Mamíferos/metabolismoRESUMEN
Gene essentiality is defined as the extent to which a gene is required for the survival and reproductive success of a living system. It can vary between genetic backgrounds and environments. Essential protein coding genes have been well studied. However, the essentiality of non-coding regions is rarely reported. Most regions of human genome do not encode proteins. Determining essentialities of non-coding genes is demanded. We developed iEssLnc models, which can assign essentiality scores to lncRNA genes. As far as we know, this is the first direct quantitative estimation to the essentiality of lncRNA genes. By taking the advantage of graph neural network with meta-path-guided random walks on the lncRNA-protein interaction network, iEssLnc models can perform genome-wide screenings for essential lncRNA genes in a quantitative manner. We carried out validations and whole genome screening in the context of human cancer cell-lines and mouse genome. In comparisons to other methods, which are transferred from protein-coding genes, iEssLnc achieved better performances. Enrichment analysis indicated that iEssLnc essentiality scores clustered essential lncRNA genes with high ranks. With the screening results of iEssLnc models, we estimated the number of essential lncRNA genes in human and mouse. We performed functional analysis to find that essential lncRNA genes interact with microRNAs and cytoskeletal proteins significantly, which may be of interest in experimental life sciences. All datasets and codes of iEssLnc models have been deposited in GitHub (https://github.com/yyZhang14/iEssLnc).
Asunto(s)
MicroARNs , Neoplasias , ARN Largo no Codificante , Humanos , Animales , Ratones , Mapas de Interacción de Proteínas , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , MicroARNs/metabolismo , Redes Neurales de la ComputaciónRESUMEN
Predicting therapeutic responses in cancer patients is a major challenge in the field of precision medicine due to high inter- and intra-tumor heterogeneity. Most drug response models need to be improved in terms of accuracy, and there is limited research to assess therapeutic responses of particular tumor types. Here, we developed a novel method DROEG (Drug Response based on Omics and Essential Genes) for prediction of drug response in tumor cell lines by integrating genomic, transcriptomic and methylomic data along with CRISPR essential genes, and revealed that the incorporation of tumor proliferation essential genes can improve drug sensitivity prediction. Concisely, DROEG integrates literature-based and statistics-based methods to select features and uses Support Vector Regression for model construction. We demonstrate that DROEG outperforms most state-of-the-art algorithms by both qualitative (prediction accuracy for drug-sensitive/resistant) and quantitative (Pearson correlation coefficient between the predicted and actual IC50) evaluation in Genomics of Drug Sensitivity in Cancer and Cancer Cell Line Encyclopedia datasets. In addition, DROEG is further applied to the pan-gastrointestinal tumor with high prevalence and mortality as a case study at both cell line and clinical levels to evaluate the model efficacy and discover potential prognostic biomarkers in Cisplatin and Epirubicin treatment. Interestingly, the CRISPR essential gene information is found to be the most important contributor to enhance the accuracy of the DROEG model. To our knowledge, this is the first study to integrate essential genes with multi-omics data to improve cancer drug response prediction and provide insights into personalized precision treatment.
Asunto(s)
Antineoplásicos , Neoplasias , Humanos , Genes Esenciales , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Genómica/métodos , Medicina de Precisión/métodosRESUMEN
BACKGROUND: Vibrio cholerae O1 El Tor, the etiological agent responsible for the last cholera pandemic, has become a well-established model organism for which some genetic tools are available. While CRISPRi technology has been applied to V. cholerae, improvements were necessary to upscale it and enable pooled screening by high-throughput sequencing in this bacterium. RESULTS: In this study, we present a genome-wide CRISPR-dCas9 screen specifically optimized for the N16961 El Tor model strain of V. cholerae. This approach is characterized by a tight control of dCas9 expression and activity, as well as a streamlined experimental setup. Our library allows the depletion of 3,674 (98.9%) annotated genes from the V. cholerae genome. To confirm its effectiveness, we screened for genes that are essential during exponential growth in rich medium and identified 369 genes for which guides were significantly depleted from the library (log2FC < -2). Remarkably, 82% of these genes had previously been described as hypothetical essential genes in V. cholerae or in a closely related bacterium, V. natriegens. CONCLUSION: We thus validated the robustness and accuracy of our CRISPRi-based approach for assessing gene fitness in a given condition. Our findings highlight the efficacy of the developed CRISPRi platform as a powerful tool for high-throughput functional genomics studies of V. cholerae.
Asunto(s)
Sistemas CRISPR-Cas , Vibrio cholerae , Vibrio cholerae/genética , Cólera/microbiología , Cólera/epidemiología , Genoma Bacteriano , Pandemias , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento , Biblioteca de GenesRESUMEN
BACKGROUND: Essential genes encode functions that play a vital role in the life activities of organisms, encompassing growth, development, immune system functioning, and cell structure maintenance. Conventional experimental techniques for identifying essential genes are resource-intensive and time-consuming, and the accuracy of current machine learning models needs further enhancement. Therefore, it is crucial to develop a robust computational model to accurately predict essential genes. RESULTS: In this study, we introduce GCNN-SFM, a computational model for identifying essential genes in organisms, based on graph convolutional neural networks (GCNN). GCNN-SFM integrates a graph convolutional layer, a convolutional layer, and a fully connected layer to model and extract features from gene sequences of essential genes. Initially, the gene sequence is transformed into a feature map using coding techniques. Subsequently, a multi-layer GCN is employed to perform graph convolution operations, effectively capturing both local and global features of the gene sequence. Further feature extraction is performed, followed by integrating convolution and fully-connected layers to generate prediction results for essential genes. The gradient descent algorithm is utilized to iteratively update the cross-entropy loss function, thereby enhancing the accuracy of the prediction results. Meanwhile, model parameters are tuned to determine the optimal parameter combination that yields the best prediction performance during training. CONCLUSIONS: Experimental evaluation demonstrates that GCNN-SFM surpasses various advanced essential gene prediction models and achieves an average accuracy of 94.53%. This study presents a novel and effective approach for identifying essential genes, which has significant implications for biology and genomics research.
Asunto(s)
Genes Esenciales , Redes Neurales de la Computación , Algoritmos , Entropía , GenómicaRESUMEN
BACKGROUND: Patients with triple-positive breast cancer (TPBC) have a higher risk of recurrence and lower survival rates than patients with other luminal breast cancers. However, there are few studies on the predictive biomarkers of prognosis and treatment responses in TPBC. METHODS: Proliferation essential genes (PEGs) were acquired from clustered regularly interspaced short palindromic repeats-associated protein 9 (CRISPR-Cas9) technology, and cohorts of patients with TPBC were obtained from public databases and our cohort. To develop a TPBC-PEG signature, Cox regression and least absolute shrinkage and selection operator regression analyses were applied. Functional analyses were performed with gene set enrichment analysis. The relationship between candidate genes and neoadjuvant chemotherapy (NACT) sensitivity was explored via real-time quantitative polymerase chain reaction (RT-qPCR) and immunohistochemistry (IHC) on the basis of clinical samples. RESULTS: Among 900 TPBC-PEGs, 437 showed significant differential expression between TPBC and normal tissues. Three prognostic PEGs (actin-like 6A [ACTL6A], chaperonin containing TCP1 subunit 2 [CCT2], and threonyl-TRNA synthetase [TARS]) were identified and used to construct the PEG signature. Patients with high PEG signature scores exhibited a worse overall survival and lower sensitivity to NACT than patients with low PEG signature scores. RT-qPCR results indicated that ACTL6A and CCT2 expression were significantly upregulated in patients who lacked sensitivity to NACT. IHC results showed that the ACTL6A protein was highly expressed in patients with NACT resistance and nonpathological complete responses. CONCLUSIONS: This efficient PEG signature prognostic model can predict the outcomes of TPBC. Furthermore, ACTL6A expression level was associated with the response to NACT, and could serve as an important factor in predicting prognosis and drug sensitivity of patients with TPBC.
Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Actinas/genética , Genes Esenciales , Terapia Neoadyuvante/métodos , Pronóstico , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Proliferación Celular , Proteínas Cromosómicas no Histona/genética , Proteínas Cromosómicas no Histona/uso terapéutico , Proteínas de Unión al ADN/genéticaRESUMEN
Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.
Asunto(s)
Uso de Codones , Escherichia coli , Genes Esenciales , Aprendizaje Automático , Genes Esenciales/genética , Escherichia coli/genética , Genoma Bacteriano/genética , Genes Bacterianos , Codón/genética , Bacterias/genética , Bacterias/clasificaciónRESUMEN
Protein evolution rate is negatively correlated with several effectors, such as expression level, expression distribution, protein-protein interactions (PPIs), and essentiality for survival. These effectors can characterize the signaling pathways mediated by ligand-receptor binding. However, it is unclear whether these effectors are constraining factors on the pathway-specific evolution of ligands and receptors. To clarify the relation between the effectors and protein evolution (dN /dS ratio) in ligands and their receptors considering each signaling pathway, we investigated 377 proteins in 20 peptide/protein ligand groups and their receptor groups using 15 primate sequences. The dN /dS ratios between peptide/protein ligand groups and their receptor groups were positively correlated, suggesting the protein evolution under the influence of signaling pathway to which they belong. Comparing each signaling pathway, ligands and receptors mainly related to development and growth (FGF/Hedgehog/Notch/WNT groups) showed lower dN /dS ratios, higher PPI numbers, and higher essentiality, whereas those mainly related to immune process (CSF/IFN/IL/TNF groups) showed higher dN /dS ratios, lower PPI numbers, and lower essentiality. Most ligands and receptors were poorly expressed, and expression level was not a constraining factor on the protein evolution. These findings indicate that PPI and essentiality are constraining factors that characterize the pathway-specific evolution of ligands and receptors.
Asunto(s)
Evolución Molecular , Primates , Animales , Ligandos , Proteínas/genética , Transducción de SeñalRESUMEN
PURPOSE: Existing resources that characterize the essentiality status of genes are based on either proliferation assessment in human cell lines, viability evaluation in mouse knockouts, or constraint metrics derived from human population sequencing studies. Several repositories document phenotypic annotations for rare disorders; however, there is a lack of comprehensive reporting on lethal phenotypes. METHODS: We queried Online Mendelian Inheritance in Man for terms related to lethality and classified all Mendelian genes according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. We characterized the genes across these lethality categories, examined the evidence on viability from mouse models and explored how this information could be used for novel gene discovery. RESULTS: We developed the Lethal Phenotypes Portal to showcase this curated catalog of human essential genes. Differences in the mode of inheritance, physiological systems affected, and disease class were found for genes in different lethality categories, as well as discrepancies between the lethal phenotypes observed in mouse and human. CONCLUSION: We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.
Asunto(s)
Genes Letales , Enfermedades Genéticas Congénitas , Fenotipo , Humanos , Animales , Ratones , Enfermedades Genéticas Congénitas/genética , Bases de Datos Genéticas , Modelos Animales de Enfermedad , Genes Esenciales/genéticaRESUMEN
Mycoplasma bovis is an important emerging pathogen of cattle and bison, but our understanding of the genetic basis of its interactions with its host is limited. The aim of this study was to identify genes of M. bovis required for interaction and survival in association with host cells. One hundred transposon-induced mutants of the type strain PG45 were assessed for their capacity to survive and proliferate in Madin-Darby bovine kidney cell cultures. The growth of 19 mutants was completely abrogated, and 47 mutants had a prolonged doubling time compared to the parent strain. All these mutants had a similar growth pattern to the parent strain PG45 in the axenic media. Thirteen genes previously classified as dispensable for the axenic growth of M. bovis were found to be essential for the growth of M. bovis in association with host cells. In most of the mutants with a growth-deficient phenotype, the transposon was inserted into a gene involved in transportation or metabolism. This included genes coding for ABC transporters, proteins related to carbohydrate, nucleotide and protein metabolism, and membrane proteins essential for attachment. It is likely that these genes are essential not only in vitro but also for the survival of M. bovis in infected animals. IMPORTANCE: Mycoplasma bovis causes chronic bronchopneumonia, mastitis, arthritis, keratoconjunctivitis, and reproductive tract disease in cattle around the globe and is an emerging pathogen in bison. Control of mycoplasma infections is difficult in the absence of appropriate antimicrobial treatment or effective vaccines. A comprehensive understanding of host-pathogen interactions and virulence factors is important to implement more effective control methods against M. bovis. Recent studies of other mycoplasmas with in vitro cell culture models have identified essential virulence genes of mycoplasmas. Our study has identified genes of M. bovis required for survival in association with host cells, which will pave the way to a better understanding of host-pathogen interactions and the role of specific genes in the pathogenesis of disease caused by M. bovis.
Asunto(s)
Mycoplasma bovis , Mycoplasma bovis/genética , Animales , Bovinos , Infecciones por Mycoplasma/microbiología , Infecciones por Mycoplasma/veterinaria , Línea Celular , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Enfermedades de los Bovinos/microbiología , Genes Bacterianos/genética , Elementos Transponibles de ADN , Interacciones Huésped-Patógeno , Bison/microbiología , Viabilidad MicrobianaRESUMEN
BACKGROUND: Studying genomic variation in rapidly evolving pathogens potentially enables identification of genes supporting their "core biology", being present, functional and expressed by all strains or "flexible biology", varying between strains. Genes supporting flexible biology may be considered to be "accessory", whilst the "core" gene set is likely to be important for common features of a pathogen species biology, including virulence on all host genotypes. The wheat-pathogenic fungus Zymoseptoria tritici represents one of the most rapidly evolving threats to global food security and was the focus of this study. RESULTS: We constructed a pangenome of 18 European field isolates, with 12 also subjected to RNAseq transcription profiling during infection. Combining this data, we predicted a "core" gene set comprising 9807 sequences which were (1) present in all isolates, (2) lacking inactivating polymorphisms and (3) expressed by all isolates. A large accessory genome, consisting of 45% of the total genes, was also defined. We classified genetic and genomic polymorphism at both chromosomal and individual gene scales. Proteins required for essential functions including virulence had lower-than average sequence variability amongst core genes. Both core and accessory genomes encoded many small, secreted candidate effector proteins that likely interact with plant immunity. Viral vector-mediated transient in planta overexpression of 88 candidates failed to identify any which induced leaf necrosis characteristic of disease. However, functional complementation of a non-pathogenic deletion mutant lacking five core genes demonstrated that full virulence was restored by re-introduction of the single gene exhibiting least sequence polymorphism and highest expression. CONCLUSIONS: These data support the combined use of pangenomics and transcriptomics for defining genes which represent core, and potentially exploitable, weaknesses in rapidly evolving pathogens.
Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Virulencia/genética , Genoma Fúngico , Genes Fúngicos , Enfermedades de las Plantas/microbiologíaRESUMEN
Essential genes are crucial for microbial viability, playing key roles in both the primary and secondary metabolism. Since mutations in these genes can threaten organism viability, identifying them is challenging. Conditionally essential genes are required only under specific conditions and are important for functions such as virulence, immunity, stress survival, and antibiotic resistance. Transposon-directed sequencing (Tn-Seq) has emerged as a powerful method for identifying both essential and conditionally essential genes. In this review, we explored Tn-Seq workflows, focusing on eubacterial species and some yeast species. A comparison of 14 eubacteria species revealed 133 conserved essential genes, including those involved in cell division (e.g., ftsA, ftsZ), DNA replication (e.g., dnaA, dnaE), ribosomal function, cell wall synthesis (e.g., murB, murC), and amino acid synthesis (e.g., alaS, argS). Many other essential genes lack clear orthologues across different microorganisms, making them specific to each organism studied. Conditionally essential genes were identified in 18 bacterial species grown under various conditions, but their conservation was low, reflecting dependence on specific environments and microorganisms. Advances in Tn-Seq are expected to reveal more essential genes in the near future, deepening our understanding of microbial biology and enhancing our ability to manipulate microbial growth, as well as both the primary and secondary metabolism.
Asunto(s)
Elementos Transponibles de ADN , Genes Esenciales , Elementos Transponibles de ADN/genética , Bacterias/genética , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
Over the years, comprehensive explorations of the model organisms Caenorhabditis elegans (elegant worm) and Drosophila melanogaster (vinegar fly) have contributed substantially to our understanding of complex biological processes and pathways in multicellular organisms generally. Extensive functional genomic-phenomic, genomic, transcriptomic, and proteomic data sets have enabled the discovery and characterisation of genes that are crucial for life, called 'essential genes'. Recently, we investigated the feasibility of inferring essential genes from such data sets using advanced bioinformatics and showed that a machine learning (ML)-based workflow could be used to extract or engineer features from DNA, RNA, protein, and/or cellular data/information to underpin the reliable prediction of essential genes both within and between C. elegans and D. melanogaster. As these are two distantly related species within the Ecdysozoa, we proposed that this ML approach would be particularly well suited for species that are within the same phylum or evolutionary clade. In the present study, we cross-predicted essential genes within the phylum Nematoda (evolutionary clade V)-between C. elegans and the pathogenic parasitic nematode H. contortus-and then ranked and prioritised H. contortus proteins encoded by these genes as intervention (e.g., drug) target candidates. Using strong, validated predictors, we inferred essential genes of H. contortus that are involved predominantly in crucial biological processes/pathways including ribosome biogenesis, translation, RNA binding/processing, and signalling and which are highly transcribed in the germline, somatic gonad precursors, sex myoblasts, vulva cell precursors, various nerve cells, glia, or hypodermis. The findings indicate that this in silico workflow provides a promising avenue to identify and prioritise panels/groups of drug target candidates in parasitic nematodes for experimental validation in vitro and/or in vivo.
Asunto(s)
Caenorhabditis elegans , Genes Esenciales , Haemonchus , Aprendizaje Automático , Animales , Haemonchus/genética , Caenorhabditis elegans/genética , Proteínas del Helminto/genética , Proteínas del Helminto/metabolismo , Biología Computacional/métodos , Drosophila melanogaster/genéticaRESUMEN
BACKGROUND: The ability to accurately predict essential genes intolerant to loss-of-function (LOF) mutations can dramatically improve the identification of disease-associated genes. Recently, there have been numerous computational methods developed to predict human essential genes from population genomic data. While the existing methods are highly predictive of essential genes of long length, they have limited power in pinpointing short essential genes due to the sparsity of polymorphisms in the human genome. RESULTS: Motivated by the premise that population and functional genomic data may provide complementary evidence for gene essentiality, here we present an evolution-based deep learning model, DeepLOF, to predict essential genes in an unsupervised manner. Unlike previous population genetic methods, DeepLOF utilizes a novel deep learning framework to integrate both population and functional genomic data, allowing us to pinpoint short essential genes that can hardly be predicted from population genomic data alone. Compared with previous methods, DeepLOF shows unmatched performance in predicting ClinGen haploinsufficient genes, mouse essential genes, and essential genes in human cell lines. Notably, at a false positive rate of 5%, DeepLOF detects 50% more ClinGen haploinsufficient genes than previous methods. Furthermore, DeepLOF discovers 109 novel essential genes that are too short to be identified by previous methods. CONCLUSION: The predictive power of DeepLOF shows that it is a compelling computational method to aid in the discovery of essential genes.
Asunto(s)
Aprendizaje Profundo , Genes Esenciales , Humanos , Animales , Ratones , Genómica , Metagenómica , Línea CelularRESUMEN
Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes' biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. SHORT ABSTRACT: Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets' discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Genes Esenciales/genética , Aprendizaje Automático , Máquina de Vectores de Soporte , Animales , Caenorhabditis elegans/genética , Ontología de Genes , Redes Reguladoras de Genes , HumanosRESUMEN
Inducible gene expression systems are important for studying bacterial gene function, yet most exhibit leakage. In this study, we engineered a leakage-free hybrid system for precise gene expression controls in Fusobacterium nucleatum by integrating the xylose-inducible expression system with the theophylline-responsive riboswitch. This innovative method enables concurrent control of target gene expression at both transcription and translation initiation levels. Using luciferase and the indole-producing enzyme tryptophanase (TnaA) as reporters, we demonstrated that the hybrid system displays virtually no observable signal in the absence of inducers. We employed this system to express FtsX, a protein related to fusobacterial cytokinesis, in an ftsX mutant strain, unveiling a dose-dependent manner in FtsX production. Without inducers, cells form long filaments, while increasing FtsX levels by increasing inducer concentrations led to a gradual reduction in cell length until normal morphology was restored. Crucially, this system facilitated essential gene investigation, identifying the signal peptidase lepB gene as vital for F. nucleatum. LepB's essentiality stems from depletion, affecting outer membrane biogenesis and cell division. This novel hybrid system holds the potential for advancing research on essential genes and accurate gene regulation in F. nucleatum. IMPORTANCE Fusobacterium nucleatum, an anaerobic bacterium prevalent in the human oral cavity, is strongly linked to periodontitis and can colonize areas beyond the oral cavity, such as the placenta and gastrointestinal tract, causing adverse pregnancy outcomes and promoting colorectal cancer growth. Given F. nucleatum's clinical significance, research is underway to develop targeted therapies to inhibit its growth or eradicate the bacterium specifically. Essential genes, crucial for bacterial survival, growth, and reproduction, are promising drug targets. A leak-free-inducible gene expression system is needed for studying these genes, enabling conditional gene knockouts and elucidating the importance of those essential genes. Our study identified lepB as the essential gene by first generating a conditional gene mutation in F. nucleatum. Combining a xylose-inducible system with a riboswitch facilitated the analysis of essential genes in F. nucleatum, paving the way for potential drug development targeting this bacterium for various clinical applications.