Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 240
Filtrar
Mais filtros

País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38605639

RESUMO

The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.


Assuntos
Disciplinas das Ciências Biológicas , Reconhecimento Automatizado de Padrão , Algoritmos , Aprendizado de Máquina , Semântica
2.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38152979

RESUMO

The identification and characterization of essential genes are central to our understanding of the core biological functions in eukaryotic organisms, and has important implications for the treatment of diseases caused by, for example, cancers and pathogens. Given the major constraints in testing the functions of genes of many organisms in the laboratory, due to the absence of in vitro cultures and/or gene perturbation assays for most metazoan species, there has been a need to develop in silico tools for the accurate prediction or inference of essential genes to underpin systems biological investigations. Major advances in machine learning approaches provide unprecedented opportunities to overcome these limitations and accelerate the discovery of essential genes on a genome-wide scale. Here, we developed and evaluated a large language model- and graph neural network (LLM-GNN)-based approach, called 'Bingo', to predict essential protein-coding genes in the metazoan model organisms Caenorhabditis elegans and Drosophila melanogaster as well as in Mus musculus and Homo sapiens (a HepG2 cell line) by integrating LLM and GNNs with adversarial training. Bingo predicts essential genes under two 'zero-shot' scenarios with transfer learning, showing promise to compensate for a lack of high-quality genomic and proteomic data for non-model organisms. In addition, the attention mechanisms and GNNExplainer were employed to manifest the functional sites and structural domain with most contribution to essentiality. In conclusion, Bingo provides the prospect of being able to accurately infer the essential genes of little- or under-studied organisms of interest, and provides a biological explanation for gene essentiality.


Assuntos
Proteínas de Drosophila , Genes Essenciais , Camundongos , Animais , Proteômica , Drosophila melanogaster/genética , Fluxo de Trabalho , Redes Neurais de Computação , Proteínas/genética , Proteínas dos Microfilamentos/genética , Proteínas de Drosophila/genética
3.
BMC Genomics ; 25(1): 619, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38898442

RESUMO

Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.


Assuntos
Produtos Agrícolas , Genoma de Planta , Anotação de Sequência Molecular , Proteômica , Produtos Agrícolas/genética , Proteômica/métodos , Genômica/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo
4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35880623

RESUMO

Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.


Assuntos
Aprendizado de Máquina , Reconhecimento Automatizado de Padrão , Conhecimento
5.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36151740

RESUMO

Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritization. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, while relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data are required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorized according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and an evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, while also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.


Assuntos
Aprendizado de Máquina , Reconhecimento Automatizado de Padrão , Descoberta de Drogas , Conhecimento , Armazenamento e Recuperação da Informação
6.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34953465

RESUMO

Alzheimer's disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer's brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer's Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer's brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.


Assuntos
Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Predisposição Genética para Doença , Envelhecimento/genética , Perfilação da Expressão Gênica , Humanos , Estudos Longitudinais , Memória , Proteômica , RNA-Seq , Transcriptoma
7.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35551347

RESUMO

Understanding the biological functions of molecules in specific human tissues or cell types is crucial for gaining insights into human physiology and disease. To address this issue, it is essential to systematically uncover associations among multilevel elements consisting of disease phenotypes, tissues, cell types and molecules, which could pose a challenge because of their heterogeneity and incompleteness. To address this challenge, we describe a new methodological framework, called Graph Local InfoMax (GLIM), based on a human multilevel network (HMLN) that we established by introducing multiple tissues and cell types on top of molecular networks. GLIM can systematically mine the potential relationships between multilevel elements by embedding the features of the HMLN through contrastive learning. Our simulation results demonstrated that GLIM consistently outperforms other state-of-the-art algorithms in disease gene prediction. Moreover, GLIM was also successfully used to infer cell markers and rewire intercellular and molecular interactions in the context of specific tissues or diseases. As a typical case, the tissue-cell-molecule network underlying gastritis and gastric cancer was first uncovered by GLIM, providing systematic insights into the mechanism underlying the occurrence and development of gastric cancer. Overall, our constructed methodological framework has the potential to systematically uncover complex disease mechanisms and mine high-quality relationships among phenotypical, tissue, cellular and molecular elements.


Assuntos
Biologia Computacional , Neoplasias Gástricas , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Humanos
8.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35275996

RESUMO

MOTIVATION: Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS: We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS: The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Modelos Estatísticos , Proteínas
9.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34727570

RESUMO

Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.


Assuntos
Doença de Alzheimer , Conectoma , Transtorno Depressivo Maior , Encéfalo/diagnóstico por imagem , Conectoma/métodos , Humanos , Imageamento por Ressonância Magnética/métodos
10.
J Eukaryot Microbiol ; : e13038, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38934348

RESUMO

Since the advent of sequencing techniques and due to their continuous evolution, it has become easier and less expensive to obtain the complete genome sequence of any organism. Nevertheless, to elucidate all biological processes governing organism development, quality annotation is essential. In genome annotation, predicting gene structure is one of the most important and captivating challenges for computational biology. This aspect of annotation requires continual optimization, particularly for genomes as unusual as those of microsporidia. Indeed, this group of fungal-related parasites exhibits specific features (highly reduced gene sizes, sequences with high rate of evolution) linked to their evolution as intracellular parasites, requiring the implementation of specific annotation approaches to consider all these features. This review aimed to outline these characteristics and to assess the increasingly efficient approaches and tools that have enhanced the accuracy of gene prediction for microsporidia, both in terms of sensitivity and specificity. Subsequently, a final part will be dedicated to postgenomic approaches aimed at reinforcing the annotation data generated by prediction software. These approaches include the characterization of other understudied genes, such as those encoding regulatory noncoding RNAs or very small proteins, which also play crucial roles in the life cycle of these microorganisms.

11.
Environ Res ; 255: 119187, 2024 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-38777295

RESUMO

The issue of combined pollution in oligotrophic water has garnered increasing attention in recent years. To enhance the pollutant removal efficiency in oligotrophic water, the system containing Zoogloea sp. FY6 was constructed using polyester fiber wrapped sugarcane biochar and construction waste iron (PWSI), and the denitrification test of simulated water and actual oligotrophic water was carried out for 35 days. The experimental findings from the systems indicated that the removal efficiencies of nitrate (NO3--N), total nitrogen (TN), chemical oxygen demand (COD), and total phosphorus (TP) in simulated water were 88.61%, 85.23%, 94.28%, and 98.90%, respectively. The removal efficiencies of actual oligotrophic water were 83.06%, 81.39%, 81.66%, and 97.82%, respectively. Furthermore, the high-throughput sequencing data demonstrated that strain FY6 was successfully loaded onto the biological carrier. According to functional gene predictions derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, the introduction of PWSI enhanced intracellular iron cycling and nitrogen metabolism.


Assuntos
Carvão Vegetal , Ferro , Nitrogênio , Fósforo , Poluentes Químicos da Água , Fósforo/análise , Nitrogênio/análise , Nitrogênio/metabolismo , Carvão Vegetal/química , Ferro/química , Poluentes Químicos da Água/análise , Eliminação de Resíduos Líquidos/métodos
12.
Genomics ; 115(2): 110594, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36863417

RESUMO

Astrocytes activate and crosstalk with neurons influencing inflammatory responses following ischemic stroke. The distribution, abundance, and activity of microRNAs in astrocytes-derived exosomes after ischemic stroke remains largely unknown. In this study, exosomes were extracted from primary cultured mouse astrocytes via ultracentrifugation, and exposed to oxygen glucose deprivation/re­oxygenation injury to mimic experimental ischemic stroke. SmallRNAs from astrocyte-derived exosomes were sequenced, and differentially expressed microRNAs were randomly selected and verified by stem-loop real time quantitative polymerase chain reaction. We found that 176 microRNAs, including 148 known and 28 novel microRNAs, were differentially expressed in astrocyte-derived exosomes following oxygen glucose deprivation/re­oxygenation injury. In gene ontology enrichment, Kyoto encyclopedia of genes and genomes pathway analyses, and microRNA target gene prediction analyses, these alteration in microRNAs were associated to a broad spectrum of physiological functions including signaling transduction, neuroprotection and stress responses. Our findings warrant further investigating of these differentially expressed microRNAs in human diseases particularly ischemic stroke.


Assuntos
Exossomos , AVC Isquêmico , MicroRNAs , Camundongos , Animais , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Astrócitos/metabolismo , Exossomos/genética , Exossomos/metabolismo , AVC Isquêmico/metabolismo , Glucose/metabolismo , Oxigênio/metabolismo
13.
Int J Mol Sci ; 25(13)2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-39000335

RESUMO

In various domains, including everyday activities, agricultural practices, and medical treatments, the escalating challenge of antibiotic resistance poses a significant concern. Traditional approaches to studying antibiotic resistance genes (ARGs) often require substantial time and effort and are limited in accuracy. Moreover, the decentralized nature of existing data repositories complicates comprehensive analysis of antibiotic resistance gene sequences. In this study, we introduce a novel computational framework named TGC-ARG designed to predict potential ARGs. This framework takes protein sequences as input, utilizes SCRATCH-1D for protein secondary structure prediction, and employs feature extraction techniques to derive distinctive features from both sequence and structural data. Subsequently, a Siamese network is employed to foster a contrastive learning environment, enhancing the model's ability to effectively represent the data. Finally, a multi-layer perceptron (MLP) integrates and processes sequence embeddings alongside predicted secondary structure embeddings to forecast ARG presence. To evaluate our approach, we curated a pioneering open dataset termed ARSS (Antibiotic Resistance Sequence Statistics). Comprehensive comparative experiments demonstrate that our method surpasses current state-of-the-art methodologies. Additionally, through detailed case studies, we illustrate the efficacy of our approach in predicting potential ARGs.


Assuntos
Resistência Microbiana a Medicamentos , Resistência Microbiana a Medicamentos/genética , Biologia Computacional/métodos , Estrutura Secundária de Proteína , Aprendizado de Máquina , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Redes Neurais de Computação
14.
BMC Bioinformatics ; 24(1): 327, 2023 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-37653395

RESUMO

BACKGROUND: The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. RESULTS: Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein-to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. CONCLUSIONS: Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.


Assuntos
Eucariotos , Células Eucarióticas , Animais , Anotação de Sequência Molecular , Transcriptoma
15.
BMC Bioinformatics ; 24(1): 379, 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37803253

RESUMO

PURPOSE: Autism spectrum disorder(ASD) is a disease associated with the neurodevelopment of the brain. The autism spectrum can be observed in early childhood, where the symptoms of the disease usually appear in children within the first year of their life. Currently, ASD can only be diagnosed based on the apparent symptoms due to the lack of information on genes related to the disease. Therefore, in this paper, we need to predict the largest number of disease-causing genes for a better diagnosis. METHODS: A hybrid stacking ensemble model with Synthetic Minority Oversampling TEchnique (Stack-SMOTE) is proposed to predict the genes associated with ASD. The proposed model uses the gene ontology database to measure the similarities between the genes using a hybrid gene similarity function(HGS). HGS is effective in measuring the similarity as it combines the features of information gain-based methods and graph-based methods. The proposed model solves the imbalanced ASD dataset problem using the Synthetic Minority Oversampling Technique (SMOTE), which generates synthetic data rather than duplicates the data to reduce the overfitting. Sequentially, a gradient boosting-based random forest classifier (GBBRF) is introduced as a new combination technique to enhance the prediction of ASD genes. Moreover, the GBBRF classifier combined with random forest(RF), k-nearest neighbor, support vector machine(SVM), and logistic regression(LR) to form the proposed Stacking-SMOTE model to optimize the prediction of ASD genes. RESULTS: The proposed Stacking-SMOTE model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database and a set of candidates ASD genes.The results of the proposed model-based SMOTE outperform other reported undersampling and oversampling techniques. Sequentially, the results of GBBRF achieve higher accuracy than using the basic classifiers. Moreover, the experimental results show that the proposed Stacking-SMOTE model outperforms the existing ASD prediction models with approximately 95.5% accuracy. CONCLUSION: The proposed Stacking-SMOTE model demonstrates that SMOTE is effective in handling the autism imbalanced data. Sequentially, the integration between the gradient boosting and random forest classifier (GBBRF) support to build a robust stacking ensemble model(Stacking-SMOTE).


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Pré-Escolar , Criança , Humanos , Transtorno Autístico/genética , Transtorno do Espectro Autista/genética , Algoritmo Florestas Aleatórias , Máquina de Vetores de Suporte , Fenótipo
16.
BMC Bioinformatics ; 24(1): 162, 2023 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-37085750

RESUMO

BACKGROUND: The identification of disease-related genes is of great significance for the diagnosis and treatment of human disease. Most studies have focused on developing efficient and accurate computational methods to predict disease-causing genes. Due to the sparsity and complexity of biomedical data, it is still a challenge to develop an effective multi-feature fusion model to identify disease genes. RESULTS: This paper proposes an approach to predict the pathogenic gene based on multi-head attention fusion (MHAGP). Firstly, the heterogeneous biological information networks of disease genes are constructed by integrating multiple biomedical knowledge databases. Secondly, two graph representation learning algorithms are used to capture the feature vectors of gene-disease pairs from the network, and the features are fused by introducing multi-head attention. Finally, multi-layer perceptron model is used to predict the gene-disease association. CONCLUSIONS: The MHAGP model outperforms all of other methods in comparative experiments. Case studies also show that MHAGP is able to predict genes potentially associated with diseases. In the future, more biological entity association data, such as gene-drug, disease phenotype-gene ontology and so on, can be added to expand the information in heterogeneous biological networks and achieve more accurate predictions. In addition, MHAGP with strong expansibility can be used for potential tasks such as gene-drug association and drug-disease association prediction.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Algoritmos , Conhecimento
17.
Plant J ; 110(6): 1592-1602, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35365907

RESUMO

The activation of plant immunity is mediated by resistance (R)-gene receptors, also known as nucleotide-binding leucine-rich repeat (NB-LRR) genes, which in turn trigger the authentic defense response. R-gene identification is a crucial goal for both classic and modern plant breeding strategies for disease resistance. The conventional method identifies NB-LRR genes using a protein motif/domain-based search (PDS) within an automatically predicted gene set of the respective genome assembly. PDS proved to be imprecise since repeat masking prior to automatic genome annotation unwittingly prevented comprehensive NB-LRR gene detection. Furthermore, R-genes have diversified in a species-specific manner, so that NB-LRR gene identification cannot be universally standardized. Here, we present the full-length Homology-based R-gene Prediction (HRP) method for the comprehensive identification and annotation of a genome's R-gene repertoire. Our method has substantially addressed the complex genomic organization of tomato (Solanum lycopersicum) NB-LRR gene loci, proving to be more performant than the well-established RenSeq approach. HRP efficiency was also tested on three differently assembled and annotated Beta sp. genomes. Indeed, HRP identified up to 45% more full-length NB-LRR genes compared to previous approaches. HRP also turned out to be a more refined strategy for R-gene allele mining, testified by the identification of hitherto undiscovered Fom-2 homologs in five Cucurbita sp. genomes. In summary, our high-performance method for full-length NB-LRR gene discovery will propel the identification of novel R-genes towards development of improved cultivars.


Assuntos
Genes de Plantas , Solanum lycopersicum , Resistência à Doença/genética , Genes de Plantas/genética , Solanum lycopersicum/genética , Solanum lycopersicum/metabolismo , Melhoramento Vegetal , Doenças das Plantas/genética , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Homologia de Sequência
18.
BMC Med ; 21(1): 267, 2023 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-37488529

RESUMO

BACKGROUND: Comorbidities are expected to impact the pathophysiology of heart failure (HF) with preserved ejection fraction (HFpEF). However, comorbidity profiles are usually reduced to a few comorbid disorders. Systems medicine approaches can model phenome-wide comorbidity profiles to improve our understanding of HFpEF and infer associated genetic profiles. METHODS: We retrospectively explored 569 comorbidities in 29,047 HF patients, including 8062 HFpEF and 6585 HF with reduced ejection fraction (HFrEF) patients from a German university hospital. We assessed differences in comorbidity profiles between HF subtypes via multiple correspondence analysis. Then, we used machine learning classifiers to identify distinctive comorbidity profiles of HFpEF and HFrEF patients. Moreover, we built a comorbidity network (HFnet) to identify the main disease clusters that summarized the phenome-wide comorbidity. Lastly, we predicted novel gene candidates for HFpEF by linking the HFnet to a multilayer gene network, integrating multiple databases. To corroborate HFpEF candidate genes, we collected transcriptomic data in a murine HFpEF model. We compared predicted genes with the murine disease signature as well as with the literature. RESULTS: We found a high degree of variance between the comorbidity profiles of HFpEF and HFrEF, while each was more similar to HFmrEF. The comorbidities present in HFpEF patients were more diverse than those in HFrEF and included neoplastic, osteologic and rheumatoid disorders. Disease communities in the HFnet captured important comorbidity concepts of HF patients which could be assigned to HF subtypes, age groups, and sex. Based on the HFpEF comorbidity profile, we predicted and recovered gene candidates, including genes involved in fibrosis (COL3A1, LOX, SMAD9, PTHL), hypertrophy (GATA5, MYH7), oxidative stress (NOS1, GSST1, XDH), and endoplasmic reticulum stress (ATF6). Finally, predicted genes were significantly overrepresented in the murine transcriptomic disease signature providing additional plausibility for their relevance. CONCLUSIONS: We applied systems medicine concepts to analyze comorbidity profiles in a HF patient cohort. We were able to identify disease clusters that helped to characterize HF patients. We derived a distinct comorbidity profile for HFpEF, which was leveraged to suggest novel candidate genes via network propagation. The identification of distinctive comorbidity profiles and candidate genes from routine clinical data provides insights that may be leveraged to improve diagnosis and identify treatment targets for HFpEF patients.


Assuntos
Insuficiência Cardíaca , Medicina , Humanos , Animais , Camundongos , Estudos Retrospectivos , Volume Sistólico , Comorbidade
19.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33367541

RESUMO

In disease research, the study of gene-disease correlation has always been an important topic. With the emergence of large-scale connected data sets in biology, we use known correlations between the entities, which may be from different sets, to build a biological heterogeneous network and propose a new network embedded representation algorithm to calculate the correlation between disease and genes, using the correlation score to predict pathogenic genes. Then, we conduct several experiments to compare our method to other state-of-the-art methods. The results reveal that our method achieves better performance than the traditional methods.


Assuntos
Algoritmos , Biologia Computacional , Redes Reguladoras de Genes , Humanos
20.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34015806

RESUMO

Recently, the frequency of observing bacterial strains without known genetic components underlying phenotypic resistance to antibiotics has increased. There are several strains of bacteria lacking known resistance genes; however, they demonstrate resistance phenotype to drugs of that family. Although such strains are fewer compared to the overall population, they pose grave emerging threats to an already heavily challenged area of antimicrobial resistance (AMR), where death tolls have reached ~700 000 per year and a grim projection of ~10 million deaths per year by 2050 looms. Considering the fact that development of novel antibiotics is not keeping pace with the emergence and dissemination of resistance, there is a pressing need to decipher yet unknown genetic mechanisms of resistance, which will enable developing strategies for the best use of available interventions and show the way for the development of new drugs. In this study, we present a machine learning framework to predict novel AMR factors that are potentially responsible for resistance to specific antimicrobial drugs. The machine learning framework utilizes whole-genome sequencing AMR genetic data and antimicrobial susceptibility testing phenotypic data to predict resistance phenotypes and rank AMR genes by their importance in discriminating the resistance from the susceptible phenotypes. In summary, we present here a bioinformatics framework for training machine learning models, evaluating their performances, selecting the best performing model(s) and finally predicting the most important AMR loci for the resistance involved.


Assuntos
Antibacterianos , Bactérias/efeitos dos fármacos , Biologia Computacional/métodos , Farmacorresistência Bacteriana/efeitos dos fármacos , Aprendizado de Máquina , Algoritmos , Antibacterianos/farmacologia , Bactérias/genética , Biologia Computacional/normas , Genótipo , Fenótipo , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA