Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.692
Filtrar
Más filtros

Intervalo de año de publicación
1.
Development ; 151(3)2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38230566

RESUMEN

Research in model organisms is central to the characterization of signaling pathways in multicellular organisms. Here, we present the comprehensive and systematic curation of 17 Drosophila signaling pathways using the Gene Ontology framework to establish a dynamic resource that has been incorporated into FlyBase, providing visualization and data integration tools to aid research projects. By restricting to experimental evidence reported in the research literature and quantifying the amount of such evidence for each gene in a pathway, we captured the landscape of empirical knowledge of signaling pathways in Drosophila.


Asunto(s)
Bases de Datos Genéticas , Drosophila , Animales , Drosophila/genética , Ontología de Genes , Transducción de Señal , Drosophila melanogaster/genética
2.
Proc Natl Acad Sci U S A ; 121(37): e2319804121, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-39226356

RESUMEN

The rapid growth of large-scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability-driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert-annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region-specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability-driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.


Asunto(s)
Encéfalo , Animales , Ratones , Encéfalo/metabolismo , Perfilación de la Expresión Génica/métodos , Transcriptoma , Algoritmos , Aprendizaje Automático no Supervisado , Ontología de Genes , Atlas como Asunto , Redes Reguladoras de Genes , Análisis de Componente Principal
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38557678

RESUMEN

Disease ontologies facilitate the semantic organization and representation of domain-specific knowledge. In the case of prostate cancer (PCa), large volumes of research results and clinical data have been accumulated and needed to be standardized for sharing and translational researches. A formal representation of PCa-associated knowledge will be essential to the diverse data standardization, data sharing and the future knowledge graph extraction, deep phenotyping and explainable artificial intelligence developing. In this study, we constructed an updated PCa ontology (PCAO2) based on the ontology development life cycle. An online information retrieval system was designed to ensure the usability of the ontology. The PCAO2 with a subclass-based taxonomic hierarchy covers the major biomedical concepts for PCa-associated genotypic, phenotypic and lifestyle data. The current version of the PCAO2 contains 633 concepts organized under three biomedical viewpoints, namely, epidemiology, diagnosis and treatment. These concepts are enriched by the addition of definition, synonym, relationship and reference. For the precision diagnosis and treatment, the PCa-associated genes and lifestyles are integrated in the viewpoint of epidemiological aspects of PCa. PCAO2 provides a standardized and systematized semantic framework for studying large amounts of heterogeneous PCa data and knowledge, which can be further, edited and enriched by the scientific community. The PCAO2 is freely available at https://bioportal.bioontology.org/ontologies/PCAO, http://pcaontology.net/ and http://pcaontology.net/mobile/.


Asunto(s)
Ontologías Biológicas , Neoplasias de la Próstata , Humanos , Masculino , Inteligencia Artificial , Semántica , Neoplasias de la Próstata/genética
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38446740

RESUMEN

Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG). In this study, we propose a novel GO term representation learning method, PO2Vec, to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.


Asunto(s)
Benchmarking , Biología Computacional , Ontología de Genes , Aprendizaje , Anotación de Secuencia Molecular
5.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-39038936

RESUMEN

Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND-one of the most popular tools for function prediction-under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.


Asunto(s)
Bases de Datos de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Biología Computacional/métodos , Ontología de Genes , Algoritmos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Aprendizaje Automático
6.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39126426

RESUMEN

Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).


Asunto(s)
Aprendizaje Automático , Fenotipo , Humanos , Metilación de ADN , Algoritmos , Biología Computacional/métodos , Programas Informáticos , Transcriptoma , Genómica/métodos
7.
Proc Natl Acad Sci U S A ; 120(34): e2221473120, 2023 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-37579152

RESUMEN

Collective intelligence has emerged as a powerful mechanism to boost decision accuracy across many domains, such as geopolitical forecasting, investment, and medical diagnostics. However, collective intelligence has been mostly applied to relatively simple decision tasks (e.g., binary classifications). Applications in more open-ended tasks with a much larger problem space, such as emergency management or general medical diagnostics, are largely lacking, due to the challenge of integrating unstandardized inputs from different crowd members. Here, we present a fully automated approach for harnessing collective intelligence in the domain of general medical diagnostics. Our approach leverages semantic knowledge graphs, natural language processing, and the SNOMED CT medical ontology to overcome a major hurdle to collective intelligence in open-ended medical diagnostics, namely to identify the intended diagnosis from unstructured text. We tested our method on 1,333 medical cases diagnosed on a medical crowdsourcing platform: The Human Diagnosis Project. Each case was independently rated by ten diagnosticians. Comparing the diagnostic accuracy of single diagnosticians with the collective diagnosis of differently sized groups, we find that our method substantially increases diagnostic accuracy: While single diagnosticians achieved 46% accuracy, pooling the decisions of ten diagnosticians increased this to 76%. Improvements occurred across medical specialties, chief complaints, and diagnosticians' tenure levels. Our results show the life-saving potential of tapping into the collective intelligence of the global medical community to reduce diagnostic errors and increase patient safety.


Asunto(s)
Colaboración de las Masas , Inteligencia , Humanos , Errores Diagnósticos
8.
Plant J ; 118(2): 304-323, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38265362

RESUMEN

The model moss species Physcomitrium patens has long been used for studying divergence of land plants spanning from bryophytes to angiosperms. In addition to its phylogenetic relationships, the limited number of differential tissues, and comparable morphology to the earliest embryophytes provide a system to represent basic plant architecture. Based on plant-fungal interactions today, it is hypothesized these kingdoms have a long-standing relationship, predating plant terrestrialization. Mortierellaceae have origins diverging from other land fungi paralleling bryophyte divergence, are related to arbuscular mycorrhizal fungi but are free-living, observed to interact with plants, and can be found in moss microbiomes globally. Due to their parallel origins, we assess here how two Mortierellaceae species, Linnemannia elongata and Benniella erionia, interact with P. patens in coculture. We also assess how Mollicute-related or Burkholderia-related endobacterial symbionts (MRE or BRE) of these fungi impact plant response. Coculture interactions are investigated through high-throughput phenomics, microscopy, RNA-sequencing, differential expression profiling, gene ontology enrichment, and comparisons among 99 other P. patens transcriptomic studies. Here we present new high-throughput approaches for measuring P. patens growth, identify novel expression of over 800 genes that are not expressed on traditional agar media, identify subtle interactions between P. patens and Mortierellaceae, and observe changes to plant-fungal interactions dependent on whether MRE or BRE are present. Our study provides insights into how plants and fungal partners may have interacted based on their communications observed today as well as identifying L. elongata and B. erionia as modern fungal endophytes with P. patens.


Asunto(s)
Briófitas , Bryopsida , Micorrizas , Filogenia , Endófitos/metabolismo , Análisis Multinivel , Proteínas de Plantas/metabolismo , Bryopsida/genética , Bryopsida/metabolismo , Briófitas/genética , Briófitas/metabolismo , Micorrizas/metabolismo
9.
Am J Hum Genet ; 109(9): 1591-1604, 2022 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-35998640

RESUMEN

Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.


Asunto(s)
Procesamiento de Lenguaje Natural , Enfermedades Raras , Registros Electrónicos de Salud , Humanos , Fenotipo , Enfermedades Raras/genética
10.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37861172

RESUMEN

Protein function annotation is one of the most important research topics for revealing the essence of life at molecular level in the post-genome era. Current research shows that integrating multisource data can effectively improve the performance of protein function prediction models. However, the heavy reliance on complex feature engineering and model integration methods limits the development of existing methods. Besides, models based on deep learning only use labeled data in a certain dataset to extract sequence features, thus ignoring a large amount of existing unlabeled sequence data. Here, we propose an end-to-end protein function annotation model named HNetGO, which innovatively uses heterogeneous network to integrate protein sequence similarity and protein-protein interaction network information and combines the pretraining model to extract the semantic features of the protein sequence. In addition, we design an attention-based graph neural network model, which can effectively extract node-level features from heterogeneous networks and predict protein function by measuring the similarity between protein nodes and gene ontology term nodes. Comparative experiments on the human dataset show that HNetGO achieves state-of-the-art performance on cellular component and molecular function branches.


Asunto(s)
Redes Neurales de la Computación , Mapas de Interacción de Proteínas , Humanos , Secuencia de Aminoácidos , Ontología de Genes , Anotación de Secuencia Molecular
11.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37756593

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.


Asunto(s)
Aprendizaje Profundo , Ontología de Genes , Análisis de Expresión Génica de una Sola Célula , Algoritmos , Genómica
12.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37248747

RESUMEN

Human Phenotype Ontology (HPO)-based approaches have gained popularity in recent times as a tool for genomic diagnostics of rare diseases. However, these approaches do not make full use of the available information on disease and patient phenotypes. We present a new method called Phen2Disease, which utilizes the bidirectional maximum matching semantic similarity between two phenotype sets of patients and diseases to prioritize diseases and genes. Our comprehensive experiments have been conducted on six real data cohorts with 2051 cases (Cohort 1, n = 384; Cohort 2, n = 281; Cohort 3, n = 185; Cohort 4, n = 784; Cohort 5, n = 208; and Cohort 6, n = 209) and two simulated data cohorts with 1000 cases. The results of the experiments showed that Phen2Disease outperforms the three state-of-the-art methods when only phenotype information and HPO knowledge base are used, particularly in cohorts with fewer average numbers of HPO terms. We also observed that patients with higher information content scores have more specific information, leading to more accurate predictions. Moreover, Phen2Disease provides high interpretability with ranked diseases and patient HPO terms presented. Our method provides a novel approach to utilizing phenotype data for genomic diagnostics of rare diseases, with potential for clinical impact. Phen2Disease is freely available on GitHub at https://github.com/ZhuLab-Fudan/Phen2Disease.


Asunto(s)
Ontologías Biológicas , Enfermedades Raras , Humanos , Semántica , Genómica , Fenotipo
13.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-37995133

RESUMEN

Interpreting the function of genes and gene sets identified from omics experiments remains a challenge, as current pathway analysis tools often fail to consider the critical biological context, such as tissue or cell-type specificity. To address this limitation, we introduced CellGO. CellGO tackles this challenge by leveraging the visible neural network (VNN) and single-cell gene expressions to mimic cell-type-specific signaling propagation along the Gene Ontology tree within a cell. This design enables a novel scoring system to calculate the cell-type-specific gene-pathway paired active scores, based on which, CellGO is able to identify cell-type-specific active pathways associated with single genes. In addition, by aggregating the activities of single genes, CellGO extends its capability to identify cell-type-specific active pathways for a given gene set. To enhance biological interpretation, CellGO offers additional features, including the identification of significantly active cell types and driver genes and community analysis of pathways. To validate its performance, CellGO was assessed using a gene set comprising mixed cell-type markers, confirming its ability to discern active pathways across distinct cell types. Subsequent benchmarking analyses demonstrated CellGO's superiority in effectively identifying cell types and their corresponding cell-type-specific pathways affected by gene knockouts, using either single genes or sets of genes differentially expressed between knockout and control samples. Moreover, CellGO demonstrated its ability to infer cell-type-specific pathogenesis for disease risk genes. Accessible as a Python package, CellGO also provides a user-friendly web interface, making it a versatile and accessible tool for researchers in the field.


Asunto(s)
Aprendizaje Profundo , Programas Informáticos , Humanos , Susceptibilidad a Enfermedades
14.
J Med Genet ; 61(5): 443-451, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38458754

RESUMEN

BACKGROUND: Dystonia is one of the most common movement disorders. To date, the genetic causes of dystonia in populations of European descent have been extensively studied. However, other populations, particularly those from the Middle East, have not been adequately studied. The purpose of this study is to discover the genetic basis of dystonia in a clinically and genetically well-characterised dystonia cohort from Turkey, which harbours poorly studied populations. METHODS: Exome sequencing analysis was performed in 42 Turkish dystonia families. Using co-expression network (CEN) analysis, identified candidate genes were interrogated for the networks including known dystonia-associated genes and genes further associated with the protein-protein interaction, animal model-based characteristics and clinical findings. RESULTS: We identified potentially disease-causing variants in the established dystonia genes (PRKRA, SGCE, KMT2B, SLC2A1, GCH1, THAP1, HPCA, TSPOAP1, AOPEP; n=11 families (26%)), in the uncommon forms of dystonia-associated genes (PCCB, CACNA1A, ALDH5A1, PRKN; n=4 families (10%)) and in the candidate genes prioritised based on the pathogenicity of the variants and CEN-based analyses (n=11 families (21%)). The diagnostic yield was found to be 36%. Several pathways and gene ontologies implicated in immune system, transcription, metabolic pathways, endosomal-lysosomal and neurodevelopmental mechanisms were over-represented in our CEN analysis. CONCLUSIONS: Here, using a structured approach, we have characterised a clinically and genetically well-defined dystonia cohort from Turkey, where dystonia has not been widely studied, and provided an uncovered genetic basis, which will facilitate diagnostic dystonia research.


Asunto(s)
Distonía , Trastornos Distónicos , Animales , Humanos , Distonía/genética , Distonía/diagnóstico , Trastornos Distónicos/genética , Trastornos Distónicos/diagnóstico , Pruebas Genéticas , Turquía , Biología Molecular , Mutación , Proteínas de Unión al ADN/genética , Proteínas Reguladoras de la Apoptosis/genética
15.
J Allergy Clin Immunol ; 153(3): 615-628.e4, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38185417

RESUMEN

Autoimmunity in inborn errors of immunity (IEIs) has a multifactorial pathogenesis and develops subsequent to a genetic predisposition in conjunction with gene regulation, environmental modifiers, and infectious triggers. On the basis of incremental data availability owing to upfront application of omics technologies, a more granular and dynamic view of mechanisms and manifestations is warranted. Here, we present a comprehensive novel concept of autoimmunity in IEIs that considers multiple layers of interdependent elements and connects 101 causative genes or deletions according to the quality of the allelic variants with 47 molecular pathways and 22 immune effector mechanisms. Furthermore, we list 50 resulting manifestations together with the corresponding Human Phenotype Ontology terms and review the types and frequencies of the most relevant clinical presentations. When all of its elements are taken together, this concept (1) extends the historical anatomic view of central versus peripheral tolerance toward multiple interdependent mechanisms of immune tolerance, (2) delineates the mechanisms underlying the protean clinical manifestations, and thereby, (3) points toward the most suitable precision therapy for autoimmunity in IEIs. The multilayer concept of autoimmune mechanisms and manifestations in IEIs will facilitate research design and provide clinical guidance on the use of precision medicine irrespective of the data depth available in each health care scenario.


Asunto(s)
Autoinmunidad , Medicina de Precisión , Humanos , Alelos , Predisposición Genética a la Enfermedad , Tolerancia Inmunológica
16.
Proteomics ; : e2300471, 2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-38996351

RESUMEN

Predicting protein function from protein sequence, structure, interaction, and other relevant information is important for generating hypotheses for biological experiments and studying biological systems, and therefore has been a major challenge in protein bioinformatics. Numerous computational methods had been developed to advance protein function prediction gradually in the last two decades. Particularly, in the recent years, leveraging the revolutionary advances in artificial intelligence (AI), more and more deep learning methods have been developed to improve protein function prediction at a faster pace. Here, we provide an in-depth review of the recent developments of deep learning methods for protein function prediction. We summarize the significant advances in the field, identify several remaining major challenges to be tackled, and suggest some potential directions to explore. The data sources and evaluation metrics widely used in protein function prediction are also discussed to assist the machine learning, AI, and bioinformatics communities to develop more cutting-edge methods to advance protein function prediction.

17.
BMC Bioinformatics ; 25(1): 174, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38698340

RESUMEN

BACKGROUND: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .


Asunto(s)
Biología Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Algoritmos
18.
BMC Bioinformatics ; 25(1): 127, 2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38528499

RESUMEN

BACKGROUND: N6-methyladenosine (m6A) is the most prevalent post-transcriptional modification in eukaryotic cells that plays a crucial role in regulating various biological processes, and dysregulation of m6A status is involved in multiple human diseases including cancer contexts. A number of prediction frameworks have been proposed for high-accuracy identification of putative m6A sites, however, none have targeted for direct prediction of tissue-conserved m6A modified residues from non-conserved ones at base-resolution level. RESULTS: We report here m6A-TCPred, a computational tool for predicting tissue-conserved m6A residues using m6A profiling data from 23 human tissues. By taking advantage of the traditional sequence-based characteristics and additional genome-derived information, m6A-TCPred successfully captured distinct patterns between potentially tissue-conserved m6A modifications and non-conserved ones, with an average AUROC of 0.871 and 0.879 tested on cross-validation and independent datasets, respectively. CONCLUSION: Our results have been integrated into an online platform: a database holding 268,115 high confidence m6A sites with their conserved information across 23 human tissues; and a web server to predict the conserved status of user-provided m6A collections. The web interface of m6A-TCPred is freely accessible at: www.rnamd.org/m6ATCPred .


Asunto(s)
Adenosina , Computadores , Humanos , Aprendizaje Automático , Procesamiento Postranscripcional del ARN
19.
J Proteome Res ; 23(5): 1593-1602, 2024 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-38626392

RESUMEN

With the rapid expansion of sequencing of genomes, the functional annotation of proteins becomes a bottleneck in understanding proteomes. The Chromosome-centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome and find functional annotations for them. However, until now there are still 1137 identified human proteins without functional annotation, called uPE1 proteins. Sequence alignment was insufficient to predict their functions, and the crystal structures of most proteins were unavailable. In this study, we demonstrated a new functional annotation strategy, AlphaFun, based on structural alignment using deep-learning-predicted protein structures. Using this strategy, we functionally annotated 99% of the human proteome, including the uPE1 proteins and missing proteins, which have not been identified yet. The accuracy of the functional annotations was validated using the known-function proteins. The uPE1 proteins shared similar functions to the known-function PE1 proteins and tend to express only in very limited tissues. They are evolutionally young genes and thus should conduct functions only in specific tissues and conditions, limiting their occurrence in commonly studied biological models. Such functional annotations provide hints for functional investigations on the uPE1 proteins. This proteome-wide-scale functional annotation strategy is also applicable to any other species.


Asunto(s)
Anotación de Secuencia Molecular , Proteoma , Humanos , Proteoma/genética , Proteoma/metabolismo , Proteoma/análisis , Proteoma/química , Aprendizaje Profundo , Alineación de Secuencia , Genoma Humano , Proteómica/métodos , Bases de Datos de Proteínas
20.
Proteins ; 92(1): 60-75, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37638618

RESUMEN

Proteins are played key roles in different functionalities in our daily life. All functional roles of a protein are a bit enhanced in interaction compared to individuals. Identification of essential proteins of an organism is a time consume and costly task during observation in the wet lab. The results of observation in wet lab always ensure high reliability and accuracy in the biological ground. Essential protein prediction using computational approaches is an alternative choice in research. It proves its significance rapidly in day-to-day life as well as reduces the experimental cost of wet lab effectively. Existing computational methods were implemented using Protein interaction networks (PPIN), Sequence, Gene Expression Dataset (GED), Gene Ontology (GO), Orthologous groups, and Subcellular localized datasets. Machine learning has diverse categories of features that enable to model and predict essential macromolecules of understudied organisms. A novel methodology MEM-FET (membership feature) is predicted based on features, that is, edge clustering coefficient, Average clustering coefficient, subcellular localization, and Gene Ontology within a compartment of common neighbors. The accuracy (ACC) values of the predicted true positive (TP) essential proteins are 0.79, 0.74, 0.78, and 0.71 for YHQ, YMIPS, YDIP, and YMBD datasets. An enriched set of essential proteins are also predicted using the MEM-FET algorithm. Ensemble ML also validated the proposed model with an accuracy of 60%. It has been predicted that MEM-FET algorithms outperform other existing algorithms with an ACC value of 80% for the yeast dataset.


Asunto(s)
Biología Computacional , Proteínas , Humanos , Reproducibilidad de los Resultados , Biología Computacional/métodos , Proteínas/genética , Proteínas/metabolismo , Algoritmos , Aprendizaje Automático , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA