Búsqueda | Portal Regional de la BVS

A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology.

He, Yongqun; Yu, Hong; Huffman, Anthony; Lin, Asiyah Yu; Natale, Darren A; Beverley, John; Zheng, Ling; Perl, Yehoshua; Wang, Zhigang; Liu, Yingtong; Ong, Edison; Wang, Yang; Huang, Philip; Tran, Long; Du, Jinyang; Shah, Zalan; Shah, Easheta; Desai, Roshan; Huang, Hsin-Hui; Tian, Yujia; Merrell, Eric; Duncan, William D; Arabandi, Sivaram; Schriml, Lynn M; Zheng, Jie; Masci, Anna Maria; Wang, Liwei; Liu, Hongfang; Smaili, Fatima Zohra; Hoehndorf, Robert; Pendlington, Zoë May; Roncaglia, Paola; Ye, Xianwei; Xie, Jiangan; Tang, Yi-Wei; Yang, Xiaolin; Peng, Suyuan; Zhang, Luxia; Chen, Luonan; Hur, Junguk; Omenn, Gilbert S; Athey, Brian; Smith, Barry.

J Biomed Semantics ; 13(1): 25, 2022 10 21.

Artículo en Inglés | MEDLINE | ID: mdl-36271389

RESUMEN

BACKGROUND: The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020. RESULTS: As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment. CONCLUSION: CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.

Asunto(s)

COVID-19 , Enfermedades Transmisibles , Coronavirus , Vacunas , Humanos , SARS-CoV-2 , Pandemias , Aminoácidos , Tratamiento Farmacológico de COVID-19

QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs.

Smaili, Fatima Zohra; Tian, Shuye; Roy, Ambrish; Alazmi, Meshari; Arold, Stefan T; Mukherjee, Srayanta; Hefty, P Scott; Chen, Wei; Gao, Xin.

Genomics Proteomics Bioinformatics ; 19(6): 998-1011, 2021 12.

Artículo en Inglés | MEDLINE | ID: mdl-33631427

RESUMEN

The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein-protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.

Asunto(s)

Proteínas , Programas Informáticos , Biología Computacional/métodos , Bases de Datos de Proteínas , Humanos , Proteínas/química , Proteínas/genética

Semantic similarity and machine learning with ontologies.

Kulmanov, Maxat; Smaili, Fatima Zohra; Gao, Xin; Hoehndorf, Robert.

Brief Bioinform ; 22(4)2021 07 20.

Artículo en Inglés | MEDLINE | ID: mdl-33049044

RESUMEN

Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.

Asunto(s)

Ontologías Biológicas , Aprendizaje Automático , Modelos Biológicos , Semántica

Formal axioms in biomedical ontologies improve analysis and interpretation of associated data.

Smaili, Fatima Zohra; Gao, Xin; Hoehndorf, Robert.

Bioinformatics ; 36(7): 2229-2236, 2020 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-31821406

RESUMEN

MOTIVATION: Over the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns and encode domain background knowledge. The domain knowledge of biomedical ontologies may have also the potential to provide background knowledge for machine learning and predictive modelling. RESULTS: We use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein-protein interactions and gene-disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies. AVAILABILITY AND IMPLEMENTATION: https://github.com/bio-ontology-research-group/tsoe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Ontologías Biológicas , Ontología de Genes , Bases del Conocimiento , Aprendizaje Automático

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction.

Smaili, Fatima Zohra; Gao, Xin; Hoehndorf, Robert.

Bioinformatics ; 35(12): 2133-2140, 2019 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-30407490

RESUMEN

MOTIVATION: Ontologies are widely used in biology for data annotation, integration and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such as semantic similarity measures. RESULTS: We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology meta-data. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein-protein interaction on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology. AVAILABILITY AND IMPLEMENTATION: https://github.com/bio-ontology-research-group/opa2vec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Ontologías Biológicas , Animales , Ratones , Fenotipo , Semántica

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.

Smaili, Fatima Zohra; Gao, Xin; Hoehndorf, Robert.

Bioinformatics ; 34(13): i52-i60, 2018 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-29949999

RESUMEN

Motivation: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications. Results: We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein-protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved. Availability and implementation: https://github.com/bio-ontology-research-group/onto2vec. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional/métodos , Ontología de Genes , Mapas de Interacción de Proteínas , Programas Informáticos , Humanos , Aprendizaje Automático , Saccharomyces cerevisiae/genética , Semántica

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA