Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 135
Filtrar
1.
PLoS Biol ; 20(2): e3001470, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35104289

RESUMO

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.


Assuntos
Idioma , Revisão da Pesquisa por Pares , Pré-Publicações como Assunto , Pesquisa Biomédica , Publicações/normas , Terminologia como Assunto
2.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35649342

RESUMO

Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.


Assuntos
Benchmarking , Desenvolvimento de Medicamentos , Algoritmos , Avaliação Pré-Clínica de Medicamentos , Reposicionamento de Medicamentos/métodos , Proteínas/genética
3.
J Biomed Inform ; 143: 104405, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37270143

RESUMO

BACKGROUND: Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition. RESULTS: We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements. CONCLUSION: Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.


Assuntos
Bases de Conhecimento , Conhecimento , Processamento de Linguagem Natural , Feminino , Humanos , Recém-Nascido , Nascimento Prematuro , Publicações , Vitamina D
4.
BMC Bioinformatics ; 22(Suppl 1): 598, 2021 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-34920707

RESUMO

BACKGROUND: Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches. METHODS: We systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance. RESULTS: Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches. CONCLUSIONS: Machine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation .

5.
Dev Biol ; 426(1): 97-114, 2017 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-28363736

RESUMO

The rapid increase in gene-centric biological knowledge coupled with analytic approaches for genomewide data integration provides an opportunity to develop systems-level understanding of facial development. Experimental analyses have demonstrated the importance of signaling between the surface ectoderm and the underlying mesenchyme are coordinating facial patterning. However, current transcriptome data from the developing vertebrate face is dominated by the mesenchymal component, and the contributions of the ectoderm are not easily identified. We have generated transcriptome datasets from critical periods of mouse face formation that enable gene expression to be analyzed with respect to time, prominence, and tissue layer. Notably, by separating the ectoderm and mesenchyme we considerably improved the sensitivity compared to data obtained from whole prominences, with more genes detected over a wider dynamic range. From these data we generated a detailed description of ectoderm-specific developmental programs, including pan-ectodermal programs, prominence- specific programs and their temporal dynamics. The genes and pathways represented in these programs provide mechanistic insights into several aspects of ectodermal development. We also used these data to identify co-expression modules specific to facial development. We then used 14 co-expression modules enriched for genes involved in orofacial clefts to make specific mechanistic predictions about genes involved in tongue specification, in nasal process patterning and in jaw development. Our multidimensional gene expression dataset is a unique resource for systems analysis of the developing face; our co-expression modules are a resource for predicting functions of poorly annotated genes, or for predicting roles for genes that have yet to be studied in the context of facial development; and our analytic approaches provide a paradigm for analysis of other complex developmental programs.


Assuntos
Ectoderma/embriologia , Face/embriologia , Regulação da Expressão Gênica no Desenvolvimento/genética , Desenvolvimento Maxilofacial/fisiologia , Mesoderma/embriologia , Biologia de Sistemas , Animais , Arcada Osseodentária/embriologia , Camundongos , Camundongos Endogâmicos C57BL , Nariz/embriologia , Língua/embriologia
6.
BMC Bioinformatics ; 18(1): 372, 2017 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-28818042

RESUMO

BACKGROUND: Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. RESULTS: The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. CONCLUSIONS: The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not generic in the biomedical domain due to their referents to specific classes in domain-specific ontologies. The comparison of the performance of a publicly available and well-understood coreference resolution system with a domain-adapted system produced results that are consistent with the notion that the requirements for successful coreference resolution in this genre are quite different from those of the general domain, and also suggest that the baseline performance difference is quite large.


Assuntos
Mineração de Dados/métodos , Publicações Periódicas como Assunto , Semântica
7.
J Physiol ; 595(17): 5965-5986, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-28640508

RESUMO

KEY POINTS: Despite sparse connectivity, population-level interactions between mitral cells (MCs) and granule cells (GCs) can generate synchronized oscillations in the rodent olfactory bulb. Intraglomerular gap junctions between MCs at the same glomerulus can greatly enhance synchronized activity of MCs at different glomeruli. The facilitating effect of intraglomerular gap junctions on interglomerular synchrony is through triggering of mutually synchronizing interactions between MCs and GCs. Divergent connections between MCs and GCs make minimal direct contribution to synchronous activity. ABSTRACT: A dominant feature of the olfactory bulb response to odour is fast synchronized oscillations at beta (15-40 Hz) or gamma (40-90 Hz) frequencies, thought to be involved in integration of olfactory signals. Mechanistically, the bulb presents an interesting case study for understanding how beta/gamma oscillations arise. Fast oscillatory synchrony in the activity of output mitral cells (MCs) appears to result from interactions with GABAergic granule cells (GCs), yet the incidence of MC-GC connections is very low, around 4%. Here, we combined computational and experimental approaches to examine how oscillatory synchrony can nevertheless arise, focusing mainly on activity between 'non-sister' MCs affiliated with different glomeruli (interglomerular synchrony). In a sparsely connected model of MCs and GCs, we found first that interglomerular synchrony was generally quite low, but could be increased by a factor of 4 by physiological levels of gap junctional coupling between sister MCs at the same glomerulus. This effect was due to enhanced mutually synchronizing interactions between MC and GC populations. The potent role of gap junctions was confirmed in patch-clamp recordings in bulb slices from wild-type and connexin 36-knockout (KO) mice. KO reduced both beta and gamma local field potential oscillations as well as synchrony of inhibitory signals in pairs of non-sister MCs. These effects were independent of potential KO actions on network excitation. Divergent synaptic connections did not contribute directly to the vast majority of synchronized signals. Thus, in a sparsely connected network, gap junctions between a small subset of cells can, through population effects, greatly amplify oscillatory synchrony amongst unconnected cells.


Assuntos
Junções Comunicantes/fisiologia , Bulbo Olfatório/fisiologia , Animais , Conexinas/genética , Feminino , Técnicas In Vitro , Potenciais Pós-Sinápticos Inibidores , Masculino , Camundongos Knockout , Modelos Biológicos , Ratos Sprague-Dawley , Proteína delta-2 de Junções Comunicantes
8.
Hum Genomics ; 9: 28, 2015 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-26510531

RESUMO

Proteomics is an expanding area of research into biological systems with significance for biomedical and therapeutic applications ranging from understanding the molecular basis of diseases to testing new treatments, studying the toxicity of drugs, or biotechnological improvements in agriculture. Progress in proteomic technologies and growing interest has resulted in rapid accumulation of proteomic data, and consequently, a great number of tools have become available. In this paper, we review the well-known and ready-to-use tools for classification, clustering and validation, interpretation, and generation of biological information from experimental data. We suggest some rules of thumb for the reader on choosing the best suitable learning method for a particular dataset and conclude with pathway and functional analysis and then provide information about submitting final results to a repository.


Assuntos
Biologia Computacional/métodos , Proteômica , Software , Análise por Conglomerados , Bases de Dados Genéticas , Humanos
9.
BMC Bioinformatics ; 16: 126, 2015 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-25903923

RESUMO

BACKGROUND: The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. RESULTS: We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. CONCLUSIONS: KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.


Assuntos
Pesquisa Biomédica , Biologia Computacional/métodos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Semântica , Ontologias Biológicas , Coleta de Dados , Humanos , Internet , Bases de Conhecimento , PubMed
10.
BMC Bioinformatics ; 16: 135, 2015 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-25925016

RESUMO

BACKGROUND: The interpretation of the results from genome-scale experiments is a challenging and important problem in contemporary biomedical research. Biological networks that integrate experimental results with existing knowledge from biomedical databases and published literature can provide a rich resource and powerful basis for hypothesizing about mechanistic explanations for observed gene-phenotype relationships. However, the size and density of such networks often impede their efficient exploration and understanding. RESULTS: We introduce a visual analytics approach that integrates interactive filtering of dense networks based on degree-of-interest functions with attribute-based layouts of the resulting subnetworks. The comparison of multiple subnetworks representing different analysis facets is facilitated through an interactive super-network that integrates brushing-and-linking techniques for highlighting components across networks. An implementation is freely available as a Cytoscape app. CONCLUSIONS: We demonstrate the utility of our approach through two case studies using a dataset that combines clinical data with high-throughput data for studying the effect of ß-blocker treatment on heart failure patients. Furthermore, we discuss our team-based iterative design and development process as well as the limitations and generalizability of our approach.


Assuntos
Antagonistas Adrenérgicos beta/farmacologia , Proteínas de Transferência de Ésteres de Colesterol/metabolismo , Colesterol/metabolismo , Gráficos por Computador , Bases de Dados Factuais , Redes Reguladoras de Genes , Insuficiência Cardíaca/genética , Software , Proteínas de Transferência de Ésteres de Colesterol/genética , Mineração de Dados , Perfilação da Expressão Gênica , Insuficiência Cardíaca/tratamento farmacológico , Humanos
11.
BMC Bioinformatics ; 15: 59, 2014 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-24571547

RESUMO

BACKGROUND: Ontological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem. RESULTS: Three dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented. CONCLUSIONS: Baselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14-0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.


Assuntos
Ontologias Biológicas , Mineração de Dados/métodos , Bases de Dados Factuais , Reprodutibilidade dos Testes
12.
PLoS Comput Biol ; 9(4): e1003044, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23633944

RESUMO

Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Algoritmos , Animais , Inteligência Artificial , Simulação por Computador , Humanos , Fenótipo , Linguagens de Programação , Software , Pesquisa Translacional Biomédica
13.
J Biomed Semantics ; 15(1): 2, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38650032

RESUMO

The more science advances, the more questions are asked. This compounding growth can make it difficult to keep up with current research directions. Furthermore, this difficulty is exacerbated for junior researchers who enter fields with already large bases of potentially fruitful research avenues. In this paper, we propose a novel task and a recommender system for research directions, RecSOI, that draws from statements of ignorance (SOIs) found in the research literature. By building researchers' profiles based on textual elements, RecSOI generates personalized recommendations of potential research directions tailored to their interests. In addition, RecSOI provides context for the recommended SOIs, so that users can quickly evaluate how relevant the research direction is for them. In this paper, we provide an overview of RecSOI's functioning, implementation, and evaluation, demonstrating its effectiveness in guiding researchers through the vast landscape of potential research directions.


Assuntos
Pesquisa Biomédica , Pesquisa , Humanos
14.
Front Microbiol ; 15: 1351678, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38638909

RESUMO

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.

15.
Sci Data ; 11(1): 363, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605048

RESUMO

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.


Assuntos
Disciplinas das Ciências Biológicas , Bases de Conhecimento , Reconhecimento Automatizado de Padrão , Algoritmos , Pesquisa Translacional Biomédica
16.
bioRxiv ; 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38076987

RESUMO

Motivation: Knowledge graphs have found broad biomedical applications, providing useful representations of complex knowledge. Although plentiful evidence exists linking the gut microbiome to disease, mechanistic understanding of those relationships remains generally elusive. A structured analysis of existing resources is necessary to characterize the resources and methodologies needed to facilitate mechanistic inference. Results: Here we demonstrate the potential of knowledge graphs to hypothesize plausible mechanistic accounts of host-microbe interactions in disease and define the need for semantic constraint in doing so. We constructed a knowledge graph of linked microbes, genes and metabolites called MGMLink, and one of microbial traits, environments, and human pheno-types called KG-microbe-phenio. Using a shortest path search and a pattern based semantically constrained path search through the graphs, we highlight the need for a microbiome-disease resource and semantically informed search methods to enable mechanistic inference. Availability: The software to create MGMLink is openly available at https://github.com/bsantan/MGMLink , and KG-microbe is available at https://github.com/Knowledge-Graph-Hub/kg-microbe and KG-phenio is available at https://github.com/Knowledge-Graph-Hub/kg-phenio . Contact: brook.santangelo@cuanschutz.edu.

17.
bioRxiv ; 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38106100

RESUMO

Knowledge graphs have found broad biomedical applications, providing useful representations of complex knowledge. Although plentiful evidence exists linking the gut microbiome to disease, mechanistic understanding of those relationships remains generally elusive. Here we demonstrate the potential of knowledge graphs to hypothesize plausible mechanistic accounts of host-microbe interactions in disease. To do so, we constructed a knowledge graph of linked microbes, genes and metabolites called MGMLink. Using a semantically constrained shortest path search through the graph and a novel path prioritization methodology based on cosine similarity, we show that this knowledge supports inference of mechanistic hypotheses that explain observed relationships between microbes and disease phenotypes. We discuss specific applications of this methodology in inflammatory bowel disease and Parkinson's disease. This approach enables mechanistic hypotheses surrounding the complex interactions between gut microbes and disease to be generated in a scalable and comprehensive manner.

18.
Pac Symp Biocomput ; 28: 371-382, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540992

RESUMO

Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.


Assuntos
Pré-Eclâmpsia , Gravidez , Feminino , Humanos , Pré-Eclâmpsia/genética , Biologia Computacional/métodos , Placenta , Feto
19.
Sci Rep ; 13(1): 12195, 2023 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-37500700

RESUMO

Early detection of cancer is vital for the best chance of successful treatment, but half of all cancers are diagnosed at an advanced stage. A simple and reliable blood screening test applied routinely would therefore address a major unmet medical need. To gain insight into the value of protein biomarkers in early detection and stratification of cancer we determined the time course of changes in the plasma proteome of mice carrying transplanted human lung, breast, colon, or ovarian tumors. For protein measurements we used an aptamer-based assay which simultaneously measures ~ 5000 proteins. Along with tumor lineage-specific biomarkers, we also found 15 markers shared among all cancer types that included the energy metabolism enzymes glyceraldehyde-3-phosphate dehydrogenase, glucose-6-phophate isomerase and dihydrolipoyl dehydrogenase as well as several important biomarkers for maintaining protein, lipid, nucleotide, or carbohydrate balance such as tryptophanyl t-RNA synthetase and nucleoside diphosphate kinase. Using significantly altered proteins in the tumor bearing mice, we developed models to stratify tumor types and to estimate the minimum detectable tumor volume. Finally, we identified significantly enriched common and unique biological pathways among the eight tumor cell lines tested.


Assuntos
Neoplasias Ovarianas , Proteoma , Feminino , Humanos , Camundongos , Animais , Proteoma/metabolismo , Biomarcadores Tumorais/metabolismo , Metabolismo Energético , Linhagem Celular Tumoral
20.
NPJ Digit Med ; 6(1): 89, 2023 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-37208468

RESUMO

Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA