Búsqueda | OPS/OMS Uruguay

1.

Examining linguistic shifts between preprints and publications.

Nicholson, David N; Rubinetti, Vincent; Hu, Dongbo; Thielk, Marvin; Hunter, Lawrence E; Greene, Casey S.

PLoS Biol ; 20(2): e3001470, 2022 02.

Artículo en Inglés | MEDLINE | ID: mdl-35104289

RESUMEN

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.

Asunto(s)

Lenguaje , Revisión de la Investigación por Pares , Preimpresos como Asunto , Investigación Biomédica , Publicaciones/normas , Terminología como Asunto

2.

Creating an ignorance-base: Exploring known unknowns in the scientific literature.

Boguslav, Mayla R; Salem, Nourah M; White, Elizabeth K; Sullivan, Katherine J; Bada, Michael; Hernandez, Teri L; Leach, Sonia M; Hunter, Lawrence E.

J Biomed Inform ; 143: 104405, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-37270143

RESUMEN

BACKGROUND: Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition. RESULTS: We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements. CONCLUSION: Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.

Asunto(s)

Bases del Conocimiento , Conocimiento , Procesamiento de Lenguaje Natural , Femenino , Humanos , Recién Nacido , Nacimiento Prematuro , Publicaciones , Vitamina D

3.

Concept recognition as a machine translation problem.

Boguslav, Mayla R; Hailu, Negacy D; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E.

BMC Bioinformatics ; 22(Suppl 1): 598, 2021 Dec 17.

Artículo en Inglés | MEDLINE | ID: mdl-34920707

RESUMEN

BACKGROUND: Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches. METHODS: We systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance. RESULTS: Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches. CONCLUSIONS: Machine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation .

4.

Systems biology of facial development: contributions of ectoderm and mesenchyme.

Hooper, Joan E; Feng, Weiguo; Li, Hong; Leach, Sonia M; Phang, Tzulip; Siska, Charlotte; Jones, Kenneth L; Spritz, Richard A; Hunter, Lawrence E; Williams, Trevor.

Dev Biol ; 426(1): 97-114, 2017 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-28363736

RESUMEN

The rapid increase in gene-centric biological knowledge coupled with analytic approaches for genomewide data integration provides an opportunity to develop systems-level understanding of facial development. Experimental analyses have demonstrated the importance of signaling between the surface ectoderm and the underlying mesenchyme are coordinating facial patterning. However, current transcriptome data from the developing vertebrate face is dominated by the mesenchymal component, and the contributions of the ectoderm are not easily identified. We have generated transcriptome datasets from critical periods of mouse face formation that enable gene expression to be analyzed with respect to time, prominence, and tissue layer. Notably, by separating the ectoderm and mesenchyme we considerably improved the sensitivity compared to data obtained from whole prominences, with more genes detected over a wider dynamic range. From these data we generated a detailed description of ectoderm-specific developmental programs, including pan-ectodermal programs, prominence- specific programs and their temporal dynamics. The genes and pathways represented in these programs provide mechanistic insights into several aspects of ectodermal development. We also used these data to identify co-expression modules specific to facial development. We then used 14 co-expression modules enriched for genes involved in orofacial clefts to make specific mechanistic predictions about genes involved in tongue specification, in nasal process patterning and in jaw development. Our multidimensional gene expression dataset is a unique resource for systems analysis of the developing face; our co-expression modules are a resource for predicting functions of poorly annotated genes, or for predicting roles for genes that have yet to be studied in the context of facial development; and our analytic approaches provide a paradigm for analysis of other complex developmental programs.

Asunto(s)

Ectodermo/embriología , Cara/embriología , Regulación del Desarrollo de la Expresión Génica/genética , Desarrollo Maxilofacial/fisiología , Mesodermo/embriología , Biología de Sistemas , Animales , Maxilares/embriología , Ratones , Ratones Endogámicos C57BL , Nariz/embriología , Lengua/embriología

5.

Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E.

BMC Bioinformatics ; 18(1): 372, 2017 Aug 17.

Artículo en Inglés | MEDLINE | ID: mdl-28818042

RESUMEN

BACKGROUND: Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. RESULTS: The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. CONCLUSIONS: The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not generic in the biomedical domain due to their referents to specific classes in domain-specific ontologies. The comparison of the performance of a publicly available and well-understood coreference resolution system with a domain-adapted system produced results that are consistent with the notion that the requirements for successful coreference resolution in this genre are quite different from those of the general domain, and also suggest that the baseline performance difference is quite large.

Asunto(s)

Minería de Datos/métodos , Publicaciones Periódicas como Asunto , Semántica

6.

Intraglomerular gap junctions enhance interglomerular synchrony in a sparsely connected olfactory bulb network.

Pouille, Frederic; McTavish, Thomas S; Hunter, Lawrence E; Restrepo, Diego; Schoppa, Nathan E.

J Physiol ; 595(17): 5965-5986, 2017 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-28640508

RESUMEN

KEY POINTS: Despite sparse connectivity, population-level interactions between mitral cells (MCs) and granule cells (GCs) can generate synchronized oscillations in the rodent olfactory bulb. Intraglomerular gap junctions between MCs at the same glomerulus can greatly enhance synchronized activity of MCs at different glomeruli. The facilitating effect of intraglomerular gap junctions on interglomerular synchrony is through triggering of mutually synchronizing interactions between MCs and GCs. Divergent connections between MCs and GCs make minimal direct contribution to synchronous activity. ABSTRACT: A dominant feature of the olfactory bulb response to odour is fast synchronized oscillations at beta (15-40 Hz) or gamma (40-90 Hz) frequencies, thought to be involved in integration of olfactory signals. Mechanistically, the bulb presents an interesting case study for understanding how beta/gamma oscillations arise. Fast oscillatory synchrony in the activity of output mitral cells (MCs) appears to result from interactions with GABAergic granule cells (GCs), yet the incidence of MC-GC connections is very low, around 4%. Here, we combined computational and experimental approaches to examine how oscillatory synchrony can nevertheless arise, focusing mainly on activity between 'non-sister' MCs affiliated with different glomeruli (interglomerular synchrony). In a sparsely connected model of MCs and GCs, we found first that interglomerular synchrony was generally quite low, but could be increased by a factor of 4 by physiological levels of gap junctional coupling between sister MCs at the same glomerulus. This effect was due to enhanced mutually synchronizing interactions between MC and GC populations. The potent role of gap junctions was confirmed in patch-clamp recordings in bulb slices from wild-type and connexin 36-knockout (KO) mice. KO reduced both beta and gamma local field potential oscillations as well as synchrony of inhibitory signals in pairs of non-sister MCs. These effects were independent of potential KO actions on network excitation. Divergent synaptic connections did not contribute directly to the vast majority of synchronized signals. Thus, in a sparsely connected network, gap junctions between a small subset of cells can, through population effects, greatly amplify oscillatory synchrony amongst unconnected cells.

Asunto(s)

Uniones Comunicantes/fisiología , Bulbo Olfatorio/fisiología , Animales , Conexinas/genética , Femenino , Técnicas In Vitro , Potenciales Postsinápticos Inhibidores , Masculino , Ratones Noqueados , Modelos Biológicos , Ratas Sprague-Dawley , Proteína delta-6 de Union Comunicante

7.

A survey of computational tools for downstream analysis of proteomic and other omic datasets.

Karimpour-Fard, Anis; Epperson, L Elaine; Hunter, Lawrence E.

Hum Genomics ; 9: 28, 2015 Oct 28.

Artículo en Inglés | MEDLINE | ID: mdl-26510531

RESUMEN

Proteomics is an expanding area of research into biological systems with significance for biomedical and therapeutic applications ranging from understanding the molecular basis of diseases to testing new treatments, studying the toxicity of drugs, or biotechnological improvements in agriculture. Progress in proteomic technologies and growing interest has resulted in rapid accumulation of proteomic data, and consequently, a great number of tools have become available. In this paper, we review the well-known and ready-to-use tools for classification, clustering and validation, interpretation, and generation of biological information from experimental data. We suggest some rules of thumb for the reader on choosing the best suitable learning method for a particular dataset and conclude with pathway and functional analysis and then provide information about submitting final results to a repository.

Asunto(s)

Biología Computacional/métodos , Proteómica , Programas Informáticos , Análisis por Conglomerados , Bases de Datos Genéticas , Humanos

8.

KaBOB: ontology-based semantic integration of biomedical databases.

Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E.

BMC Bioinformatics ; 16: 126, 2015 Apr 23.

Artículo en Inglés | MEDLINE | ID: mdl-25903923

RESUMEN

BACKGROUND: The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. RESULTS: We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. CONCLUSIONS: KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

Asunto(s)

Investigación Biomédica , Biología Computacional/métodos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Semántica , Ontologías Biológicas , Recolección de Datos , Humanos , Internet , Bases del Conocimiento , PubMed

9.

Visual analysis of biological data-knowledge networks.

Vehlow, Corinna; Kao, David P; Bristow, Michael R; Hunter, Lawrence E; Weiskopf, Daniel; Görg, Carsten.

BMC Bioinformatics ; 16: 135, 2015 Apr 29.

Artículo en Inglés | MEDLINE | ID: mdl-25925016

RESUMEN

BACKGROUND: The interpretation of the results from genome-scale experiments is a challenging and important problem in contemporary biomedical research. Biological networks that integrate experimental results with existing knowledge from biomedical databases and published literature can provide a rich resource and powerful basis for hypothesizing about mechanistic explanations for observed gene-phenotype relationships. However, the size and density of such networks often impede their efficient exploration and understanding. RESULTS: We introduce a visual analytics approach that integrates interactive filtering of dense networks based on degree-of-interest functions with attribute-based layouts of the resulting subnetworks. The comparison of multiple subnetworks representing different analysis facets is facilitated through an interactive super-network that integrates brushing-and-linking techniques for highlighting components across networks. An implementation is freely available as a Cytoscape app. CONCLUSIONS: We demonstrate the utility of our approach through two case studies using a dataset that combines clinical data with high-throughput data for studying the effect of ß-blocker treatment on heart failure patients. Furthermore, we discuss our team-based iterative design and development process as well as the limitations and generalizability of our approach.

Asunto(s)

Antagonistas Adrenérgicos beta/farmacología , Proteínas de Transferencia de Ésteres de Colesterol/metabolismo , Colesterol/metabolismo , Gráficos por Computador , Bases de Datos Factuales , Redes Reguladoras de Genes , Insuficiencia Cardíaca/genética , Programas Informáticos , Proteínas de Transferencia de Ésteres de Colesterol/genética , Minería de Datos , Perfilación de la Expresión Génica , Insuficiencia Cardíaca/tratamiento farmacológico , Humanos

10.

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters.

Funk, Christopher; Baumgartner, William; Garcia, Benjamin; Roeder, Christophe; Bada, Michael; Cohen, K Bretonnel; Hunter, Lawrence E; Verspoor, Karin.

BMC Bioinformatics ; 15: 59, 2014 Feb 26.

Artículo en Inglés | MEDLINE | ID: mdl-24571547

RESUMEN

BACKGROUND: Ontological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem. RESULTS: Three dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented. CONCLUSIONS: Baselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14-0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.

Asunto(s)

Ontologías Biológicas , Minería de Datos/métodos , Bases de Datos Factuales , Reproducibilidad de los Resultados

11.

Chapter 16: text mining for translational bioinformatics.

Cohen, K Bretonnel; Hunter, Lawrence E.

PLoS Comput Biol ; 9(4): e1003044, 2013 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-23633944

RESUMEN

Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

Asunto(s)

Biología Computacional/métodos , Minería de Datos/métodos , Algoritmos , Animales , Inteligencia Artificial , Simulación por Computador , Humanos , Fenotipo , Lenguajes de Programación , Programas Informáticos , Investigación Biomédica Traslacional

12.

Integrating biological knowledge for mechanistic inference in the host-associated microbiome.

Santangelo, Brook E; Apgar, Madison; Colorado, Angela Sofia Burkhart; Martin, Casey G; Sterrett, John; Wall, Elena; Joachimiak, Marcin P; Hunter, Lawrence E; Lozupone, Catherine A.

Front Microbiol ; 15: 1351678, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38638909

RESUMEN

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.

13.

RecSOI: recommending research directions using statements of ignorance.

Bibal, Adrien; Salem, Nourah M; Cardon, Rémi; White, Elizabeth K; Acuna, Daniel E; Burke, Robin; Hunter, Lawrence E.

J Biomed Semantics ; 15(1): 2, 2024 Apr 22.

Artículo en Inglés | MEDLINE | ID: mdl-38650032

RESUMEN

The more science advances, the more questions are asked. This compounding growth can make it difficult to keep up with current research directions. Furthermore, this difficulty is exacerbated for junior researchers who enter fields with already large bases of potentially fruitful research avenues. In this paper, we propose a novel task and a recommender system for research directions, RecSOI, that draws from statements of ignorance (SOIs) found in the research literature. By building researchers' profiles based on textual elements, RecSOI generates personalized recommendations of potential research directions tailored to their interests. In addition, RecSOI provides context for the recommended SOIs, so that users can quickly evaluate how relevant the research direction is for them. In this paper, we provide an overview of RecSOI's functioning, implementation, and evaluation, demonstrating its effectiveness in guiding researchers through the vast landscape of potential research directions.

Asunto(s)

Investigación Biomédica , Investigación , Humanos

14.

An open source knowledge graph ecosystem for the life sciences.

Callahan, Tiffany J; Tripodi, Ignacio J; Stefanski, Adrianne L; Cappelletti, Luca; Taneja, Sanya B; Wyrwa, Jordan M; Casiraghi, Elena; Matentzoglu, Nicolas A; Reese, Justin; Silverstein, Jonathan C; Hoyt, Charles Tapley; Boyce, Richard D; Malec, Scott A; Unni, Deepak R; Joachimiak, Marcin P; Robinson, Peter N; Mungall, Christopher J; Cavalleri, Emanuele; Fontana, Tommaso; Valentini, Giorgio; Mesiti, Marco; Gillenwater, Lucas A; Santangelo, Brook; Vasilevsky, Nicole A; Hoehndorf, Robert; Bennett, Tellen D; Ryan, Patrick B; Hripcsak, George; Kahn, Michael G; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E.

Sci Data ; 11(1): 363, 2024 Apr 11.

Artículo en Inglés | MEDLINE | ID: mdl-38605048

RESUMEN

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Bases del Conocimiento , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Investigación Biomédica Traslacional

15.

Knowledge-Driven Mechanistic Enrichment of the Preeclampsia Ignorome.

Callahan, Tiffany J; Stefanski, Adrianne L; Kim, Jin-Dong; Baumgartner, William A; Wyrwa, Jordan M; Hunter, Lawrence E.

Pac Symp Biocomput ; 28: 371-382, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36540992

RESUMEN

Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.

Asunto(s)

Preeclampsia , Embarazo , Femenino , Humanos , Preeclampsia/genética , Biología Computacional/métodos , Placenta , Feto

16.

Plasma proteome of growing tumors.

Gupta, Shashi; Westacott, Matthew J; Ayers, Deborah G; Weiss, Sophie J; Whitley, Penn; Mueller, Christopher; Weaver, Daniel C; Schneider, Daniel J; Karimpour-Fard, Anis; Hunter, Lawrence E; Drolet, Daniel W; Janjic, Nebojsa.

Sci Rep ; 13(1): 12195, 2023 07 27.

Artículo en Inglés | MEDLINE | ID: mdl-37500700

RESUMEN

Early detection of cancer is vital for the best chance of successful treatment, but half of all cancers are diagnosed at an advanced stage. A simple and reliable blood screening test applied routinely would therefore address a major unmet medical need. To gain insight into the value of protein biomarkers in early detection and stratification of cancer we determined the time course of changes in the plasma proteome of mice carrying transplanted human lung, breast, colon, or ovarian tumors. For protein measurements we used an aptamer-based assay which simultaneously measures ~ 5000 proteins. Along with tumor lineage-specific biomarkers, we also found 15 markers shared among all cancer types that included the energy metabolism enzymes glyceraldehyde-3-phosphate dehydrogenase, glucose-6-phophate isomerase and dihydrolipoyl dehydrogenase as well as several important biomarkers for maintaining protein, lipid, nucleotide, or carbohydrate balance such as tryptophanyl t-RNA synthetase and nucleoside diphosphate kinase. Using significantly altered proteins in the tumor bearing mice, we developed models to stratify tumor types and to estimate the minimum detectable tumor volume. Finally, we identified significantly enriched common and unique biological pathways among the eight tumor cell lines tested.

Asunto(s)

Neoplasias Ováricas , Proteoma , Femenino , Humanos , Ratones , Animales , Proteoma/metabolismo , Biomarcadores de Tumor/metabolismo , Metabolismo Energético , Línea Celular Tumoral

17.

Ontologizing health systems data at scale: making translational discovery a reality.

Callahan, Tiffany J; Stefanski, Adrianne L; Wyrwa, Jordan M; Zeng, Chenjie; Ostropolets, Anna; Banda, Juan M; Baumgartner, William A; Boyce, Richard D; Casiraghi, Elena; Coleman, Ben D; Collins, Janine H; Deakyne Davies, Sara J; Feinstein, James A; Lin, Asiyah Y; Martin, Blake; Matentzoglu, Nicolas A; Meeker, Daniella; Reese, Justin; Sinclair, Jessica; Taneja, Sanya B; Trinkley, Katy E; Vasilevsky, Nicole A; Williams, Andrew E; Zhang, Xingmin A; Denny, Joshua C; Ryan, Patrick B; Hripcsak, George; Bennett, Tellen D; Haendel, Melissa A; Robinson, Peter N; Hunter, Lawrence E; Kahn, Michael G.

NPJ Digit Med ; 6(1): 89, 2023 May 19.

Artículo en Inglés | MEDLINE | ID: mdl-37208468

RESUMEN

Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

18.

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.

Verspoor, Karin; Cohen, Kevin Bretonnel; Lanfranchi, Arrick; Warner, Colin; Johnson, Helen L; Roeder, Christophe; Choi, Jinho D; Funk, Christopher; Malenkiy, Yuriy; Eckert, Miriam; Xue, Nianwen; Baumgartner, William A; Bada, Michael; Palmer, Martha; Hunter, Lawrence E.

BMC Bioinformatics ; 13: 207, 2012 Aug 17.

Artículo en Inglés | MEDLINE | ID: mdl-22901054

RESUMEN

BACKGROUND: We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. RESULTS: Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. CONCLUSIONS: The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications.

Asunto(s)

Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Programas Informáticos

19.

Concept annotation in the CRAFT corpus.

Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E.

BMC Bioinformatics ; 13: 161, 2012 Jul 09.

Artículo en Inglés | MEDLINE | ID: mdl-22776079

RESUMEN

BACKGROUND: Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. RESULTS: This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. CONCLUSIONS: As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

Asunto(s)

Minería de Datos , Procesamiento de Lenguaje Natural , Vocabulario Controlado , Biología Computacional/métodos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Semántica

20.

Kidney proteome changes provide evidence for a dynamic metabolism and regional redistribution of plasma proteins during torpor-arousal cycles of hibernation.

Jani, Alkesh; Orlicky, David J; Karimpour-Fard, Anis; Epperson, L Elaine; Russell, Rae L; Hunter, Lawrence E; Martin, Sandra L.

Physiol Genomics ; 44(14): 717-27, 2012 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-22643061

RESUMEN

Hibernating ground squirrels maintain homeostasis despite extreme physiological challenges. In winter, these circannual hibernators fast for months while cycling between prolonged periods of low blood flow and body temperature, known as torpor, and short interbout arousals (IBA), where more typical mammalian parameters are rapidly restored. Here we examined the kidney proteome for changes that support the dramatically different physiological demands of the hibernator's year. We identified proteins in 150 two-dimensional gel spots that altered by at least 1.5-fold using liquid chromatography and tandem mass spectrometry. These data successfully classified individuals by physiological state and revealed three dynamic patterns of relative protein abundance that dominated the hibernating kidney: 1) a large group of proteins generally involved with capturing and storing energy were most abundant in summer; 2) a select subset of these also increased during each arousal from torpor; and 3) 14 spots increased in torpor and early arousal were enriched for plasma proteins that enter cells via the endocytic pathway. Immunohistochemistry identified α(2)-macroglobulin and albumin in kidney blood vessels during late torpor and early arousal; both exhibited regional heterogeneity consistent with highly localized control of blood flow in the glomeruli. Furthermore, albumin, but not α(2)-macroglobulin, was detected in the proximal tubules during torpor and early arousal but not in IBA or summer animals. Taken together, our findings indicate that normal glomerular filtration barriers remain intact throughout torpor-arousal cycles but endocytosis, and hence renal function, is compromised at low body temperature during torpor and then recovers with rewarming during arousal.

Asunto(s)

Nivel de Alerta/fisiología , Proteínas Sanguíneas/metabolismo , Regulación de la Expresión Génica/fisiología , Hibernación/fisiología , Riñón/metabolismo , Sciuridae/fisiología , Animales , Western Blotting , Temperatura Corporal , Cromatografía Liquida , Cartilla de ADN/genética , Inmunohistoquímica , Proteómica/métodos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Sciuridae/metabolismo , Estaciones del Año , Albúmina Sérica/metabolismo , Espectrometría de Masas en Tándem , alfa-Macroglobulinas/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA