Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

An open source knowledge graph ecosystem for the life sciences.

Callahan, Tiffany J; Tripodi, Ignacio J; Stefanski, Adrianne L; Cappelletti, Luca; Taneja, Sanya B; Wyrwa, Jordan M; Casiraghi, Elena; Matentzoglu, Nicolas A; Reese, Justin; Silverstein, Jonathan C; Hoyt, Charles Tapley; Boyce, Richard D; Malec, Scott A; Unni, Deepak R; Joachimiak, Marcin P; Robinson, Peter N; Mungall, Christopher J; Cavalleri, Emanuele; Fontana, Tommaso; Valentini, Giorgio; Mesiti, Marco; Gillenwater, Lucas A; Santangelo, Brook; Vasilevsky, Nicole A; Hoehndorf, Robert; Bennett, Tellen D; Ryan, Patrick B; Hripcsak, George; Kahn, Michael G; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E.

Sci Data ; 11(1): 363, 2024 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-38605048

RESUMO

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Assuntos

Disciplinas das Ciências Biológicas , Bases de Conhecimento , Reconhecimento Automatizado de Padrão , Algoritmos , Pesquisa Translacional Biomédica

2.

Integrating biological knowledge for mechanistic inference in the host-associated microbiome.

Santangelo, Brook E; Apgar, Madison; Colorado, Angela Sofia Burkhart; Martin, Casey G; Sterrett, John; Wall, Elena; Joachimiak, Marcin P; Hunter, Lawrence E; Lozupone, Catherine A.

Front Microbiol ; 15: 1351678, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38638909

RESUMO

Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.

3.

RecSOI: recommending research directions using statements of ignorance.

Bibal, Adrien; Salem, Nourah M; Cardon, Rémi; White, Elizabeth K; Acuna, Daniel E; Burke, Robin; Hunter, Lawrence E.

J Biomed Semantics ; 15(1): 2, 2024 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-38650032

RESUMO

The more science advances, the more questions are asked. This compounding growth can make it difficult to keep up with current research directions. Furthermore, this difficulty is exacerbated for junior researchers who enter fields with already large bases of potentially fruitful research avenues. In this paper, we propose a novel task and a recommender system for research directions, RecSOI, that draws from statements of ignorance (SOIs) found in the research literature. By building researchers' profiles based on textual elements, RecSOI generates personalized recommendations of potential research directions tailored to their interests. In addition, RecSOI provides context for the recommended SOIs, so that users can quickly evaluate how relevant the research direction is for them. In this paper, we provide an overview of RecSOI's functioning, implementation, and evaluation, demonstrating its effectiveness in guiding researchers through the vast landscape of potential research directions.

Assuntos

Pesquisa Biomédica , Pesquisa , Humanos

4.

Characterization of methods for mechanistic inference of the gut microbiome in disease.

Santangelo, Brook; Hunter, Lawrence; Lozupone, Catherine.

bioRxiv ; 2023 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-38076987

RESUMO

Motivation: Knowledge graphs have found broad biomedical applications, providing useful representations of complex knowledge. Although plentiful evidence exists linking the gut microbiome to disease, mechanistic understanding of those relationships remains generally elusive. A structured analysis of existing resources is necessary to characterize the resources and methodologies needed to facilitate mechanistic inference. Results: Here we demonstrate the potential of knowledge graphs to hypothesize plausible mechanistic accounts of host-microbe interactions in disease and define the need for semantic constraint in doing so. We constructed a knowledge graph of linked microbes, genes and metabolites called MGMLink, and one of microbial traits, environments, and human pheno-types called KG-microbe-phenio. Using a shortest path search and a pattern based semantically constrained path search through the graphs, we highlight the need for a microbiome-disease resource and semantically informed search methods to enable mechanistic inference. Availability: The software to create MGMLink is openly available at https://github.com/bsantan/MGMLink , and KG-microbe is available at https://github.com/Knowledge-Graph-Hub/kg-microbe and KG-phenio is available at https://github.com/Knowledge-Graph-Hub/kg-phenio . Contact: brook.santangelo@cuanschutz.edu.

5.

Hypothesizing mechanistic links between microbes and disease using knowledge graphs.

Santangelo, Brook; Bada, Michael; Hunter, Lawrence; Lozupone, Catherine.

bioRxiv ; 2023 Dec 04.

Artigo em Inglês | MEDLINE | ID: mdl-38106100

RESUMO

Knowledge graphs have found broad biomedical applications, providing useful representations of complex knowledge. Although plentiful evidence exists linking the gut microbiome to disease, mechanistic understanding of those relationships remains generally elusive. Here we demonstrate the potential of knowledge graphs to hypothesize plausible mechanistic accounts of host-microbe interactions in disease. To do so, we constructed a knowledge graph of linked microbes, genes and metabolites called MGMLink. Using a semantically constrained shortest path search through the graph and a novel path prioritization methodology based on cosine similarity, we show that this knowledge supports inference of mechanistic hypotheses that explain observed relationships between microbes and disease phenotypes. We discuss specific applications of this methodology in inflammatory bowel disease and Parkinson's disease. This approach enables mechanistic hypotheses surrounding the complex interactions between gut microbes and disease to be generated in a scalable and comprehensive manner.

6.

Plasma proteome of growing tumors.

Gupta, Shashi; Westacott, Matthew J; Ayers, Deborah G; Weiss, Sophie J; Whitley, Penn; Mueller, Christopher; Weaver, Daniel C; Schneider, Daniel J; Karimpour-Fard, Anis; Hunter, Lawrence E; Drolet, Daniel W; Janjic, Nebojsa.

Sci Rep ; 13(1): 12195, 2023 07 27.

Artigo em Inglês | MEDLINE | ID: mdl-37500700

RESUMO

Early detection of cancer is vital for the best chance of successful treatment, but half of all cancers are diagnosed at an advanced stage. A simple and reliable blood screening test applied routinely would therefore address a major unmet medical need. To gain insight into the value of protein biomarkers in early detection and stratification of cancer we determined the time course of changes in the plasma proteome of mice carrying transplanted human lung, breast, colon, or ovarian tumors. For protein measurements we used an aptamer-based assay which simultaneously measures ~ 5000 proteins. Along with tumor lineage-specific biomarkers, we also found 15 markers shared among all cancer types that included the energy metabolism enzymes glyceraldehyde-3-phosphate dehydrogenase, glucose-6-phophate isomerase and dihydrolipoyl dehydrogenase as well as several important biomarkers for maintaining protein, lipid, nucleotide, or carbohydrate balance such as tryptophanyl t-RNA synthetase and nucleoside diphosphate kinase. Using significantly altered proteins in the tumor bearing mice, we developed models to stratify tumor types and to estimate the minimum detectable tumor volume. Finally, we identified significantly enriched common and unique biological pathways among the eight tumor cell lines tested.

Assuntos

Neoplasias Ovarianas , Proteoma , Feminino , Humanos , Camundongos , Animais , Proteoma/metabolismo , Biomarcadores Tumorais/metabolismo , Metabolismo Energético , Linhagem Celular Tumoral

7.

Creating an ignorance-base: Exploring known unknowns in the scientific literature.

Boguslav, Mayla R; Salem, Nourah M; White, Elizabeth K; Sullivan, Katherine J; Bada, Michael; Hernandez, Teri L; Leach, Sonia M; Hunter, Lawrence E.

J Biomed Inform ; 143: 104405, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37270143

RESUMO

BACKGROUND: Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition. RESULTS: We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements. CONCLUSION: Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.

Assuntos

Bases de Conhecimento , Conhecimento , Processamento de Linguagem Natural , Feminino , Humanos , Recém-Nascido , Nascimento Prematuro , Publicações , Vitamina D

8.

Ontologizing health systems data at scale: making translational discovery a reality.

Callahan, Tiffany J; Stefanski, Adrianne L; Wyrwa, Jordan M; Zeng, Chenjie; Ostropolets, Anna; Banda, Juan M; Baumgartner, William A; Boyce, Richard D; Casiraghi, Elena; Coleman, Ben D; Collins, Janine H; Deakyne Davies, Sara J; Feinstein, James A; Lin, Asiyah Y; Martin, Blake; Matentzoglu, Nicolas A; Meeker, Daniella; Reese, Justin; Sinclair, Jessica; Taneja, Sanya B; Trinkley, Katy E; Vasilevsky, Nicole A; Williams, Andrew E; Zhang, Xingmin A; Denny, Joshua C; Ryan, Patrick B; Hripcsak, George; Bennett, Tellen D; Haendel, Melissa A; Robinson, Peter N; Hunter, Lawrence E; Kahn, Michael G.

NPJ Digit Med ; 6(1): 89, 2023 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-37208468

RESUMO

Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

9.

Knowledge-Driven Mechanistic Enrichment of the Preeclampsia Ignorome.

Callahan, Tiffany J; Stefanski, Adrianne L; Kim, Jin-Dong; Baumgartner, William A; Wyrwa, Jordan M; Hunter, Lawrence E.

Pac Symp Biocomput ; 28: 371-382, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36540992

RESUMO

Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.

Assuntos

Pré-Eclâmpsia , Gravidez , Feminino , Humanos , Pré-Eclâmpsia/genética , Biologia Computacional/métodos , Placenta , Feto

10.

Molecular cartooning with knowledge graphs.

Santangelo, Brook E; Gillenwater, Lucas A; Salem, Nourah M; Hunter, Lawrence E.

Front Bioinform ; 2: 1054578, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36568701

RESUMO

Molecular "cartoons," such as pathway diagrams, provide a visual summary of biomedical research results and hypotheses. Their ubiquitous appearance within the literature indicates their universal application in mechanistic communication. A recent survey of pathway diagrams identified 64,643 pathway figures published between 1995 and 2019 with 1,112,551 mentions of 13,464 unique human genes participating in a wide variety of biological processes. Researchers generally create these diagrams using generic diagram editing software that does not itself embody any biomedical knowledge. Biomedical knowledge graphs (KGs) integrate and represent knowledge in a semantically consistent way, systematically capturing biomedical knowledge similar to that in molecular cartoons. KGs have the potential to provide context and precise details useful in drawing such figures. However, KGs cannot generally be translated directly into figures. They include substantial material irrelevant to the scientific point of a given figure and are often more detailed than is appropriate. How could KGs be used to facilitate the creation of molecular diagrams? Here we present a new approach towards cartoon image creation that utilizes the semantic structure of knowledge graphs to aid the production of molecular diagrams. We introduce a set of "semantic graphical actions" that select and transform the relational information between heterogeneous entities (e.g., genes, proteins, pathways, diseases) in a KG to produce diagram schematics that meet the scientific communication needs of the user. These semantic actions search, select, filter, transform, group, arrange, connect and extract relevant subgraphs from KGs based on meaning in biological terms, e.g., a protein upstream of a target in a pathway. To demonstrate the utility of this approach, we show how semantic graphical actions on KGs could have been used to produce three existing pathway diagrams in diverse biomedical domains: Down Syndrome, COVID-19, and neuroinflammation. Our focus is on recapitulating the semantic content of the figures, not the layout, glyphs, or other aesthetic aspects. Our results suggest that the use of KGs and semantic graphical actions to produce biomedical diagrams will reduce the effort required and improve the quality of this visual form of scientific communication.

11.

BETA: a comprehensive benchmark for computational drug-target prediction.

Zong, Nansu; Li, Ning; Wen, Andrew; Ngo, Victoria; Yu, Yue; Huang, Ming; Chowdhury, Shaika; Jiang, Chao; Fu, Sunyang; Weinshilboum, Richard; Jiang, Guoqian; Hunter, Lawrence; Liu, Hongfang.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35649342

RESUMO

Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.

Assuntos

Benchmarking , Desenvolvimento de Medicamentos , Algoritmos , Avaliação Pré-Clínica de Medicamentos , Reposicionamento de Medicamentos/métodos , Proteínas/genética

12.

Examining linguistic shifts between preprints and publications.

Nicholson, David N; Rubinetti, Vincent; Hu, Dongbo; Thielk, Marvin; Hunter, Lawrence E; Greene, Casey S.

PLoS Biol ; 20(2): e3001470, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-35104289

RESUMO

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.

Assuntos

Idioma , Revisão da Pesquisa por Pares , Pré-Publicações como Assunto , Pesquisa Biomédica , Publicações/normas , Terminologia como Assunto

13.

Characterizing Patient Representations for Computational Phenotyping.

Callahan, Tiffany J; Stefanksi, Adrianne L; Ostendorf, Danielle M; Wyrwa, Jordan M; Davies, Sara J Deakyne; Hripcsak, George; Hunter, Lawrence E; Kahn, Michael G.

AMIA Annu Symp Proc ; 2022: 319-328, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37128436

RESUMO

Patient representation learning methods create rich representations of complex data and have potential to further advance the development of computational phenotypes (CP). Currently, these methods are either applied to small predefined concept sets or all available patient data, limiting the potential for novel discovery and reducing the explainability of the resulting representations. We report on an extensive, data-driven characterization of the utility of patient representation learning methods for the purpose of CP development or automatization. We conducted ablation studies to examine the impact of patient representations, built using data from different combinations of data types and sampling windows on rare disease classification. We demonstrated that the data type and sampling window directly impact classification and clustering performance, and these results differ by rare disease group. Our results, although preliminary, exemplify the importance of and need for data-driven characterization in patient representation-based CP development pipelines.

Assuntos

Aprendizado de Máquina , Doenças Raras , Humanos , Fenótipo

14.

Concept recognition as a machine translation problem.

Boguslav, Mayla R; Hailu, Negacy D; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E.

BMC Bioinformatics ; 22(Suppl 1): 598, 2021 Dec 17.

Artigo em Inglês | MEDLINE | ID: mdl-34920707

RESUMO

BACKGROUND: Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches. METHODS: We systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance. RESULTS: Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches. CONCLUSIONS: Machine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation .

15.

Identifying and classifying goals for scientific knowledge.

Boguslav, Mayla R; Salem, Nourah M; White, Elizabeth K; Leach, Sonia M; Hunter, Lawrence E.

Bioinform Adv ; 1(1): vbab012, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34661112

RESUMO

MOTIVATION: Science progresses by posing good questions, yet work in biomedical text mining has not focused on them much. We propose a novel idea for biomedical natural language processing: identifying and characterizing the questions stated in the biomedical literature. Formally, the task is to identify and characterize statements of ignorance, statements where scientific knowledge is missing or incomplete. The creation of such technology could have many significant impacts, from the training of PhD students to ranking publications and prioritizing funding based on particular questions of interest. The work presented here is intended as the first step towards these goals. RESULTS: We present a novel ignorance taxonomy driven by the role statements of ignorance play in research, identifying specific goals for future scientific knowledge. Using this taxonomy and reliable annotation guidelines (inter-annotator agreement above 80%), we created a gold standard ignorance corpus of 60 full-text documents from the prenatal nutrition literature with over 10 000 annotations and used it to train classifiers that achieved over 0.80 F1 scores. AVAILABILITY AND IMPLEMENTATION: Corpus and source code freely available for download at https://github.com/UCDenver-ccp/Ignorance-Question-Work. The source code is implemented in Python.

16.

Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data.

Sullivan, Katherine J; Burden, Marisha; Keniston, Angela; Banda, Juan M; Hunter, Lawrence E.

Pac Symp Biocomput ; 26: 95-106, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33691008

RESUMO

Physicians' beliefs and attitudes about COVID-19 are important to ascertain because of their central role in providing care to patients during the pandemic. Identifying topics and sentiments discussed by physicians and other healthcare workers can lead to identification of gaps relating to theCOVID-19 pandemic response within the healthcare system. To better understand physicians' perspectives on the COVID-19 response, we extracted Twitter data from a specific user group that allows physicians to stay anonymous while expressing their perspectives about the COVID-19 pandemic. All tweets were in English. We measured most frequent bigrams and trigrams, compared sentiment analysis methods, and compared our findings to a larger Twitter dataset containing general COVID-19 related discourse. We found significant differences between the two datasets for specific topical phrases. No statistically significant difference was found in sentiments between the two datasets, and both trended slightly more positive than negative. Upon comparison to manual sentiment analysis, it was determined that these sentiment analysis methods should be improved to accurately capture sentiments of anonymous physician data. Anonymous physician social media data is a unique source of information that provides important insights into COVID-19 perspectives.

Assuntos

COVID-19 , Médicos , Mídias Sociais , Biologia Computacional , Humanos , Pandemias , SARS-CoV-2

17.

Applying knowledge-driven mechanistic inference to toxicogenomics.

Tripodi, Ignacio J; Callahan, Tiffany J; Westfall, Jessica T; Meitzer, Nayland S; Dowell, Robin D; Hunter, Lawrence E.

Toxicol In Vitro ; 66: 104877, 2020 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-32387679

RESUMO

When considering toxic chemicals in the environment, a mechanistic, causal explanation of toxicity may be preferred over a statistical or machine learning-based prediction by itself. Elucidating a mechanism of toxicity is, however, a costly and time-consuming process that requires the participation of specialists from a variety of fields, often relying on animal models. We present an innovative mechanistic inference framework (MechSpy), which can be used as a hypothesis generation aid to narrow the scope of mechanistic toxicology analysis. MechSpy generates hypotheses of the most likely mechanisms of toxicity, by combining a semantically-interconnected knowledge representation of human biology, toxicology and biochemistry with gene expression time series on human tissue. Using vector representations of biological entities, MechSpy seeks enrichment in a manually curated list of high-level mechanisms of toxicity, represented as biochemically- and causally-linked ontology concepts. Besides predicting the canonical mechanism of toxicity for many well-studied compounds, we experimentally validated some of our predictions for other chemicals without an established mechanism of toxicity. This mechanistic inference framework is an advantageous tool for predictive toxicology, and the first of its kind to produce a mechanistic explanation for each prediction. MechSpy can be modified to include additional mechanisms of toxicity, and is generalizable to other types of mechanisms of human biology.

Assuntos

Toxicogenética/métodos , Linhagem Celular , Biologia Computacional/métodos , Expressão Gênica , Genômica , Humanos , Software

18.

Knowledge-Based Biomedical Data Science.

Callahan, Tiffany J; Tripodi, Ignacio J; Pielke-Lombardo, Harrison; Hunter, Lawrence E.

Annu Rev Biomed Data Sci ; 3: 23-41, 2020 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-33954284

RESUMO

Knowledge-based biomedical data science involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey recent progress in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as progress on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing to construct knowledge graphs, and the expansion of novel knowledge-based approaches to clinical and biological domains.

19.

The use or generation of biomedical data and existing medicines to discover and establish new treatments for patients with rare diseases - recommendations of the IRDiRC Data Mining and Repurposing Task Force.

Southall, Noel T; Natarajan, Madhusudan; Lau, Lilian Pek Lian; Jonker, Anneliene Hechtelt; Deprez, Benoît; Guilliams, Tim; Hunter, Lawrence; Rademaker, Carin Ma; Hivert, Virginie; Ardigò, Diego.

Orphanet J Rare Dis ; 14(1): 225, 2019 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-31615551

RESUMO

The number of available therapies for rare diseases remains low, as fewer than 6% of rare diseases have an approved treatment option. The International Rare Diseases Research Consortium (IRDiRC) set up the multi-stakeholder Data Mining and Repurposing (DMR) Task Force to examine the potential of applying biomedical data mining strategies to identify new opportunities to use existing pharmaceutical compounds in new ways and to accelerate the pace of drug development for rare disease patients. In reviewing past successes of data mining for drug repurposing, and planning for future biomedical research capacity, the DMR Task Force identified four strategic infrastructure investment areas to focus on in order to accelerate rare disease research productivity and drug development: (1) improving the capture and sharing of self-reported patient data, (2) better integration of existing research data, (3) increasing experimental testing capacity, and (4) sharing of rare disease research and development expertise. Additionally, the DMR Task Force also recommended a number of strategies to increase data mining and repurposing opportunities for rare diseases research as well as the development of individualized and precision medicine strategies.

Assuntos

Pesquisa Biomédica , Mineração de Dados , Reposicionamento de Medicamentos , Doenças Raras/tratamento farmacológico , Big Data , Bases de Dados Factuais , Humanos

20.

P-Hacking Lexical Richness Through Definitions of "Type" and "Token".

Cohen, K Bretonnel; Hunter, Lawrence E; Pressman, Peter S.

Stud Health Technol Inform ; 264: 1433-1434, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31438167

RESUMO

"P-hacking" is the repeated analysis of data until a statistically significant result is achieved. We show that p-hacking can also occur during data generation, sometimes unintentionally. We use the type-token ratio to demonstrate that differences in the definitions of "type" and "token" can produce significantly different results. Since these terms are rarely defined in the biomedical literature, the result is an inability to meaningfully interpret the body of literature that makes use of this measure.

Assuntos

Segurança Computacional , Vocabulário

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA