RESUMO
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Assuntos
Bases de Dados Factuais , Doença , Genes , Fenótipo , Humanos , Internet , Bases de Dados Factuais/normas , Software , Genes/genética , Doença/genéticaRESUMO
The BRCA Challenge is a long-term data-sharing project initiated within the Global Alliance for Genomics and Health (GA4GH) to aggregate BRCA1 and BRCA2 data to support highly collaborative research activities. Its goal is to generate an informed and current understanding of the impact of genetic variation on cancer risk across the iconic cancer predisposition genes, BRCA1 and BRCA2. Initially, reported variants in BRCA1 and BRCA2 available from public databases were integrated into a single, newly created site, www.brcaexchange.org. The purpose of the BRCA Exchange is to provide the community with a reliable and easily accessible record of variants interpreted for a high-penetrance phenotype. More than 20,000 variants have been aggregated, three times the number found in the next-largest public database at the project's outset, of which approximately 7,250 have expert classifications. The data set is based on shared information from existing clinical databases-Breast Cancer Information Core (BIC), ClinVar, and the Leiden Open Variation Database (LOVD)-as well as population databases, all linked to a single point of access. The BRCA Challenge has brought together the existing international Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium expert panel, along with expert clinicians, diagnosticians, researchers, and database providers, all with a common goal of advancing our understanding of BRCA1 and BRCA2 variation. Ongoing work includes direct contact with national centers with access to BRCA1 and BRCA2 diagnostic data to encourage data sharing, development of methods suitable for extraction of genetic variation at the level of individual laboratory reports, and engagement with participant communities to enable a more comprehensive understanding of the clinical significance of genetic variation in BRCA1 and BRCA2.
Assuntos
Bases de Dados Genéticas , Genes BRCA1 , Genes BRCA2 , Variação Genética , Alelos , Neoplasias da Mama/genética , Bases de Dados Genéticas/ética , Feminino , Frequência do Gene , Predisposição Genética para Doença , Humanos , Disseminação de Informação/ética , Disseminação de Informação/legislação & jurisprudência , Masculino , Mutação , Neoplasias Ovarianas/genética , Penetrância , Fenótipo , Fatores de RiscoRESUMO
While we often think of words as having a fixed meaning that we use to describe a changing world, words are also dynamic and changing. Scientific research can also be remarkably fast-moving, with new concepts or approaches rapidly gaining mind share. We examined scientific writing, both preprint and pre-publication peer-reviewed text, to identify terms that have changed and examine their use. One particular challenge that we faced was that the shift from closed to open access publishing meant that the size of available corpora changed by over an order of magnitude in the last two decades. We developed an approach to evaluate semantic shift by accounting for both intra- and inter-year variability using multiple integrated models. This analysis revealed thousands of change points in both corpora, including for terms such as 'cas9', 'pandemic', and 'sars'. We found that the consistent change-points between pre-publication peer-reviewed and preprinted text are largely related to the COVID-19 pandemic. We also created a web app for exploration that allows users to investigate individual terms ( https://greenelab.github.io/word-lapse/ ). To our knowledge, our research is the first to examine semantic shift in biomedical preprints and pre-publication peer-reviewed text, and provides a foundation for future work to understand how terms acquire new meanings and how peer review affects this process.
RESUMO
Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .
RESUMO
BACKGROUND: Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS: We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION: We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.
Assuntos
Algoritmos , ProbabilidadeRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
BACKGROUND AND OBJECTIVES: We describe and illustrate use of DISTING, a novel web application for computing alternative structurally identifiable linear compartmental models that are input-output indistinguishable from a postulated linear compartmental model. Several computer packages are available for analysing the structural identifiability of such models, but DISTING is the first to be made available for assessing indistinguishability. METHODS: The computational algorithms embedded in DISTING are based on advanced versions of established geometric and algebraic properties of linear compartmental models, embedded in a user-friendly graphic model user interface. Novel computational tools greatly speed up the overall procedure. These include algorithms for Jacobian matrix reduction, submatrix rank reduction, and parallelization of candidate rank computations in symbolic matrix analysis. RESULTS: The application of DISTING to three postulated models with respectively two, three and four compartments is given. The 2-compartment example is used to illustrate the indistinguishability problem; the original (unidentifiable) model is found to have two structurally identifiable models that are indistinguishable from it. The 3-compartment example has three structurally identifiable indistinguishable models. It is found from DISTING that the four-compartment example has five structurally identifiable models indistinguishable from the original postulated model. This example shows that care is needed when dealing with models that have two or more compartments which are neither perturbed nor observed, because the numbering of these compartments may be arbitrary. CONCLUSIONS: DISTING is universally and freely available via the Internet. It is easy to use and circumvents tedious and complicated algebraic analysis previously done by hand.
Assuntos
Algoritmos , Gráficos por Computador , Internet , Software , Simulação por Computador , Humanos , Modelos Lineares , Fígado/efeitos dos fármacos , Preparações Farmacêuticas , Sulfobromoftaleína/química , Biologia de SistemasRESUMO
PURPOSE: This study aimed to develop and examine the acceptability, feasibility, and usability of a text messaging, or Short Message Service (SMS), system for improving the receipt of survivorship care for adolescent and young adult (AYA) survivors of childhood cancer. METHODS: Researchers developed and refined the text messaging system based on qualitative data from AYA survivors in an iterative three-stage process. In stage 1, a focus group (n = 4) addressed acceptability; in stage 2, key informant interviews (n = 10) following a 6-week trial addressed feasibility; and in stage 3, key informant interviews (n = 23) following a 6-week trial addressed usability. Qualitative data were analyzed using a constant comparative analytic approach exploring in-depth themes. RESULTS: The final system includes programmed reminders to schedule and attend late effect screening appointments, tailored suggestions for community resources for cancer survivors, and messages prompting participant feedback regarding the appointments and resources. Participants found the text messaging system an acceptable form of communication, the screening reminders and feedback prompts feasible for improving the receipt of survivorship care, and the tailored suggestions for community resources usable for connecting survivors to relevant services. Participants suggested supplementing survivorship care visits and forming AYA survivor social networks as future implementations for the text messaging system. CONCLUSIONS: The text messaging system may assist AYA survivors by coordinating late effect screening appointments, facilitating a partnership with the survivorship care team, and connecting survivors with relevant community resources. IMPLICATIONS FOR CANCER SURVIVORS: The text messaging system has the potential to improve the receipt of survivorship care.
Assuntos
Neoplasias/terapia , Sobreviventes/psicologia , Envio de Mensagens de Texto/estatística & dados numéricos , Adolescente , Adulto , Criança , Humanos , Masculino , Neoplasias/mortalidade , Taxa de Sobrevida , Adulto JovemRESUMO
Active and passive mobile sensing has garnered much attention in recent years. In this paper, we focus on chronic pain measurement and management as a case application to exemplify the state of the art. We present a consolidated discussion on the leveraging of various sensing modalities along with modular server-side and on-device architectures required for this task. Modalities included are: activity monitoring from accelerometry and location sensing, audio analysis of speech, image processing for facial expressions as well as modern methods for effective patient self-reporting. We review examples that deliver actionable information to clinicians and patients while addressing privacy, usability, and computational constraints. We also discuss open challenges in the higher level inferencing of patient state and effective feedback with potential directions to address them. The methods and challenges presented here are also generalizable and relevant to a broad range of other applications in mobile sensing.