Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 394
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 148(6): 1293-307, 2012 Mar 16.
Article in English | MEDLINE | ID: mdl-22424236

ABSTRACT

Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.


Subject(s)
Genome, Human , Genomics , Precision Medicine , Diabetes Mellitus, Type 2/genetics , Female , Gene Expression Profiling , Humans , Male , Metabolomics , Middle Aged , Mutation , Proteomics , Respiratory Syncytial Viruses/isolation & purification , Rhinovirus/isolation & purification
2.
Am J Hum Genet ; 110(9): 1522-1533, 2023 09 07.
Article in English | MEDLINE | ID: mdl-37607538

ABSTRACT

Population-scale biobanks linked to electronic health record data provide vast opportunities to extend our knowledge of human genetics and discover new phenotype-genotype associations. Given their dense phenotype data, biobanks can also facilitate replication studies on a phenome-wide scale. Here, we introduce the phenotype-genotype reference map (PGRM), a set of 5,879 genetic associations from 523 GWAS publications that can be used for high-throughput replication experiments. PGRM phenotypes are standardized as phecodes, ensuring interoperability between biobanks. We applied the PGRM to five ancestry-specific cohorts from four independent biobanks and found evidence of robust replications across a wide array of phenotypes. We show how the PGRM can be used to detect data corruption and to empirically assess parameters for phenome-wide studies. Finally, we use the PGRM to explore factors associated with replicability of GWAS results.


Subject(s)
Biological Specimen Banks , Data Science , Humans , Phenomics , Phenotype , Genotype
3.
Proc Natl Acad Sci U S A ; 119(46): e2210247119, 2022 Nov 16.
Article in English | MEDLINE | ID: mdl-36343260

ABSTRACT

Genetic variants in SLC22A5, encoding the membrane carnitine transporter OCTN2, cause the rare metabolic disorder Carnitine Transporter Deficiency (CTD). CTD is potentially lethal but actionable if detected early, with confirmatory diagnosis involving sequencing of SLC22A5. Interpretation of missense variants of uncertain significance (VUSs) is a major challenge. In this study, we sought to characterize the largest set to date (n = 150) of OCTN2 variants identified in diverse ancestral populations, with the goals of furthering our understanding of the mechanisms leading to OCTN2 loss-of-function (LOF) and creating a protein-specific variant effect prediction model for OCTN2 function. Uptake assays with 14C-carnitine revealed that 105 variants (70%) significantly reduced transport of carnitine compared to wild-type OCTN2, and 37 variants (25%) severely reduced function to less than 20%. All ancestral populations harbored LOF variants; 62% of green fluorescent protein (GFP)-tagged variants impaired OCTN2 localization to the plasma membrane of human embryonic kidney (HEK293T) cells, and subcellular localization significantly associated with function, revealing a major LOF mechanism of interest for CTD. With these data, we trained a model to classify variants as functional (>20% function) or LOF (<20% function). Our model outperformed existing state-of-the-art methods as evaluated by multiple performance metrics, with mean area under the receiver operating characteristic curve (AUROC) of 0.895 ± 0.025. In summary, in this study we generated a rich dataset of OCTN2 variant function and localization, revealed important disease-causing mechanisms, and improved upon machine learning-based prediction of OCTN2 variant function to aid in variant interpretation in the diagnosis and treatment of CTD.


Subject(s)
Carnitine , Organic Cation Transport Proteins , Humans , Solute Carrier Family 22 Member 5/genetics , Solute Carrier Family 22 Member 5/metabolism , Organic Cation Transport Proteins/genetics , Organic Cation Transport Proteins/metabolism , HEK293 Cells , Carnitine/genetics , Carnitine/metabolism , Genomics
4.
Am J Hum Genet ; 108(4): 535-548, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33798442

ABSTRACT

Genome sequencing is enabling precision medicine-tailoring treatment to the unique constellation of variants in an individual's genome. The impact of recurrent pathogenic variants is often understood, however there is a long tail of rare genetic variants that are uncharacterized. The problem of uncharacterized rare variation is especially acute when it occurs in genes of known clinical importance with functionally consequential variants and associated mechanisms. Variants of uncertain significance (VUSs) in these genes are discovered at a rate that outpaces current ability to classify them with databases of previous cases, experimental evaluation, and computational predictors. Clinicians are thus left without guidance about the significance of variants that may have actionable consequences. Computational prediction of the impact of rare genetic variation is increasingly becoming an important capability. In this paper, we review the technical and ethical challenges of interpreting the function of rare variants in two settings: inborn errors of metabolism in newborns and pharmacogenomics. We propose a framework for a genomic learning healthcare system with an initial focus on early-onset treatable disease in newborns and actionable pharmacogenomics. We argue that (1) a genomic learning healthcare system must allow for continuous collection and assessment of rare variants, (2) emerging machine learning methods will enable algorithms to predict the clinical impact of rare variants on protein function, and (3) ethical considerations must inform the construction and deployment of all rare-variation triage strategies, particularly with respect to health disparities arising from unbalanced ancestry representation.


Subject(s)
Genetic Variation/genetics , Genetics, Medical , Genomics , Machine Learning , Metabolism, Inborn Errors/genetics , Pharmacogenetics , Precision Medicine , Genome, Human/genetics , Humans , Infant, Newborn
5.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35817308

ABSTRACT

The cost of drug development continues to rise and may be prohibitive in cases of unmet clinical need, particularly for rare diseases. Artificial intelligence-based methods are promising in their potential to discover new treatment options. The task of drug repurposing hypothesis generation is well-posed as a link prediction problem in a knowledge graph (KG) of interacting of drugs, proteins, genes and disease phenotypes. KGs derived from biomedical literature are semantically rich and up-to-date representations of scientific knowledge. Inference methods on scientific KGs can be confounded by unspecified contexts and contradictions. Extracting context enables incorporation of relevant pharmacokinetic and pharmacodynamic detail, such as tissue specificity of interactions. Contradictions in biomedical KGs may arise when contexts are omitted or due to contradicting research claims. In this review, we describe challenges to creating literature-scale representations of pharmacological knowledge and survey current approaches toward incorporating context and resolving contradictions.


Subject(s)
Artificial Intelligence , Drug Repositioning , Knowledge , Proteins , Publications
6.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34849568

ABSTRACT

Network biology is useful for modeling complex biological phenomena; it has attracted attention with the advent of novel graph-based machine learning methods. However, biological applications of network methods often suffer from inadequate follow-up. In this perspective, we discuss obstacles for contemporary network approaches-particularly focusing on challenges representing biological concepts, applying machine learning methods, and interpreting and validating computational findings about biology-in an effort to catalyze actionable biological discovery.


Subject(s)
Machine Learning
7.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36394254

ABSTRACT

MOTIVATION: Gene set analysis methods rely on knowledge-based representations of genetic interactions in the form of both gene set collections and protein-protein interaction (PPI) networks. However, explicit representations of genetic interactions often fail to capture complex interdependencies among genes, limiting the analytic power of such methods. RESULTS: We propose an extension of gene set enrichment analysis to a latent embedding space reflecting PPI network topology, called gene set proximity analysis (GSPA). Compared with existing methods, GSPA provides improved ability to identify disease-associated pathways in disease-matched gene expression datasets, while improving reproducibility of enrichment statistics for similar gene sets. GSPA is statistically straightforward, reducing to a version of traditional gene set enrichment analysis through a single user-defined parameter. We apply our method to identify novel drug associations with SARS-CoV-2 viral entry. Finally, we validate our drug association predictions through retrospective clinical analysis of claims data from 8 million patients, supporting a role for gabapentin as a risk factor and metformin as a protective factor for severe COVID-19. AVAILABILITY AND IMPLEMENTATION: GSPA is available for download as a command-line Python package at https://github.com/henrycousins/gspa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
COVID-19 , Humans , Drug Repositioning , Reproducibility of Results , Retrospective Studies , SARS-CoV-2
8.
Proc Natl Acad Sci U S A ; 118(23)2021 06 08.
Article in English | MEDLINE | ID: mdl-34016708

ABSTRACT

The SARS-CoV-2 pandemic has caused a surge in research exploring all aspects of the virus and its effects on human health. The overwhelming publication rate means that researchers are unable to keep abreast of the literature. To ameliorate this, we present the CoronaCentral resource that uses machine learning to process the research literature on SARS-CoV-2 together with SARS-CoV and MERS-CoV. We categorize the literature into useful topics and article types and enable analysis of the contents, pace, and emphasis of research during the crisis with integration of Altmetric data. These topics include therapeutics, disease forecasting, as well as growing areas such as "long COVID" and studies of inequality. This resource, available at https://coronacentral.ai, is updated daily.


Subject(s)
COVID-19 , Machine Learning , Middle East Respiratory Syndrome Coronavirus/metabolism , Pandemics , SARS-CoV-2/metabolism , Severe Acute Respiratory Syndrome , Animals , COVID-19/epidemiology , COVID-19/metabolism , COVID-19/therapy , COVID-19/transmission , Humans , Middle East Respiratory Syndrome Coronavirus/pathogenicity , SARS-CoV-2/pathogenicity , Severe Acute Respiratory Syndrome/epidemiology , Severe Acute Respiratory Syndrome/metabolism , Severe Acute Respiratory Syndrome/therapy , Severe Acute Respiratory Syndrome/transmission
9.
Nat Methods ; 17(12): 1200-1206, 2020 12.
Article in English | MEDLINE | ID: mdl-33077966

ABSTRACT

Although tremendous effort has been put into cell-type annotation, identification of previously uncharacterized cell types in heterogeneous single-cell RNA-seq data remains a challenge. Here we present MARS, a meta-learning approach for identifying and annotating known as well as new cell types. MARS overcomes the heterogeneity of cell types by transferring latent cell representations across multiple datasets. MARS uses deep learning to learn a cell embedding function as well as a set of landmarks in the cell embedding space. The method has a unique ability to discover cell types that have never been seen before and annotate experiments that are as yet unannotated. We apply MARS to a large mouse cell atlas and show its ability to accurately identify cell types, even when it has never seen them before. Further, MARS automatically generates interpretable names for new cell types by probabilistically defining a cell type in the embedding space.


Subject(s)
Cells/classification , Single-Cell Analysis/methods , Transcriptome/genetics , Algorithms , Animals , Databases, Factual , Gene Expression Profiling , Mice , RNA/genetics , Sequence Analysis, RNA , Software
10.
PLoS Comput Biol ; 18(4): e1009497, 2022 04.
Article in English | MEDLINE | ID: mdl-35404985

ABSTRACT

The pathogenesis of many inflammatory diseases is a coordinated process involving metabolic dysfunctions and immune response-usually modulated by the production of cytokines and associated inflammatory molecules. In this work, we seek to understand how genes involved in pathogenesis which are often not associated with the immune system in an obvious way communicate with the immune system. We have embedded a network of human protein-protein interactions (PPI) from the STRING database with 14,707 human genes using feature learning that captures high confidence edges. We have found that our predicted Association Scores derived from the features extracted from STRING's high confidence edges are useful for predicting novel connections between genes, thus enabling the construction of a full map of predicted associations for all possible pairs between 14,707 human genes. In particular, we analyzed the pattern of associations for 126 cytokines and found that the six patterns of cytokine interaction with human genes are consistent with their functional classifications. To define the disease-specific roles of cytokines we have collected gene sets for 11,944 diseases from DisGeNET. We used these gene sets to predict disease-specific gene associations with cytokines by calculating the normalized average Association Scores between disease-associated gene sets and the 126 cytokines; this creates a unique profile of inflammatory genes (both known and predicted) for each disease. We validated our predicted cytokine associations by comparing them to known associations for 171 diseases. The predicted cytokine profiles correlate (p-value<0.0003) with the known ones in 95 diseases. We further characterized the profiles of each disease by calculating an "Inflammation Score" that summarizes different modes of immune responses. Finally, by analyzing subnetworks formed between disease-specific pathogenesis genes, hormones, receptors, and cytokines, we identified the key genes responsible for interactions between pathogenesis and inflammatory responses. These genes and the corresponding cytokines used by different immune disorders suggest unique targets for drug discovery.


Subject(s)
Cytokines , Inflammation , Cytokines/metabolism , Humans , Immunity , Inflammation/genetics
11.
J Biomed Inform ; 145: 104474, 2023 09.
Article in English | MEDLINE | ID: mdl-37572825

ABSTRACT

Inferring knowledge from known relationships between drugs, proteins, genes, and diseases has great potential for clinical impact, such as predicting which existing drugs could be repurposed to treat rare diseases. Incorporating key biological context such as cell type or tissue of action into representations of extracted biomedical knowledge is essential for principled pharmacological discovery. Existing global, literature-derived knowledge graphs of interactions between drugs, proteins, genes, and diseases lack this essential information. In this study, we frame the task of associating biological context with protein-protein interactions extracted from text as a classification task using syntactic, semantic, and novel meta-discourse features. We introduce the Insider corpora, which are automatically generated PubMed-scale corpora for training classifiers for the context association task. These corpora are created by searching for precise syntactic cues of cell type and tissue relevancy to extracted regulatory relations. We report F1 scores of 0.955 and 0.862 for identifying relevant cell types and tissues, respectively, for our identified relations. By classifying with this framework, we demonstrate that the problem of context association can be addressed using intuitive, interpretable features. We demonstrate the potential of this approach to enrich text-derived knowledge bases with biological detail by incorporating cell type context into a protein-protein network for dengue fever.


Subject(s)
Data Mining , Knowledge Bases , Humans , PubMed , Rare Diseases
12.
PLoS Comput Biol ; 17(2): e1008631, 2021 02.
Article in English | MEDLINE | ID: mdl-33544718

ABSTRACT

For many prevalent complex diseases, treatment regimens are frequently ineffective. For example, despite multiple available immunomodulators and immunosuppressants, inflammatory bowel disease (IBD) remains difficult to treat. Heterogeneity in the disease across patients makes it challenging to select the optimal treatment regimens, and some patients do not respond to any of the existing treatment choices. Drug repurposing strategies for IBD have had limited clinical success and have not typically offered individualized patient-level treatment recommendations. In this work, we present NetPTP, a Network-based Personalized Treatment Prediction framework which models measured drug effects from gene expression data and applies them to patient samples to generate personalized ranked treatment lists. To accomplish this, we combine publicly available network, drug target, and drug effect data to generate treatment rankings using patient data. These ranked lists can then be used to prioritize existing treatments and discover new therapies for individual patients. We demonstrate how NetPTP captures and models drug effects, and we apply our framework to individual IBD samples to provide novel insights into IBD treatment.


Subject(s)
Drug Repositioning/methods , Immunosuppressive Agents/therapeutic use , Inflammatory Bowel Diseases/drug therapy , Precision Medicine/methods , Algorithms , Animals , Databases, Factual , Drug Design , Gene Expression Profiling , Humans , Mice , Phylogeny
13.
BMC Bioinformatics ; 22(1): 168, 2021 Mar 30.
Article in English | MEDLINE | ID: mdl-33784977

ABSTRACT

BACKGROUND: Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. RESULTS: Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2-5%). CONCLUSIONS: Our results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses.


Subject(s)
Databases, Factual , Gene Expression , Neoplasms , Sex Factors , Animals , Bias , Female , Male , Metadata , Mice , Neoplasms/genetics
14.
N Engl J Med ; 389(15): 1431-1434, 2023 Oct 12.
Article in English | MEDLINE | ID: mdl-37732608
15.
PLoS Comput Biol ; 16(11): e1008399, 2020 11.
Article in English | MEDLINE | ID: mdl-33137098

ABSTRACT

Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene whose protein product metabolizes more than 20% of clinically used drugs. Genetic variations in CYP2D6 are responsible for interindividual heterogeneity in drug response that can lead to drug toxicity and ineffective treatment, making CYP2D6 one of the most important pharmacogenes. Prediction of CYP2D6 phenotype relies on curation of literature-derived functional studies to assign a functional status to CYP2D6 haplotypes. As the number of large-scale sequencing efforts grows, new haplotypes continue to be discovered, and assignment of function is challenging to maintain. To address this challenge, we have trained a convolutional neural network to predict functional status of CYP2D6 haplotypes, called Hubble.2D6. Hubble.2D6 predicts haplotype function from sequence data and was trained using two pre-training steps with a combination of real and simulated data. We find that Hubble.2D6 predicts CYP2D6 haplotype functional status with 88% accuracy in a held-out test set and explains 47.5% of the variance in in vitro functional data among star alleles with unknown function. Hubble.2D6 may be a useful tool for assigning function to haplotypes with uncurated function, and used for screening individuals who are at risk of being poor metabolizers.


Subject(s)
Cytochrome P-450 CYP2D6/genetics , Cytochrome P-450 CYP2D6/metabolism , Deep Learning , Alleles , Base Sequence , Computational Biology , Computer Simulation , DNA/genetics , Haplotypes , Humans , Microsomes, Liver/enzymology , Neural Networks, Computer , Pharmaceutical Preparations/metabolism , Pharmacogenomic Testing , Phenotype , Polymorphism, Genetic , Supervised Machine Learning
16.
BMC Gastroenterol ; 21(1): 160, 2021 Apr 09.
Article in English | MEDLINE | ID: mdl-33836648

ABSTRACT

BACKGROUND: Defining clinical phenotypes provides opportunities for new diagnostics and may provide insights into early intervention and disease prevention. There is increasing evidence that patient-derived health data may contain information that complements traditional methods of clinical phenotyping. The utility of these data for defining meaningful phenotypic groups is of great interest because social media and online resources make it possible to query large cohorts of patients with health conditions. METHODS: We evaluated the degree to which patient-reported categorical data is useful for discovering subclinical phenotypes and evaluated its utility for discovering new measures of disease severity, treatment response and genetic architecture. Specifically, we examined the responses of 1961 patients with inflammatory bowel disease to questionnaires in search of sub-phenotypes. We applied machine learning methods to identify novel subtypes of Crohn's disease and studied their associations with drug responses. RESULTS: Using the patients' self-reported information, we identified two subpopulations of Crohn's disease; these subpopulations differ in disease severity, associations with smoking, and genetic transmission patterns. We also identified distinct features of drug response for the two Crohn's disease subtypes. These subtypes show a trend towards differential genotype signatures. CONCLUSION: Our findings suggest that patient-defined data can have unplanned utility for defining disease subtypes and may be useful for guiding treatment approaches.


Subject(s)
Crohn Disease , Inflammatory Bowel Diseases , Crohn Disease/diagnosis , Crohn Disease/drug therapy , Crohn Disease/genetics , Genotype , Humans , Phenotype , Surveys and Questionnaires
17.
J Biomed Inform ; 117: 103732, 2021 05.
Article in English | MEDLINE | ID: mdl-33737208

ABSTRACT

BACKGROUND: Understanding the relationships between genes, drugs, and disease states is at the core of pharmacogenomics. Two leading approaches for identifying these relationships in medical literature are: human expert led manual curation efforts, and modern data mining based automated approaches. The former generates small amounts of high-quality data, and the latter offers large volumes of mixed quality data. The algorithmically extracted relationships are often accompanied by supporting evidence, such as, confidence scores, source articles, and surrounding contexts (excerpts) from the articles, that can be used as data quality indicators. Tools that can leverage these quality indicators to help the user gain access to larger and high-quality data are needed. APPROACH: We introduce GeneDive, a web application for pharmacogenomics researchers and precision medicine practitioners that makes gene, disease, and drug interactions data easily accessible and usable. GeneDive is designed to meet three key objectives: (1) provide functionality to manage information-overload problem and facilitate easy assimilation of supporting evidence, (2) support longitudinal and exploratory research investigations, and (3) offer integration of user-provided interactions data without requiring data sharing. RESULTS: GeneDive offers multiple search modalities, visualizations, and other features that guide the user efficiently to the information of their interest. To facilitate exploratory research, GeneDive makes the supporting evidence and context for each interaction readily available and allows the data quality threshold to be controlled by the user as per their risk tolerance level. The interactive search-visualization loop enables relationship discoveries between diseases, genes, and drugs that might not be explicitly described in literature but are emergent from the source medical corpus and deductive reasoning. The ability to utilize user's data either in combination with the GeneDive native datasets or in isolation promotes richer data-driven exploration and discovery. These functionalities along with GeneDive's applicability for precision medicine, bringing the knowledge contained in biomedical literature to bear on particular clinical situations and improving patient care, are illustrated through detailed use cases. CONCLUSION: GeneDive is a comprehensive, broad-use biological interactions browser. The GeneDive application and information about its underlying system architecture are available at http://www.genedive.net. GeneDive Docker image is also available for download at this URL, allowing users to (1) import their own interaction data securely and privately; and (2) generate and test hypotheses across their own and other datasets.


Subject(s)
Pharmaceutical Preparations , Precision Medicine , Data Mining , Humans , Pharmacogenetics , Software
18.
J Med Internet Res ; 23(10): e27714, 2021 10 21.
Article in English | MEDLINE | ID: mdl-34673524

ABSTRACT

BACKGROUND: Adverse drug reactions (ADRs) affect the health of hundreds of thousands of individuals annually in the United States, with associated costs of hundreds of billions of dollars. The monitoring and analysis of the severity of ADRs is limited by the current qualitative and categorical systems of severity classification. Previous efforts have generated quantitative estimates for a subset of ADRs but were limited in scope because of the time and costs associated with the efforts. OBJECTIVE: The aim of this study is to increase the number of ADRs for which there are quantitative severity estimates while improving the quality of these severity estimates. METHODS: We present a semisupervised approach that estimates ADR severity by using social media word embeddings to construct a lexical network of ADRs and perform label propagation. We used this method to estimate the severity of 28,113 ADRs, representing 12,198 unique ADR concepts from the Medical Dictionary for Regulatory Activities. RESULTS: Our Severity of Adverse Events Derived from Reddit (SAEDR) scores have good correlations with real-world outcomes. The SAEDR scores had Spearman correlations of 0.595, 0.633, and -0.748 for death, serious outcome, and no outcome, respectively, with ADR case outcomes in the Food and Drug Administration Adverse Event Reporting System. We investigated different methods for defining initial seed term sets and evaluated their impact on the severity estimates. We analyzed severity distributions for ADRs based on their appearance in boxed warning drug label sections, as well as for ADRs with sex-specific associations. We found that ADRs discovered in the postmarketing period had significantly greater severity than those discovered during the clinical trial (P<.001). We created quantitative drug-risk profile (DRIP) scores for 968 drugs that had a Spearman correlation of 0.377 with drugs ranked by the Food and Drug Administration Adverse Event Reporting System cases resulting in death, where the given drug was the primary suspect. CONCLUSIONS: Our SAEDR and DRIP scores are well correlated with the real-world outcomes of the entities they represent and have demonstrated utility in pharmacovigilance research. We make the SAEDR scores for 12,198 ADRs and the DRIP scores for 968 drugs publicly available to enable more quantitative analysis of pharmacovigilance data.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Social Media , Adverse Drug Reaction Reporting Systems , Drug Labeling , Female , Humans , Male , Pharmacovigilance
19.
BMC Bioinformatics ; 21(1): 217, 2020 May 27.
Article in English | MEDLINE | ID: mdl-32460703

ABSTRACT

BACKGROUND: Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types. RESULTS: We developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature abstracts. For this work, we defined a chemical reaction relationship as the transformation of chemical A to chemical B. We built and evaluated our system on small annotated sets of chemical reaction relationships from two corpora: curated bacteria-related abstracts from the MetaCyc database (MetaCyc_Corpus) and a more general set of abstracts annotated with MeSH (Medical Subject Headings) term Bacteria (Bacteria_Corpus; a superset of MetaCyc_Corpus). For the MetaCyc_Corpus, we obtained 84% precision and 41% recall (55% F1 score). Extending to the more general Bacteria_Corpus decreased precision to 62% with only a four-point drop in recall to 37% (46% F1 score). Overall, the Bacteria_Corpus contained two orders of magnitude more candidate chemical reaction relationships (nine million candidates vs 68,0000 candidates) and had a larger class imbalance (2.5% positives vs 5% positives) as compared to the MetaCyc_Corpus. In total, we extracted 6871 chemical reaction relationships from nine million candidates in the Bacteria_Corpus. CONCLUSIONS: With this work, we built a database of chemical reaction relationships from almost 900,000 scientific abstracts without a large training set of labeled annotations. Further, we showed the generalizability of our initial application built on MetaCyc documents enriched with chemical reactions to a general set of articles related to bacteria.


Subject(s)
Data Mining/methods , Bacteria/metabolism , Biochemical Phenomena , Databases, Factual , Humans , Machine Learning , Publications , Software
20.
Hum Mol Genet ; 27(R1): R72-R78, 2018 05 01.
Article in English | MEDLINE | ID: mdl-29635477

ABSTRACT

The field of pharmacogenomics is an area of great potential for near-term human health impacts from the big genomic data revolution. Pharmacogenomics research momentum is building with numerous hypotheses currently being investigated through the integration of molecular profiles of different cell lines and large genomic data sets containing information on cellular and human responses to therapies. Additionally, the results of previous pharmacogenetic research efforts have been formulated into clinical guidelines that are beginning to impact how healthcare is conducted on the level of the individual patient. This trend will only continue with the recent release of new datasets containing linked genotype and electronic medical record data. This review discusses key resources available for pharmacogenomics and pharmacogenetics research and highlights recent work within the field.


Subject(s)
Big Data , Genomics/trends , Pharmacogenetics/trends , Genotype , Humans , Pharmacogenomic Testing/trends
SELECTION OF CITATIONS
SEARCH DETAIL