Search | Nursing VHL Search Portal

EpiGraphDB: a database and data mining platform for health data science.

Liu, Yi; Elsworth, Benjamin; Erola, Pau; Haberland, Valeriia; Hemani, Gibran; Lyon, Matt; Zheng, Jie; Lloyd, Oliver; Vabistsevits, Marina; Gaunt, Tom R.

Bioinformatics ; 37(9): 1304-1311, 2021 06 09.

Article in English | MEDLINE | ID: mdl-33165574

ABSTRACT

MOTIVATION: The wealth of data resources on human phenotypes, risk factors, molecular traits and therapeutic interventions presents new opportunities for population health sciences. These opportunities are paralleled by a growing need for data integration, curation and mining to increase research efficiency, reduce mis-inference and ensure reproducible research. RESULTS: We developed EpiGraphDB (https://epigraphdb.org/), a graph database containing an array of different biomedical and epidemiological relationships and an analytical platform to support their use in human population health data science. In addition, we present three case studies that illustrate the value of this platform. The first uses EpiGraphDB to evaluate potential pleiotropic relationships, addressing mis-inference in systematic causal analysis. In the second case study, we illustrate how protein-protein interaction data offer opportunities to identify new drug targets. The final case study integrates causal inference using Mendelian randomization with relationships mined from the biomedical literature to 'triangulate' evidence from different sources. AVAILABILITY AND IMPLEMENTATION: The EpiGraphDB platform is openly available at https://epigraphdb.org. Code for replicating case study results is available at https://github.com/MRCIEU/epigraphdb as Jupyter notebooks using the API, and https://mrcieu.github.io/epigraphdb-r using the R package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Data Science , Software , Data Mining , Databases, Factual , Humans , Phenotype

Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.

Lightbody, Gaye; Haberland, Valeriia; Browne, Fiona; Taggart, Laura; Zheng, Huiru; Parkes, Eileen; Blayney, Jaine K.

Brief Bioinform ; 20(5): 1795-1811, 2019 09 27.

Article in English | MEDLINE | ID: mdl-30084865

ABSTRACT

There has been an exponential growth in the performance and output of sequencing technologies (omics data) with full genome sequencing now producing gigabases of reads on a daily basis. These data may hold the promise of personalized medicine, leading to routinely available sequencing tests that can guide patient treatment decisions. In the era of high-throughput sequencing (HTS), computational considerations, data governance and clinical translation are the greatest rate-limiting steps. To ensure that the analysis, management and interpretation of such extensive omics data is exploited to its full potential, key factors, including sample sourcing, technology selection and computational expertise and resources, need to be considered, leading to an integrated set of high-performance tools and systems. This article provides an up-to-date overview of the evolution of HTS and the accompanying tools, infrastructure and data management approaches that are emerging in this space, which, if used within in a multidisciplinary context, may ultimately facilitate the development of personalized medicine.

Subject(s)

Biomedical Research , High-Throughput Nucleotide Sequencing/methods , Precision Medicine , Cloud Computing , Computational Biology , Computer Security , Ethics

Erratum to: EpiGraphDB: a database and data mining platform for health data science.

Liu, Yi; Elsworth, Benjamin; Erola, Pau; Haberland, Valeriia; Hemani, Gibran; Lyon, Matt; Zheng, Jie; Lloyd, Oliver; Vabistsevits, Marina; Gaunt, Tom R.

Bioinformatics ; 37(2): 288, 2021 Apr 19.

Article in English | MEDLINE | ID: mdl-33693535

Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases.

Zheng, Jie; Haberland, Valeriia; Baird, Denis; Walker, Venexia; Haycock, Philip C; Hurle, Mark R; Gutteridge, Alex; Erola, Pau; Liu, Yi; Luo, Shan; Robinson, Jamie; Richardson, Tom G; Staley, James R; Elsworth, Benjamin; Burgess, Stephen; Sun, Benjamin B; Danesh, John; Runz, Heiko; Maranville, Joseph C; Martin, Hannah M; Yarmolinsky, James; Laurin, Charles; Holmes, Michael V; Liu, Jimmy Z; Estrada, Karol; Santos, Rita; McCarthy, Linda; Waterworth, Dawn; Nelson, Matthew R; Smith, George Davey; Butterworth, Adam S; Hemani, Gibran; Scott, Robert A; Gaunt, Tom R.

Nat Genet ; 52(10): 1122-1131, 2020 10.

Article in English | MEDLINE | ID: mdl-32895551

ABSTRACT

The human proteome is a major source of therapeutic targets. Recent genetic association analyses of the plasma proteome enable systematic evaluation of the causal consequences of variation in plasma protein levels. Here we estimated the effects of 1,002 proteins on 225 phenotypes using two-sample Mendelian randomization (MR) and colocalization. Of 413 associations supported by evidence from MR, 130 (31.5%) were not supported by results of colocalization analyses, suggesting that genetic confounding due to linkage disequilibrium is widespread in naïve phenome-wide association studies of proteins. Combining MR and colocalization evidence in cis-only analyses, we identified 111 putatively causal effects between 65 proteins and 52 disease-related phenotypes ( https://www.epigraphdb.org/pqtl/ ). Evaluation of data from historic drug development programs showed that target-indication pairs with MR and colocalization support were more likely to be approved, evidencing the value of this approach in identifying and prioritizing potential therapeutic targets.

Subject(s)

Blood Proteins/genetics , Genetic Predisposition to Disease , Mendelian Randomization Analysis , Proteome/genetics , Genome-Wide Association Study , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics

Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data.

Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Southam, Lorraine; Esparza-Gordillo, Jorge; Haberland, Valeriia; Zheng, Jie; Johnson, Toby; Koprulu, Mine; Zengini, Eleni; Steinberg, Julia; Wilkinson, Jeremy M; Bhatnagar, Sahir; Hoffman, Joshua D; Buchan, Natalie; Süveges, Dániel; Yerges-Armstrong, Laura; Smith, George Davey; Gaunt, Tom R; Scott, Robert A; McCarthy, Linda C; Zeggini, Eleftheria.

Nat Genet ; 51(2): 230-236, 2019 02.

Article in English | MEDLINE | ID: mdl-30664745

ABSTRACT

Osteoarthritis is the most common musculoskeletal disease and the leading cause of disability globally. Here, we performed a genome-wide association study for osteoarthritis (77,052 cases and 378,169 controls), analyzing four phenotypes: knee osteoarthritis, hip osteoarthritis, knee and/or hip osteoarthritis, and any osteoarthritis. We discovered 64 signals, 52 of them novel, more than doubling the number of established disease loci. Six signals fine-mapped to a single variant. We identified putative effector genes by integrating expression quantitative trait loci (eQTL) colocalization, fine-mapping, and human rare-disease, animal-model, and osteoarthritis tissue expression data. We found enrichment for genes underlying monogenic forms of bone development diseases, and for the collagen formation and extracellular matrix organization biological pathways. Ten of the likely effector genes, including TGFB1 (transforming growth factor beta 1), FGF18 (fibroblast growth factor 18), CTSK (cathepsin K), and IL11 (interleukin 11), have therapeutics approved or in clinical trials, with mechanisms of action supportive of evaluation for efficacy in osteoarthritis.

Subject(s)

Genetic Predisposition to Disease/genetics , Osteoarthritis, Hip/genetics , Adult , Aged , Biological Specimen Banks , Case-Control Studies , Female , Genome-Wide Association Study/methods , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , United Kingdom

The MR-Base platform supports systematic causal inference across the human phenome.

Hemani, Gibran; Zheng, Jie; Elsworth, Benjamin; Wade, Kaitlin H; Haberland, Valeriia; Baird, Denis; Laurin, Charles; Burgess, Stephen; Bowden, Jack; Langdon, Ryan; Tan, Vanessa Y; Yarmolinsky, James; Shihab, Hashem A; Timpson, Nicholas J; Evans, David M; Relton, Caroline; Martin, Richard M; Davey Smith, George; Gaunt, Tom R; Haycock, Philip C.

Elife ; 72018 05 30.

Article in English | MEDLINE | ID: mdl-29846171

ABSTRACT

Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.

Subject(s)

Mendelian Randomization Analysis , Cholesterol, LDL/metabolism , Coronary Disease/etiology , Databases, Genetic , Genetic Pleiotropy , Genome-Wide Association Study , Humans , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL