Search | VHL Regional Portal

Knowledge-Based Biomedical Data Science.

Callahan, Tiffany J; Tripodi, Ignacio J; Pielke-Lombardo, Harrison; Hunter, Lawrence E.

Annu Rev Biomed Data Sci ; 3: 23-41, 2020 Jul.

Article in English | MEDLINE | ID: mdl-33954284

ABSTRACT

Knowledge-based biomedical data science involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey recent progress in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as progress on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing to construct knowledge graphs, and the expansion of novel knowledge-based approaches to clinical and biological domains.

Pre-analytic Considerations for Mass Spectrometry-Based Untargeted Metabolomics Data.

Reinhold, Dominik; Pielke-Lombardo, Harrison; Jacobson, Sean; Ghosh, Debashis; Kechris, Katerina.

Methods Mol Biol ; 1978: 323-340, 2019.

Article in English | MEDLINE | ID: mdl-31119672

ABSTRACT

Metabolomics is the science of characterizing and quantifying small molecule metabolites in biological systems. These metabolites give organisms their biochemical characteristics, providing a link between genotype, environment, and phenotype. With these opportunities also come data challenges, such as compound annotation, missing values, and batch effects. We present the steps of a general pipeline to process untargeted mass spectrometry data to alleviate the latter two challenges. We assume to have a matrix with metabolite abundances, with metabolites in rows and samples in columns. The steps in the pipeline include summarizing technical replicates (if available), filtering, imputing, transforming, and normalizing the data. In each of these steps, a method and parameters should be chosen based on assumptions one is willing to make, the question of interest, and diagnostic tools. Besides giving a general pipeline that can be adapted by the reader, our goal is to review diagnostic tools and criteria that are helpful when making decisions in each step of the pipeline and assessing the effectiveness of normalization and batch correction. We conclude by giving a list of useful packages and discuss some alternative approaches that might be more appropriate for the reader's data.

Subject(s)

Databases, Factual , Mass Spectrometry/methods , Metabolomics/methods , Genotype , Humans , Phenotype

GSEA-InContext: identifying novel and common patterns in expression experiments.

Powers, Rani K; Goodspeed, Andrew; Pielke-Lombardo, Harrison; Tan, Aik-Choon; Costello, James C.

Bioinformatics ; 34(13): i555-i564, 2018 07 01.

Article in English | MEDLINE | ID: mdl-29950010

ABSTRACT

Motivation: Gene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate pathway-level changes in transcriptomics experiments. For an experiment where less than seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment-specific patterns of gene set enrichment. Results: We compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment-specific gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression patterns within a background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA-InContext on experiments using small molecules with known targets to show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis. Availability and implementation: GSEA-InContext implemented in Python, Supplementary results and the background expression compendium are available at: https://github.com/CostelloLab/GSEA-InContext.

Subject(s)

Algorithms , Gene Expression Profiling/methods , Metabolic Networks and Pathways , Humans

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL