Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
J Cheminform ; 14(1): 50, 2022 Jul 28.
Article in English | MEDLINE | ID: mdl-35902962

ABSTRACT

In virtual screening for drug discovery, hit enrichment curves are widely used to assess the performance of ranking algorithms with regard to their ability to identify early enrichment. Unfortunately, researchers almost never consider the uncertainty associated with estimating such curves before declaring differences between performance of competing algorithms. Uncertainty is often large because the testing fractions of interest to researchers are small. Appropriate inference is complicated by two sources of correlation that are often overlooked: correlation across different testing fractions within a single algorithm, and correlation between competing algorithms. Additionally, researchers are often interested in making comparisons along the entire curve, not only at a few testing fractions. We develop inferential procedures to address both the needs of those interested in a few testing fractions, as well as those interested in the entire curve. For the former, four hypothesis testing and (pointwise) confidence intervals are investigated, and a newly developed EmProc approach is found to be most effective. For inference along entire curves, EmProc-based confidence bands are recommended for simultaneous coverage and minimal width. While we focus on the hit enrichment curve, this work is also appropriate for lift curves that are used throughout the machine learning community. Our inferential procedures trivially extend to enrichment factors, as well.

2.
Analyst ; 145(22): 7197-7209, 2020 Nov 09.
Article in English | MEDLINE | ID: mdl-33094747

ABSTRACT

Since its inception, the main goal of the lipidomics field has been to characterize lipid species and their respective biological roles. However, difficulties in both full speciation and biological interpretation have rendered these objectives extremely challenging and as a result, limited our understanding of lipid mechanisms and dysregulation. While mass spectrometry-based advancements have significantly increased the ability to identify lipid species, less progress has been made surrounding biological interpretations. We have therefore developed a Structural-based Connectivity and Omic Phenotype Evaluations (SCOPE) cheminformatics toolbox to aid in these evaluations. SCOPE enables the assessment and visualization of two main lipidomic associations: structure/biological connections and metadata linkages either separately or in tandem. To assess structure and biological relationships, SCOPE utilizes key lipid structural moieties such as head group and fatty acyl composition and links them to their respective biological relationships through hierarchical clustering and grouped heatmaps. Metadata arising from phenotypic and environmental factors such as age and diet is then correlated with the lipid structures and/or biological relationships, utilizing Toxicological Prioritization Index (ToxPi) software. Here, SCOPE is demonstrated for various applications from environmental studies to clinical assessments to showcase new biological connections not previously observed with other techniques.


Subject(s)
Cheminformatics , Lipidomics , Lipids , Mass Spectrometry , Phenotype
3.
Mol Omics ; 16(6): 521-532, 2020 12 01.
Article in English | MEDLINE | ID: mdl-32966491

ABSTRACT

To fully enable the development of diagnostic tools and progressive pharmaceutical drugs, it is imperative to understand the molecular changes occurring before and during disease onset and progression. Systems biology assessments utilizing multi-omic analyses (e.g. the combination of proteomics, lipidomics, genomics, etc.) have shown enormous value in determining molecules prevalent in diseases and their associated mechanisms. Herein, we utilized multi-omic evaluations, multi-dimensional analysis methods, and new cheminformatics-based visualization tools to provide an in depth understanding of the molecular changes taking place in preeclampsia (PRE) and gestational diabetes mellitus (GDM) patients. Since PRE and GDM are two prevalent pregnancy complications that result in adverse health effects for both the mother and fetus during pregnancy and later in life, a better understanding of each is essential. The multi-omic evaluations performed here provide new insight into the end-stage molecular profiles of each disease, thereby supplying information potentially crucial for earlier diagnosis and treatments.


Subject(s)
Diabetes, Gestational/genetics , Genomics , Pre-Eclampsia/genetics , Case-Control Studies , Female , Humans , Lipidomics , Metabolic Networks and Pathways , Pregnancy
4.
J Cheminform ; 11(1): 43, 2019 Jun 24.
Article in English | MEDLINE | ID: mdl-31236709

ABSTRACT

Developing predictive and transparent approaches to the analysis of metabolite profiles across patient cohorts is of critical importance for understanding the events that trigger or modulate traits of interest (e.g., disease progression, drug metabolism, chemical risk assessment). However, metabolites' chemical structures are still rarely used in the statistical modeling workflows that establish these trait-metabolite relationships. Herein, we present a novel cheminformatics-based approach capable of identifying predictive, interpretable, and reproducible trait-metabolite relationships. As a proof-of-concept, we utilize a previously published case study consisting of metabolite profiles from non-small-cell lung cancer (NSCLC) adenocarcinoma patients and healthy controls. By characterizing each structurally annotated metabolite using both computed molecular descriptors and patient metabolite concentration profiles, we show that these complementary features enhance the identification and understanding of key metabolites associated with cancer. Ultimately, we built multi-metabolite classification models for assessing patients' cancer status using specific groups of metabolites identified based on high structural similarity through chemical clustering. We subsequently performed a metabolic pathway enrichment analysis to identify potential mechanistic relationships between metabolites and NSCLC adenocarcinoma. This cheminformatics-inspired approach relies on the metabolites' structural features and chemical properties to provide critical information about metabolite-trait associations. This method could ultimately facilitate biological understanding and advance research based on metabolomics data, especially with respect to the identification of novel biomarkers.

5.
J Cheminform ; 10(1): 57, 2018 Nov 28.
Article in English | MEDLINE | ID: mdl-30488298

ABSTRACT

The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of these models. While focused on implementing methods for model fitting and assessment that have been accepted by experts in the cheminformatics field, all of the methods in chemmodlab have broad utility for the machine learning community. chemmodlab contains several assessment utilities, including a plotting function that constructs accumulation curves and a function that computes many performance measures. The most novel feature of chemmodlab is the ease with which statistically significant performance differences for many machine learning models is presented by means of the multiple comparisons similarity plot. Differences are assessed using repeated k-fold cross validation, where blocking increases precision and multiplicity adjustments are applied. chemmodlab is freely available on CRAN at https://cran.r-project.org/web/packages/chemmodlab/index.html .

6.
Mol Biol Evol ; 33(12): 3314-3316, 2016 12.
Article in English | MEDLINE | ID: mdl-27634869

ABSTRACT

Modern phylogenomic analyses often result in large collections of phylogenetic trees representing uncertainty in individual gene trees, variation across genes, or both. Extracting phylogenetic signal from these tree sets can be challenging, as they are difficult to visualize, explore, and quantify. To overcome some of these challenges, we have developed TreeScaper, an application for tree set visualization as well as the identification of distinct phylogenetic signals. GUI and command-line versions of TreeScaper and a manual with tutorials can be downloaded from https://github.com/whuang08/TreeScaper/releases TreeScaper is distributed under the GNU General Public License.


Subject(s)
Phylogeny , Sequence Analysis, DNA/methods , Computer Simulation , Databases, Nucleic Acid , Evolution, Molecular , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...