Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
1.
Brain Behav Immun ; 113: 303-316, 2023 10.
Article in English | MEDLINE | ID: mdl-37516387

ABSTRACT

Metabolomics, proteomics and DNA methylome assays, when done in tandem from the same blood sample and analyzed together, offer an opportunity to evaluate the molecular basis of post-traumatic stress disorder (PTSD) course and pathogenesis. We performed separate metabolomics, proteomics, and DNA methylome assays on blood samples from two well-characterized cohorts of 159 active duty male participants with relatively recent onset PTSD (<1.5 years) and 300 male veterans with chronic PTSD (>7 years). Analyses of the multi-omics datasets from these two independent cohorts were used to identify convergent and distinct molecular profiles that might constitute potential signatures of severity and progression of PTSD and its comorbid conditions. Molecular signatures indicative of homeostatic processes such as signaling and metabolic pathways involved in cellular remodeling, neurogenesis, molecular safeguards against oxidative stress, metabolism of polyunsaturated fatty acids, regulation of normal immune response, post-transcriptional regulation, cellular maintenance and markers of longevity were significantly activated in the active duty participants with recent PTSD. In contrast, we observed significantly altered multimodal molecular signatures associated with chronic inflammation, neurodegeneration, cardiovascular and metabolic disorders, and cellular attritions in the veterans with chronic PTSD. Activation status of signaling and metabolic pathways at the early and late timepoints of PTSD demonstrated the differential molecular changes related to homeostatic processes at its recent and multi-system syndromes at its chronic phase. Molecular alterations in the recent PTSD seem to indicate some sort of recalibration or compensatory response, possibly directed in mitigating the pathological trajectory of the disorder.


Subject(s)
Stress Disorders, Post-Traumatic , Veterans , Humans , Male , Stress Disorders, Post-Traumatic/genetics , Stress Disorders, Post-Traumatic/metabolism , Epigenomics , Proteomics , Metabolomics
2.
Mol Psychiatry ; 26(8): 4300-4314, 2021 08.
Article in English | MEDLINE | ID: mdl-33339956

ABSTRACT

Post-traumatic stress disorder (PTSD) is a heterogeneous condition evidenced by the absence of objective physiological measurements applicable to all who meet the criteria for the disorder as well as divergent responses to treatments. This study capitalized on biological diversity observed within the PTSD group observed following epigenome-wide analysis of a well-characterized Discovery cohort (N = 166) consisting of 83 male combat exposed veterans with PTSD, and 83 combat veterans without PTSD in order to identify patterns that might distinguish subtypes. Computational analysis of DNA methylation (DNAm) profiles identified two PTSD biotypes within the PTSD+ group, G1 and G2, associated with 34 clinical features that are associated with PTSD and PTSD comorbidities. The G2 biotype was associated with an increased PTSD risk and had higher polygenic risk scores and a greater methylation compared to the G1 biotype and healthy controls. The findings were validated at a 3-year follow-up (N = 59) of the same individuals as well as in two independent, veteran cohorts (N = 54 and N = 38), and an active duty cohort (N = 133). In some cases, for example Dopamine-PKA-CREB and GABA-PKC-CREB signaling pathways, the biotypes were oppositely dysregulated, suggesting that the biotypes were not simply a function of a dimensional relationship with symptom severity, but may represent distinct biological risk profiles underpinning PTSD. The identification of two novel distinct epigenetic biotypes for PTSD may have future utility in understanding biological and clinical heterogeneity in PTSD and potential applications in risk assessment for active duty military personnel under non-clinician-administered settings, and improvement of PTSD diagnostic markers.


Subject(s)
Military Personnel , Stress Disorders, Post-Traumatic , Veterans , Epigenesis, Genetic/genetics , Epigenome , Humans , Male , Stress Disorders, Post-Traumatic/genetics
3.
BMC Bioinformatics ; 22(1): 44, 2021 Feb 03.
Article in English | MEDLINE | ID: mdl-33535967

ABSTRACT

BACKGROUND: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. RESULTS: In this study, we propose a novel differential expression and feature selection method-GEOlimma-which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. CONCLUSIONS: Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.


Subject(s)
Computational Biology , Gene Expression Profiling , Bayes Theorem , Child , Female , Humans , Male , Oligonucleotide Array Sequence Analysis , Sample Size
4.
Mol Psychiatry ; 25(12): 3337-3349, 2020 12.
Article in English | MEDLINE | ID: mdl-31501510

ABSTRACT

Post-traumatic stress disorder (PTSD) impacts many veterans and active duty soldiers, but diagnosis can be problematic due to biases in self-disclosure of symptoms, stigma within military populations, and limitations identifying those at risk. Prior studies suggest that PTSD may be a systemic illness, affecting not just the brain, but the entire body. Therefore, disease signals likely span multiple biological domains, including genes, proteins, cells, tissues, and organism-level physiological changes. Identification of these signals could aid in diagnostics, treatment decision-making, and risk evaluation. In the search for PTSD diagnostic biomarkers, we ascertained over one million molecular, cellular, physiological, and clinical features from three cohorts of male veterans. In a discovery cohort of 83 warzone-related PTSD cases and 82 warzone-exposed controls, we identified a set of 343 candidate biomarkers. These candidate biomarkers were selected from an integrated approach using (1) data-driven methods, including Support Vector Machine with Recursive Feature Elimination and other standard or published methodologies, and (2) hypothesis-driven approaches, using previous genetic studies for polygenic risk, or other PTSD-related literature. After reassessment of ~30% of these participants, we refined this set of markers from 343 to 28, based on their performance and ability to track changes in phenotype over time. The final diagnostic panel of 28 features was validated in an independent cohort (26 cases, 26 controls) with good performance (AUC = 0.80, 81% accuracy, 85% sensitivity, and 77% specificity). The identification and validation of this diverse diagnostic panel represents a powerful and novel approach to improve accuracy and reduce bias in diagnosing combat-related PTSD.


Subject(s)
Military Personnel , Stress Disorders, Post-Traumatic , Veterans , Biomarkers , Brain , Humans , Male , Stress Disorders, Post-Traumatic/diagnosis , Stress Disorders, Post-Traumatic/genetics
5.
Bioinformatics ; 35(21): 4411-4412, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31038667

ABSTRACT

SUMMARY: Although heteroplasmy has been studied extensively in animal systems, there is a lack of tools for analyzing, exploring and visualizing heteroplasmy at the genome-wide level in other taxonomic systems. We introduce icHET, which is a computational workflow that produces an interactive visualization that facilitates the exploration, analysis and discovery of heteroplasmy across multiple genomic samples. icHET works on short reads from multiple samples from any organism with an organellar reference genome (mitochondrial or plastid) and a nuclear reference genome. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/vtphan/HeteroplasmyWorkflow. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Software , Animals , Genome , Workflow
6.
PLoS Comput Biol ; 12(12): e1005220, 2016 12.
Article in English | MEDLINE | ID: mdl-27930676

ABSTRACT

We present StochSS: Stochastic Simulation as a Service, an integrated development environment for modeling and simulation of both deterministic and discrete stochastic biochemical systems in up to three dimensions. An easy to use graphical user interface enables researchers to quickly develop and simulate a biological model on a desktop or laptop, which can then be expanded to incorporate increasing levels of complexity. StochSS features state-of-the-art simulation engines. As the demand for computational power increases, StochSS can seamlessly scale computing resources in the cloud. In addition, StochSS can be deployed as a multi-user software environment where collaborators share computational resources and exchange models via a public model repository. We demonstrate the capabilities and ease of use of StochSS with an example of model development and simulation at increasing levels of complexity.


Subject(s)
Computational Biology/methods , Computer Simulation , Software , Stochastic Processes
7.
Bioinformatics ; 31(9): 1428-35, 2015 May 01.
Article in English | MEDLINE | ID: mdl-25573914

ABSTRACT

MOTIVATION: Stochastic promoter switching between transcriptionally active (ON) and inactive (OFF) states is a major source of noise in gene expression. It is often implicitly assumed that transitions between promoter states are memoryless, i.e. promoters spend an exponentially distributed time interval in each of the two states. However, increasing evidence suggests that promoter ON/OFF times can be non-exponential, hinting at more complex transcriptional regulatory architectures. Given the essential role of gene expression in all cellular functions, efficient computational techniques for characterizing promoter architectures are critically needed. RESULTS: We have developed a novel model reduction for promoters with arbitrary numbers of ON and OFF states, allowing us to approximate complex promoter switching behavior with Weibull-distributed ON/OFF times. Using this model reduction, we created bursty Monte Carlo expectation-maximization with modified cross-entropy method ('bursty MCEM(2)'), an efficient parameter estimation and model selection technique for inferring the number and configuration of promoter states from single-cell gene expression data. Application of bursty MCEM(2) to data from the endogenous mouse glutaminase promoter reveals nearly deterministic promoter OFF times, consistent with a multi-step activation mechanism consisting of 10 or more inactive states. Our novel approach to modeling promoter fluctuations together with bursty MCEM(2) provides powerful tools for characterizing transcriptional bursting across genes under different environmental conditions. AVAILABILITY AND IMPLEMENTATION: R source code implementing bursty MCEM(2) is available upon request. CONTACT: absingh@udel.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression , Models, Genetic , Animals , Computer Simulation , Glutaminase/genetics , Mice , Monte Carlo Method , Promoter Regions, Genetic , Single-Cell Analysis , Stochastic Processes
8.
BMC Med Inform Decis Mak ; 16(1): 124, 2016 Sep 22.
Article in English | MEDLINE | ID: mdl-27658851

ABSTRACT

BACKGROUND: Trauma is the leading cause of death between the ages of 1 to 44 in the United States. Blood loss is the primary cause of these deaths. The discrimination of states through which patients transition would be helpful in understanding the disease process, and in identification of critical states and appropriate interventions. Even though these states are strongly associated with patients' blood composition data, there has not been a way to directly identify them. Statistical tools such as hidden Markov models can be used to infer the discrete latent states from the blood composition data. METHODS: We applied a hidden Markov model to time-series multivariate patient measurements from the UCSF/ San Francisco General Hospital and Trauma Center. Ten blood factor related measurements were used to identify the model: factors II, V, VII, VIII, IX, X, antithrombin III, protein C, prothrombin time and partial thromboplastin time. Missing data in the time-series dataset was considered in the hidden Markov model. The number of states was determined by minimizing the Bayesian information criterion across different numbers of states. RESULTS: After preprocessing, 1090 patients with a total number of 2176 time point measurements were included in the analysis. The hidden Markov model identified 6 disease states and 3 stages. We analyzed their relationships to the blood composition data and the coagulation cascade. The states are very indicative of the disease progression status of patients. CONCLUSIONS: Six disease states and 3 stages associated with Coagulopathy in trauma were identified in our study. The hidden Markov model can be useful in identifying latent states by using patients' time-series multivariate data. The information obtained from the states and stages can be useful in the clinical setting.

9.
J Mol Graph Model ; 122: 108488, 2023 07.
Article in English | MEDLINE | ID: mdl-37121167

ABSTRACT

Pharmacophore models are three-dimensional arrangements of molecular features required for biological activity that are used in ligand identification efforts for many biological targets, including G protein-coupled receptors (GPCR). Though GPCR are integral membrane proteins of considerable interest as targets for drug development, many of these receptors lack known ligands or experimentally determined structures necessary for ligand- or structure-based pharmacophore model generation, respectively. Thus, we here present a structure-based pharmacophore modeling approach that uses fragments placed with Multiple Copy Simultaneous Search (MCSS) to generate high-performing pharmacophore models in the context of experimentally determined, as well as modeled GPCR structures. Moreover, we have addressed the oft-neglected topic of pharmacophore model selection via development of a cluster-then-predict machine learning workflow. Herein score-based pharmacophore models were generated in experimentally determined and modeled structures of 13 class A GPCR and resulted in pharmacophore models exhibiting high enrichment factors when used to search a database containing 569 class A GPCR ligands. In addition, classification of pharmacophore models with the best performing cluster-then-predict logistic regression classifier resulted in positive predictive values (PPV) of 0.88 and 0.76 for selecting high enrichment pharmacophore models from among those generated in experimentally determined and modeled structures, respectively.


Subject(s)
Pharmacophore , Receptors, G-Protein-Coupled , Ligands , Receptors, G-Protein-Coupled/chemistry , Signal Transduction , Protein Binding
10.
Cell Rep Med ; 4(5): 101045, 2023 05 16.
Article in English | MEDLINE | ID: mdl-37196634

ABSTRACT

Post-traumatic stress disorder (PTSD) is a multisystem syndrome. Integration of systems-level multi-modal datasets can provide a molecular understanding of PTSD. Proteomic, metabolomic, and epigenomic assays are conducted on blood samples of two cohorts of well-characterized PTSD cases and controls: 340 veterans and 180 active-duty soldiers. All participants had been deployed to Iraq and/or Afghanistan and exposed to military-service-related criterion A trauma. Molecular signatures are identified from a discovery cohort of 218 veterans (109/109 PTSD+/-). Identified molecular signatures are tested in 122 separate veterans (62/60 PTSD+/-) and in 180 active-duty soldiers (PTSD+/-). Molecular profiles are computationally integrated with upstream regulators (genetic/methylation/microRNAs) and functional units (mRNAs/proteins/metabolites). Reproducible molecular features of PTSD are identified, including activated inflammation, oxidative stress, metabolic dysregulation, and impaired angiogenesis. These processes may play a role in psychiatric and physical comorbidities, including impaired repair/wound healing mechanisms and cardiovascular, metabolic, and psychiatric diseases.


Subject(s)
Military Personnel , Stress Disorders, Post-Traumatic , Veterans , Humans , Military Personnel/psychology , Veterans/psychology , Stress Disorders, Post-Traumatic/diagnosis , Stress Disorders, Post-Traumatic/genetics , Stress Disorders, Post-Traumatic/psychology , Proteomics , Inflammation
11.
BMC Bioinformatics ; 13: 12, 2012 Jan 18.
Article in English | MEDLINE | ID: mdl-22257533

ABSTRACT

BACKGROUND: In a complex disease, the expression of many genes can be significantly altered, leading to the appearance of a differentially expressed "disease module". Some of these genes directly correspond to the disease phenotype, (i.e. "driver" genes), while others represent closely-related first-degree neighbours in gene interaction space. The remaining genes consist of further removed "passenger" genes, which are often not directly related to the original cause of the disease. For prognostic and diagnostic purposes, it is crucial to be able to separate the group of "driver" genes and their first-degree neighbours, (i.e. "core module") from the general "disease module". RESULTS: We have developed COMBINER: COre Module Biomarker Identification with Network ExploRation. COMBINER is a novel pathway-based approach for selecting highly reproducible discriminative biomarkers. We applied COMBINER to three benchmark breast cancer datasets for identifying prognostic biomarkers. COMBINER-derived biomarkers exhibited 10-fold higher reproducibility than other methods, with up to 30-fold greater enrichment for known cancer-related genes, and 4-fold enrichment for known breast cancer susceptible genes. More than 50% and 40% of the resulting biomarkers were cancer and breast cancer specific, respectively. The identified modules were overlaid onto a map of intracellular pathways that comprehensively highlighted the hallmarks of cancer. Furthermore, we constructed a global regulatory network intertwining several functional clusters and uncovered 13 confident "driver" genes of breast cancer metastasis. CONCLUSIONS: COMBINER can efficiently and robustly identify disease core module genes and construct their associated regulatory network. In the same way, it is potentially applicable in the characterization of any disease that can be probed with microarrays.


Subject(s)
Biomarkers/analysis , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Gene Regulatory Networks , Genes, Neoplasm , Breast Neoplasms/metabolism , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Protein Interaction Maps
12.
BMC Bioinformatics ; 13: 68, 2012 May 01.
Article in English | MEDLINE | ID: mdl-22548918

ABSTRACT

BACKGROUND: A prerequisite for the mechanistic simulation of a biochemical system is detailed knowledge of its kinetic parameters. Despite recent experimental advances, the estimation of unknown parameter values from observed data is still a bottleneck for obtaining accurate simulation results. Many methods exist for parameter estimation in deterministic biochemical systems; methods for discrete stochastic systems are less well developed. Given the probabilistic nature of stochastic biochemical models, a natural approach is to choose parameter values that maximize the probability of the observed data with respect to the unknown parameters, a.k.a. the maximum likelihood parameter estimates (MLEs). MLE computation for all but the simplest models requires the simulation of many system trajectories that are consistent with experimental data. For models with unknown parameters, this presents a computational challenge, as the generation of consistent trajectories can be an extremely rare occurrence. RESULTS: We have developed Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method (MCEM(2)): an accelerated method for calculating MLEs that combines advances in rare event simulation with a computationally efficient version of the Monte Carlo expectation-maximization (MCEM) algorithm. Our method requires no prior knowledge regarding parameter values, and it automatically provides a multivariate parameter uncertainty estimate. We applied the method to five stochastic systems of increasing complexity, progressing from an analytically tractable pure-birth model to a computationally demanding model of yeast-polarization. Our results demonstrate that MCEM(2) substantially accelerates MLE computation on all tested models when compared to a stand-alone version of MCEM. Additionally, we show how our method identifies parameter values for certain classes of models more accurately than two recently proposed computationally efficient methods. CONCLUSIONS: This work provides a novel, accelerated version of a likelihood-based parameter estimation method that can be readily applied to stochastic biochemical systems. In addition, our results suggest opportunities for added efficiency improvements that will further enhance our ability to mechanistically simulate biological processes.


Subject(s)
Biochemical Phenomena , Computer Simulation/statistics & numerical data , Models, Biological , Monte Carlo Method , Algorithms , Cell Polarity , GTP-Binding Proteins/metabolism , Kinetics , Likelihood Functions , Probability , Saccharomyces cerevisiae/enzymology , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/physiology , Stochastic Processes
13.
CBE Life Sci Educ ; 21(4): ar64, 2022 12.
Article in English | MEDLINE | ID: mdl-36112620

ABSTRACT

Plant awareness disparity (PAD, formerly plant blindness) is the idea that students tend not to notice or appreciate the plants in their environment. This phenomenon often leads to naïve points of view, such as plants are not important or do not do anything for humans. There are four components of PAD: attitude (not liking plants), attention (not noticing plants), knowledge (not understanding the importance of plants), and relative interest (finding animals more interesting than plants). Many interventions have been suggested to prevent PAD, but without an instrument shown to demonstrate valid inferences to measure PAD, it is difficult to tell whether these interventions are successful or not. We have developed and validated the Plant Awareness Disparity Index (PAD-I) to measure PAD and its four components in undergraduate biology students. The study population was 74.32% female and 69.08% white, indicating that the need for further analysis is necessary if this instrument is to be used in a more diverse student population. We collected validity evidence based upon text content, response processes, and internal structure. Our findings demonstrate that our instrument generates reliable inferences regarding PAD with a Cronbach's alpha of 0.884 and a six-factor structure that aligns conceptually with the four components of PAD.


Subject(s)
Attitude , Students , Animals , Humans
14.
PLoS Comput Biol ; 6(3): e1000718, 2010 Mar 26.
Article in English | MEDLINE | ID: mdl-20361040

ABSTRACT

Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58%) of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a freely available software package that is immediately applicable to any human microarray study.


Subject(s)
Algorithms , Databases, Protein , Gene Expression Profiling/methods , Information Storage and Retrieval/methods , Insulin Resistance/physiology , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Proteome/metabolism , Artificial Intelligence
15.
J Chem Phys ; 134(4): 044110, 2011 Jan 28.
Article in English | MEDLINE | ID: mdl-21280690

ABSTRACT

In biochemical systems, the occurrence of a rare event can be accompanied by catastrophic consequences. Precise characterization of these events using Monte Carlo simulation methods is often intractable, as the number of realizations needed to witness even a single rare event can be very large. The weighted stochastic simulation algorithm (wSSA) [J. Chem. Phys. 129, 165101 (2008)] and its subsequent extension [J. Chem. Phys. 130, 174103 (2009)] alleviate this difficulty with importance sampling, which effectively biases the system toward the desired rare event. However, extensive computation coupled with substantial insight into a given system is required, as there is currently no automatic approach for choosing wSSA parameters. We present a novel modification of the wSSA--the doubly weighted SSA (dwSSA)--that makes possible a fully automated parameter selection method. Our approach uses the information-theoretic concept of cross entropy to identify parameter values yielding minimum variance rare event probability estimates. We apply the method to four examples: a pure birth process, a birth-death process, an enzymatic futile cycle, and a yeast polarization model. Our results demonstrate that the proposed method (1) enables probability estimation for a class of rare events that cannot be interrogated with the wSSA, and (2) for all examples tested, reduces the number of runs needed to achieve comparable accuracy by multiple orders of magnitude. For a particular rare event in the yeast polarization model, our method transforms a projected simulation time of 600 years to three hours. Furthermore, by incorporating information-theoretic principles, our approach provides a framework for the development of more sophisticated influencing schemes that should further improve estimation accuracy.


Subject(s)
Automation/methods , Biochemical Phenomena , Molecular Dynamics Simulation , Algorithms , Automation/statistics & numerical data , Probability , Stochastic Processes , Thermodynamics
16.
J Chem Phys ; 135(23): 234108, 2011 Dec 21.
Article in English | MEDLINE | ID: mdl-22191865

ABSTRACT

In recent years there has been substantial growth in the development of algorithms for characterizing rare events in stochastic biochemical systems. Two such algorithms, the state-dependent weighted stochastic simulation algorithm (swSSA) and the doubly weighted SSA (dwSSA) are extensions of the weighted SSA (wSSA) by H. Kuwahara and I. Mura [J. Chem. Phys. 129, 165101 (2008)]. The swSSA substantially reduces estimator variance by implementing system state-dependent importance sampling (IS) parameters, but lacks an automatic parameter identification strategy. In contrast, the dwSSA provides for the automatic determination of state-independent IS parameters, thus it is inefficient for systems whose states vary widely in time. We present a novel modification of the dwSSA--the state-dependent doubly weighted SSA (sdwSSA)--that combines the strengths of the swSSA and the dwSSA without inheriting their weaknesses. The sdwSSA automatically computes state-dependent IS parameters via the multilevel cross-entropy method. We apply the method to three examples: a reversible isomerization process, a yeast polarization model, and a lac operon model. Our results demonstrate that the sdwSSA offers substantial improvements over previous methods in terms of both accuracy and efficiency.


Subject(s)
Algorithms , Biochemical Phenomena , Molecular Dynamics Simulation , Stochastic Processes , GTP-Binding Proteins/chemistry , Isomerism , Lac Operon , Probability , Thermodynamics
17.
Transl Psychiatry ; 11(1): 227, 2021 04 20.
Article in English | MEDLINE | ID: mdl-33879773

ABSTRACT

We sought to find clinical subtypes of posttraumatic stress disorder (PTSD) in veterans 6-10 years post-trauma exposure based on current symptom assessments and to examine whether blood biomarkers could differentiate them. Samples were males deployed to Iraq and Afghanistan studied by the PTSD Systems Biology Consortium: a discovery sample of 74 PTSD cases and 71 healthy controls (HC), and a validation sample of 26 PTSD cases and 36 HC. A machine learning method, random forests (RF), in conjunction with a clustering method, partitioning around medoids, were used to identify subtypes derived from 16 self-report and clinician assessment scales, including the clinician-administered PTSD scale for DSM-IV (CAPS). Two subtypes were identified, designated S1 and S2, differing on mean current CAPS total scores: S2 = 75.6 (sd 14.6) and S1 = 54.3 (sd 6.6). S2 had greater symptom severity scores than both S1 and HC on all scale items. The mean first principal component score derived from clinical summary scales was three times higher in S2 than in S1. Distinct RFs were grown to classify S1 and S2 vs. HCs and vs. each other on multi-omic blood markers feature classes of current medical comorbidities, neurocognitive functioning, demographics, pre-military trauma, and psychiatric history. Among these classes, in each RF intergroup comparison of S1, S2, and HC, multi-omic biomarkers yielded the highest AUC-ROCs (0.819-0.922); other classes added little to further discrimination of the subtypes. Among the top five biomarkers in each of these RFs were methylation, micro RNA, and lactate markers, suggesting their biological role in symptom severity.


Subject(s)
Military Personnel , Stress Disorders, Post-Traumatic , Veterans , Diagnostic and Statistical Manual of Mental Disorders , Humans , Machine Learning , Male , Stress Disorders, Post-Traumatic/diagnosis
18.
J Biomed Inform ; 43(6): 932-44, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20619355

ABSTRACT

As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anti-cancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.


Subject(s)
Data Mining/methods , Gene Expression Profiling/methods , Gene Expression , Oligonucleotide Array Sequence Analysis/methods , Databases, Genetic , Gene Regulatory Networks , Humans , Polysaccharides/metabolism , Principal Component Analysis , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/metabolism
19.
PeerJ ; 8: e8668, 2020.
Article in English | MEDLINE | ID: mdl-32201640

ABSTRACT

Histopathological images contain rich phenotypic descriptions of the molecular processes underlying disease progression. Convolutional neural networks, state-of-the-art image analysis techniques in computer vision, automatically learn representative features from such images which can be useful for disease diagnosis, prognosis, and subtyping. Hepatocellular carcinoma (HCC) is the sixth most common type of primary liver malignancy. Despite the high mortality rate of HCC, little previous work has made use of CNN models to explore the use of histopathological images for prognosis and clinical survival prediction of HCC. We applied three pre-trained CNN models-VGG 16, Inception V3 and ResNet 50-to extract features from HCC histopathological images. Sample visualization and classification analyses based on these features showed a very clear separation between cancer and normal samples. In a univariate Cox regression analysis, 21.4% and 16% of image features on average were significantly associated with overall survival (OS) and disease-free survival (DFS), respectively. We also observed significant correlations between these features and integrated biological pathways derived from gene expression and copy number variation. Using an elastic net regularized Cox Proportional Hazards model of OS constructed from Inception image features, we obtained a concordance index (C-index) of 0.789 and a significant log-rank test (p = 7.6E-18). We also performed unsupervised classification to identify HCC subgroups from image features. The optimal two subgroups discovered using Inception model image features showed significant differences in both overall (C-index = 0.628 and p = 7.39E-07) and DFS (C-index = 0.558 and p = 0.012). Our work demonstrates the utility of extracting image features using pre-trained models by using them to build accurate prognostic models of HCC as well as highlight significant correlations between these features, clinical survival, and relevant biological pathways. Image features extracted from HCC histopathological images using the pre-trained CNN models VGG 16, Inception V3 and ResNet 50 can accurately distinguish normal and cancer samples. Furthermore, these image features are significantly correlated with survival and relevant biological pathways.

20.
BMC Bioinformatics ; 9: 214, 2008 Apr 25.
Article in English | MEDLINE | ID: mdl-18439292

ABSTRACT

BACKGROUND: The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes. RESULTS: M-BISON improves signal detection on a range of simulated data, particularly when using very noisy microarray data. We also applied the method to the task of predicting heat shock-related differentially expressed genes in S. cerevisiae, using an hsf1 mutant microarray dataset and conserved yeast DNA sequence motifs. Our results demonstrate that M-BISON improves the analysis quality and makes predictions that are easy to interpret in concert with incorporated knowledge. Specifically, M-BISON increases the AUC of DE gene prediction from .541 to .623 when compared to a method using only microarray data, and M-BISON outperforms a related method, GeneRank. Furthermore, by analyzing M-BISON predictions in the context of the background knowledge, we identified YHR124W as a potentially novel player in the yeast heat shock response. CONCLUSION: This work provides a solid foundation for the principled integration of imperfect biological knowledge with gene expression data and other high-throughput data sources.


Subject(s)
Algorithms , Databases, Genetic , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Proteome/metabolism , Signal Transduction/physiology , Software , Computer Simulation , Data Interpretation, Statistical , Models, Biological , Systems Integration
SELECTION OF CITATIONS
SEARCH DETAIL