Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Mol Psychiatry ; 26(8): 4300-4314, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-33339956

RESUMEN

Post-traumatic stress disorder (PTSD) is a heterogeneous condition evidenced by the absence of objective physiological measurements applicable to all who meet the criteria for the disorder as well as divergent responses to treatments. This study capitalized on biological diversity observed within the PTSD group observed following epigenome-wide analysis of a well-characterized Discovery cohort (N = 166) consisting of 83 male combat exposed veterans with PTSD, and 83 combat veterans without PTSD in order to identify patterns that might distinguish subtypes. Computational analysis of DNA methylation (DNAm) profiles identified two PTSD biotypes within the PTSD+ group, G1 and G2, associated with 34 clinical features that are associated with PTSD and PTSD comorbidities. The G2 biotype was associated with an increased PTSD risk and had higher polygenic risk scores and a greater methylation compared to the G1 biotype and healthy controls. The findings were validated at a 3-year follow-up (N = 59) of the same individuals as well as in two independent, veteran cohorts (N = 54 and N = 38), and an active duty cohort (N = 133). In some cases, for example Dopamine-PKA-CREB and GABA-PKC-CREB signaling pathways, the biotypes were oppositely dysregulated, suggesting that the biotypes were not simply a function of a dimensional relationship with symptom severity, but may represent distinct biological risk profiles underpinning PTSD. The identification of two novel distinct epigenetic biotypes for PTSD may have future utility in understanding biological and clinical heterogeneity in PTSD and potential applications in risk assessment for active duty military personnel under non-clinician-administered settings, and improvement of PTSD diagnostic markers.


Asunto(s)
Personal Militar , Trastornos por Estrés Postraumático , Veteranos , Epigénesis Genética/genética , Epigenoma , Humanos , Masculino , Trastornos por Estrés Postraumático/genética
2.
BMC Bioinformatics ; 22(1): 44, 2021 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-33535967

RESUMEN

BACKGROUND: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. RESULTS: In this study, we propose a novel differential expression and feature selection method-GEOlimma-which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. CONCLUSIONS: Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.


Asunto(s)
Biología Computacional , Perfilación de la Expresión Génica , Teorema de Bayes , Niño , Femenino , Humanos , Masculino , Análisis de Secuencia por Matrices de Oligonucleótidos , Tamaño de la Muestra
3.
Mol Psychiatry ; 25(12): 3337-3349, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-31501510

RESUMEN

Post-traumatic stress disorder (PTSD) impacts many veterans and active duty soldiers, but diagnosis can be problematic due to biases in self-disclosure of symptoms, stigma within military populations, and limitations identifying those at risk. Prior studies suggest that PTSD may be a systemic illness, affecting not just the brain, but the entire body. Therefore, disease signals likely span multiple biological domains, including genes, proteins, cells, tissues, and organism-level physiological changes. Identification of these signals could aid in diagnostics, treatment decision-making, and risk evaluation. In the search for PTSD diagnostic biomarkers, we ascertained over one million molecular, cellular, physiological, and clinical features from three cohorts of male veterans. In a discovery cohort of 83 warzone-related PTSD cases and 82 warzone-exposed controls, we identified a set of 343 candidate biomarkers. These candidate biomarkers were selected from an integrated approach using (1) data-driven methods, including Support Vector Machine with Recursive Feature Elimination and other standard or published methodologies, and (2) hypothesis-driven approaches, using previous genetic studies for polygenic risk, or other PTSD-related literature. After reassessment of ~30% of these participants, we refined this set of markers from 343 to 28, based on their performance and ability to track changes in phenotype over time. The final diagnostic panel of 28 features was validated in an independent cohort (26 cases, 26 controls) with good performance (AUC = 0.80, 81% accuracy, 85% sensitivity, and 77% specificity). The identification and validation of this diverse diagnostic panel represents a powerful and novel approach to improve accuracy and reduce bias in diagnosing combat-related PTSD.


Asunto(s)
Personal Militar , Trastornos por Estrés Postraumático , Veteranos , Biomarcadores , Encéfalo , Humanos , Masculino , Trastornos por Estrés Postraumático/diagnóstico , Trastornos por Estrés Postraumático/genética
4.
Bioinformatics ; 35(21): 4411-4412, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31038667

RESUMEN

SUMMARY: Although heteroplasmy has been studied extensively in animal systems, there is a lack of tools for analyzing, exploring and visualizing heteroplasmy at the genome-wide level in other taxonomic systems. We introduce icHET, which is a computational workflow that produces an interactive visualization that facilitates the exploration, analysis and discovery of heteroplasmy across multiple genomic samples. icHET works on short reads from multiple samples from any organism with an organellar reference genome (mitochondrial or plastid) and a nuclear reference genome. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/vtphan/HeteroplasmyWorkflow. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Programas Informáticos , Animales , Genoma , Flujo de Trabajo
5.
PLoS Comput Biol ; 12(12): e1005220, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27930676

RESUMEN

We present StochSS: Stochastic Simulation as a Service, an integrated development environment for modeling and simulation of both deterministic and discrete stochastic biochemical systems in up to three dimensions. An easy to use graphical user interface enables researchers to quickly develop and simulate a biological model on a desktop or laptop, which can then be expanded to incorporate increasing levels of complexity. StochSS features state-of-the-art simulation engines. As the demand for computational power increases, StochSS can seamlessly scale computing resources in the cloud. In addition, StochSS can be deployed as a multi-user software environment where collaborators share computational resources and exchange models via a public model repository. We demonstrate the capabilities and ease of use of StochSS with an example of model development and simulation at increasing levels of complexity.


Asunto(s)
Biología Computacional/métodos , Simulación por Computador , Programas Informáticos , Procesos Estocásticos
6.
Bioinformatics ; 31(9): 1428-35, 2015 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-25573914

RESUMEN

MOTIVATION: Stochastic promoter switching between transcriptionally active (ON) and inactive (OFF) states is a major source of noise in gene expression. It is often implicitly assumed that transitions between promoter states are memoryless, i.e. promoters spend an exponentially distributed time interval in each of the two states. However, increasing evidence suggests that promoter ON/OFF times can be non-exponential, hinting at more complex transcriptional regulatory architectures. Given the essential role of gene expression in all cellular functions, efficient computational techniques for characterizing promoter architectures are critically needed. RESULTS: We have developed a novel model reduction for promoters with arbitrary numbers of ON and OFF states, allowing us to approximate complex promoter switching behavior with Weibull-distributed ON/OFF times. Using this model reduction, we created bursty Monte Carlo expectation-maximization with modified cross-entropy method ('bursty MCEM(2)'), an efficient parameter estimation and model selection technique for inferring the number and configuration of promoter states from single-cell gene expression data. Application of bursty MCEM(2) to data from the endogenous mouse glutaminase promoter reveals nearly deterministic promoter OFF times, consistent with a multi-step activation mechanism consisting of 10 or more inactive states. Our novel approach to modeling promoter fluctuations together with bursty MCEM(2) provides powerful tools for characterizing transcriptional bursting across genes under different environmental conditions. AVAILABILITY AND IMPLEMENTATION: R source code implementing bursty MCEM(2) is available upon request. CONTACT: absingh@udel.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Expresión Génica , Modelos Genéticos , Animales , Simulación por Computador , Glutaminasa/genética , Ratones , Método de Montecarlo , Regiones Promotoras Genéticas , Análisis de la Célula Individual , Procesos Estocásticos
7.
BMC Med Inform Decis Mak ; 16(1): 124, 2016 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-27658851

RESUMEN

BACKGROUND: Trauma is the leading cause of death between the ages of 1 to 44 in the United States. Blood loss is the primary cause of these deaths. The discrimination of states through which patients transition would be helpful in understanding the disease process, and in identification of critical states and appropriate interventions. Even though these states are strongly associated with patients' blood composition data, there has not been a way to directly identify them. Statistical tools such as hidden Markov models can be used to infer the discrete latent states from the blood composition data. METHODS: We applied a hidden Markov model to time-series multivariate patient measurements from the UCSF/ San Francisco General Hospital and Trauma Center. Ten blood factor related measurements were used to identify the model: factors II, V, VII, VIII, IX, X, antithrombin III, protein C, prothrombin time and partial thromboplastin time. Missing data in the time-series dataset was considered in the hidden Markov model. The number of states was determined by minimizing the Bayesian information criterion across different numbers of states. RESULTS: After preprocessing, 1090 patients with a total number of 2176 time point measurements were included in the analysis. The hidden Markov model identified 6 disease states and 3 stages. We analyzed their relationships to the blood composition data and the coagulation cascade. The states are very indicative of the disease progression status of patients. CONCLUSIONS: Six disease states and 3 stages associated with Coagulopathy in trauma were identified in our study. The hidden Markov model can be useful in identifying latent states by using patients' time-series multivariate data. The information obtained from the states and stages can be useful in the clinical setting.

8.
J Mol Graph Model ; 122: 108488, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37121167

RESUMEN

Pharmacophore models are three-dimensional arrangements of molecular features required for biological activity that are used in ligand identification efforts for many biological targets, including G protein-coupled receptors (GPCR). Though GPCR are integral membrane proteins of considerable interest as targets for drug development, many of these receptors lack known ligands or experimentally determined structures necessary for ligand- or structure-based pharmacophore model generation, respectively. Thus, we here present a structure-based pharmacophore modeling approach that uses fragments placed with Multiple Copy Simultaneous Search (MCSS) to generate high-performing pharmacophore models in the context of experimentally determined, as well as modeled GPCR structures. Moreover, we have addressed the oft-neglected topic of pharmacophore model selection via development of a cluster-then-predict machine learning workflow. Herein score-based pharmacophore models were generated in experimentally determined and modeled structures of 13 class A GPCR and resulted in pharmacophore models exhibiting high enrichment factors when used to search a database containing 569 class A GPCR ligands. In addition, classification of pharmacophore models with the best performing cluster-then-predict logistic regression classifier resulted in positive predictive values (PPV) of 0.88 and 0.76 for selecting high enrichment pharmacophore models from among those generated in experimentally determined and modeled structures, respectively.


Asunto(s)
Farmacóforo , Receptores Acoplados a Proteínas G , Ligandos , Receptores Acoplados a Proteínas G/química , Transducción de Señal , Unión Proteica
9.
Cell Rep Med ; 4(5): 101045, 2023 05 16.
Artículo en Inglés | MEDLINE | ID: mdl-37196634

RESUMEN

Post-traumatic stress disorder (PTSD) is a multisystem syndrome. Integration of systems-level multi-modal datasets can provide a molecular understanding of PTSD. Proteomic, metabolomic, and epigenomic assays are conducted on blood samples of two cohorts of well-characterized PTSD cases and controls: 340 veterans and 180 active-duty soldiers. All participants had been deployed to Iraq and/or Afghanistan and exposed to military-service-related criterion A trauma. Molecular signatures are identified from a discovery cohort of 218 veterans (109/109 PTSD+/-). Identified molecular signatures are tested in 122 separate veterans (62/60 PTSD+/-) and in 180 active-duty soldiers (PTSD+/-). Molecular profiles are computationally integrated with upstream regulators (genetic/methylation/microRNAs) and functional units (mRNAs/proteins/metabolites). Reproducible molecular features of PTSD are identified, including activated inflammation, oxidative stress, metabolic dysregulation, and impaired angiogenesis. These processes may play a role in psychiatric and physical comorbidities, including impaired repair/wound healing mechanisms and cardiovascular, metabolic, and psychiatric diseases.


Asunto(s)
Personal Militar , Trastornos por Estrés Postraumático , Veteranos , Humanos , Personal Militar/psicología , Veteranos/psicología , Trastornos por Estrés Postraumático/diagnóstico , Trastornos por Estrés Postraumático/genética , Trastornos por Estrés Postraumático/psicología , Proteómica , Inflamación
10.
BMC Bioinformatics ; 13: 12, 2012 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-22257533

RESUMEN

BACKGROUND: In a complex disease, the expression of many genes can be significantly altered, leading to the appearance of a differentially expressed "disease module". Some of these genes directly correspond to the disease phenotype, (i.e. "driver" genes), while others represent closely-related first-degree neighbours in gene interaction space. The remaining genes consist of further removed "passenger" genes, which are often not directly related to the original cause of the disease. For prognostic and diagnostic purposes, it is crucial to be able to separate the group of "driver" genes and their first-degree neighbours, (i.e. "core module") from the general "disease module". RESULTS: We have developed COMBINER: COre Module Biomarker Identification with Network ExploRation. COMBINER is a novel pathway-based approach for selecting highly reproducible discriminative biomarkers. We applied COMBINER to three benchmark breast cancer datasets for identifying prognostic biomarkers. COMBINER-derived biomarkers exhibited 10-fold higher reproducibility than other methods, with up to 30-fold greater enrichment for known cancer-related genes, and 4-fold enrichment for known breast cancer susceptible genes. More than 50% and 40% of the resulting biomarkers were cancer and breast cancer specific, respectively. The identified modules were overlaid onto a map of intracellular pathways that comprehensively highlighted the hallmarks of cancer. Furthermore, we constructed a global regulatory network intertwining several functional clusters and uncovered 13 confident "driver" genes of breast cancer metastasis. CONCLUSIONS: COMBINER can efficiently and robustly identify disease core module genes and construct their associated regulatory network. In the same way, it is potentially applicable in the characterization of any disease that can be probed with microarrays.


Asunto(s)
Biomarcadores/análisis , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Redes Reguladoras de Genes , Genes Relacionados con las Neoplasias , Neoplasias de la Mama/metabolismo , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Mapas de Interacción de Proteínas
11.
BMC Bioinformatics ; 13: 68, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-22548918

RESUMEN

BACKGROUND: A prerequisite for the mechanistic simulation of a biochemical system is detailed knowledge of its kinetic parameters. Despite recent experimental advances, the estimation of unknown parameter values from observed data is still a bottleneck for obtaining accurate simulation results. Many methods exist for parameter estimation in deterministic biochemical systems; methods for discrete stochastic systems are less well developed. Given the probabilistic nature of stochastic biochemical models, a natural approach is to choose parameter values that maximize the probability of the observed data with respect to the unknown parameters, a.k.a. the maximum likelihood parameter estimates (MLEs). MLE computation for all but the simplest models requires the simulation of many system trajectories that are consistent with experimental data. For models with unknown parameters, this presents a computational challenge, as the generation of consistent trajectories can be an extremely rare occurrence. RESULTS: We have developed Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method (MCEM(2)): an accelerated method for calculating MLEs that combines advances in rare event simulation with a computationally efficient version of the Monte Carlo expectation-maximization (MCEM) algorithm. Our method requires no prior knowledge regarding parameter values, and it automatically provides a multivariate parameter uncertainty estimate. We applied the method to five stochastic systems of increasing complexity, progressing from an analytically tractable pure-birth model to a computationally demanding model of yeast-polarization. Our results demonstrate that MCEM(2) substantially accelerates MLE computation on all tested models when compared to a stand-alone version of MCEM. Additionally, we show how our method identifies parameter values for certain classes of models more accurately than two recently proposed computationally efficient methods. CONCLUSIONS: This work provides a novel, accelerated version of a likelihood-based parameter estimation method that can be readily applied to stochastic biochemical systems. In addition, our results suggest opportunities for added efficiency improvements that will further enhance our ability to mechanistically simulate biological processes.


Asunto(s)
Fenómenos Bioquímicos , Simulación por Computador/estadística & datos numéricos , Modelos Biológicos , Método de Montecarlo , Algoritmos , Polaridad Celular , Proteínas de Unión al GTP/metabolismo , Cinética , Funciones de Verosimilitud , Probabilidad , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/fisiología , Procesos Estocásticos
12.
CBE Life Sci Educ ; 21(4): ar64, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36112620

RESUMEN

Plant awareness disparity (PAD, formerly plant blindness) is the idea that students tend not to notice or appreciate the plants in their environment. This phenomenon often leads to naïve points of view, such as plants are not important or do not do anything for humans. There are four components of PAD: attitude (not liking plants), attention (not noticing plants), knowledge (not understanding the importance of plants), and relative interest (finding animals more interesting than plants). Many interventions have been suggested to prevent PAD, but without an instrument shown to demonstrate valid inferences to measure PAD, it is difficult to tell whether these interventions are successful or not. We have developed and validated the Plant Awareness Disparity Index (PAD-I) to measure PAD and its four components in undergraduate biology students. The study population was 74.32% female and 69.08% white, indicating that the need for further analysis is necessary if this instrument is to be used in a more diverse student population. We collected validity evidence based upon text content, response processes, and internal structure. Our findings demonstrate that our instrument generates reliable inferences regarding PAD with a Cronbach's alpha of 0.884 and a six-factor structure that aligns conceptually with the four components of PAD.


Asunto(s)
Actitud , Estudiantes , Animales , Humanos
13.
PLoS Comput Biol ; 6(3): e1000718, 2010 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-20361040

RESUMEN

Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58%) of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a freely available software package that is immediately applicable to any human microarray study.


Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Perfilación de la Expresión Génica/métodos , Almacenamiento y Recuperación de la Información/métodos , Resistencia a la Insulina/fisiología , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteoma/metabolismo , Inteligencia Artificial
14.
J Chem Phys ; 134(4): 044110, 2011 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-21280690

RESUMEN

In biochemical systems, the occurrence of a rare event can be accompanied by catastrophic consequences. Precise characterization of these events using Monte Carlo simulation methods is often intractable, as the number of realizations needed to witness even a single rare event can be very large. The weighted stochastic simulation algorithm (wSSA) [J. Chem. Phys. 129, 165101 (2008)] and its subsequent extension [J. Chem. Phys. 130, 174103 (2009)] alleviate this difficulty with importance sampling, which effectively biases the system toward the desired rare event. However, extensive computation coupled with substantial insight into a given system is required, as there is currently no automatic approach for choosing wSSA parameters. We present a novel modification of the wSSA--the doubly weighted SSA (dwSSA)--that makes possible a fully automated parameter selection method. Our approach uses the information-theoretic concept of cross entropy to identify parameter values yielding minimum variance rare event probability estimates. We apply the method to four examples: a pure birth process, a birth-death process, an enzymatic futile cycle, and a yeast polarization model. Our results demonstrate that the proposed method (1) enables probability estimation for a class of rare events that cannot be interrogated with the wSSA, and (2) for all examples tested, reduces the number of runs needed to achieve comparable accuracy by multiple orders of magnitude. For a particular rare event in the yeast polarization model, our method transforms a projected simulation time of 600 years to three hours. Furthermore, by incorporating information-theoretic principles, our approach provides a framework for the development of more sophisticated influencing schemes that should further improve estimation accuracy.


Asunto(s)
Automatización/métodos , Fenómenos Bioquímicos , Simulación de Dinámica Molecular , Algoritmos , Automatización/estadística & datos numéricos , Probabilidad , Procesos Estocásticos , Termodinámica
15.
J Chem Phys ; 135(23): 234108, 2011 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-22191865

RESUMEN

In recent years there has been substantial growth in the development of algorithms for characterizing rare events in stochastic biochemical systems. Two such algorithms, the state-dependent weighted stochastic simulation algorithm (swSSA) and the doubly weighted SSA (dwSSA) are extensions of the weighted SSA (wSSA) by H. Kuwahara and I. Mura [J. Chem. Phys. 129, 165101 (2008)]. The swSSA substantially reduces estimator variance by implementing system state-dependent importance sampling (IS) parameters, but lacks an automatic parameter identification strategy. In contrast, the dwSSA provides for the automatic determination of state-independent IS parameters, thus it is inefficient for systems whose states vary widely in time. We present a novel modification of the dwSSA--the state-dependent doubly weighted SSA (sdwSSA)--that combines the strengths of the swSSA and the dwSSA without inheriting their weaknesses. The sdwSSA automatically computes state-dependent IS parameters via the multilevel cross-entropy method. We apply the method to three examples: a reversible isomerization process, a yeast polarization model, and a lac operon model. Our results demonstrate that the sdwSSA offers substantial improvements over previous methods in terms of both accuracy and efficiency.


Asunto(s)
Algoritmos , Fenómenos Bioquímicos , Simulación de Dinámica Molecular , Procesos Estocásticos , Proteínas de Unión al GTP/química , Isomerismo , Operón Lac , Probabilidad , Termodinámica
16.
Transl Psychiatry ; 11(1): 227, 2021 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-33879773

RESUMEN

We sought to find clinical subtypes of posttraumatic stress disorder (PTSD) in veterans 6-10 years post-trauma exposure based on current symptom assessments and to examine whether blood biomarkers could differentiate them. Samples were males deployed to Iraq and Afghanistan studied by the PTSD Systems Biology Consortium: a discovery sample of 74 PTSD cases and 71 healthy controls (HC), and a validation sample of 26 PTSD cases and 36 HC. A machine learning method, random forests (RF), in conjunction with a clustering method, partitioning around medoids, were used to identify subtypes derived from 16 self-report and clinician assessment scales, including the clinician-administered PTSD scale for DSM-IV (CAPS). Two subtypes were identified, designated S1 and S2, differing on mean current CAPS total scores: S2 = 75.6 (sd 14.6) and S1 = 54.3 (sd 6.6). S2 had greater symptom severity scores than both S1 and HC on all scale items. The mean first principal component score derived from clinical summary scales was three times higher in S2 than in S1. Distinct RFs were grown to classify S1 and S2 vs. HCs and vs. each other on multi-omic blood markers feature classes of current medical comorbidities, neurocognitive functioning, demographics, pre-military trauma, and psychiatric history. Among these classes, in each RF intergroup comparison of S1, S2, and HC, multi-omic biomarkers yielded the highest AUC-ROCs (0.819-0.922); other classes added little to further discrimination of the subtypes. Among the top five biomarkers in each of these RFs were methylation, micro RNA, and lactate markers, suggesting their biological role in symptom severity.


Asunto(s)
Personal Militar , Trastornos por Estrés Postraumático , Veteranos , Manual Diagnóstico y Estadístico de los Trastornos Mentales , Humanos , Aprendizaje Automático , Masculino , Trastornos por Estrés Postraumático/diagnóstico
17.
J Biomed Inform ; 43(6): 932-44, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20619355

RESUMEN

As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anti-cancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.


Asunto(s)
Minería de Datos/métodos , Perfilación de la Expresión Génica/métodos , Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Bases de Datos Genéticas , Redes Reguladoras de Genes , Humanos , Polisacáridos/metabolismo , Análisis de Componente Principal , Receptores de Antígenos de Linfocitos T/genética , Receptores de Antígenos de Linfocitos T/metabolismo
18.
PeerJ ; 8: e8668, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32201640

RESUMEN

Histopathological images contain rich phenotypic descriptions of the molecular processes underlying disease progression. Convolutional neural networks, state-of-the-art image analysis techniques in computer vision, automatically learn representative features from such images which can be useful for disease diagnosis, prognosis, and subtyping. Hepatocellular carcinoma (HCC) is the sixth most common type of primary liver malignancy. Despite the high mortality rate of HCC, little previous work has made use of CNN models to explore the use of histopathological images for prognosis and clinical survival prediction of HCC. We applied three pre-trained CNN models-VGG 16, Inception V3 and ResNet 50-to extract features from HCC histopathological images. Sample visualization and classification analyses based on these features showed a very clear separation between cancer and normal samples. In a univariate Cox regression analysis, 21.4% and 16% of image features on average were significantly associated with overall survival (OS) and disease-free survival (DFS), respectively. We also observed significant correlations between these features and integrated biological pathways derived from gene expression and copy number variation. Using an elastic net regularized Cox Proportional Hazards model of OS constructed from Inception image features, we obtained a concordance index (C-index) of 0.789 and a significant log-rank test (p = 7.6E-18). We also performed unsupervised classification to identify HCC subgroups from image features. The optimal two subgroups discovered using Inception model image features showed significant differences in both overall (C-index = 0.628 and p = 7.39E-07) and DFS (C-index = 0.558 and p = 0.012). Our work demonstrates the utility of extracting image features using pre-trained models by using them to build accurate prognostic models of HCC as well as highlight significant correlations between these features, clinical survival, and relevant biological pathways. Image features extracted from HCC histopathological images using the pre-trained CNN models VGG 16, Inception V3 and ResNet 50 can accurately distinguish normal and cancer samples. Furthermore, these image features are significantly correlated with survival and relevant biological pathways.

19.
BMC Bioinformatics ; 9: 214, 2008 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-18439292

RESUMEN

BACKGROUND: The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes. RESULTS: M-BISON improves signal detection on a range of simulated data, particularly when using very noisy microarray data. We also applied the method to the task of predicting heat shock-related differentially expressed genes in S. cerevisiae, using an hsf1 mutant microarray dataset and conserved yeast DNA sequence motifs. Our results demonstrate that M-BISON improves the analysis quality and makes predictions that are easy to interpret in concert with incorporated knowledge. Specifically, M-BISON increases the AUC of DE gene prediction from .541 to .623 when compared to a method using only microarray data, and M-BISON outperforms a related method, GeneRank. Furthermore, by analyzing M-BISON predictions in the context of the background knowledge, we identified YHR124W as a potentially novel player in the yeast heat shock response. CONCLUSION: This work provides a solid foundation for the principled integration of imperfect biological knowledge with gene expression data and other high-throughput data sources.


Asunto(s)
Algoritmos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Programas Informáticos , Simulación por Computador , Interpretación Estadística de Datos , Modelos Biológicos , Integración de Sistemas
20.
BMC Syst Biol ; 10(1): 109, 2016 11 25.
Artículo en Inglés | MEDLINE | ID: mdl-27884189

RESUMEN

BACKGROUND: Despite the increasing availability of high performance computing capabilities, analysis and characterization of stochastic biochemical systems remain a computational challenge. To address this challenge, the Stochastic Parameter Search for Events (SParSE) was developed to automatically identify reaction rates that yield a probabilistic user-specified event. SParSE consists of three main components: the multi-level cross-entropy method, which identifies biasing parameters to push the system toward the event of interest, the related inverse biasing method, and an optional interpolation of identified parameters. While effective for many examples, SParSE depends on the existence of a sufficient amount of intrinsic stochasticity in the system of interest. In the absence of this stochasticity, SParSE can either converge slowly or not at all. RESULTS: We have developed SParSE++, a substantially improved algorithm for characterizing target events in terms of system parameters. SParSE++ makes use of a series of novel parameter leaping methods that accelerate the convergence rate to the target event, particularly in low stochasticity cases. In addition, the interpolation stage is modified to compute multiple interpolants and to choose the optimal one in a statistically rigorous manner. We demonstrate the performance of SParSE++ on four example systems: a birth-death process, a reversible isomerization model, SIRS disease dynamics, and a yeast polarization model. In all four cases, SParSE++ shows significantly improved computational efficiency over SParSE, with the largest improvements resulting from analyses with the strictest error tolerances. CONCLUSIONS: As researchers continue to model realistic biochemical systems, the need for efficient methods to characterize target events will grow. The algorithmic advancements provided by SParSE++ fulfill this need, enabling characterization of computationally intensive biochemical events that are currently resistant to analysis.


Asunto(s)
Biología Computacional/métodos , Algoritmos , Muerte , Humanos , Isomerismo , Parto , Saccharomyces cerevisiae/metabolismo , Procesos Estocásticos , Síndrome de Respuesta Inflamatoria Sistémica/transmisión
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA