Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Bioinformatics ; 39(5)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37099704

RESUMEN

MOTIVATION: The human microbiome, which is linked to various diseases by growing evidence, has a profound impact on human health. Since changes in the composition of the microbiome across time are associated with disease and clinical outcomes, microbiome analysis should be performed in a longitudinal study. However, due to limited sample sizes and differing numbers of timepoints for different subjects, a significant amount of data cannot be utilized, directly affecting the quality of analysis results. Deep generative models have been proposed to address this lack of data issue. Specifically, a generative adversarial network (GAN) has been successfully utilized for data augmentation to improve prediction tasks. Recent studies have also shown improved performance of GAN-based models for missing value imputation in a multivariate time series dataset compared with traditional imputation methods. RESULTS: This work proposes DeepMicroGen, a bidirectional recurrent neural network-based GAN model, trained on the temporal relationship between the observations, to impute the missing microbiome samples in longitudinal studies. DeepMicroGen outperforms standard baseline imputation methods, showing the lowest mean absolute error for both simulated and real datasets. Finally, the proposed model improved the predicted clinical outcome for allergies, by providing imputation for an incomplete longitudinal dataset used to train the classifier. AVAILABILITY AND IMPLEMENTATION: DeepMicroGen is publicly available at https://github.com/joungmin-choi/DeepMicroGen.


Asunto(s)
Microbiota , Humanos , Estudios Longitudinales , Redes Neurales de la Computación , Tamaño de la Muestra , Factores de Tiempo
2.
PLoS Comput Biol ; 18(1): e1009847, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35089921

RESUMEN

The cell cycle of Caulobacter crescentus involves the polar morphogenesis and an asymmetric cell division driven by precise interactions and regulations of proteins, which makes Caulobacter an ideal model organism for investigating bacterial cell development and differentiation. The abundance of molecular data accumulated on Caulobacter motivates system biologists to analyze the complex regulatory network of cell cycle via quantitative modeling. In this paper, We propose a comprehensive model to accurately characterize the underlying mechanisms of cell cycle regulation based on the study of: a) chromosome replication and methylation; b) interactive pathways of five master regulatory proteins including DnaA, GcrA, CcrM, CtrA, and SciP, as well as novel consideration of their corresponding mRNAs; c) cell cycle-dependent proteolysis of CtrA through hierarchical protease complexes. The temporal dynamics of our simulation results are able to closely replicate an extensive set of experimental observations and capture the main phenotype of seven mutant strains of Caulobacter crescentus. Collectively, the proposed model can be used to predict phenotypes of other mutant cases, especially for nonviable strains which are hard to cultivate and observe. Moreover, the module of cyclic proteolysis is an efficient tool to study the metabolism of proteins with similar mechanisms.


Asunto(s)
Caulobacter crescentus , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Caulobacter crescentus/genética , Caulobacter crescentus/metabolismo , Ciclo Celular/fisiología , Proteínas de Unión al ADN/metabolismo , Regulación Bacteriana de la Expresión Génica , Proteolisis
3.
Simulation ; 94(11): 993-1008, 2018 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-31303682

RESUMEN

The growing size and complexity of molecular network models makes them increasingly difficult to construct and understand. Modifying a model that consists of tens of reactions is no easy task. Attempting the same on a model containing hundreds of reactions can seem nearly impossible. We present the JigCell Model Connector, a software tool that supports large-scale molecular network modeling. Our approach to developing large models is to combine smaller models, making the result easier to comprehend. At the base, the smaller models (called modules) are defined by small collections of reactions. Modules connect together to form larger modules through clearly defined interfaces, called ports. In this work, we enhance the port concept by defining three types of ports. An output port is linked to an internal component that will send a value. An input port is linked to an internal component that will receive a value. An equivalence port is linked to an internal component that will both receive and send values. Not all modules connect together in the same way; therefore, multiple connection options need to exist.

4.
Hum Genomics ; 9: 18, 2015 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-26223264

RESUMEN

BACKGROUND: Many genetic variants have been identified in the human genome. The functional effects of a single variant have been intensively studied. However, the joint effects of multiple variants in the same genes have been largely ignored due to their complexity or lack of data. This paper uses HMMvar, a hidden Markov model based approach, to investigate the combined effect of multiple variants from the 1000 Genomes Project. Two tumor suppressor genes, TP53 and phosphatase and tensin homolog (PTEN), are also studied for the joint effect of compensatory indel variants. RESULTS: Results show that there are cases where the joint effect of having multiple variants in the same genes is significantly different from that of a single variant. The deleterious effect of a single indel variant can be alleviated by their compensatory indels in TP53 and PTEN. Compound mutations in two genes, ß-MHC and MyBP-C, leading to severer cardiovascular disease compared to single mutations, are also validated. CONCLUSIONS: This paper extends the functionality of HMMvar, a tool for assigning a quantitative score to a variant, to measure not only the deleterious effect of a single variant but also the joint effect of multiple variants. HMMvar is the first tool that can predict the functional effects of both single and general multiple variations on proteins. The precomputed scores for multiple variants from the 1000 Genomes Project and the HMMvar package are available at https://bioinformatics.cs.vt.edu/zhanglab/HMMvar/.


Asunto(s)
Enfermedades Cardiovasculares/genética , Variación Genética/genética , Fosfohidrolasa PTEN/genética , Proteína p53 Supresora de Tumor/genética , Enfermedades Cardiovasculares/patología , Genoma Humano , Proyecto Genoma Humano , Humanos , Mutación INDEL/genética , Cadenas de Markov , Polimorfismo de Nucleótido Simple
5.
BMC Bioinformatics ; 16: 351, 2015 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-26518340

RESUMEN

BACKGROUND: Numerous tools have been developed to predict the fitness effects (i.e., neutral, deleterious, or beneficial) of genetic variants on corresponding proteins. However, prediction in terms of whether a variant causes the variant bearing protein to lose the original function or gain new function is also needed for better understanding of how the variant contributes to disease/cancer. To address this problem, the present work introduces and computationally defines four types of functional outcome of a variant: gain, loss, switch, and conservation of function. The deployment of multiple hidden Markov models is proposed to computationally classify mutations by the four functional impact types. RESULTS: The functional outcome is predicted for over a hundred thyroid stimulating hormone receptor (TSHR) mutations, as well as cancer related mutations in oncogenes or tumor suppressor genes. The results show that the proposed computational method is effective in fine grained prediction of the functional outcome of a mutation, and can be used to help elucidate the molecular mechanism of disease/cancer causing mutations. The program is freely available at http://bioinformatics.cs.vt.edu/zhanglab/HMMvar/download.php. CONCLUSION: This work is the first to computationally define and predict functional impact of mutations, loss, switch, gain, or conservation of function. These fine grained predictions can be especially useful for identifying mutations that cause or are linked to cancer.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Humanos , Internet , Cadenas de Markov , Mutación , Neoplasias/genética , Neoplasias/patología , Receptores de Tirotropina/genética , Interfaz Usuario-Computador
6.
BMC Bioinformatics ; 15: 5, 2014 Jan 09.
Artículo en Inglés | MEDLINE | ID: mdl-24405700

RESUMEN

BACKGROUND: With the development of sequencing technologies, more and more sequence variants are available for investigation. Different classes of variants in the human genome have been identified, including single nucleotide substitutions, insertion and deletion, and large structural variations such as duplications and deletions. Insertion and deletion (indel) variants comprise a major proportion of human genetic variation. However, little is known about their effects on humans. The absence of understanding is largely due to the lack of both biological data and computational resources. RESULTS: This paper presents a new indel functional prediction method HMMvar based on HMM profiles, which capture the conservation information in sequences. The results demonstrate that a scoring strategy based on HMM profiles can achieve good performance in identifying deleterious or neutral variants for different data sets, and can predict the protein functional effects of both single and multiple mutations. CONCLUSIONS: This paper proposed a quantitative prediction method, HMMvar, to predict the effect of genetic variation using hidden Markov models. The HMM based pipeline program implementing the method HMMvar is freely available at https://bioinformatics.cs.vt.edu/zhanglab/hmm.


Asunto(s)
Variación Genética , Genoma Humano/genética , Mutación INDEL/genética , Mutación INDEL/fisiología , Biología Computacional/métodos , Genoma Humano/fisiología , Humanos , Cadenas de Markov , Modelos Genéticos , Modelos Estadísticos , Proteínas/genética , Proteínas/metabolismo , Proteínas/fisiología , Curva ROC
7.
Proc Natl Acad Sci U S A ; 107(28): 12511-6, 2010 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-20571120

RESUMEN

Biological processes such as circadian rhythms, cell division, metabolism, and development occur as ordered sequences of events. The synchronization of these coordinated events is essential for proper cell function, and hence the determination of critical time points in biological processes is an important component of all biological investigations. In particular, such critical time points establish logical ordering constraints on subprocesses, impose prerequisites on temporal regulation and spatial compartmentalization, and situate dynamic reorganization of functional elements in preparation for subsequent stages. Thus, building temporal phenomenological representations of biological processes from genome-wide datasets is relevant in formulating biological hypotheses on: how processes are mechanistically regulated; how the regulations vary on an evolutionary scale, and how their inadvertent disregulation leads to a diseased state or fatality. This paper presents a general framework (GOALIE) to reconstruct temporal models of cellular processes from time-course gene expression data. We mathematically formulate the problem as one of optimally segmenting datasets into a succession of "informative" windows such that time points within a window expose concerted clusters of gene action whereas time points straddling window boundaries constitute points of significant restructuring. We illustrate here how GOALIE successfully brings out the interplay between multiple yeast processes, inferred from combined experimental datasets for the cell cycle and the metabolic cycle.


Asunto(s)
Fenómenos Fisiológicos Celulares , Fenómenos Biológicos , Ciclo Celular/genética , División Celular , Análisis por Conglomerados , Expresión Génica , Saccharomyces cerevisiae/genética
8.
BMC Bioinformatics ; 12: 191, 2011 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-21635719

RESUMEN

BACKGROUND: The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before. RESULTS: In this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent. Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model. Results show a pattern consistent with this working hypothesis. Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components. CONCLUSIONS: The current finding may provide guidance in devising computational methods to reduce the degree of overlaps between the HMMs representing the same superfamilies, which may in turn enable more efficient large-scale sequence searches against the database of HMMs.


Asunto(s)
Bases de Datos de Proteínas , Cadenas de Markov , Proteínas/química , Proteínas/genética , Evolución Molecular , Familia de Multigenes , Estructura Secundaria de Proteína
9.
J Chem Phys ; 134(5): 054105, 2011 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-21303090

RESUMEN

Typical multiscale biochemical models contain fast-scale and slow-scale reactions, where "fast" reactions fire much more frequently than "slow" ones. This feature often causes stiffness in discrete stochastic simulation methods such as Gillespie's algorithm and the Tau-Leaping method leading to inefficient simulation. This paper proposes a new strategy to automatically detect stiffness and identify species that cause stiffness for the Tau-Leaping method, as well as two stiffness reduction methods. Numerical results on a stiff decaying dimerization model and a heat shock protein regulation model demonstrate the efficiency and accuracy of the proposed methods for multiscale biochemical systems.


Asunto(s)
Simulación por Computador , Modelos Biológicos , Algoritmos , Fenómenos Bioquímicos , Simulación por Computador/economía , Dimerización , Proteínas de Choque Térmico/metabolismo , Modelos Químicos , Procesos Estocásticos , Factores de Tiempo
10.
J Chem Theory Comput ; 16(7): 4669-4684, 2020 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-32450041

RESUMEN

Accuracy of protein-ligand binding free energy calculations utilizing implicit solvent models is critically affected by parameters of the underlying dielectric boundary, specifically, the atomic and water probe radii. Here, a global multidimensional optimization pipeline is developed to find optimal atomic radii specifically for protein-ligand binding calculations in implicit solvent. The computational pipeline has these three key components: (1) a massively parallel implementation of a deterministic global optimization algorithm (VTDIRECT95), (2) an accurate yet reasonably fast generalized Born implicit solvent model (GBNSR6), and (3) a novel robustness metric that helps distinguish between nearly degenerate local minima via a postprocessing step of the optimization. A graph-based "kT-connectivity" approach to explore and visualize the multidimensional energy landscape is proposed: local minima that can be reached from the global minimum without exceeding a given energy threshold (kT) are considered to be connected. As an illustration of the capabilities of the optimization pipeline, we apply it to find a global optimum in the space of just five radii: four atomic (O, H, N, and C) radii and water probe radius. The optimized radii, ρW = 1.37 Å, ρC = 1.40 Å, ρH = 1.55 Å, ρN = 2.35 Å, and ρO = 1.28 Å, lead to a closer agreement of electrostatic binding free energies with the explicit solvent reference than two commonly used sets of radii previously optimized for small molecules. At the same time, the ability of the optimizer to find the global optimum reveals fundamental limits of the common two-dielectric implicit solvation model: the computed electrostatic binding free energies are still almost 4 kcal/mol away from the explicit solvent reference. The proposed computational approach opens the possibility to further improve the accuracy of practical computational protocols for binding free energy calculations.


Asunto(s)
Ligandos , Proteínas/química , Algoritmos , Modelos Químicos , Unión Proteica , Proteínas/metabolismo , Solventes/química , Electricidad Estática , Termodinámica
11.
J Bioinform Comput Biol ; 7(2): 339-56, 2009 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19340919

RESUMEN

We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and rearrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological processes. By directly minimizing information-theoretic measures of segmentation quality derived from Kullback-Leibler (KL) divergences, our formulation reveals clusters of genes along with a segmentation such that clusters show concerted behavior within segments but exhibit significant regrouping across segmentation boundaries. The results of the segmentation algorithm can be summarized as Gantt charts revealing temporal dependencies in the ordering of key biological processes. Applications to the yeast metabolic cycle and the yeast cell cycle are described.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , Modelos Biológicos , Simulación por Computador
12.
IEEE Trans Pattern Anal Mach Intell ; 31(2): 275-87, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19110493

RESUMEN

Hidden Markov model (HMM) classifier design is considered for the analysis of sequential data, incorporating both labeled and unlabeled data for training; the balance between the use of labeled and unlabeled data is controlled by an allocation parameter \lambda \in [0, 1), where \lambda = 0 corresponds to purely supervised HMM learning (based only on the labeled data) and \lambda = 1 corresponds to unsupervised HMM-based clustering (based only on the unlabeled data). The associated estimation problem can typically be reduced to solving a set of fixed-point equations in the form of a "natural-parameter homotopy." This paper applies a homotopy method to track a continuous path of solutions, starting from a local supervised solution (\lambda = 0) to a local unsupervised solution (\lambda = 1). The homotopy method is guaranteed to track with probability one from \lambda = 0 to \lambda = 1 if the \lambda = 0 solution is unique; this condition is not satisfied for the HMM since the maximum likelihood supervised solution (\lambda = 0) is characterized by many local optima. A modified form of the homotopy map for HMMs assures a track from \lambda = 0 to \lambda = 1. Following this track leads to a formulation for selecting \lambda \in [0, 1) for a semisupervised solution and it also provides a tool for selection from among multiple local-optimal supervised solutions. The results of applying the proposed method to measured and synthetic sequential data verify its robustness and feasibility compared to the conventional EM approach for semisupervised HMM training.


Asunto(s)
Algoritmos , Inteligencia Artificial , Cadenas de Markov , Modelos Estadísticos , Modelos Teóricos , Reconocimiento de Normas Patrones Automatizadas/métodos , Simulación por Computador , Interpretación Estadística de Datos
13.
Methods Mol Biol ; 1945: 119-139, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30945244

RESUMEN

Biologists seek to create increasingly complex molecular regulatory network models. Writing such a model is a creative effort that requires flexible analysis tools and better modeling languages than offered by many of today's biochemical model editors. Our Multistate Model Builder (MSMB) supports multistate models created using different modeling styles that suit the modeler rather than the software. MSMB defines a simple but powerful syntax to describe multistate species. Our syntax reduces the number of reactions needed to encode the model, thereby reducing the cognitive load involved with model creation. MSMB gives extensive feedback during all stages of model creation. Users can activate error notifications, and use these notifications as a guide toward a consistent, syntactically correct model. Any consistent model can be exported to SBML or COPASI formats. We show the effectiveness of MSMB's multistate syntax through realistic models of cell cycle regulation and mRNA transcription. MSMB is an open-source project implemented in Java and it uses the COPASI API. Complete information and the installation package can be found at http://copasi.org/Projects/ .


Asunto(s)
Biología Computacional/métodos , Modelos Biológicos , Programas Informáticos , Biología de Sistemas/métodos , Algoritmos , Gráficos por Computador , Simulación por Computador , Lenguajes de Programación
14.
Artículo en Inglés | MEDLINE | ID: mdl-29990127

RESUMEN

Parameter estimation in discrete or continuous deterministic cell cycle models is challenging for several reasons, including the nature of what can be observed, and the accuracy and quantity of those observations. The challenge is even greater for stochastic models, where the number of simulations and amount of empirical data must be even larger to obtain statistically valid parameter estimates. The two main contributions of this work are (1) stochastic model parameter estimation based on directly matching multivariate probability distributions, and (2) a new quasi-Newton algorithm class QNSTOP for stochastic optimization problems. QNSTOP directly uses the random objective function value samples rather than creating ensemble statistics. QNSTOP is used here to directly match empirical and simulated joint probability distributions rather than matching summary statistics. Results are given for a current state-of-the-art stochastic cell cycle model of budding yeast, whose predictions match well some summary statistics and one-dimensional distributions from empirical data, but do not match well the empirical joint distributions. The nature of the mismatch provides insight into the weakness in the stochastic model.


Asunto(s)
Ciclo Celular/fisiología , Saccharomycetales , Biología de Sistemas/métodos , Algoritmos , Simulación por Computador , Modelos Biológicos , Saccharomycetales/citología , Saccharomycetales/genética , Saccharomycetales/fisiología , Procesos Estocásticos
15.
BMC Med Genomics ; 11(1): 78, 2018 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-30200981

RESUMEN

BACKGROUND: CRISPR/CAS9 (epi)genome editing revolutionized the field of gene and cell therapy. Our previous study demonstrated that a rapid and robust reactivation of the HIV latent reservoir by a catalytically-deficient Cas9 (dCas9)-synergistic activation mediator (SAM) via HIV long terminal repeat (LTR)-specific MS2-mediated single guide RNAs (msgRNAs) directly induces cellular suicide without additional immunotherapy. However, potential off-target effect remains a concern for any clinical application of Cas9 genome editing and dCas9 epigenome editing. After dCas9 treatment, potential off-target responses have been analyzed through different strategies such as mRNA sequence analysis, and functional screening. In this study, a comprehensive analysis of the host transcriptome including mRNA, lncRNA, and alternative splicing was performed using human cell lines expressing dCas9-SAM and HIV-targeting msgRNAs. RESULTS: The control scrambled msgRNA (LTR_Zero), and two LTR-specific msgRNAs (LTR_L and LTR_O) groups show very similar expression profiles of the whole transcriptome. Among 839 identified lncRNAs, none exhibited significantly different expression in LTR_L vs. LTR_Zero group. In LTR_O group, only TERC and scaRNA2 lncRNAs were significantly decreased. Among 142,791 mRNAs, four genes were differentially expressed in LTR_L vs. LTR_Zero group. There were 21 genes significantly downregulated in LTR_O vs. either LTR_Zero or LTR_L group and one third of them are histone related. The distributions of different types of alternative splicing were very similar either within or between groups. There were no apparent changes in all the lncRNA and mRNA transcripts between the LTR_L and LTR_Zero groups. CONCLUSION: This is an extremely comprehensive study demonstrating the rare off-target effects of the HIV-specific dCas9-SAM system in human cells. This finding is encouraging for the safe application of dCas9-SAM technology to induce target-specific reactivation of latent HIV for an effective "shock-and-kill" strategy.


Asunto(s)
Proteína 9 Asociada a CRISPR/metabolismo , Duplicado del Terminal Largo de VIH/genética , VIH-1/genética , VIH-1/fisiología , Secuenciación de Nucleótidos de Alto Rendimiento , ARN Largo no Codificante/genética , Activación Viral/genética , Empalme Alternativo , Perfilación de la Expresión Génica , Células HeLa , Humanos , Polimorfismo de Nucleótido Simple , ARN Mensajero/genética , Análisis de Secuencia de ARN
16.
J Comput Biol ; 14(7): 950-60, 2007 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-17803372

RESUMEN

This work extends the work of Whitlock in examining the critical effective population sizes from the fixation of both deleterious and beneficial mutations under drift and selection to prevent mutation breakdown of the population. The validity of approximations for the probability of fixation depends on the nature of the assumed distribution for the fitness effect of both types of mutations. Using no approximation for the probability of fixation and assuming a heavy tailed fitness effect distribution, the current model indicates that the coefficients of variation for the fitness effect distributions of both types of mutations and the fitness effect distribution mean for the beneficial mutations are important predictors of the critical effective population size. The current model further predicts that very small populations can be sustained if the fitness effect variances for both types of mutations and the mean for beneficial mutations are large.


Asunto(s)
Genética de Población , Modelos Genéticos , Mutación , Densidad de Población , Selección Genética , Animales , Flujo Genético , Matemática
17.
J Comput Biol ; 14(1): 97-112, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17381349

RESUMEN

Earlier work rigorously derived a general probabilistic model for the PCR process that includes as a special case the Velikanov-Kapral model where all nucleotide reaction rates are the same. In this model, the probability of binding of deoxy-nucleoside triphosphate (dNTP) molecules with template strands is derived from the microscopic chemical kinetics. A recursive solution for the probability function of binding of dNTPs is developed for a single cycle and is used to calculate expected yield for a multicycle PCR. The model is able to reproduce important features of the PCR amplification process quantitatively. With a set of favorable reaction conditions, the amplification of the target sequence is fast enough to rapidly outnumber all side products. Furthermore, the final yield of the target sequence in a multicycle PCR run always approaches an asymptotic limit that is less than one. The amplification process itself is highly sensitive to initial concentrations and the reaction rates of addition to the template strand of each type of dNTP in the solution. This paper extends the earlier Saha model with a physics based model of the dependence of the reaction rates on temperature, and estimates parameters in this new model by nonlinear regression. The calibrated model is validated using RT-PCR data.


Asunto(s)
Simulación por Computador , Modelos Estadísticos , Hibridación de Ácido Nucleico , Reacción en Cadena de la Polimerasa/métodos , Gliceraldehído-3-Fosfato Deshidrogenasa (Fosforilante)/química , Nucleótidos/química
18.
BMC Syst Biol ; 11(1): 30, 2017 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-28241833

RESUMEN

BACKGROUND: Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. RESULTS: Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. CONCLUSIONS: By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.


Asunto(s)
Proteínas de Ciclo Celular/metabolismo , Ciclo Celular , Modelos Biológicos , Proteínas de Ciclo Celular/genética , Mutación , Fenotipo , Saccharomycetales/citología , Saccharomycetales/genética , Saccharomycetales/metabolismo
19.
Sci Rep ; 7(1): 14106, 2017 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-29074871

RESUMEN

Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.

20.
Artículo en Inglés | MEDLINE | ID: mdl-17048401

RESUMEN

Converting a biochemical reaction network to a set of kinetic rate equations is tedious and error prone. We describe known interface paradigms for inputing models of intracellular regulatory networks: graphical layout (diagrams), wizards, scripting languages, and direct entry of chemical equations. We present the JigCell Model Builder, which allows users to define models as a set of reaction equations using a spreadsheet (an example of direct entry of equations) and outputs model definitions in the Systems Biology Markup Language, Level 2. We present the results of two usability studies. The spreadsheet paradigm demonstrated its effectiveness in reducing the number of errors made by modelers when compared to hand conversion of a wiring diagram to differential equations. A comparison of representatives of the four interface paradigms for a simple model of the cell cycle was conducted which measured time, mouse clicks, and keystrokes to enter the model, and the number of screens needed to view the contents of the model. All four paradigms had similar data entry times. The spreadsheet and scripting language approaches require significantly fewer screens to view the models than do the wizard or graphical layout approaches.


Asunto(s)
Metabolismo/fisiología , Modelos Biológicos , Biología de Sistemas/métodos , Interfaz Usuario-Computador , Algoritmos , Simulación por Computador , Cinética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA