Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
PLoS Comput Biol ; 18(3): e1009273, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35255084

RESUMEN

The understanding of bacterial gene function has been greatly enhanced by recent advancements in the deep sequencing of microbial genomes. Transposon insertion sequencing methods combines next-generation sequencing techniques with transposon mutagenesis for the exploration of the essentiality of genes under different environmental conditions. We propose a model-based method that uses regularized negative binomial regression to estimate the change in transposon insertions attributable to gene-environment changes in this genetic interaction study without transformations or uniform normalization. An empirical Bayes model for estimating the local false discovery rate combines unique and total count information to test for genes that show a statistically significant change in transposon counts. When applied to RB-TnSeq (randomized barcode transposon sequencing) and Tn-seq (transposon sequencing) libraries made in strains of Caulobacter crescentus using both total and unique count data the model was able to identify a set of conditionally beneficial or conditionally detrimental genes for each target condition that shed light on their functions and roles during various stress conditions.


Asunto(s)
Elementos Transponibles de ADN , Genes Esenciales , Teorema de Bayes , Elementos Transponibles de ADN/genética , Genes Esenciales/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutagénesis Insercional
2.
BMC Bioinformatics ; 21(1): 215, 2020 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-32456609

RESUMEN

BACKGROUND: Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision. RESULTS: We have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools. CONCLUSIONS: The DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Humanos
3.
bioRxiv ; 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38464212

RESUMEN

Every protein progresses through a natural lifecycle from birth to maturation to death; this process is coordinated by the protein homeostasis system. Environmental or physiological conditions trigger pathways that maintain the homeostasis of the proteome. An open question is how these pathways are modulated to respond to the many stresses that an organism encounters during its lifetime. To address this question, we tested how the fitness landscape changes in response to environmental and genetic perturbations using directed and massively parallel transposon mutagenesis in Caulobacter crescentus. We developed a general computational pipeline for the analysis of gene-by-environment interactions in transposon mutagenesis experiments. This pipeline uses a combination of general linear models (GLMs), statistical knockoffs, and a nonparametric Bayesian statistical model to identify essential genetic network components that are shared across environmental perturbations. This analysis allows us to quantify the similarity of proteotoxic environmental perturbations from the perspective of the fitness landscape. We find that essential genes vary more by genetic background than by environmental conditions, with limited overlap among mutant strains targeting different facets of the protein homeostasis system. We also identified 146 unique fitness determinants across different strains, with 19 genes common to at least two strains, showing varying resilience to proteotoxic stresses. Experiments exposing cells to a combination of genetic perturbations and dual environmental stressors show that perturbations that are quantitatively dissimilar from the perspective of the fitness landscape are likely to have a synergistic effect on the growth defect.

4.
Ann Appl Stat ; 15(2): 925-951, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34262633

RESUMEN

There are distinguishing features or "hallmarks" of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Understanding the genetic composition of the tumor at the time of treatment is important in the personalized design of targeted therapeutic combinations and monitoring for possible recurrence after treatment. We propose a hierarchical Dirichlet process mixture model that incorporates the correlation structure induced by a structured sampling arrangement and we show that this model improves the quality of inference. We develop a representation of the hierarchical Dirichlet process prior as a Gamma-Poisson hierarchy and we use this representation to derive a fast Gibbs sampling inference algorithm using the augment-and-marginalize method. Experiments with simulation data show that our model outperforms standard numerical and statistical methods for decomposing admixed count data. Analyses of real acute lymphoblastic leukemia cancer sequencing dataset shows that our model improves upon state-of-the-art bioinformatic methods. An interpretation of the results of our model on this real dataset reveals co-mutated loci across samples.

5.
J Appl Physiol (1985) ; 119(4): 396-403, 2015 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-26112238

RESUMEN

This investigation developed models to estimate aspects of physical activity and sedentary behavior from three-axis high-frequency wrist-worn accelerometer data. The models were developed and tested on 20 participants (n = 10 males, n = 10 females, mean age = 24.1, mean body mass index = 23.9), who wore an ActiGraph GT3X+ accelerometer on their dominant wrist and an ActiGraph GT3X on the hip while performing a variety of scripted activities. Energy expenditure was concurrently measured by a portable indirect calorimetry system. Those calibration data were then used to develop and assess both machine-learning and simpler models with fewer unknown parameters (linear regression and decision trees) to estimate metabolic equivalent scores (METs) and to classify activity intensity, sedentary time, and locomotion time. The wrist models, applied to 15-s windows, estimated METs [random forest: root mean squared error (rSME) = 1.21 METs, hip: rMSE = 1.67 METs] and activity intensity (random forest: 75% correct, hip: 60% correct) better than a previously developed model that used counts per minute measured at the hip. In a separate set of comparisons, the simpler decision trees classified activity intensity (random forest: 75% correct, tree: 74% correct), sedentary time (random forest: 96% correct, decision tree: 97% correct), and locomotion time (random forest: 99% correct, decision tree: 96% correct) nearly as well or better than the machine-learning approaches. Preliminary investigation of the models' performance on two free-living people suggests that they may work well outside of controlled conditions.


Asunto(s)
Actigrafía/instrumentación , Conductas Relacionadas con la Salud , Modelos Estadísticos , Actividad Motora , Conducta Sedentaria , Procesamiento de Señales Asistido por Computador , Muñeca/fisiología , Actividades Cotidianas , Adulto , Fenómenos Biomecánicos , Índice de Masa Corporal , Calorimetría Indirecta , Árboles de Decisión , Metabolismo Energético , Falla de Equipo , Femenino , Humanos , Modelos Lineales , Aprendizaje Automático , Masculino , Reproducibilidad de los Resultados , Factores de Tiempo , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA