Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 598(7879): 103-110, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34616066

RESUMEN

Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.


Asunto(s)
Epigenómica , Perfilación de la Expresión Génica , Corteza Motora/citología , Neuronas/clasificación , Análisis de la Célula Individual , Transcriptoma , Animales , Atlas como Asunto , Conjuntos de Datos como Asunto , Epigénesis Genética , Femenino , Masculino , Ratones , Corteza Motora/anatomía & histología , Neuronas/citología , Neuronas/metabolismo , Especificidad de Órganos , Reproducibilidad de los Resultados
2.
BMC Bioinformatics ; 25(1): 198, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38789920

RESUMEN

BACKGROUND: Single-cell transcriptome sequencing (scRNA-Seq) has allowed new types of investigations at unprecedented levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into distinct types. Many approaches build on existing clustering literature to develop tools specific to single-cell. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist, in general, there is little assurance that any given set of parameters will represent an optimal choice in the trade-off between cluster resolution and replicability. For instance, another set of parameters may result in more clusters that are also more replicable. RESULTS: Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html . CONCLUSIONS: Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.


Asunto(s)
RNA-Seq , Análisis de la Célula Individual , Programas Informáticos , Análisis de la Célula Individual/métodos , RNA-Seq/métodos , Análisis por Conglomerados , Algoritmos , Análisis de Secuencia de ARN/métodos , Humanos , Transcriptoma/genética , Reproducibilidad de los Resultados , Perfilación de la Expresión Génica/métodos , Análisis de Expresión Génica de una Sola Célula
3.
Biostatistics ; 24(4): 1085-1105, 2023 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-35861622

RESUMEN

An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit most, or least, from a given treatment. The identification of these biomarkers is often the byproduct of the related but fundamentally different task of treatment rule estimation. Using treatment rule estimation methods to identify predictive biomarkers in clinical trials where the number of covariates exceeds the number of participants often results in high false discovery rates. The higher than expected number of false positives translates to wasted resources when conducting follow-up experiments for drug target identification and diagnostic assay development. Patient outcomes are in turn negatively affected. We propose a variable importance parameter for directly assessing the importance of potentially predictive biomarkers and develop a flexible nonparametric inference procedure for this estimand. We prove that our estimator is double robust and asymptotically linear under loose conditions in the data-generating process, permitting valid inference about the importance metric. The statistical guarantees of the method are verified in a thorough simulation study representative of randomized control trials with moderate and high-dimensional covariate vectors. Our procedure is then used to discover predictive biomarkers from among the tumor gene expression data of metastatic renal cell carcinoma patients enrolled in recently completed clinical trials. We find that our approach more readily discerns predictive from nonpredictive biomarkers than procedures whose primary purpose is treatment rule estimation. An open-source software implementation of the methodology, the uniCATE R package, is briefly introduced.


Asunto(s)
Investigación Biomédica , Carcinoma de Células Renales , Neoplasias Renales , Humanos , Carcinoma de Células Renales/diagnóstico , Carcinoma de Células Renales/genética , Neoplasias Renales/diagnóstico , Neoplasias Renales/genética , Biomarcadores , Simulación por Computador
4.
Bioinformatics ; 38(Suppl 1): i36-i44, 2022 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-35758804

RESUMEN

MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects. RESULTS: Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Algoritmos , Bacterias/genética , Análisis de Secuencia de ADN/métodos
5.
Bioinformatics ; 36(11): 3422-3430, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32176249

RESUMEN

MOTIVATION: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously. RESULTS: Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets. AVAILABILITY AND IMPLEMENTATION: A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub. CONTACT: philippe_boileau@berkeley.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Análisis de Componente Principal
6.
Chem Res Toxicol ; 34(12): 2549-2557, 2021 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-34788011

RESUMEN

We previously developed an adductomics pipeline that employed nanoflow liquid chromatography and high-resolution tandem mass spectrometry (nLC-HR-MS/MS) plus informatics to perform an untargeted detection of modifications to Cys34 in the tryptic T3 peptide of human serum albumin (HSA) (21ALVLIAFAQYLQQC34PFEDHVK41). In order to detect these peptide modifications without targeting specific masses, the pipeline interrogates MS2 ions that are signatures of the T3 peptide. The pipeline had been pilot-tested with archived plasma from healthy human subjects, and several of the 43 Cys34 adducts were highly associated with the smoking status. In the current investigation, we adapted the pipeline to include modifications to the ε-amino group of Lys525─a major glycation site in HSA─and thereby extend the coverage to products of Schiff bases that cannot be produced at Cys34. Because trypsin is generally unable to digest proteins at modified lysines, our pipeline detects miscleaved tryptic peptides with the sequence 525KQTALVELVK534. Adducts of both Lys525 and Cys34 are measured in a single nLC-HR-MS/MS run by increasing the mass range of precursor ions in MS1 scans and including both triply and doubly charged precursor ions for collision-induced dissociation fragmentation. For proof of principle, we applied the Cys34/Lys525 pipeline to archived plasma specimens from a subset of the same volunteer subjects used in the original investigation. Twelve modified Lys525 peptides were detected, including products of glycation (fructosyl-lysine plus advanced-glycated-end products), acetylation, and elimination of ammonia and water. Surprisingly, the carbamylated and glycated adducts were present at significantly lower levels in smoking subjects. By including a larger class of in vivo nucleophilic substitution reactions, the Cys34/Lys525 adductomics pipeline expands exposomic investigations of unknown human exposure to reactive electrophiles derived from both exogenous and endogenous sources.


Asunto(s)
Cisteína/química , Lisina/química , Albúmina Sérica Humana/química , Cisteína/sangre , Voluntarios Sanos , Humanos , Lisina/sangre , Masculino , Modelos Moleculares , Péptidos/sangre , Péptidos/química
7.
Nat Methods ; 14(6): 565-571, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28504683

RESUMEN

Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , ARN/genética , Análisis de Secuencia de ARN/normas , Análisis de la Célula Individual/normas , Transcriptoma/genética , Interpretación Estadística de Datos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Valores de Referencia
8.
BMC Bioinformatics ; 20(1): 334, 2019 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-31200644

RESUMEN

BACKGROUND: Untargeted metabolomics datasets contain large proportions of uninformative features that can impede subsequent statistical analysis such as biomarker discovery and metabolic pathway analysis. Thus, there is a need for versatile and data-adaptive methods for filtering data prior to investigating the underlying biological phenomena. Here, we propose a data-adaptive pipeline for filtering metabolomics data that are generated by liquid chromatography-mass spectrometry (LC-MS) platforms. Our data-adaptive pipeline includes novel methods for filtering features based on blank samples, proportions of missing values, and estimated intra-class correlation coefficients. RESULTS: Using metabolomics datasets that were generated in our laboratory from samples of human blood, as well as two public LC-MS datasets, we compared our data-adaptive filtering method with traditional methods that rely on non-method specific thresholds. The data-adaptive approach outperformed traditional approaches in terms of removing noisy features and retaining high quality, biologically informative ones. The R code for running the data-adaptive filtering method is provided at https://github.com/courtneyschiffman/Metabolomics-Filtering . CONCLUSIONS: Our proposed data-adaptive filtering pipeline is intuitive and effectively removes uninformative features from untargeted metabolomics datasets. It is particularly relevant for interrogation of biological phenomena in data derived from complex matrices associated with biospecimens.


Asunto(s)
Metabolómica/métodos , Espectrometría de Masas en Tándem/métodos , Cromatografía Liquida , Neoplasias Colorrectales/metabolismo , Bases de Datos como Asunto , Humanos , Redes y Vías Metabólicas
9.
PLoS Comput Biol ; 14(9): e1006378, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-30180157

RESUMEN

Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene expression datasets with the goal of detecting previously unknown heterogeneity within cells. It is common in the detection of novel subtypes to run many clustering algorithms, as well as rely on subsampling and ensemble methods to improve robustness. We introduce a Bioconductor R package, clusterExperiment, that implements a general and flexible strategy we entitle Resampling-based Sequential Ensemble Clustering (RSEC). RSEC enables the user to easily create multiple, competing clusterings of the data based on different techniques and associated tuning parameters, including easy integration of resampling and sequential clustering, and then provides methods for consolidating the multiple clusterings into a final consensus clustering. The package is modular and allows the user to separately apply the individual components of the RSEC procedure, i.e., apply multiple clustering algorithms, create a consensus clustering or choose tuning parameters, and merge clusters. Additionally, clusterExperiment provides a variety of visualization tools for the clustering process, as well as methods for the identification of possible cluster signatures or biomarkers. The R package clusterExperiment is publicly available through the Bioconductor Project, with a detailed manual (vignette) as well as well documented help pages for each function.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Hipotálamo/fisiología , Mucosa Olfatoria/fisiología , Algoritmos , Animales , Astrocitos/fisiología , Biomarcadores , Análisis por Conglomerados , Bases de Datos Factuales , Humanos , Microglía/fisiología , Familia de Multigenes , Neuronas/fisiología , Oligodendroglía/fisiología , Lenguajes de Programación , Análisis de Secuencia de ARN , Programas Informáticos
10.
Anal Bioanal Chem ; 411(11): 2351-2362, 2019 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-30783713

RESUMEN

Metabolism of chemicals from the diet, exposures to xenobiotics, the microbiome, and lifestyle factors (e.g., smoking, alcohol intake) produce electrophiles that react with nucleophilic sites in circulating proteins, notably Cys34 of human serum albumin (HSA). To discover potential risk factors resulting from in utero exposures, we are investigating HSA-Cys34 adducts in archived newborn dried blood spots (DBS) that reflect systemic exposures during the last month of gestation. The workflow includes extraction of proteins from DBS, measurement of hemoglobin (Hb) to normalize for blood volume, addition of methanol to enrich HSA by precipitation of Hb and other interfering proteins, digestion with trypsin, and detection of HSA-Cys34 adducts via nanoflow liquid chromatography-high-resolution mass spectrometry. As proof-of-principle, we applied the method to 49 archived DBS collected from newborns whose mothers either actively smoked during pregnancy or were nonsmokers. Twenty-six HSA-Cys34 adducts were detected, including Cys34 oxidation products, mixed disulfides with low molecular weight thiols (e.g., cysteine, homocysteine, glutathione, cysteinylglycine), and other modifications. Data were normalized with a novel method ("scone") to remove unwanted technical variation arising from HSA digestion, blood volume, DBS age, mass spectrometry analysis, and batch effects. Using an ensemble of linear and nonlinear models, the Cys34 adduct of cyanide was found to consistently discriminate between newborns of smoking and nonsmoking mothers with a mean fold change (smoking/nonsmoking) of 1.31. These results indicate that DBS adductomics is suitable for investigating in utero exposures to reactive chemicals and metabolites that may influence disease risks later in life.


Asunto(s)
Cisteína/análisis , Pruebas con Sangre Seca/métodos , Albúmina Sérica Humana/química , Espectrometría de Masas en Tándem/métodos , Cromatografía Líquida de Alta Presión/métodos , Femenino , Humanos , Recién Nacido , Exposición Materna/efectos adversos , Oxidación-Reducción , Embarazo , Efectos Tardíos de la Exposición Prenatal/sangre , Fumar/efectos adversos , Fumar/sangre
11.
BMC Genomics ; 19(1): 477, 2018 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-29914354

RESUMEN

BACKGROUND: Single-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve. RESULTS: We introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods. CONCLUSIONS: Slingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression.


Asunto(s)
Linaje de la Célula , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , Humanos , Mioblastos Esqueléticos/metabolismo , Análisis de la Célula Individual , Programas Informáticos
12.
BMC Cancer ; 18(1): 996, 2018 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-30340609

RESUMEN

BACKGROUND: Epidemiologists are beginning to employ metabolomics and lipidomics with archived blood from incident cases and controls to discover causes of cancer. Although several such studies have focused on colorectal cancer (CRC), they all followed targeted or semi-targeted designs that limited their ability to find discriminating molecules and pathways related to the causes of CRC. METHODS: Using an untargeted design, we measured lipophilic metabolites in prediagnostic serum from 66 CRC patients and 66 matched controls from the European Prospective Investigation into Cancer and Nutrition (Turin, Italy). Samples were analyzed by liquid chromatography-high-resolution mass spectrometry (LC-MS), resulting in 8690 features for statistical analysis. RESULTS: Rather than the usual multiple-hypothesis-testing approach, we based variable selection on an ensemble of regression methods, which found nine features to be associated with case-control status. We then regressed each selected feature on time-to-diagnosis to determine whether the feature was likely to be either a potentially causal biomarker or a reactive product of disease progression (reverse causality). CONCLUSIONS: Of the nine selected LC-MS features, four appear to be involved in CRC etiology and merit further investigation in prospective studies of CRC. Four other features appear to be related to progression of the disease (reverse causality), and may represent biomarkers of value for early detection of CRC.


Asunto(s)
Biomarcadores de Tumor/sangre , Neoplasias Colorrectales/sangre , Neoplasias Colorrectales/diagnóstico , Metabolómica/métodos , Adulto , Anciano , Estudios de Casos y Controles , Estudios de Cohortes , Neoplasias Colorrectales/epidemiología , Europa (Continente)/epidemiología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos
13.
Nature ; 471(7339): 473-9, 2011 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-21179090

RESUMEN

Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.


Asunto(s)
Drosophila melanogaster/crecimiento & desarrollo , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica/genética , Transcripción Genética/genética , Empalme Alternativo/genética , Animales , Secuencia de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriología , Exones/genética , Femenino , Genes de Insecto/genética , Genoma de los Insectos/genética , Masculino , MicroARNs/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Isoformas de Proteínas/genética , Edición de ARN/genética , ARN Mensajero/análisis , ARN Mensajero/genética , ARN Pequeño no Traducido/análisis , ARN Pequeño no Traducido/genética , Análisis de Secuencia , Caracteres Sexuales
14.
Genome Res ; 21(2): 193-202, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-20921232

RESUMEN

Alternative splicing is generally controlled by proteins that bind directly to regulatory sequence elements and either activate or repress splicing of adjacent splice sites in a target pre-mRNA. Here, we have combined RNAi and mRNA-seq to identify exons that are regulated by Pasilla (PS), the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2. We identified 405 splicing events in 323 genes that are significantly affected upon depletion of ps, many of which were annotated as being constitutively spliced. The sequence regions upstream and within PS-repressed exons and downstream from PS-activated exons are enriched for YCAY repeats, and these are consistent with the location of these motifs near NOVA-regulated exons in mammals. Thus, the RNA regulatory map of PS and NOVA1/2 is highly conserved between insects and mammals despite the fact that the target gene orthologs regulated by PS and NOVA1/2 are almost entirely nonoverlapping. This observation suggests that the regulatory codes of individual RNA binding proteins may be nearly immutable, yet the regulatory modules controlled by these proteins are highly evolvable.


Asunto(s)
Drosophila/genética , Mamíferos/genética , ARN Mensajero/metabolismo , Empalme Alternativo , Animales , Antígenos de Neoplasias/genética , Células Cultivadas , Biología Computacional , Secuencia Conservada/genética , Proteínas de Drosophila/genética , Exones , Perfilación de la Expresión Génica , Intrones , Datos de Secuencia Molecular , Proteínas del Tejido Nervioso/genética , Antígeno Ventral Neuro-Oncológico , Interferencia de ARN , Precursores del ARN/genética , Precursores del ARN/metabolismo , ARN Mensajero/genética , Proteínas de Unión al ARN/genética , Ribonucleoproteínas/genética
15.
Nat Commun ; 15(1): 833, 2024 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-38280860

RESUMEN

In single-cell RNA sequencing (scRNA-Seq), gene expression is assessed individually for each cell, allowing the investigation of developmental processes, such as embryogenesis and cellular differentiation and regeneration, at unprecedented resolution. In such dynamic biological systems, cellular states form a continuum, e.g., for the differentiation of stem cells into mature cell types. This process is often represented via a trajectory in a reduced-dimensional representation of the scRNA-Seq dataset. While many methods have been suggested for trajectory inference, it is often unclear how to handle multiple biological groups or conditions, e.g., inferring and comparing the differentiation trajectories of wild-type and knock-out stem cell populations. In this manuscript, we present condiments, a method for the inference and downstream interpretation of cell trajectories across multiple conditions. Our framework allows the interpretation of differences between conditions at the trajectory, cell population, and gene expression levels. We start by integrating datasets from multiple conditions into a single trajectory. By comparing the cell's conditions along the trajectory's path, we can detect large-scale changes, indicative of differential progression or fate selection. We also demonstrate how to detect subtler changes by finding genes that exhibit different behaviors between these conditions along a differentiation path.


Asunto(s)
Análisis de la Célula Individual , Células Madre , Análisis de la Célula Individual/métodos , Diferenciación Celular/genética , Desarrollo Embrionario , Análisis de Secuencia de ARN/métodos , Condimentos , Perfilación de la Expresión Génica/métodos
16.
Stat Appl Genet Mol Biol ; 11(2)2012 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-22499689

RESUMEN

We provide a brief editorial introduction to a special issue of Statistical Applications in Genetics and Molecular Biology dedicated to the workshop on "Computational Statistical Methods for Genomics and Systems Biology", held at the Centre de recherches mathématiques in Montreal in April 2011.


Asunto(s)
Genómica/métodos , Biología de Sistemas/métodos , Humanos
17.
Proc Natl Acad Sci U S A ; 107(11): 5058-63, 2010 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-20194736

RESUMEN

The search to understand how genomes innovate in response to selection dominates the field of evolutionary biology. Powerful molecular evolution approaches have been developed to test individual loci for signatures of selection. In many cases, however, an organism's response to changes in selective pressure may be mediated by multiple genes, whose products function together in a cellular process or pathway. Here we assess the prevalence of polygenic evolution in pathways in the yeasts Saccharomyces cerevisiae and S. bayanus. We first established short-read sequencing methods to detect cis-regulatory variation in a diploid hybrid between the species. We then tested for the scenario in which selective pressure in one species to increase or decrease the activity of a pathway has driven the accumulation of cis-regulatory variants that act in the same direction on gene expression. Application of this test revealed a variety of yeast pathways with evidence for directional regulatory evolution. In parallel, we also used population genomic sequencing data to compare protein and cis-regulatory variation within and between species. We identified pathways with evidence for divergence within S. cerevisiae, and we detected signatures of positive selection between S. cerevisiae and S. bayanus. Our results point to polygenic, pathway-level change as a common evolutionary mechanism among yeasts. We suggest that pathway analyses, including our test for directional regulatory evolution, will prove to be a relevant and powerful strategy in many evolutionary genomic applications.


Asunto(s)
Evolución Biológica , Redes y Vías Metabólicas/genética , Herencia Multifactorial/genética , Saccharomyces/genética , Alelos , Secuencia de Bases , Exosomas/metabolismo , Regulación Fúngica de la Expresión Génica , Variación Genética , Hibridación Genética , ARN de Hongos/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Selección Genética , Especificidad de la Especie
18.
J Comput Graph Stat ; 32(2): 601-612, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37273839

RESUMEN

The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience. Thus, a variety of estimators have been derived to overcome the shortcomings of the canonical estimator in such settings. Yet, selecting an optimal estimator from among the plethora available remains an open challenge. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. We propose a general class of loss functions for covariance matrix estimation and establish accompanying finite-sample risk bounds and conditions for the asymptotic optimality of the cross-validation selector. In numerical experiments, we demonstrate the optimality of our proposed selector in moderate sample sizes and across diverse data-generating processes. The practical benefits of our procedure are highlighted in a dimension reduction application to single-cell transcriptome sequencing data.

19.
Cancers (Basel) ; 15(4)2023 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-36831356

RESUMEN

Leukemia is the most common cancer in children in industrialized countries, and its initiation often occurs prenatally. Folic acid is a key vitamin in the production and modification of DNA, and prenatal folic acid intake is known to reduce the risk of childhood leukemia. We characterized the one-carbon (folate) metabolism nutrients that may influence risk of childhood acute lymphoblastic leukemia (ALL) among 122 cases diagnosed at age 0-14 years during 1988-2011 and 122 controls matched on sex, age, and race/ethnicity. Using hydrophilic interaction chromatography (HILIC) applied to neonatal dried blood spots, we evaluated 11 folate pathway metabolites, overall and by sex, race/ethnicity, and age at diagnosis. To conduct the prediction analyses, the 244 samples were separated into learning (75%) and test (25%) sets, maintaining the matched pairings. The learning set was used to train classification methods which were evaluated on the test set. High classification error rates indicate that the folate pathway metabolites measured have little predictive capacity for pediatric ALL. In conclusion, the one-carbon metabolism nutrients measured at birth were unable to predict subsequent leukemia in children. These negative findings are reflective of the last weeks of pregnancy and our study does not address the impact of these nutrients at the time of conception or during the first trimester of pregnancy that are critical for the embryo's DNA methylation programming.

20.
Cancer Epidemiol Biomarkers Prev ; 32(9): 1217-1226, 2023 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-37409972

RESUMEN

BACKGROUND: The higher incidence of non-Hodgkin lymphoma (NHL) in males is not well understood. Although reactive oxygen species (ROS) have been implicated as causes of NHL, they cannot be measured directly in archived blood. METHODS: We performed untargeted adductomics of stable ROS adducts in human serum albumin (HSA) from 67 incident NHL cases and 82 matched controls from the European Prospective Investigation into Cancer and Nutrition-Italy cohort. Regression and classification methods were employed to select features associated with NHL in all subjects and in males and females separately. RESULTS: Sixty seven HSA-adduct features were quantified by liquid chromatography-high-resolution mass spectrometry at Cys34 (n = 55) and Lys525 (n = 12). Three features were selected for association with NHL in all subjects, while seven were selected for males and five for females with minimal overlap. Two selected features were more abundant in cases and seven in controls, suggesting that altered homeostasis of ROS may affect NHL incidence. Heat maps revealed differential clustering of features between sexes, suggesting differences in operative pathways. CONCLUSIONS: Adduct clusters dominated by Cys34 oxidation products and disulfides further implicate ROS and redox biology in the etiology of NHL. Sex differences in dietary and alcohol consumption also help to explain the limited overlap of feature selection between sexes. Intriguingly, a disulfide of methanethiol from enteric microbial metabolism was more abundant in male cases, thereby implicating microbial translocation as a potential contributor to NHL in males. IMPACT: Only two of the ROS adducts associated with NHL overlapped between sexes and one adduct implicates microbial translocation as a risk factor.


Asunto(s)
Linfoma no Hodgkin , Albúmina Sérica Humana , Humanos , Masculino , Femenino , Albúmina Sérica Humana/química , Albúmina Sérica Humana/metabolismo , Especies Reactivas de Oxígeno , Caracteres Sexuales , Incidencia , Estudios Prospectivos , Cisteína/análisis , Cisteína/química , Cisteína/metabolismo , Linfoma no Hodgkin/epidemiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA