Búsqueda | BVS CLAP/SMR-OPS/OMS

Improving replicability in single-cell RNA-Seq cell type discovery with Dune.

Roux de Bézieux, Hector; Street, Kelly; Fischer, Stephan; Van den Berge, Koen; Chance, Rebecca; Risso, Davide; Gillis, Jesse; Ngai, John; Purdom, Elizabeth; Dudoit, Sandrine.

BMC Bioinformatics ; 25(1): 198, 2024 May 24.

Artículo en Inglés | MEDLINE | ID: mdl-38789920

RESUMEN

BACKGROUND: Single-cell transcriptome sequencing (scRNA-Seq) has allowed new types of investigations at unprecedented levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into distinct types. Many approaches build on existing clustering literature to develop tools specific to single-cell. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist, in general, there is little assurance that any given set of parameters will represent an optimal choice in the trade-off between cluster resolution and replicability. For instance, another set of parameters may result in more clusters that are also more replicable. RESULTS: Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html . CONCLUSIONS: Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.

Asunto(s)

RNA-Seq , Análisis de la Célula Individual , Programas Informáticos , Análisis de la Célula Individual/métodos , RNA-Seq/métodos , Análisis por Conglomerados , Algoritmos , Análisis de Secuencia de ARN/métodos , Humanos , Transcriptoma/genética , Reproducibilidad de los Resultados , Perfilación de la Expresión Génica/métodos , Análisis de Expresión Génica de una Sola Célula

CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS.

Roux de Bézieux, Hector; Lima, Leandro; Perraudeau, Fanny; Mary, Arnaud; Dudoit, Sandrine; Jacob, Laurent.

Bioinformatics ; 38(Suppl 1): i36-i44, 2022 06 24.

Artículo en Inglés | MEDLINE | ID: mdl-35758804

RESUMEN

MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects. RESULTS: Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Estudio de Asociación del Genoma Completo , Programas Informáticos , Algoritmos , Bacterias/genética , Análisis de Secuencia de ADN/métodos

Trajectory inference across multiple conditions with condiments.

Roux de Bézieux, Hector; Van den Berge, Koen; Street, Kelly; Dudoit, Sandrine.

Nat Commun ; 15(1): 833, 2024 Jan 27.

Artículo en Inglés | MEDLINE | ID: mdl-38280860

RESUMEN

In single-cell RNA sequencing (scRNA-Seq), gene expression is assessed individually for each cell, allowing the investigation of developmental processes, such as embryogenesis and cellular differentiation and regeneration, at unprecedented resolution. In such dynamic biological systems, cellular states form a continuum, e.g., for the differentiation of stem cells into mature cell types. This process is often represented via a trajectory in a reduced-dimensional representation of the scRNA-Seq dataset. While many methods have been suggested for trajectory inference, it is often unclear how to handle multiple biological groups or conditions, e.g., inferring and comparing the differentiation trajectories of wild-type and knock-out stem cell populations. In this manuscript, we present condiments, a method for the inference and downstream interpretation of cell trajectories across multiple conditions. Our framework allows the interpretation of differences between conditions at the trajectory, cell population, and gene expression levels. We start by integrating datasets from multiple conditions into a single trajectory. By comparing the cell's conditions along the trajectory's path, we can detect large-scale changes, indicative of differential progression or fate selection. We also demonstrate how to detect subtler changes by finding genes that exhibit different behaviors between these conditions along a differentiation path.

Asunto(s)

Análisis de la Célula Individual , Células Madre , Análisis de la Célula Individual/métodos , Diferenciación Celular/genética , Desarrollo Embrionario , Análisis de Secuencia de ARN/métodos , Condimentos , Perfilación de la Expresión Génica/métodos

Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects.

Van den Berge, Koen; Chou, Hsin-Jung; Roux de Bézieux, Hector; Street, Kelly; Risso, Davide; Ngai, John; Dudoit, Sandrine.

Cell Rep Methods ; 2(11): 100321, 2022 11 21.

Artículo en Inglés | MEDLINE | ID: mdl-36452861

RESUMEN

The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related to ATAC-seq data, with most studies leveraging tools developed for bulk transcriptome sequencing. Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins and evaluate it together with 11 different normalization procedures on 8 public ATAC-seq datasets. Accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, improving accuracy and interpretability. Through case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset.

Asunto(s)

Benchmarking , Secuenciación de Inmunoprecipitación de Cromatina , Epigénesis Genética , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento

Medical Food Assessment Using a Smartphone App With Continuous Glucose Monitoring Sensors: Proof-of-Concept Study.

Roux de Bézieux, Hector; Bullard, James; Kolterman, Orville; Souza, Michael; Perraudeau, Fanny.

JMIR Form Res ; 5(3): e20175, 2021 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-33661120

RESUMEN

BACKGROUND: Novel wearable biosensors, ubiquitous smartphone ownership, and telemedicine are converging to enable new paradigms of clinical research. A new generation of continuous glucose monitoring (CGM) devices provides access to clinical-grade measurement of interstitial glucose levels. Adoption of these sensors has become widespread for the management of type 1 diabetes and is accelerating in type 2 diabetes. In parallel, individuals are adopting health-related smartphone-based apps to monitor and manage care. OBJECTIVE: We conducted a proof-of-concept study to investigate the potential of collecting robust, annotated, real-time clinical study measures of glucose levels without clinic visits. METHODS: Self-administered meal-tolerance tests were conducted to assess the impact of a proprietary synbiotic medical food on glucose control in a 6-week, double-blind, placebo-controlled, 2×2 cross-over pilot study (n=6). The primary endpoint was incremental glucose measured using Abbott Freestyle Libre CGM devices associated with a smartphone app that provided a visual diet log. RESULTS: All subjects completed the study and mastered CGM device usage. Over 40 days, 3000 data points on average per subject were collected across three sensors. No adverse events were recorded, and subjects reported general satisfaction with sensor management, the study product, and the smartphone app, with an average self-reported satisfaction score of 8.25/10. Despite a lack of sufficient power to achieve statistical significance, we demonstrated that we can detect meaningful changes in the postprandial glucose response in real-world settings, pointing to the merits of larger studies in the future. CONCLUSIONS: We have shown that CGM devices can provide a comprehensive picture of glucose control without clinic visits. CGM device usage in conjunction with our custom smartphone app can lower the participation burden for subjects while reducing study costs, and allows for robust integration of multiple valuable data types with glucose levels remotely. TRIAL REGISTRATION: ClinicalTrials.gov NCT04424888; http://clinicaltrials.gov/ct2/show/NCT04424888.

Klf5 establishes bi-potential cell fate by dual regulation of ICM and TE specification genes.

Kinisu, Martin; Choi, Yong Jin; Cattoglio, Claudia; Liu, Ke; Roux de Bezieux, Hector; Valbuena, Raeline; Pum, Nicole; Dudoit, Sandrine; Huang, Haiyan; Xuan, Zhenyu; Kim, Sang Yong; He, Lin.

Cell Rep ; 37(6): 109982, 2021 11 09.

Artículo en Inglés | MEDLINE | ID: mdl-34758315

RESUMEN

Early blastomeres of mouse preimplantation embryos exhibit bi-potential cell fate, capable of generating both embryonic and extra-embryonic lineages in blastocysts. Here we identify three major two-cell-stage (2C)-specific endogenous retroviruses (ERVs) as the molecular hallmark of this bi-potential plasticity. Using the long terminal repeats (LTRs) of all three 2C-specific ERVs, we identify Krüppel-like factor 5 (Klf5) as their major upstream regulator. Klf5 is essential for bi-potential cell fate; a single Klf5-overexpressing embryonic stem cell (ESC) generates terminally differentiated embryonic and extra-embryonic lineages in chimeric embryos, and Klf5 directly induces inner cell mass (ICM) and trophectoderm (TE) specification genes. Intriguingly, Klf5 and Klf4 act redundantly during ICM specification, whereas Klf5 deficiency alone impairs TE specification. Klf5 is regulated by multiple 2C-specific transcription factors, particularly Dux, and the Dux/Klf5 axis is evolutionarily conserved. The 2C-specific transcription program converges on Klf5 to establish bi-potential cell fate, enabling a cell state with dual activation of ICM and TE genes.

Asunto(s)

Masa Celular Interna del Blastocisto/citología , Blastocisto , Linaje de la Célula , Células Madre Embrionarias/citología , Regulación del Desarrollo de la Expresión Génica , Factores de Transcripción de Tipo Kruppel/metabolismo , Trofoblastos/citología , Animales , Masa Celular Interna del Blastocisto/metabolismo , Diferenciación Celular , Células Madre Embrionarias/metabolismo , Femenino , Factores de Transcripción de Tipo Kruppel/genética , Masculino , Ratones , Ratones Endogámicos C3H , Ratones Endogámicos C57BL , RNA-Seq , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Trofoblastos/metabolismo

Trajectory-based differential expression analysis for single-cell sequencing data.

Van den Berge, Koen; Roux de Bézieux, Hector; Street, Kelly; Saelens, Wouter; Cannoodt, Robrecht; Saeys, Yvan; Dudoit, Sandrine; Clement, Lieven.

Nat Commun ; 11(1): 1201, 2020 03 05.

Artículo en Inglés | MEDLINE | ID: mdl-32139671

RESUMEN

Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression. Downstream of trajectory inference, it is vital to discover genes that are (i) associated with the lineages in the trajectory, or (ii) differentially expressed between lineages, to illuminate the underlying biological processes. Current data analysis procedures, however, either fail to exploit the continuous resolution provided by trajectory inference, or fail to pinpoint the exact types of differential expression. We introduce tradeSeq, a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of both within-lineage and between-lineage differential expression. By incorporating observation-level weights, the model additionally allows to account for zero inflation. We evaluate the method on simulated datasets and on real datasets from droplet-based and full-length protocols, and show that it yields biological insights through a clear interpretation of the data.

Asunto(s)

Perfilación de la Expresión Génica , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Animales , Médula Ósea/metabolismo , Simulación por Computador , Bases de Datos Genéticas , Regulación de la Expresión Génica , Ratones , Modelos Estadísticos , Mucosa Olfatoria/metabolismo , Análisis de Componente Principal

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA