Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36089561

RESUMEN

We present a novel self-supervised Contrastive LEArning framework for single-cell ribonucleic acid (RNA)-sequencing (CLEAR) data representation and the downstream analysis. Compared with current methods, CLEAR overcomes the heterogeneity of the experimental data with a specifically designed representation learning task and thus can handle batch effects and dropout events simultaneously. It achieves superior performance on a broad range of fundamental tasks, including clustering, visualization, dropout correction, batch effect removal, and pseudo-time inference. The proposed method successfully identifies and illustrates inflammatory-related mechanisms in a COVID-19 disease study with 43 695 single cells from peripheral blood mononuclear cells.


Asunto(s)
COVID-19 , ARN , COVID-19/genética , Análisis por Conglomerados , Análisis de Datos , Humanos , Leucocitos Mononucleares , RNA-Seq , Análisis de Secuencia de ARN/métodos
2.
Mol Syst Biol ; 19(6): e11490, 2023 06 12.
Artículo en Inglés | MEDLINE | ID: mdl-37063090

RESUMEN

High-content image-based cell phenotyping provides fundamental insights into a broad variety of life science disciplines. Striving for accurate conclusions and meaningful impact demands high reproducibility standards, with particular relevance for high-quality open-access data sharing and meta-analysis. However, the sources and degree of biological and technical variability, and thus the reproducibility and usefulness of meta-analysis of results from live-cell microscopy, have not been systematically investigated. Here, using high-content data describing features of cell migration and morphology, we determine the sources of variability across different scales, including between laboratories, persons, experiments, technical repeats, cells, and time points. Significant technical variability occurred between laboratories and, to lesser extent, between persons, providing low value to direct meta-analysis on the data from different laboratories. However, batch effect removal markedly improved the possibility to combine image-based datasets of perturbation experiments. Thus, reproducible quantitative high-content cell image analysis of perturbation effects and meta-analysis depend on standardized procedures combined with batch correction.


Asunto(s)
Reproducibilidad de los Resultados , Movimiento Celular
3.
BMC Bioinformatics ; 17 Suppl 5: 194, 2016 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-27294826

RESUMEN

BACKGROUND: We address the problem of integratively analyzing multiple gene expression, microarray datasets in order to reconstruct gene-gene interaction networks. Integrating multiple datasets is generally believed to provide increased statistical power and to lead to a better characterization of the system under study. However, the presence of systematic variation across different studies makes network reverse-engineering tasks particularly challenging. We contrast two approaches that have been frequently used in the literature for addressing systematic biases: meta-analysis methods, which first calculate opportune statistics on single datasets and successively summarize them, and data-merging methods, which directly analyze the pooled data after removing eventual biases. This comparative evaluation is performed on both synthetic and real data, the latter consisting of two manually curated microarray compendia comprising several E. coli and Yeast studies, respectively. Furthermore, the reconstruction of the regulatory network of the transcription factor Ikaros in human Peripheral Blood Mononuclear Cells (PBMCs) is presented as a case-study. RESULTS: The meta-analysis and data-merging methods included in our experimentations provided comparable performances on both synthetic and real data. Furthermore, both approaches outperformed (a) the naïve solution of merging data together ignoring possible biases, and (b) the results that are expected when only one dataset out of the available ones is analyzed in isolation. Using correlation statistics proved to be more effective than using p-values for correctly ranking candidate interactions. The results from the PBMC case-study indicate that the findings of the present study generalize to different types of network reconstruction algorithms. CONCLUSIONS: Ignoring the systematic variations that differentiate heterogeneous studies can produce results that are statistically indistinguishable from random guessing. Meta-analysis and data merging methods have proved equally effective in addressing this issue, and thus researchers may safely select the approach that best suit their specific application.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes/genética , Área Bajo la Curva , Escherichia coli/genética , Escherichia coli/metabolismo , Humanos , Leucocitos Mononucleares/citología , Leucocitos Mononucleares/metabolismo , Metaanálisis como Asunto , Curva ROC , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
4.
Brief Bioinform ; 14(4): 469-90, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22851511

RESUMEN

Genomic data integration is a key goal to be achieved towards large-scale genomic data analysis. This process is very challenging due to the diverse sources of information resulting from genomics experiments. In this work, we review methods designed to combine genomic data recorded from microarray gene expression (MAGE) experiments. It has been acknowledged that the main source of variation between different MAGE datasets is due to the so-called 'batch effects'. The methods reviewed here perform data integration by removing (or more precisely attempting to remove) the unwanted variation associated with batch effects. They are presented in a unified framework together with a wide range of evaluation tools, which are mandatory in assessing the efficiency and the quality of the data integration process. We provide a systematic description of the MAGE data integration methodology together with some basic recommendation to help the users in choosing the appropriate tools to integrate MAGE data for large-scale analysis; and also how to evaluate them from different perspectives in order to quantify their efficiency. All genomic data used in this study for illustration purposes were retrieved from InSilicoDB http://insilico.ulb.ac.be.


Asunto(s)
Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos , Transcriptoma , Simulación por Computador , Bases de Datos Genéticas , Expresión Génica , Variación Genética , Genoma
5.
J Comput Biol ; 28(5): 501-513, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33470876

RESUMEN

Dimensionality reduction is an important first step in the analysis of single-cell RNA-sequencing (scRNA-seq) data. In addition to enabling the visualization of the profiled cells, such representations are used by many downstream analyses methods ranging from pseudo-time reconstruction to clustering to alignment of scRNA-seq data from different experiments, platforms, and laboratories. Both supervised and unsupervised methods have been proposed to reduce the dimension of scRNA-seq. However, all methods to date are sensitive to batch effects. When batches correlate with cell types, as is often the case, their impact can lead to representations that are batch rather than cell-type specific. To overcome this, we developed a domain adversarial neural network model for learning a reduced dimension representation of scRNA-seq data. The adversarial model tries to simultaneously optimize two objectives. The first is the accuracy of cell-type assignment and the second is the inability to distinguish the batch (domain). We tested the method by using the resulting representation to align several different data sets. As we show, by overcoming batch effects our method was able to correctly separate cell types, improving on several prior methods suggested for this task. Analysis of the top features used by the network indicates that by taking the batch impact into account, the reduced representation is much better able to focus on key genes for each cell type.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Humanos , Análisis de la Célula Individual , Aprendizaje Automático Supervisado
6.
Genome Biol ; 22(1): 10, 2021 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-33397454

RESUMEN

Distinguishing biological from technical variation is crucial when integrating and comparing single-cell genomics datasets across different experiments. Existing methods lack the capability in explicitly distinguishing these two variations, often leading to the removal of both variations. Here, we present an integration method scMC to remove the technical variation while preserving the intrinsic biological variation. scMC learns biological variation via variance analysis to subtract technical variation inferred in an unsupervised manner. Application of scMC to both simulated and real datasets from single-cell RNA-seq and ATAC-seq experiments demonstrates its capability of detecting context-shared and context-specific biological signals via accurate alignment.


Asunto(s)
Bases de Datos Genéticas , Genómica/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Secuenciación de Inmunoprecipitación de Cromatina , Epigenómica , Análisis de la Célula Individual/métodos , Transcriptoma
7.
Transl Cancer Res ; 3(3): 260-265, 2014 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-25258704

RESUMEN

RNAseq technology is replacing microarray technology as the tool of choice for gene expression profiling. While providing much richer data than microarray, analysis of RNAseq data has been much more challenging. Among the many difficulties of RNAseq analysis, correctly adjusting for batch effect is a pivotal one for large-scale RNAseq based studies. The batch effect of RNAseq data is most obvious in microRNA (miRNA) sequencing studies. Using real miRNA sequencing (miRNAseq) data, we evaluated several batch removal techniques and discussed their effectiveness. We illustrate that by adjusting for batch effect, more reliable differentially expressed genes can be identified. Our study on batch effect in miRNAseq data can serve as a guideline for future miRNAseq studies that might contain batch effect.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA