Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36585784

RESUMEN

Single-cell RNA sequencing (scRNA-seq) clustering and labelling methods are used to determine precise cellular composition of tissue samples. Automated labelling methods rely on either unsupervised, cluster-based approaches or supervised, cell-based approaches to identify cell types. The high complexity of cancer poses a unique challenge, as tumor microenvironments are often composed of diverse cell subpopulations with unique functional effects that may lead to disease progression, metastasis and treatment resistance. Here, we assess 17 cell-based and 9 cluster-based scRNA-seq labelling algorithms using 8 cancer datasets, providing a comprehensive large-scale assessment of such methods in a cancer-specific context. Using several performance metrics, we show that cell-based methods generally achieved higher performance and were faster compared to cluster-based methods. Cluster-based methods more successfully labelled non-malignant cell types, likely because of a lack of gene signatures for relevant malignant cell subpopulations. Larger cell numbers present in some cell types in training data positively impacted prediction scores for cell-based methods. Finally, we examined which methods performed favorably when trained and tested on separate patient cohorts in scenarios similar to clinical applications, and which were able to accurately label particularly small or under-represented cell populations in the given datasets. We conclude that scPred and SVM show the best overall performances with cancer-specific data and provide further suggestions for algorithm selection. Our analysis pipeline for assessing the performance of cell type labelling algorithms is available in https://github.com/shooshtarilab/scRNAseq-Automated-Cell-Type-Labelling.


Asunto(s)
Neoplasias , Análisis de Expresión Génica de una Sola Célula , Humanos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Neoplasias/genética , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , Microambiente Tumoral
2.
Nucleic Acids Res ; 48(W1): W372-W379, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32479601

RESUMEN

CReSCENT: CanceR Single Cell ExpressioN Toolkit (https://crescent.cloud), is an intuitive and scalable web portal incorporating a containerized pipeline execution engine for standardized analysis of single-cell RNA sequencing (scRNA-seq) data. While scRNA-seq data for tumour specimens are readily generated, subsequent analysis requires high-performance computing infrastructure and user expertise to build analysis pipelines and tailor interpretation for cancer biology. CReSCENT uses public data sets and preconfigured pipelines that are accessible to computational biology non-experts and are user-editable to allow optimization, comparison, and reanalysis for specific experiments. Users can also upload their own scRNA-seq data for analysis and results can be kept private or shared with other users.


Asunto(s)
Neoplasias/genética , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Humanos , Neoplasias/inmunología , Linfocitos T/metabolismo
3.
Int J Mol Sci ; 23(19)2022 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-36232752

RESUMEN

Several disease risk variants reside on non-coding regions of DNA, particularly in open chromatin regions of specific cell types. Identifying the cell types relevant to complex traits through the integration of chromatin accessibility data and genome-wide association studies (GWAS) data can help to elucidate the mechanisms of these traits. In this study, we created a collection of associations between the combinations of chromatin accessibility data (bulk and single-cell) with an array of 201 complex phenotypes. We integrated the GWAS data of these 201 phenotypes with bulk chromatin accessibility data from 137 cell types measured by DNase-I hypersensitive sequencing and found significant results (FDR adjusted p-value ≤ 0.05) for at least one cell type in 21 complex phenotypes, such as atopic dermatitis, Graves' disease, and body mass index. With the integration of single-cell chromatin accessibility data measured by an assay for transposase-accessible chromatin with high-throughput sequencing (scATAC-seq), taken from 111 adult and 111 fetal cell types, the resolution of association was magnified, enabling the identification of further cell types. This resulted in the identification of significant correlations (FDR adjusted p-value ≤ 0.05) between 15 categories of single-cell subtypes and 59 phenotypes ranging from autoimmune diseases like Graves' disease to cardiovascular traits like diastolic/systolic blood pressure.


Asunto(s)
Cromatina , Enfermedad de Graves , Cromatina/genética , ADN/genética , Desoxirribonucleasas/genética , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Fenotipo , Transposasas/genética
4.
Am J Hum Genet ; 101(1): 75-86, 2017 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-28686857

RESUMEN

Genome-wide association studies in autoimmune and inflammatory diseases (AID) have uncovered hundreds of loci mediating risk. These associations are preferentially located in non-coding DNA regions and in particular in tissue-specific DNase I hypersensitivity sites (DHSs). While these analyses clearly demonstrate the overall enrichment of disease risk alleles on gene regulatory regions, they are not designed to identify individual regulatory regions mediating risk or the genes under their control, and thus uncover the specific molecular events driving disease risk. To do so we have departed from standard practice by identifying regulatory regions which replicate across samples and connect them to the genes they control through robust re-analysis of public data. We find significant evidence of regulatory potential in 78/301 (26%) risk loci across nine autoimmune and inflammatory diseases, and we find that individual genes are targeted by these effects in 53/78 (68%) of these. Thus, we are able to generate testable mechanistic hypotheses of the molecular changes that drive disease risk.


Asunto(s)
Enfermedades Autoinmunes/genética , Epigénesis Genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Alelos , Cromosomas Humanos Par 6/genética , Desoxirribonucleasa I/metabolismo , Sitios Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Inflamación/genética , Especificidad de Órganos/genética , Mapeo Físico de Cromosoma , Reproducibilidad de los Resultados , Factores de Riesgo
5.
PLoS Genet ; 12(6): e1006121, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27305007

RESUMEN

Using robust, integrated analysis of multiple genomic datasets, we show that genes depleted for non-synonymous de novo mutations form a subnetwork of 72 members under strong selective constraint. We further show this subnetwork is preferentially expressed in the early development of the human hippocampus and is enriched for genes mutated in neurological Mendelian disorders. We thus conclude that carefully orchestrated developmental processes are under strong constraint in early brain development, and perturbations caused by mutation have adverse outcomes subject to strong purifying selection. Our findings demonstrate that selective forces can act on groups of genes involved in the same process, supporting the notion that purifying selection can act coordinately on multiple genes. Our approach provides a statistically robust, interpretable way to identify the tissues and developmental times where groups of disease genes are active.


Asunto(s)
Redes Reguladoras de Genes/genética , Enfermedades Genéticas Congénitas/genética , Genoma/genética , Hipocampo/embriología , Mapas de Interacción de Proteínas/genética , Variación Genética/genética , Humanos , Modelos Genéticos , Mutación/genética
6.
Viruses ; 15(4)2023 03 27.
Artículo en Inglés | MEDLINE | ID: mdl-37112833

RESUMEN

Epstein-Barr virus (EBV) causes lifelong infection in over 90% of the world's population. EBV infection leads to several types of B cell and epithelial cancers due to the viral reprogramming of host-cell growth and gene expression. EBV is associated with 10% of stomach/gastric adenocarcinomas (EBVaGCs), which have distinct molecular, pathological, and immunological characteristics compared to EBV-negative gastric adenocarcinomas (EBVnGCs). Publicly available datasets, such as The Cancer Genome Atlas (TCGA), contain comprehensive transcriptomic, genomic, and epigenomic data for thousands of primary human cancer samples, including EBVaGCs. Additionally, single-cell RNA-sequencing data are becoming available for EBVaGCs. These resources provide a unique opportunity to explore the role of EBV in human carcinogenesis, as well as differences between EBVaGCs and their EBVnGC counterparts. We have constructed a suite of web-based tools called the EBV Gastric Cancer Resource (EBV-GCR), which utilizes TCGA and single-cell RNA-seq data and can be used for research related to EBVaGCs. These web-based tools allow investigators to gain in-depth biological and clinical insights by exploring the effects of EBV on cellular gene expression, associations with patient outcomes, immune landscape features, and differential gene methylation, featuring both whole-tissue and single-cell analyses.


Asunto(s)
Adenocarcinoma , Infecciones por Virus de Epstein-Barr , Neoplasias Gástricas , Humanos , Herpesvirus Humano 4/genética , Neoplasias Gástricas/genética , Carcinogénesis
7.
Sci Rep ; 13(1): 8106, 2023 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-37202401

RESUMEN

International consortia, including ENCODE, Roadmap Epigenomics, Genomics of Gene Regulation and Blueprint Epigenome have made large-scale datasets of open chromatin regions publicly available. While these datasets are extremely useful for studying mechanisms of gene regulation in disease and cell development, they only identify open chromatin regions in individual samples. A uniform comparison of accessibility of the same regulatory sites across multiple samples is necessary to correlate open chromatin accessibility and expression of target genes across matched cell types. Additionally, although replicate samples are available for majority of cell types, a comprehensive replication-based quality checking of individual regulatory sites is still lacking. We have integrated 828 DNase-I hypersensitive sequencing samples, which we have uniformly processed and then clustered their regulatory regions across all samples. We checked the quality of open-chromatin regions using our replication test. This has resulted in a comprehensive, quality-checked database of Open CHROmatin (OCHROdb) regions for 194 unique human cell types and cell lines which can serve as a reference for gene regulatory studies involving open chromatin. We have made this resource publicly available: users can download the whole database, or query it for their genomic regions of interest and visualize the results in an interactive genome browser.


Asunto(s)
Cromatina , Regulación de la Expresión Génica , Humanos , Cromatina/genética , Genómica , Secuencias Reguladoras de Ácidos Nucleicos , Epigenómica/métodos
8.
PLoS One ; 17(9): e0272302, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36084081

RESUMEN

MOTIVATION: The tumour microenvironment (TME) contains various cells including stromal fibroblasts, immune and malignant cells, and its composition can be elucidated using single-cell RNA sequencing (scRNA-seq). scRNA-seq datasets from several cancer types are available, yet we lack a comprehensive database to collect and present related TME data in an easily accessible format. RESULTS: We therefore built a TME scRNA-seq database, and created the R package TMExplorer to facilitate investigation of the TME. TMExplorer provides an interface to easily access all available datasets and their metadata. The users can search for datasets using a thorough range of characteristics. The TMExplorer allows for examination of the TME using scRNA-seq in a way that is streamlined and allows for easy integration into already existing scRNA-seq analysis pipelines.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN , Microambiente Tumoral/genética
9.
Comput Struct Biotechnol J ; 20: 6375-6387, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36420149

RESUMEN

Tumors are complex biological entities that comprise cell types of different origins, with different mutational profiles and different patterns of transcriptional dysregulation. The exploration of data related to cancer biology requires careful analytical methods to reflect the heterogeneity of cell populations in cancer samples. Single-cell techniques are now able to capture the transcriptional profiles of individual cells. However, the complexity of RNA-seq data, especially in cancer samples, makes it challenging to cluster single-cell profiles into groups that reflect the underlying cell types. We have developed a framework for a systematic examination of single-cell RNA-seq clustering algorithms for cancer data, which uses a range of well-established metrics to generate a unified quality score and algorithm ranking. To demonstrate this framework, we examined clustering performance of 15 different single-cell RNA-seq clustering algorithms on eight different cancer datasets. Our results suggest that the single-cell RNA-seq clustering algorithms fall into distinct groups by performance, with the highest clustering quality on non-malignant cells achieved by three algorithms: Seurat, bigSCale and Cell Ranger. However, for malignant cells, two additional algorithms often reach a better performance, namely Monocle and SC3. Their ability to detect known rare cell types was also among the best, along with Seurat. Our approach and results can be used by a broad audience of practitioners who analyze single-cell transcriptomic data in cancer research.

10.
Cancers (Basel) ; 13(8)2021 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-33924498

RESUMEN

Reactivation of the multi-subunit ribonucleoprotein telomerase is the primary telomere maintenance mechanism in cancer, but it is rate-limited by the enzymatic component, telomerase reverse transcriptase (TERT). While regulatory in nature, TERT alternative splice variant/isoform regulation and functions are not fully elucidated and are further complicated by their highly diverse expression and nature. Our primary objective was to characterize TERT isoform expression across 7887 neoplastic and 2099 normal tissue samples using The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression Project (GTEx), respectively. We confirmed the global overexpression and splicing shift towards full-length TERT in neoplastic tissue. Stratifying by tissue type we found uncharacteristic TERT expression in normal brain tissue subtypes. Stratifying by tumor-specific subtypes, we detailed TERT expression differences potentially regulated by subtype-specific molecular characteristics. Focusing on ß-deletion splicing regulation, we found the NOVA1 trans-acting factor to mediate alternative splicing in a cancer-dependent manner. Of relevance to future tissue-specific studies, we clustered cancer cell lines with tumors from related origin based on TERT isoform expression patterns. Taken together, our work has reinforced the need for tissue and tumour-specific TERT investigations, provided avenues to do so, and brought to light the current technical limitations of bioinformatic analyses of TERT isoform expression.

11.
BMC Bioinformatics ; 11: 403, 2010 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-20667133

RESUMEN

BACKGROUND: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL. RESULTS: We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., "events" in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations. CONCLUSIONS: This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor.


Asunto(s)
Algoritmos , Citometría de Flujo/métodos , Análisis por Conglomerados
12.
Cytometry A ; 77(9): 873-80, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20629196

RESUMEN

The immune response in humans is usually assessed using immunogenicity assays to provide biomarkers as correlates of protection (CoP). Flow cytometry is the assay of choice to measure intracellular cytokine staining (ICS) of cell-mediated immune (CMI) biomarkers. For CMI analysis, the integrated mean fluorescence intensity (iMFI) was introduced as a metric to represent the total functional CMI response as a CoP. iMFI is computed by multiplying the relative frequency (percent positive) of cells expressing a particular cytokine with the MFI of that population, and correlates better with protection in challenge models than either the percentage or the MFI of the cytokine-positive population. While determination of the iMFI as a CoP can readily be accomplished in animal models that allow challenge/protection experiments, this is not feasible in humans for ethical reasons. As a first step toward extending the iMFI concept to humans, we investigated the correlation of the iMFI derived from a human innate immune response ICS assay with functional cytokine release into the culture supernatant, as innate cytokines need to be released to have a functional impact. Next, we developed a quantitatively more correlative mathematical approach for calculating the functional response of cytokine-producing cells by incorporating the assignment of different weights to the magnitude (frequency of cytokine-positive cells) and the quality (the MFI) of the observed innate immune response. We refer to this model as generalized iMFI.


Asunto(s)
Citocinas/análisis , Citometría de Flujo/métodos , Fluorescencia , Inmunidad Innata , Coloración y Etiquetado/métodos , Adulto , Células Presentadoras de Antígenos/inmunología , Citocinas/metabolismo , Femenino , Humanos , Masculino , Modelos Inmunológicos , Estadística como Asunto , Receptores Toll-Like/agonistas
13.
Front Immunol ; 11: 1691, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32849590

RESUMEN

Mucosa-associated invariant T (MAIT) cells are unconventional, innate-like T lymphocytes that recognize vitamin B metabolites of microbial origin among other antigens displayed by the monomorphic molecule MHC class I-related protein 1 (MR1). Abundant in human tissues, reactive to local inflammatory cues, and endowed with immunomodulatory and cytolytic functions, MAIT cells are likely to play key roles in human malignancies. They accumulate in various tumor microenvironments (TMEs) where they often lose some of their functional capacities. However, the potential roles of MAIT cells in anticancer immunity or cancer progression and their significance in shaping clinical outcomes remain largely unknown. In this study, we analyzed publicly available bulk and single-cell tumor transcriptomic datasets to investigate the tissue distribution, phenotype, and prognostic significance of MAIT cells across several human cancers. We found that expanded MAIT cell clonotypes were often shared between the blood, tumor tissue and adjacent healthy tissue of patients with colorectal, hepatocellular, and non-small cell lung carcinomas. Gene expression comparisons between tumor-infiltrating and healthy tissue MAIT cells revealed the presence of activation and/or exhaustion programs within the TMEs of primary hepatocellular and colorectal carcinomas. Interestingly, in basal and squamous cell carcinomas of the skin, programmed cell death-1 (PD-1) blockade upregulated the expression of several effector genes in tumor-infiltrating MAIT cells. We derived a signature comprising stable and specific MAIT cell gene markers across several tissue compartments and cancer types. By applying this signature to estimate MAIT cell abundance in pan-cancer gene expression data, we demonstrate that a heavier intratumoral MAIT cell presence is positively correlated with a favorable prognosis in esophageal carcinoma but predicts poor overall survival in colorectal and squamous cell lung carcinomas. Finally, in colorectal carcinoma and four other cancer types, we found a positive correlation between MR1 expression and estimated MAIT cell abundance. Collectively, our findings indicate that MAIT cells serve important but diverse roles in human cancers. Our work provides useful models and resources that employ gene expression data platforms to enable future studies in the realm of MAIT cell biology.


Asunto(s)
Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica/métodos , Células T Invariantes Asociadas a Mucosa/inmunología , Neoplasias/inmunología , Transcriptoma/inmunología , Humanos , Fenotipo
14.
BMC Res Notes ; 4: 50, 2011 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-21385382

RESUMEN

BACKGROUND: Flow cytometry is a widely used analytical technique for examining microscopic particles, such as cells. The Flow Cytometry Standard (FCS) was developed in 1984 for storing flow data and it is supported by all instrument and third party software vendors. However, FCS does not capture the full scope of flow cytometry (FCM)-related data and metadata, and data standards have recently been developed to address this shortcoming. FINDINGS: The Data Standards Task Force (DSTF) of the International Society for the Advancement of Cytometry (ISAC) has developed several data standards to complement the raw data encoded in FCS files. Efforts started with the Minimum Information about a Flow Cytometry Experiment, a minimal data reporting standard of details necessary to include when publishing FCM experiments to facilitate third party understanding. MIFlowCyt is now being recommended to authors by publishers as part of manuscript submission, and manuscripts are being checked by reviewers and editors for compliance. Gating-ML was then introduced to capture gating descriptions - an essential part of FCM data analysis describing the selection of cell populations of interest. The Classification Results File Format was developed to accommodate results of the gating process, mostly within the context of automated clustering. Additionally, the Archival Cytometry Standard bundles data with all the other components describing experiments. Here, we introduce these recent standards and provide the very first example of how they can be used to report FCM data including analysis and results in a standardized, computationally exchangeable form. CONCLUSIONS: Reporting standards and open file formats are essential for scientific collaboration and independent validation. The recently developed FCM data standards are now being incorporated into third party software tools and data repositories, which will ultimately facilitate understanding and data reuse.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA