Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Nat Biotechnol ; 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38429430

RESUMEN

Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.

2.
Nat Commun ; 15(1): 1014, 2024 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-38307875

RESUMEN

A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .


Asunto(s)
Algoritmos , Aprendizaje Automático , Tecnología , Concienciación , Aprendizaje Automático Supervisado , Análisis de la Célula Individual
3.
Thorax ; 79(4): 307-315, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38195644

RESUMEN

BACKGROUND: Low-dose CT screening can reduce lung cancer-related mortality. However, most screen-detected pulmonary abnormalities do not develop into cancer and it often remains challenging to identify malignant nodules, particularly among indeterminate nodules. We aimed to develop and assess prediction models based on radiological features to discriminate between benign and malignant pulmonary lesions detected on a baseline screen. METHODS: Using four international lung cancer screening studies, we extracted 2060 radiomic features for each of 16 797 nodules (513 malignant) among 6865 participants. After filtering out low-quality radiomic features, 642 radiomic and 9 epidemiological features remained for model development. We used cross-validation and grid search to assess three machine learning (ML) models (eXtreme Gradient Boosted Trees, random forest, least absolute shrinkage and selection operator (LASSO)) for their ability to accurately predict risk of malignancy for pulmonary nodules. We report model performance based on the area under the curve (AUC) and calibration metrics in the held-out test set. RESULTS: The LASSO model yielded the best predictive performance in cross-validation and was fit in the full training set based on optimised hyperparameters. Our radiomics model had a test-set AUC of 0.93 (95% CI 0.90 to 0.96) and outperformed the established Pan-Canadian Early Detection of Lung Cancer model (AUC 0.87, 95% CI 0.85 to 0.89) for nodule assessment. Our model performed well among both solid (AUC 0.93, 95% CI 0.89 to 0.97) and subsolid nodules (AUC 0.91, 95% CI 0.85 to 0.95). CONCLUSIONS: We developed highly accurate ML models based on radiomic and epidemiological features from four international lung cancer screening studies that may be suitable for assessing indeterminate screen-detected pulmonary nodules for risk of malignancy.


Asunto(s)
Neoplasias Pulmonares , Nódulos Pulmonares Múltiples , Humanos , Neoplasias Pulmonares/diagnóstico , Detección Precoz del Cáncer , Radiómica , Tomografía Computarizada por Rayos X , Canadá , Nódulos Pulmonares Múltiples/patología , Aprendizaje Automático , Estudios Retrospectivos
4.
FASEB J ; 36(10): e22560, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36165236

RESUMEN

Angiogenesis inhibitor drugs targeting vascular endothelial growth factor (VEGF) signaling to the endothelial cell (EC) are used to treat various cancer types. However, primary or secondary resistance to therapy is common. Clinical and pre-clinical studies suggest that alternative pro-angiogenic factors are upregulated after VEGF pathway inhibition. Therefore, identification of alternative pro-angiogenic pathway(s) is critical for the development of more effective anti-angiogenic therapy. Here we study the role of apelin as a pro-angiogenic G-protein-coupled receptor ligand in tumor growth and angiogenesis. We found that loss of apelin in mice delayed the primary tumor growth of Lewis lung carcinoma 1 and B16F10 melanoma when combined with the VEGF receptor tyrosine kinase inhibitor, sunitinib. Targeting apelin in combination with sunitinib markedly reduced the tumor vessel density, and decreased microvessel remodeling. Apelin loss reduced angiogenic sprouting and tip cell marker gene expression in comparison to the sunitinib-alone-treated mice. Single-cell RNA sequencing of tumor EC demonstrated that the loss of apelin prevented EC tip cell differentiation. Thus, apelin is a potent pro-angiogenic cue that supports initiation of tumor neovascularization. Together, our data suggest that targeting apelin may be useful as adjuvant therapy in combination with VEGF signaling inhibition to inhibit the growth of advanced tumors.


Asunto(s)
Neoplasias Experimentales , Neoplasias , Inhibidores de la Angiogénesis/farmacología , Animales , Apelina , Ligandos , Ratones , Neoplasias/tratamiento farmacológico , Neoplasias Experimentales/tratamiento farmacológico , Neovascularización Patológica/tratamiento farmacológico , Inhibidores de Proteínas Quinasas/farmacología , Receptores Acoplados a Proteínas G/fisiología , Receptores de Factores de Crecimiento Endotelial Vascular , Sunitinib/farmacología , Factor A de Crecimiento Endotelial Vascular/metabolismo , Factores de Crecimiento Endotelial Vascular/uso terapéutico
5.
J Gen Intern Med ; 37(1): 154-161, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34755268

RESUMEN

IMPORTANCE: SARS-CoV-2 has infected over 200 million people worldwide, resulting in more than 4 million deaths. Randomized controlled trials are the single best tool to identify effective treatments against this novel pathogen. OBJECTIVE: To describe the characteristics of randomized controlled trials of treatments for COVID-19 in the United States launched in the first 9 months of the pandemic. Design, Setting, and Participants We conducted a cross-sectional study of all completed or actively enrolling randomized, interventional, clinical trials for the treatment of COVID-19 in the United States registered on www.clinicaltrials.gov as of August 10, 2020. We excluded trials of vaccines and other interventions intended to prevent COVID-19. Main Outcomes and Measures We used descriptive statistics to characterize the clinical trials and the statistical power for the available studies. For the late-phase trials (i.e., phase 3 and 2/3 studies), we compared the geographic distribution of the clinical trials with the geographic distribution of people diagnosed with COVID-19. RESULTS: We identified 200 randomized controlled trials of treatments for people with COVID-19. Across all trials, 87 (43.5%) were single-center, 64 (32.0%) were unblinded, and 80 (40.0%) were sponsored by industry. The most common treatments included monoclonal antibodies (N=46 trials), small molecule immunomodulators (N=28), antiviral medications (N=24 trials), and hydroxychloroquine (N=20 trials). Of the 9 trials completed by August 2020, the median sample size was 450 (IQR 67-1113); of the 191 ongoing trials, the median planned sample size was 150 (IQR 60-400). Of the late-phase trials (N=54), the most common primary outcome was a severity scale (N=23, 42.6%), followed by a composite of mortality and ventilation (N=10, 18.5%), and mortality alone (N=6, 11.1%). Among these late-phase trials, all trials of antivirals, monoclonal antibodies, or chloroquine/hydroxychloroquine had a power of less than 25% to detect a 20% relative risk reduction in mortality. Had the individual trials for a given class of treatments instead formed a single trial, the power to detect that same reduction in mortality would have been greater than 98%. There was large variability in access to trials with the highest number of trials per capita in the Northeast and the lowest in the Midwest. CONCLUSIONS AND RELEVANCE: A large number of randomized trials were launched early in the pandemic to evaluate treatments for COVID-19. However, many trials were underpowered for important clinical endpoints and substantial geographic disparities were observed, highlighting the importance of improving national clinical trial infrastructure.


Asunto(s)
COVID-19 , Estudios Transversales , Humanos , Pandemias , Ensayos Clínicos Controlados Aleatorios como Asunto , SARS-CoV-2 , Resultado del Tratamiento , Estados Unidos/epidemiología
6.
NEJM Evid ; 1(5): EVIDe2200062, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-38319201

RESUMEN

The Basics of Machine LearningWhen a person is pregnant, a key question is how to establish the "date" of the pregnancy. Classically, the date was based on the last menstrual period (LMP). For the past 3 decades or more, in high-resource countries, this has been done using "hospital-grade" ultrasound machines, with testing performed by trained sonographers. In many parts of the world, neither the machines nor the trained sonographers are accessible. In an article published in NEJM Evidence, Pokaprakarn et al.1 asked whether a low-cost handheld ultrasound device combined with artificial intelligence (AI) could substitute for the expensive machines and trained sonographers.

7.
Cell Syst ; 12(12): 1173-1186.e5, 2021 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-34536381

RESUMEN

A major challenge in the analysis of highly multiplexed imaging data is the assignment of cells to a priori known cell types. Existing approaches typically solve this by clustering cells followed by manual annotation. However, these often require several subjective choices and cannot explicitly assign cells to an uncharacterized type. To help address these issues we present Astir, a probabilistic model to assign cells to cell types by integrating prior knowledge of marker proteins. Astir uses deep recognition neural networks for fast inference, allowing for annotations at the million-cell scale in the absence of a previously annotated reference. We apply Astir to over 2.4 million cells from suspension and imaging datasets and demonstrate its scalability, robustness to sample composition, and interpretable uncertainty estimates. We envision deployment of Astir either for a first broad cell type assignment or to accurately annotate cells that may serve as biomarkers in multiple disease contexts. A record of this paper's transparent peer review process is included in the supplemental information.


Asunto(s)
Redes Neurales de la Computación , Proteómica , Análisis por Conglomerados
8.
Nature ; 595(7868): 585-590, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34163070

RESUMEN

Progress in defining genomic fitness landscapes in cancer, especially those defined by copy number alterations (CNAs), has been impeded by lack of time-series single-cell sampling of polyclonal populations and temporal statistical models1-7. Here we generated 42,000 genomes from multi-year time-series single-cell whole-genome sequencing of breast epithelium and primary triple-negative breast cancer (TNBC) patient-derived xenografts (PDXs), revealing the nature of CNA-defined clonal fitness dynamics induced by TP53 mutation and cisplatin chemotherapy. Using a new Wright-Fisher population genetics model8,9 to infer clonal fitness, we found that TP53 mutation alters the fitness landscape, reproducibly distributing fitness over a larger number of clones associated with distinct CNAs. Furthermore, in TNBC PDX models with mutated TP53, inferred fitness coefficients from CNA-based genotypes accurately forecast experimentally enforced clonal competition dynamics. Drug treatment in three long-term serially passaged TNBC PDXs resulted in cisplatin-resistant clones emerging from low-fitness phylogenetic lineages in the untreated setting. Conversely, high-fitness clones from treatment-naive controls were eradicated, signalling an inversion of the fitness landscape. Finally, upon release of drug, selection pressure dynamics were reversed, indicating a fitness cost of treatment resistance. Together, our findings define clonal fitness linked to both CNA and therapeutic resistance in polyclonal tumours.


Asunto(s)
Variaciones en el Número de Copia de ADN , Resistencia a Antineoplásicos , Neoplasias de la Mama Triple Negativas/genética , Animales , Línea Celular Tumoral , Cisplatino/farmacología , Células Clonales/patología , Femenino , Aptitud Genética , Humanos , Ratones , Modelos Estadísticos , Trasplante de Neoplasias , Proteína p53 Supresora de Tumor/genética , Secuenciación Completa del Genoma
9.
J Pathol ; 254(3): 254-264, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33797756

RESUMEN

Hereditary diffuse gastric cancer (HDGC) is a cancer syndrome caused by germline variants in CDH1, the gene encoding the cell-cell adhesion molecule E-cadherin. Loss of E-cadherin in cancer is associated with cellular dedifferentiation and poor prognosis, but the mechanisms through which CDH1 loss initiates HDGC are not known. Using single-cell RNA sequencing, we explored the transcriptional landscape of a murine organoid model of HDGC to characterize the impact of CDH1 loss in early tumourigenesis. Progenitor populations of stratified squamous and simple columnar epithelium, characteristic of the mouse stomach, showed lineage-specific transcriptional programs. Cdh1 inactivation resulted in shifts along the squamous differentiation trajectory associated with aberrant expression of genes central to gastrointestinal epithelial differentiation. Cytokeratin 7 (CK7), encoded by the differentiation-dependent gene Krt7, was a specific marker for early neoplastic lesions in CDH1 carriers. Our findings suggest that deregulation of developmental transcriptional programs may precede malignancy in HDGC. © 2021 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.


Asunto(s)
Cadherinas/genética , Transformación Celular Neoplásica/genética , Regulación Neoplásica de la Expresión Génica/genética , Predisposición Genética a la Enfermedad/genética , Neoplasias Gástricas/genética , Animales , Transformación Celular Neoplásica/patología , Modelos Animales de Enfermedad , Ratones , Ratones Transgénicos , Organoides , Análisis de la Célula Individual , Neoplasias Gástricas/patología , Transcriptoma
10.
Phys Biol ; 17(6): 061001, 2020 09 19.
Artículo en Inglés | MEDLINE | ID: mdl-32759485

RESUMEN

Single-cell technologies have revolutionized biomedical research by enabling scalable measurement of the genome, transcriptome, proteome, and epigenome of multiple systems at single-cell resolution. Now widely applied to cancer models, these assays offer new insights into tumour heterogeneity, which underlies cancer initiation, progression, and relapse. However, the large quantities of high-dimensional, noisy data produced by single-cell assays can complicate data analysis, obscuring biological signals with technical artifacts. In this review article, we outline the major challenges in analyzing single-cell cancer genomics data and survey the current computational tools available to tackle these. We further outline unsolved problems that we consider major opportunities for future methods development to help interpret the vast quantities of data being generated.


Asunto(s)
Biología Computacional/métodos , Genoma , Genómica/métodos , Neoplasias/genética , Análisis de la Célula Individual/métodos , Simulación por Computador , Humanos
11.
J Pathol ; 252(2): 201-214, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32686114

RESUMEN

Endometrial carcinoma, the most common gynaecological cancer, develops from endometrial epithelium which is composed of secretory and ciliated cells. Pathologic classification is unreliable and there is a need for prognostic tools. We used single cell sequencing to study organoid model systems derived from normal endometrial endometrium to discover novel markers specific for endometrial ciliated or secretory cells. A marker of secretory cells (MPST) and several markers of ciliated cells (FAM92B, WDR16, and DYDC2) were validated by immunohistochemistry on organoids and tissue sections. We performed single cell sequencing on endometrial and ovarian tumours and found both secretory-like and ciliated-like tumour cells. We found that ciliated cell markers (DYDC2, CTH, FOXJ1, and p73) and the secretory cell marker MPST were expressed in endometrial tumours and positively correlated with disease-specific and overall survival of endometrial cancer patients. These findings suggest that expression of differentiation markers in tumours correlates with less aggressive disease, as would be expected for tumours that retain differentiation capacity, albeit cryptic in the case of ciliated cells. These markers could be used to improve the risk stratification of endometrial cancer patients, thereby improving their management. We further assessed whether consideration of MPST expression could refine the ProMiSE molecular classification system for endometrial tumours. We found that higher expression levels of MPST could be used to refine stratification of three of the four ProMiSE molecular subgroups, and that any level of MPST expression was able to significantly refine risk stratification of the copy number high subgroup which has the worst prognosis. Taken together, this shows that single cell sequencing of putative cells of origin has the potential to uncover novel biomarkers that could be used to guide management of cancers. © 2020 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.


Asunto(s)
Biomarcadores de Tumor/análisis , Carcinoma Endometrioide/patología , Neoplasias Endometriales/patología , Análisis de Secuencia de ARN/métodos , Diferenciación Celular , Femenino , Humanos , Organoides , Transcriptoma
12.
Genome Biol ; 21(1): 31, 2020 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-32033589

RESUMEN

The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.


Asunto(s)
Ciencia de los Datos/métodos , Genómica/métodos , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Animales , Humanos
13.
Genome Biol ; 20(1): 210, 2019 10 17.
Artículo en Inglés | MEDLINE | ID: mdl-31623682

RESUMEN

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is a powerful tool for studying complex biological systems, such as tumor heterogeneity and tissue microenvironments. However, the sources of technical and biological variation in primary solid tumor tissues and patient-derived mouse xenografts for scRNA-seq are not well understood. RESULTS: We use low temperature (6 °C) protease and collagenase (37 °C) to identify the transcriptional signatures associated with tissue dissociation across a diverse scRNA-seq dataset comprising 155,165 cells from patient cancer tissues, patient-derived breast cancer xenografts, and cancer cell lines. We observe substantial variation in standard quality control metrics of cell viability across conditions and tissues. From the contrast between tissue protease dissociation at 37 °C or 6 °C, we observe that collagenase digestion results in a stress response. We derive a core gene set of 512 heat shock and stress response genes, including FOS and JUN, induced by collagenase (37 °C), which are minimized by dissociation with a cold active protease (6 °C). While induction of these genes was highly conserved across all cell types, cell type-specific responses to collagenase digestion were observed in patient tissues. CONCLUSIONS: The method and conditions of tumor dissociation influence cell yield and transcriptome state and are both tissue- and cell-type dependent. Interpretation of stress pathway expression differences in cancer single-cell studies, including components of surface immune recognition such as MHC class I, may be especially confounded. We define a core set of 512 genes that can assist with the identification of such effects in dissociated scRNA-seq experiments.


Asunto(s)
Genómica/métodos , Neoplasias/metabolismo , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Animales , Frío , Colagenasas , Humanos , Ratones , Péptido Hidrolasas , Estrés Fisiológico , Transcriptoma
14.
Nat Methods ; 16(10): 1007-1015, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31501550

RESUMEN

Single-cell RNA sequencing has enabled the decomposition of complex tissues into functionally distinct cell types. Often, investigators wish to assign cells to cell types through unsupervised clustering followed by manual annotation or via 'mapping' to existing data. However, manual interpretation scales poorly to large datasets, mapping approaches require purified or pre-annotated data and both are prone to batch effects. To overcome these issues, we present CellAssign, a probabilistic model that leverages prior knowledge of cell-type marker genes to annotate single-cell RNA sequencing data into predefined or de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while controlling for batch and sample effects. We demonstrate the advantages of CellAssign through extensive simulations and analysis of tumor microenvironment composition in high-grade serous ovarian cancer and follicular lymphoma.


Asunto(s)
Perfilación de la Expresión Génica , Linfoma Folicular/patología , Probabilidad , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Microambiente Tumoral , Humanos , Linfoma Folicular/inmunología
15.
Genome Biol ; 20(1): 54, 2019 03 12.
Artículo en Inglés | MEDLINE | ID: mdl-30866997

RESUMEN

Measuring gene expression of tumor clones at single-cell resolution links functional consequences to somatic alterations. Without scalable methods to simultaneously assay DNA and RNA from the same single cell, parallel single-cell DNA and RNA measurements from independent cell populations must be mapped for genome-transcriptome association. We present clonealign, which assigns gene expression states to cancer clones using single-cell RNA and DNA sequencing independently sampled from a heterogeneous population. We apply clonealign to triple-negative breast cancer patient-derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either sequencing method alone.


Asunto(s)
Biomarcadores de Tumor/genética , Cistadenocarcinoma Seroso/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , Neoplasias Ováricas/genética , Análisis de la Célula Individual/métodos , Programas Informáticos , Neoplasias de la Mama Triple Negativas/genética , Animales , Células Clonales , Cistadenocarcinoma Seroso/patología , Femenino , Humanos , Ratones Endogámicos NOD , Ratones SCID , Neoplasias Ováricas/patología , Neoplasias de la Mama Triple Negativas/patología , Células Tumorales Cultivadas , Ensayos Antitumor por Modelo de Xenoinjerto
16.
Bioinformatics ; 35(1): 28-35, 2019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29939207

RESUMEN

Motivation: Pseudotime estimation from single-cell gene expression data allows the recovery of temporal information from otherwise static profiles of individual cells. Conventional pseudotime inference methods emphasize an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. However, the resulting trajectories can only be understood in terms of abstract geometric structures and not in terms of interpretable models of gene behaviour. Results: Here we introduce an orthogonal Bayesian approach termed 'Ouija' that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. We demonstrate that this small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify 'metastable' states-discrete cell types along the continuous trajectories-that recapitulate known cell types. Availability and implementation: An open source implementation is available as an R package at http://www.github.com/kieranrcampbell/ouija and as a Python/TensorFlow package at http://www.github.com/kieranrcampbell/ouijaflow. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Teorema de Bayes , Biología Computacional
17.
Cell Stem Cell ; 24(1): 93-106.e6, 2019 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-30503143

RESUMEN

Induced pluripotent stem cell (iPSC)-derived dopamine neurons provide an opportunity to model Parkinson's disease (PD), but neuronal cultures are confounded by asynchronous and heterogeneous appearance of disease phenotypes in vitro. Using high-resolution, single-cell transcriptomic analyses of iPSC-derived dopamine neurons carrying the GBA-N370S PD risk variant, we identified a progressive axis of gene expression variation leading to endoplasmic reticulum stress. Pseudotime analysis of genes differentially expressed (DE) along this axis identified the transcriptional repressor histone deacetylase 4 (HDAC4) as an upstream regulator of disease progression. HDAC4 was mislocalized to the nucleus in PD iPSC-derived dopamine neurons and repressed genes early in the disease axis, leading to late deficits in protein homeostasis. Treatment of iPSC-derived dopamine neurons with HDAC4-modulating compounds upregulated genes early in the DE axis and corrected PD-related cellular phenotypes. Our study demonstrates how single-cell transcriptomics can exploit cellular heterogeneity to reveal disease mechanisms and identify therapeutic targets.


Asunto(s)
Neuronas Dopaminérgicas/patología , Regulación de la Expresión Génica , Histona Desacetilasas/metabolismo , Células Madre Pluripotentes Inducidas/patología , Enfermedad de Parkinson/patología , Proteínas Represoras/metabolismo , Análisis de la Célula Individual/métodos , Progresión de la Enfermedad , Dopamina/metabolismo , Neuronas Dopaminérgicas/metabolismo , Estrés del Retículo Endoplásmico , Perfilación de la Expresión Génica , Glucosilceramidasa/genética , Histona Desacetilasas/genética , Humanos , Células Madre Pluripotentes Inducidas/metabolismo , Mutación , Enfermedad de Parkinson/genética , Enfermedad de Parkinson/metabolismo , Fenotipo , Proteínas Represoras/genética , Transcriptoma
18.
Nat Commun ; 9(1): 2442, 2018 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-29934517

RESUMEN

Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of time series data is challenging or prohibitive. Computational techniques have arisen from single-cell 'omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genómica/métodos , Modelos Genéticos , Algoritmos , Conjuntos de Datos como Asunto , Humanos , Análisis de la Célula Individual , Factores de Tiempo
19.
Wellcome Open Res ; 2: 19, 2017 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-28503665

RESUMEN

Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

20.
Bioinformatics ; 33(8): 1179-1186, 2017 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-28088763

RESUMEN

Motivation: Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization. Results: We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. Availability and Implementation: The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater . Contact: davis@ebi.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Lenguajes de Programación , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/normas , Análisis de la Célula Individual/métodos , Programas Informáticos , Línea Celular , Humanos , Análisis de Componente Principal , Control de Calidad , ARN/genética , Estadística como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...