Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Biom J ; 66(2): e2300060, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38351217

RESUMO

When modeling competing risks (CR) survival data, several techniques have been proposed in both the statistical and machine learning literature. State-of-the-art methods have extended classical approaches with more flexible assumptions that can improve predictive performance, allow high-dimensional data and missing values, among others. Despite this, modern approaches have not been widely employed in applied settings. This article aims to aid the uptake of such methods by providing a condensed compendium of CR survival methods with a unified notation and interpretation across approaches. We highlight available software and, when possible, demonstrate their usage via reproducible R vignettes. Moreover, we discuss two major concerns that can affect benchmark studies in this context: the choice of performance metrics and reproducibility.


Assuntos
Aprendizado de Máquina , Modelos Estatísticos , Reprodutibilidade dos Testes
2.
J R Soc Med ; : 1410768231223584, 2024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-38345538

RESUMO

OBJECTIVES: We undertook a national analysis to characterise and identify risk factors for acute respiratory infections (ARIs) resulting in hospitalisation during the winter period in Scotland. DESIGN: A population-based retrospective cohort analysis. SETTING: Scotland. PARTICIPANTS: The study involved 5.4 million residents in Scotland. MAIN OUTCOME MEASURES: Cox proportional hazard models were used to estimate adjusted hazard ratios (aHRs) and 95% confidence intervals (CIs) for the association between risk factors and ARI hospitalisation. RESULTS: Between 1 September 2022 and 31 January 2023, there were 22,284 (10.9% of 203,549 with any emergency hospitalisation) ARI hospitalisations (1759 in children and 20,525 in adults) in Scotland. Compared with the reference group of children aged 6-17 years, the risk of ARI hospitalisation was higher in children aged 3-5 years (aHR = 4.55; 95% CI: 4.11-5.04). Compared with those aged 25-29 years, the risk of ARI hospitalisation was highest among the oldest adults aged ≥80 years (aHR = 7.86; 95% CI: 7.06-8.76). Adults from more deprived areas (most deprived vs. least deprived, aHR = 1.64; 95% CI: 1.57-1.72), with existing health conditions (≥5 vs. 0 health conditions, aHR = 4.84; 95% CI: 4.53-5.18) or with history of all-cause emergency admissions (≥6 vs. 0 previous emergency admissions, aHR = 7.53; 95% CI: 5.48-10.35) were at a higher risk of ARI hospitalisations. The risk increased by the number of existing health conditions and previous emergency admission. Similar associations were seen in children. CONCLUSIONS: Younger children, older adults, those from more deprived backgrounds and individuals with greater numbers of pre-existing conditions and previous emergency admission were at increased risk for winter hospitalisations for ARI.

3.
Genome Biol ; 24(1): 278, 2023 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-38053194

RESUMO

BACKGROUND: Epigenetic scores (EpiScores) can provide biomarkers of lifestyle and disease risk. Projecting new datasets onto a reference panel is challenging due to separation of technical and biological variation with array data. Normalisation can standardise data distributions but may also remove population-level biological variation. RESULTS: We compare two birth cohorts (Lothian Birth Cohorts of 1921 and 1936 - nLBC1921 = 387 and nLBC1936 = 498) with blood-based DNA methylation assessed at the same chronological age (79 years) and processed in the same lab but in different years and experimental batches. We examine the effect of 16 normalisation methods on a novel BMI EpiScore (trained in an external cohort, n = 18,413), and Horvath's pan-tissue DNA methylation age, when the cohorts are normalised separately and together. The BMI EpiScore explains a maximum variance of R2=24.5% in BMI in LBC1936 (SWAN normalisation). Although there are cross-cohort R2 differences, the normalisation method makes a minimal difference to within-cohort estimates. Conversely, a range of absolute differences are seen for individual-level EpiScore estimates for BMI and age when cohorts are normalised separately versus together. While within-array methods result in identical EpiScores whether a cohort is normalised on its own or together with the second dataset, a range of differences is observed for between-array methods. CONCLUSIONS: Normalisation methods returning similar EpiScores, whether cohorts are analysed separately or together, will minimise technical variation when projecting new data onto a reference panel. These methods are important for cases where raw data is unavailable and joint normalisation of cohorts is computationally expensive.


Assuntos
Metilação de DNA , Epigenômica , Humanos , Idoso , Biomarcadores , Epigênese Genética
4.
Acta Neuropathol Commun ; 11(1): 84, 2023 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217978

RESUMO

The myelinated white matter tracts of the central nervous system (CNS) are essential for fast transmission of electrical impulses and are often differentially affected in human neurodegenerative diseases across CNS region, age and sex. We hypothesize that this selective vulnerability is underpinned by physiological variation in white matter glia. Using single nucleus RNA sequencing of human post-mortem white matter samples from the brain, cerebellum and spinal cord and subsequent tissue-based validation we found substantial glial heterogeneity with tissue region: we identified region-specific oligodendrocyte precursor cells (OPCs) that retain developmental origin markers into adulthood, distinguishing them from mouse OPCs. Region-specific OPCs give rise to similar oligodendrocyte populations, however spinal cord oligodendrocytes exhibit markers such as SKAP2 which are associated with increased myelin production and we found a spinal cord selective population particularly equipped for producing long and thick myelin sheaths based on the expression of genes/proteins such as HCN2. Spinal cord microglia exhibit a more activated phenotype compared to brain microglia, suggesting that the spinal cord is a more pro-inflammatory environment, a difference that intensifies with age. Astrocyte gene expression correlates strongly with CNS region, however, astrocytes do not show a more activated state with region or age. Across all glia, sex differences are subtle but the consistent increased expression of protein-folding genes in male donors hints at pathways that may contribute to sex differences in disease susceptibility. These findings are essential to consider for understanding selective CNS pathologies and developing tailored therapeutic strategies.


Assuntos
Neuroglia , Substância Branca , Humanos , Feminino , Masculino , Camundongos , Animais , Neuroglia/metabolismo , Medula Espinal/patologia , Bainha de Mielina/metabolismo , Oligodendroglia/patologia
5.
Clin Gastroenterol Hepatol ; 21(11): 2918-2927.e6, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37004971

RESUMO

BACKGROUND AND AIMS: The progressive nature of Crohn's disease is highly variable and hard to predict. In addition, symptoms correlate poorly with mucosal inflammation. There is therefore an urgent need to better characterize the heterogeneity of disease trajectories in Crohn's disease by utilizing objective markers of inflammation. We aimed to better understand this heterogeneity by clustering Crohn's disease patients with similar longitudinal fecal calprotectin profiles. METHODS: We performed a retrospective cohort study at the Edinburgh IBD Unit, a tertiary referral center, and used latent class mixed models to cluster Crohn's disease subjects using fecal calprotectin observed within 5 years of diagnosis. Information criteria, alluvial plots, and cluster trajectories were used to decide the optimal number of clusters. Chi-square test, Fisher's exact test, and analysis of variance were used to test for associations with variables commonly assessed at diagnosis. RESULTS: Our study cohort comprised 356 patients with newly diagnosed Crohn's disease and 2856 fecal calprotectin measurements taken within 5 years of diagnosis (median 7 per subject). Four distinct clusters were identified by characteristic calprotectin profiles: a cluster with consistently high fecal calprotectin and 3 clusters characterized by different downward longitudinal trends. Cluster membership was significantly associated with smoking (P = .015), upper gastrointestinal involvement (P < .001), and early biologic therapy (P < .001). CONCLUSIONS: Our analysis demonstrates a novel approach to characterizing the heterogeneity of Crohn's disease by using fecal calprotectin. The group profiles do not simply reflect different treatment regimens and do not mirror classical disease progression endpoints.


Assuntos
Doença de Crohn , Humanos , Doença de Crohn/diagnóstico , Doença de Crohn/terapia , Biomarcadores , Estudos Retrospectivos , Complexo Antígeno L1 Leucocitário , Progressão da Doença , Inflamação , Fezes , Índice de Gravidade de Doença
6.
Nat Aging ; 3(4): 450-458, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37117793

RESUMO

Type 2 diabetes mellitus (T2D) presents a major health and economic burden that could be alleviated with improved early prediction and intervention. While standard risk factors have shown good predictive performance, we show that the use of blood-based DNA methylation information leads to a significant improvement in the prediction of 10-year T2D incidence risk. Previous studies have been largely constrained by linear assumptions, the use of cytosine-guanine pairs one-at-a-time and binary outcomes. We present a flexible approach (via an R package, MethylPipeR) based on a range of linear and tree-ensemble models that incorporate time-to-event data for prediction. Using the Generation Scotland cohort (training set ncases = 374, ncontrols = 9,461; test set ncases = 252, ncontrols = 4,526) our best-performing model (area under the receiver operating characteristic curve (AUC) = 0.872, area under the precision-recall curve (PRAUC) = 0.302) showed notable improvement in 10-year onset prediction beyond standard risk factors (AUC = 0.839, precision-recall AUC = 0.227). Replication was observed in the German-based KORA study (n = 1,451, ncases = 142, P = 1.6 × 10-5).


Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/diagnóstico , Estudos de Coortes , Metilação de DNA/genética , Valor Preditivo dos Testes , Fatores de Risco
7.
Genome Med ; 15(1): 12, 2023 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-36855161

RESUMO

BACKGROUND: Epigenetic clocks can track both chronological age (cAge) and biological age (bAge). The latter is typically defined by physiological biomarkers and risk of adverse health outcomes, including all-cause mortality. As cohort sample sizes increase, estimates of cAge and bAge become more precise. Here, we aim to develop accurate epigenetic predictors of cAge and bAge, whilst improving our understanding of their epigenomic architecture. METHODS: First, we perform large-scale (N = 18,413) epigenome-wide association studies (EWAS) of chronological age and all-cause mortality. Next, to create a cAge predictor, we use methylation data from 24,674 participants from the Generation Scotland study, the Lothian Birth Cohorts (LBC) of 1921 and 1936, and 8 other cohorts with publicly available data. In addition, we train a predictor of time to all-cause mortality as a proxy for bAge using the Generation Scotland cohort (1214 observed deaths). For this purpose, we use epigenetic surrogates (EpiScores) for 109 plasma proteins and the 8 component parts of GrimAge, one of the current best epigenetic predictors of survival. We test this bAge predictor in four external cohorts (LBC1921, LBC1936, the Framingham Heart Study and the Women's Health Initiative study). RESULTS: Through the inclusion of linear and non-linear age-CpG associations from the EWAS, feature pre-selection in advance of elastic net regression, and a leave-one-cohort-out (LOCO) cross-validation framework, we obtain cAge prediction with a median absolute error equal to 2.3 years. Our bAge predictor was found to slightly outperform GrimAge in terms of the strength of its association to survival (HRGrimAge = 1.47 [1.40, 1.54] with p = 1.08 × 10-52, and HRbAge = 1.52 [1.44, 1.59] with p = 2.20 × 10-60). Finally, we introduce MethylBrowsR, an online tool to visualise epigenome-wide CpG-age associations. CONCLUSIONS: The integration of multiple large datasets, EpiScores, non-linear DNAm effects, and new approaches to feature selection has facilitated improvements to the blood-based epigenetic prediction of biological and chronological age.


Assuntos
Epigenoma , Epigenômica , Humanos , Feminino , Projetos de Pesquisa , Envelhecimento/genética , Epigênese Genética
8.
J Am Coll Cardiol ; 81(2): 156-168, 2023 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-36631210

RESUMO

BACKGROUND: Despite poor cardiovascular outcomes, there are no dedicated, validated risk stratification tools to guide investigation or treatment in type 2 myocardial infarction. OBJECTIVES: The goal of this study was to derive and validate a risk stratification tool for the prediction of death or future myocardial infarction in patients with type 2 myocardial infarction. METHODS: The T2-risk score was developed in a prospective multicenter cohort of consecutive patients with type 2 myocardial infarction. Cox proportional hazards models were constructed for the primary outcome of myocardial infarction or death at 1 year using variables selected a priori based on clinical importance. Discrimination was assessed by area under the receiving-operating characteristic curve (AUC). Calibration was investigated graphically. The tool was validated in a single-center cohort of consecutive patients and in a multicenter cohort study from sites across Europe. RESULTS: There were 1,121, 250, and 253 patients in the derivation, single-center, and multicenter validation cohorts, with the primary outcome occurring in 27% (297 of 1,121), 26% (66 of 250), and 14% (35 of 253) of patients, respectively. The T2-risk score incorporating age, ischemic heart disease, heart failure, diabetes mellitus, myocardial ischemia on electrocardiogram, heart rate, anemia, estimated glomerular filtration rate, and maximal cardiac troponin concentration had good discrimination (AUC: 0.76; 95% CI: 0.73-0.79) for the primary outcome and was well calibrated. Discrimination was similar in the consecutive patient (AUC: 0.83; 95% CI: 0.77-0.88) and multicenter (AUC: 0.74; 95% CI: 0.64-0.83) cohorts. T2-risk provided improved discrimination over the Global Registry of Acute Coronary Events 2.0 risk score in all cohorts. CONCLUSIONS: The T2-risk score performed well in different health care settings and could help clinicians to prognosticate, as well as target investigation and preventative therapies more effectively. (High-Sensitivity Troponin in the Evaluation of Patients With Suspected Acute Coronary Syndrome [High-STEACS]; NCT01852123).


Assuntos
Infarto Miocárdico de Parede Anterior , Diabetes Mellitus Tipo 2 , Infarto do Miocárdio , Humanos , Medição de Risco , Estudos de Coortes , Estudos Prospectivos , Prognóstico , Valor Preditivo dos Testes , Troponina I , Infarto do Miocárdio/diagnóstico , Infarto do Miocárdio/epidemiologia , Infarto do Miocárdio/tratamento farmacológico , Fatores de Risco
9.
PLoS Comput Biol ; 18(6): e1010163, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35727848

RESUMO

Single-cell multi-omics assays offer unprecedented opportunities to explore epigenetic regulation at cellular level. However, high levels of technical noise and data sparsity frequently lead to a lack of statistical power in correlative analyses, identifying very few, if any, significant associations between different molecular layers. Here we propose SCRaPL, a novel computational tool that increases power by carefully modelling noise in the experimental systems. We show on real and simulated multi-omics single-cell data sets that SCRaPL achieves higher sensitivity and better robustness in identifying correlations, while maintaining a similar level of false positives as standard analyses based on Pearson and Spearman correlation.


Assuntos
Epigênese Genética , Teorema de Bayes
10.
F1000Res ; 11: 59, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-38779464

RESUMO

Cell-to-cell gene expression variability is an inherent feature of complex biological systems, such as immunity and development. Single-cell RNA sequencing is a powerful tool to quantify this heterogeneity, but it is prone to strong technical noise. In this article, we describe a step-by-step computational workflow that uses the BASiCS Bioconductor package to robustly quantify expression variability within and between known groups of cells (such as experimental conditions or cell types). BASiCS uses an integrated framework for data normalisation, technical noise quantification and downstream analyses, propagating statistical uncertainty across these steps. Within a single seemingly homogeneous cell population, BASiCS can identify highly variable genes that exhibit strong heterogeneity as well as lowly variable genes with stable expression. BASiCS also uses a probabilistic decision rule to identify changes in expression variability between cell populations, whilst avoiding confounding effects related to differences in technical noise or in overall abundance. Using a publicly available dataset, we guide users through a complete pipeline that includes preliminary steps for quality control, as well as data exploration using the scater and scran Bioconductor packages. The workflow is accompanied by a Docker image that ensures the reproducibility of our results.

11.
Genome Biol ; 22(1): 114, 2021 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-33879195

RESUMO

High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression. scMET is available at https://github.com/andreaskapou/scMET .


Assuntos
Teorema de Bayes , Metilação de DNA , Epigênese Genética , Epigenômica/métodos , Heterogeneidade Genética , Análise de Célula Única/métodos , Software , Algoritmos , Biologia Computacional/métodos
12.
Genome Biol ; 21(1): 31, 2020 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-32033589

RESUMO

The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.


Assuntos
Ciência de Dados/métodos , Genômica/métodos , RNA-Seq/métodos , Análise de Célula Única/métodos , Animais , Humanos
13.
Circulation ; 141(3): 161-171, 2020 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-31587565

RESUMO

BACKGROUND: The introduction of more sensitive cardiac troponin assays has led to increased recognition of myocardial injury in acute illnesses other than acute coronary syndrome. The Universal Definition of Myocardial Infarction recommends high-sensitivity cardiac troponin testing and classification of patients with myocardial injury based on pathogenesis, but the clinical implications of implementing this guideline are not well understood. METHODS: In a stepped-wedge cluster randomized, controlled trial, we implemented a high-sensitivity cardiac troponin assay and the recommendations of the Universal Definition in 48 282 consecutive patients with suspected acute coronary syndrome. In a prespecified secondary analysis, we compared the primary outcome of myocardial infarction or cardiovascular death and secondary outcome of noncardiovascular death at 1 year across diagnostic categories. RESULTS: Implementation increased the diagnosis of type 1 myocardial infarction by 11% (510/4471), type 2 myocardial infarction by 22% (205/916), and acute and chronic myocardial injury by 36% (443/1233) and 43% (389/898), respectively. Compared with those without myocardial injury, the rate of the primary outcome was highest in those with type 1 myocardial infarction (cause-specific hazard ratio [HR] 5.64 [95% CI, 5.12-6.22]), but was similar across diagnostic categories, whereas noncardiovascular deaths were highest in those with acute myocardial injury (cause specific HR 2.65 [95% CI, 2.33-3.01]). Despite modest increases in antiplatelet therapy and coronary revascularization after implementation in patients with type 1 myocardial infarction, the primary outcome was unchanged (cause specific HR 1.00 [95% CI, 0.82-1.21]). Increased recognition of type 2 myocardial infarction and myocardial injury did not lead to changes in investigation, treatment or outcomes. CONCLUSIONS: Implementation of high-sensitivity cardiac troponin assays and the recommendations of the Universal Definition of Myocardial Infarction identified patients at high-risk of cardiovascular and noncardiovascular events but was not associated with consistent increases in treatment or improved outcomes. Trials of secondary prevention are urgently required to determine whether this risk is modifiable in patients without type 1 myocardial infarction. CLINICAL TRIAL REGISTRATION: https://www.clinicaltrials.gov. Unique identifier: NCT01852123.


Assuntos
Infarto do Miocárdio/sangue , Infarto do Miocárdio/diagnóstico , Troponina I/metabolismo , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/classificação , Valor Preditivo dos Testes , Medição de Risco
16.
Cell Syst ; 7(3): 284-294.e12, 2018 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-30172840

RESUMO

Cell-to-cell transcriptional variability in otherwise homogeneous cell populations plays an important role in tissue function and development. Single-cell RNA sequencing can characterize this variability in a transcriptome-wide manner. However, technical variation and the confounding between variability and mean expression estimates hinder meaningful comparison of expression variability between cell populations. To address this problem, we introduce an analysis approach that extends the BASiCS statistical framework to derive a residual measure of variability that is not confounded by mean expression. This includes a robust procedure for quantifying technical noise in experiments where technical spike-in molecules are not available. We illustrate how our method provides biological insight into the dynamics of cell-to-cell expression variability, highlighting a synchronization of biosynthetic machinery components in immune cells upon activation. In contrast to the uniform up-regulation of the biosynthetic machinery, CD4+ T cells show heterogeneous up-regulation of immune-related and lineage-defining genes during activation and differentiation.


Assuntos
Variação Biológica da População , Linfócitos T CD4-Positivos/fisiologia , Modelos Teóricos , Análise de Sequência de RNA/métodos , Análise de Célula Única , Animais , Diferenciação Celular/genética , Linhagem da Célula/genética , Simulação por Computador , Regulação da Expressão Gênica , Imunidade/genética , Ativação Linfocitária/genética , Camundongos , Camundongos Endogâmicos C57BL , Transcriptoma
17.
Nat Methods ; 14(6): 565-571, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28504683

RESUMO

Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/normas , RNA/genética , Análise de Sequência de RNA/normas , Análise de Célula Única/normas , Transcriptoma/genética , Interpretação Estatística de Dados , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Valores de Referência
18.
Science ; 355(6332): 1433-1436, 2017 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-28360329

RESUMO

Aging is characterized by progressive loss of physiological and cellular functions, but the molecular basis of this decline remains unclear. We explored how aging affects transcriptional dynamics using single-cell RNA sequencing of unstimulated and stimulated naïve and effector memory CD4+ T cells from young and old mice from two divergent species. In young animals, immunological activation drives a conserved transcriptomic switch, resulting in tightly controlled gene expression characterized by a strong up-regulation of a core activation program, coupled with a decrease in cell-to-cell variability. Aging perturbed the activation of this core program and increased expression heterogeneity across populations of cells in both species. These discoveries suggest that increased cell-to-cell transcriptional variability will be a hallmark feature of aging across most, if not all, mammalian tissues.


Assuntos
Envelhecimento/genética , Envelhecimento/imunologia , Linfócitos T CD4-Positivos/imunologia , Memória Imunológica/genética , Transcriptoma , Animais , Senescência Celular/genética , Senescência Celular/imunologia , Variação Genética , Ativação Linfocitária/genética , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Receptores de Antígenos de Linfócitos T/metabolismo , Análise de Sequência de RNA , Análise de Célula Única
19.
Genome Biol ; 17: 70, 2016 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-27083558

RESUMO

Traditional differential expression tools are limited to detecting changes in overall expression, and fail to uncover the rich information provided by single-cell level data sets. We present a Bayesian hierarchical model that builds upon BASiCS to study changes that lie beyond comparisons of means, incorporating built-in normalization and quantifying technical artifacts by borrowing information from spike-in genes. Using a probabilistic approach, we highlight genes undergoing changes in cell-to-cell heterogeneity but whose overall expression remains unchanged. Control experiments validate our method's performance and a case study suggests that novel biological insights can be revealed. Our method is implemented in R and available at https://github.com/catavallejos/BASiCS.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Animais , Teorema de Bayes , Heterogeneidade Genética , Humanos , Camundongos , Células-Tronco Embrionárias Murinas/citologia , Navegador
20.
PLoS Comput Biol ; 11(6): e1004333, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26107944

RESUMO

Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , RNA Mensageiro/análise , Análise de Célula Única/métodos , Animais , Teorema de Bayes , Células-Tronco Embrionárias/metabolismo , Células-Tronco Embrionárias/fisiologia , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA