Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 20(8): 1196-1202, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37429993

RESUMO

Unsupervised clustering of single-cell RNA-sequencing data enables the identification of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. We find that not addressing known sources of variability in a statistically rigorous manner can lead to overconfidence in the discovery of novel cell types. Here we extend a previous method, significance of hierarchical clustering, to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. Finally, we extend these approaches to account for batch structure. We benchmarked our approach against popular clustering workflows, demonstrating improved performance. To show practical utility, we applied our approach to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex, identifying several cases of over-clustering and recapitulating experimentally validated cell type definitions.


Assuntos
Algoritmos , Benchmarking , Humanos , Animais , Camundongos , Análise por Conglomerados , RNA , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos
2.
Biostatistics ; 23(4): 1150-1164, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-35770795

RESUMO

Single-cell RNA sequencing (scRNA-seq) quantifies gene expression for individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the annotation of cells into known cell types. While this can be achieved using experimental techniques, such as fluorescence-activated cell sorting, these approaches are impractical for large numbers of cells. This motivates the development of data-driven cell-type annotation methods. We find limitations with current approaches due to the reliance on known marker genes or from overfitting because of systematic differences, or batch effects, between studies. Here, we present a statistical approach that leverages public data sets to combine information across thousands of genes, uses a latent variable model to define cell-type-specific barcodes and account for batch effect variation, and probabilistically annotates cell-type identity from a reference of known cell types. The barcoding approach also provides a new way to discover marker genes. Using a range of data sets, including those generated to represent imperfect real-world reference data, we demonstrate that our approach substantially outperforms current reference-based methods, particularly when predicting across studies.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Humanos , RNA-Seq , Análise de Sequência de RNA/métodos , Software
3.
Cancer Epidemiol Biomarkers Prev ; 33(1): 158-169, 2024 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-37943166

RESUMO

BACKGROUND: KRAS is among the most commonly mutated oncogenes in cancer, and previous studies have shown associations with survival in many cancer contexts. Evidence from both clinical observations and mouse experiments further suggests that these associations are allele- and tissue-specific. These findings motivate using clinical data to understand gene interactions and clinical covariates within different alleles and tissues. METHODS: We analyze genomic and clinical data from the AACR Project GENIE Biopharma Collaborative for samples from lung, colorectal, and pancreatic cancers. For each of these cancer types, we report epidemiological associations for different KRAS alleles, apply principal component analysis (PCA) to discover groups of genes co-mutated with KRAS, and identify distinct clusters of patient profiles with implications for survival. RESULTS: KRAS mutations were associated with inferior survival in lung, colon, and pancreas, although the specific mutations implicated varied by disease. Tissue- and allele-specific associations with smoking, sex, age, and race were found. Tissue-specific genetic interactions with KRAS were identified by PCA, which were clustered to produce five, four, and two patient profiles in lung, colon, and pancreas. Membership in these profiles was associated with survival in all three cancer types. CONCLUSIONS: KRAS mutations have tissue- and allele-specific associations with inferior survival, clinical covariates, and genetic interactions. IMPACT: Our results provide greater insight into the tissue- and allele-specific associations with KRAS mutations and identify clusters of patients that are associated with survival and clinical attributes from combinations of genetic interactions with KRAS mutations.


Assuntos
Neoplasias Pulmonares , Neoplasias Pancreáticas , Animais , Humanos , Pulmão , Neoplasias Pulmonares/genética , Mutação , Pâncreas , Neoplasias Pancreáticas/genética , Proteínas Proto-Oncogênicas p21(ras)/genética
4.
Ann Appl Stat ; 17(3): 2212-2235, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37786772

RESUMO

Mutations in the BRCA1 and BRCA2 genes are known to be highly associated with breast cancer. Identifying both shared and unique transcript expression patterns in blood samples from these groups can shed insight into if and how the disease mechanisms differ among individuals by mutation status, but this is challenging in the high-dimensional setting. A recent method, Bayesian Multi-Study Factor Analysis (BMSFA), identifies latent factors common to all studies (or equivalently, groups) and latent factors specific to individual studies. However, BMSFA does not allow for factors shared by more than one but less than all studies. This is critical in our context, as we may expect some but not all signals to be shared by BRCA1-and BRCA2-mutation carriers but not necessarily other high-risk groups. We extend BMSFA by introducing a new method, Tetris, for Bayesian combinatorial multi-study factor analysis, which identifies latent factors that any combination of studies or groups can share. We model the subsets of studies that share latent factors with an Indian Buffet Process, and offer a way to summarize uncertainty in the sharing patterns using credible balls. We test our method with an extensive range of simulations, and showcase its utility not only in dimension reduction but also in covariance estimation. When applied to transcript expression data from high-risk families grouped by mutation status, Tetris reveals the features and pathways characterizing each group and the sharing patterns among them. Finally, we further extend Tetris to discover groupings of samples when group labels are not provided, which can elucidate additional structure in these data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA