RESUMO
Cell-type annotation is a critical step in single-cell data analysis. With the development of numerous cell annotation methods, it is necessary to evaluate these methods to help researchers use them effectively. Reference datasets are essential for evaluation, but currently, the cell labels of reference datasets mainly come from computational methods, which may have computational biases and may not reflect the actual cell-type outcomes. This study first constructed an experimentally labeled immune cell-subtype single-cell dataset of the same batch and systematically evaluated 18 cell annotation methods. We assessed those methods under five scenarios, including intra-dataset validation, immune cell-subtype validation, unsupervised clustering, inter-dataset annotation, and unknown cell-type prediction. Accuracy and ARI were evaluation metrics. The results showed that SVM, scBERT, and scDeepSort were the best-performing supervised methods. Seurat was the best-performing unsupervised clustering method, but it couldn't fully fit the actual cell-type distribution. Our results indicated that experimentally labeled immune cell-subtype datasets revealed the deficiencies of unsupervised clustering methods and provided new dataset support for supervised methods.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise por Conglomerados , Biologia Computacional/métodos , Anotação de Sequência Molecular , RNA-Seq/métodos , Análise da Expressão Gênica de Célula ÚnicaRESUMO
BACKGROUND: A whole-exome or targeted cancer genes panel by next-generation sequencing has been used widely in assisting individualized treatment decisions. Currently, multiple algorithms are developed to estimate DNA copy numbers based on sequencing data, which makes a comprehensive global glance at chromosomal integrity possible. We aim to classify gastric cancers based on chromosomal integrity to guide personalized therapy. METHODS: We investigated copy number variations (CNV) across the entire genome of 124 gastric carcinomas via exome or targeted sequencing. Chromosomal integrity was classified as chromosomal stability (CS), chromosomal instability (CIN) and intermediate state (CIN/CS) based on CNV results. Chromosomal integrity was correlated to molecular features and clinical characteristics. RESULTS: According the states of chromosomal integrity, gastric carcinomas can be stratified into two cohorts: CS and CIN. Our results showed a significant relationship between CIN status and TP53 mutation, but not RB1, phosphatase and tensin homolog (PTEN), or other reported DNA damage repair genes. The mutation frequency of the TP53 gene had great relevance. Our study initially revealed clinical significance of chromosomal integrity that CIN patients were prone to HER2-positive and mucinous adenocarcinoma, while CS patients were a diffuse subtype and poorly differentiated but had longer overall survival. CONCLUSIONS: We classified gastric carcinomas into two states of chromosomal integrity with clinical implications. The dichotomy is applicable to clinical transformation. We proposed that classifying gastric cancers based on chromosomal integrity would enable us to achieve personalized therapy for patients and may be beneficial to patient stratification in future clinical trials.
Assuntos
Adenocarcinoma Mucinoso , Neoplasias Gástricas , Instabilidade Cromossômica/genética , Variações do Número de Cópias de DNA/genética , Humanos , Mutação , Neoplasias Gástricas/genética , Neoplasias Gástricas/patologiaRESUMO
Circular RNA (circRNA) is a distinguishable circular formed long non-coding RNA (lncRNA), which has specific roles in transcriptional regulation, multiple biological processes. The identification of circRNA from other lncRNA is necessary for relevant research. In this study, we designed attention-based multi-instance learning (MIL) network architecture fed with a raw sequence, to learn the sparse features of RNA sequences and to accomplish the circRNAs identification task. The model outperformed the state-of-art models. Moreover, following the validation of the attention mechanism effectiveness by the handwritten digit dataset, the key sequence loci underlying circRNA's recognition were obtained based on the corresponding attention score. Then, motif enrichment analysis identified some of the key motifs for circRNA formation. In conclusion, we designed deep learning network architecture suitable for learning gene sequences with sparse features and implemented it for the circRNA identification task, and the model has strong representation capability in the indication of some key loci.