RESUMO
BACKGROUND: In large-scale high-throughput sequencing projects and biobank construction, sample tagging is essential to prevent sample mix-ups. Despite the availability of fingerprint panels for DNA data, little research has been conducted on sample tagging of whole genome bisulfite sequencing (WGBS) data. This study aims to construct a pipeline and identify applicable fingerprint panels to address this problem. RESULTS: Using autosome-wide A/T polymorphic single nucleotide variants (SNVs) obtained from whole genome sequencing (WGS) and WGBS of individuals from the Third China National Stroke Registry, we designed a fingerprint panel and constructed an optimized pipeline for tagging WGBS data. This pipeline used Bis-SNP to call genotypes from the WGBS data, and optimized genotype comparison by eliminating wildtype homozygous and missing genotypes, and retaining variants with identical genomic coordinates and reference/alternative alleles. WGS-based and WGBS-based genotypes called from identical or different samples were extensively compared using hap.py. In the first batch of 94 samples, the genotype consistency rates were between 71.01%-84.23% and 51.43%-60.50% for the matched and mismatched WGS and WGBS data using the autosome-wide A/T polymorphic SNV panel. This capability to tag WGBS data was validated among the second batch of 240 samples, with genotype consistency rates ranging from 70.61%-84.65% to 49.58%-61.42% for the matched and mismatched data, respectively. We also determined that the number of genetic variants required to correctly tag WGBS data was on the order of thousands through testing six fingerprint panels with different orders for the number of variants. Additionally, we affirmed this result with two self-designed panels of 1351 and 1278 SNVs, respectively. Furthermore, this study confirmed that using the number of genetic variants with identical coordinates and ref/alt alleles, or identical genotypes could not correctly tag WGBS data. CONCLUSION: This study proposed an optimized pipeline, applicable fingerprint panels, and a lower boundary for the number of fingerprint genetic variants needed for correct sample tagging of WGBS data, which are valuable for tagging WGBS data and integrating multi-omics data for biobanks.
Assuntos
Genoma , Sulfitos , Humanos , Sequenciamento Completo do Genoma , Genótipo , Metilação de DNA , DNA , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
BACKGROUND: With the widespread use of multiple amplicon-sequencing (MAS) in genetic variation detection, an efficient tool is required to remove primer sequences from short reads to ensure the reliability of downstream analysis. Although some tools are currently available, their efficiency and accuracy require improvement in trimming large scale of primers in high throughput target genome sequencing. This issue is becoming more urgent considering the potential clinical implementation of MAS for processing patient samples. We here developed pTrimmer that could handle thousands of primers simultaneously with greatly improved accuracy and performance. RESULT: pTrimmer combines the two algorithms of k-mers and Needleman-Wunsch algorithm, which ensures its accuracy even with the presence of sequencing errors. pTrimmer has an improvement of 28.59% sensitivity and 11.87% accuracy compared to the similar tools. The simulation showed pTrimmer has an ultra-high sensitivity rate of 99.96% and accuracy of 97.38% compared to cutPrimers (70.85% sensitivity rate and 58.73% accuracy). And the performance of pTrimmer is notably higher. It is about 370 times faster than cutPrimers and even 17,000 times faster than cutadapt per threads. Trimming 2158 pairs of primers from 11 million reads (Illumina PE 150 bp) takes only 37 s and no more than 100 MB of memory consumption. CONCLUSIONS: pTrimmer is designed to trim primer sequence from multiplex amplicon sequencing and target sequencing. It is highly sensitive and specific compared to other three similar tools, which could help users to get more reliable mutational information for downstream analysis.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , HumanosRESUMO
The liquid biopsy is being integrated into cancer diagnostics and surveillance. However, critical questions still remain, such as how to precisely evaluate cancer mutation burden and interpret the corresponding clinical implications. Herein, we evaluated the role of peripheral blood cell-free DNA (cfDNA) in characterizing the dynamic mutation alterations of 48 cancer driver genes from cervical cancer patients. We performed targeted deep sequencing on 93 plasma cfDNA from 57 cervical cancer patients and from this developed an algorithm, allele fraction deviation (AFD), to monitor in an unbiased manner the dynamic changes of genomic aberrations. Differing treatments, including chemotherapy (n = 22), radiotherapy (n = 14) and surgery (n = 15), led to a significant decrease in AFD values (Wilcoxon, p = 0.029). The decrease of cfDNA AFD values was accompanied by shrinkage in the size of the tumor in most patients. However, in a subgroup of patients where cfDNA AFD values did not reflect a reduction in tumor size, there was a detection of progressive disease (metastasis). Furthermore, a low AFD value at diagnosis followed a later increase of AFD value also successfully predicted relapse. These results show that plasma cfDNA, together with targeted deep sequencing, may help predict treatment response and disease development in cervical cancer.
Assuntos
Ácidos Nucleicos Livres/sangue , Ácidos Nucleicos Livres/genética , DNA Tumoral Circulante/sangue , DNA Tumoral Circulante/genética , Neoplasias do Colo do Útero/sangue , Neoplasias do Colo do Útero/genética , Adulto , Idoso , Alelos , Biomarcadores Tumorais/sangue , Biomarcadores Tumorais/genética , Quimiorradioterapia/métodos , DNA de Neoplasias/sangue , DNA de Neoplasias/genética , Feminino , Genoma/genética , Genômica/métodos , Humanos , Pessoa de Meia-Idade , Mutação/genética , Neoplasias do Colo do Útero/tratamento farmacológico , Neoplasias do Colo do Útero/radioterapiaRESUMO
PURPOSE: Mixed phenotype acute leukemia (MPAL) is a rare subtype of acute leukemia and its progressive genomic basis over time remains unclear. We aimed to investigate the longitudinal genomic evolution of MPAL from diagnosis to relapse. METHODS: We performed whole genome sequencing (WGS) on bone marrow (BM) samples obtained at the four stages of this disease in a male patient with Philadelphia chromosome positive (Ph+) MPAL, including primary, complete cytogenetic remission (CCR), complete molecular remission (CMR), and relapse stage during the 3 year follow-up period. RESULTS: 156 single-nucleotide variants (SNVs) and indels were detected, which exhibited distinctive evolutionary behaviors. Seventeen mutations disappeared quickly upon DCTER treatment and never came back. Seven mutations, although disappeared initially, reoccurred with the withdrawal of TKI treatment. Notably, ten mutations emerged in spite of the active DCTER chemotherapy. Moreover, copy number loss played critical roles in monitoring MPAL progression, displaying 7, 0, 0, and 383 losses at the stages of primary, CCR, CMR, and relapse respectively. CONCLUSION: This longitudinal genomic investigation of the Ph+ MPAL patient established one MPAL evolution model in which the primary tumor acquired additional variations leading to tumor relapse. Moreover, the event of copy number loss remained a valuable hallmark in the progression of MPAL.
Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Evolução Clonal , Variações do Número de Cópias de DNA , Leucemia Aguda Bifenotípica/genética , Recidiva Local de Neoplasia/genética , Adulto , Análise Mutacional de DNA , Progressão da Doença , Humanos , Leucemia Aguda Bifenotípica/tratamento farmacológico , Leucemia Aguda Bifenotípica/patologia , Estudos Longitudinais , Masculino , Recidiva Local de Neoplasia/patologia , Cromossomo Filadélfia , Sequenciamento Completo do GenomaRESUMO
INTRODUCTION: Targeted therapies are based on specific gene alterations. Various specimen types have been used to determine gene alterations, however, no systemic comparisons have yet been made. Herein, we assessed alterations in selected cancer-associated genes across varying sample sites in lung cancer patients. MATERIALS AND METHODS: Targeted deep sequencing for 48 tumor-related genes was applied to 153 samples from 55 lung cancer patients obtained from six sources: Formalin-fixed paraffin-embedded (FFPE) tumor tissues, pleural effusion supernatant (PES) and pleural effusion cell sediments (PEC), white blood cells (WBCs), oral epithelial cells (OECs), and plasma. RESULTS: Mutations were detected in 96% (53/55) of the patients and in 83% (40/48) of the selected genes. Each sample type exhibited a characteristic mutational pattern. As anticipated, TP53 was the most affected sequence (54.5% patients), however this was followed by NOTCH1 (36%, across all sample types). EGFR was altered in patient samples at a frequency of 32.7% and KRAS 10.9%. This high EGFR/ low KRAS frequency is in accordance with other TCGA cohorts of Asian origin but differs from the Caucasian population where KRAS is the more dominant mutation. Additionally, 66% (31/47) of PEC samples had copy number variants (CNVs) in at least one gene. Unlike the concurrent loss and gain in most genes, herein NOTCH1 loss was identified in 21% patients, with no gain observed. Based on the relative prevalence of mutations and CNVs, we divided lung cancer patients into SNV-dominated, CNV-dominated, and codominated groups. CONCLUSIONS: Our results confirm previous reports that EGFR mutations are more prevalent than KRAS in Chinese lung cancer patients. NOTCH1 gene alterations are more common than previously reported and reveals a role of NOTCH1 modifications in tumor metastasis. Furthermore, genetic material from malignant pleural effusion cell sediments may be a noninvasive manner to identify CNV and participate in treatment decisions.
Assuntos
Carcinoma Pulmonar de Células não Pequenas/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias Pulmonares/genética , Receptor Notch1/genética , Variações do Número de Cópias de DNA , Humanos , Mutação , Taxa de Mutação , Metástase Neoplásica , Análise de Sequência de DNARESUMO
Investigation of spontaneous mutations by next-generation sequencing technology has attracted extensive attention lately due to the fundamental roles of spontaneous mutations in evolution and pathological processes. However, these studies only focused on the mutations accumulated through many generations during long-term (possibly be years of) culturing, but not the freshly generated mutations that occur at very low frequencies. In this study, we established a molecularly barcoded deep sequencing strategy to detect low abundant spontaneous mutations in genomes of bacteria cell cultures. Genome-wide spontaneous mutations in 15 Escherichia coli cell culture samples were defined with a high confidence (P < 0.01). We also developed a hotspot-calling approach based on the run-length encoding algorithm to find the genomic regions that are vulnerable to the spontaneous mutations. The hotspots for the mutations appeared to be highly conserved across the bacteria samples. Further biological annotation of these regions indicated that most of the spontaneous mutations were located at the repeat domains or nonfunctional domains of the genomes, suggesting the existence of mechanisms that could somehow prevent the occurrence of mutations in crucial genic areas. This study provides a more faithful picture of mutation occurrence and spectra in a single expansion process without long-term culturing.