RESUMO
Microsatellite polymorphism has always been a challenge for genome assembly and sequence alignment due to sequencing errors, short read lengths, and high incidence of polymerase slippage in microsatellite regions. Despite the information they carry being very valuable, microsatellite variations have not gained enough attention to be a routine step in genome sequence analysis pipelines. After the completion of the 1000 Genomes Project, which aimed to establish the most detailed genetic variation catalog for humans, the consortium released only two microsatellite prediction sets generated by two tools. Many other large research efforts have failed to shed light on microsatellite variations. We evaluated the performance of three different local assembly methods on three different experimental settings, focusing on genotype-based performance, coverage impact, and preprocessing including flanking regions. All these experiments supported our initial expectations on assembly. We also demonstrate that overlap-layout-consensus (OLC)-basedassembly methods show higher sensitivity to microsatellite variant calling when compared to a de Bruijn graph-based approach. We conclude that assembly with OLC is the better method for genotyping microsatellites. Our pipeline is available at https://github.com/gulfemd/STRAssembly.
RESUMO
The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.
Assuntos
Genoma Humano/genética , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Deleção de Sequência/genética , Sequenciamento Completo do Genoma/métodosRESUMO
BACKGROUND: Synchronous multifocal tumours are commonly observed in urothelial carcinomas of the bladder. The origin of these physically independent tumours has been proposed to occur by either intraluminal migration (clonal) or spontaneous transformation of multiple cells by carcinogens (field effect). It is unclear which model is correct, with several studies supporting both hypotheses. A potential cause of this uncertainty may be the small number of genetic mutations previously used to quantify the relationship between these tumours. METHODS: To better understand the genetic lineage of these tumours we conducted exome sequencing of synchronous multifocal pTa urothelial bladder cancers at a high depth, using multiple samples from three patients. RESULTS: Phylogenetic analysis of high confidence single nucleotide variants (SNV) demonstrated that the sequenced multifocal bladder cancers arose from a clonal origin in all three patients (bootstrap value 100 %). Interestingly, in two patients the most common type of tumour-associated SNVs were cytosine mutations of TpC* dinucleotides (Fisher's exact test p < 10(-41)), likely caused by APOBEC-mediated deamination. Incorporating these results into our clonal model, we found that TpC* type mutations occurred 2-5× more often among SNVs on the ancestral branches than in the more recent private branches (p < 10(-4)) suggesting that TpC* mutations largely occurred early in the development of the tumour. CONCLUSIONS: These results demonstrate that synchronous multifocal bladder cancers frequently arise from a clonal origin. Our data also suggests that APOBEC-mediated mutations occur early in the development of the tumour and may be a driver of tumourigenesis in non-muscle invasive urothelial bladder cancer.