RESUMO
We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.
Assuntos
Pareamento de Bases , Biologia Computacional/métodos , Variação Genética , Genoma Humano , Ligases , Análise de Sequência de DNA/métodos , África , Sequência de Bases , Genômica , Genótipo , Heterozigoto , Homozigoto , Humanos , Polimorfismo de Nucleotídeo Único , Padrões de ReferênciaRESUMO
We developed the SNPlex Genotyping System to address the need for accurate genotyping data, high sample throughput, study design flexibility, and cost efficiency. The system uses oligonucleotide ligation/polymerase chain reaction and capillary electrophoresis to analyze bi-allelic single nucleotide polymorphism genotypes. It is well suited for single nucleotide polymorphism genotyping efforts in which throughput and cost efficiency are essential. The SNPlex Genotyping System offers a high degree of flexibility and scalability, allowing the selection of custom-defined sets of SNPs for medium- to high-throughput genotyping projects. It is therefore suitable for a broad range of study designs. In this article we describe the principle and applications of the SNPlex Genotyping System, as well as a set of single nucleotide polymorphism selection tools and validated assay resources that accelerate the assay design process. We developed the control pool, an oligonucleotide ligation probe set for training and quality-control purposes, which interrogates 48 SNPs simultaneously. We present performance data from this control pool obtained by testing genomic DNA samples from 44 individuals. in addition, we present data from a study that analyzed 521 SNPs in 92 individuals. Combined, both studies show the SNPlex Genotyping system to have a 99.32% overall call rate, 99.95% precision, and 99.84% concordance with genotypes analyzed by TaqMan probe-based assays. The SNPlex Genotyping System is an efficient and reliable tool for a broad range of genotyping applications, supported by applications for study design, data analysis, and data management.