Your browser doesn't support javascript.
loading
Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.
Faber-Hammond, Joshua J; Brown, Kim H.
Affiliation
  • Faber-Hammond JJ; Department of Biology, Portland State University, 1719 SW 10th Ave., SRTC 246, Portland, 97207-0751, USA.
  • Brown KH; Department of Biology, Portland State University, 1719 SW 10th Ave., SRTC 246, Portland, 97207-0751, USA. kibr2@pdx.edu.
Hum Genet ; 135(7): 727-40, 2016 07.
Article in En | MEDLINE | ID: mdl-27061184
ABSTRACT
The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2-5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10-20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome, Human / Sequence Analysis, DNA / Genomics / High-Throughput Nucleotide Sequencing Type of study: Prognostic_studies Limits: Humans Language: En Journal: Hum Genet Year: 2016 Document type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genome, Human / Sequence Analysis, DNA / Genomics / High-Throughput Nucleotide Sequencing Type of study: Prognostic_studies Limits: Humans Language: En Journal: Hum Genet Year: 2016 Document type: Article Affiliation country: United States
...