Your browser doesn't support javascript.
loading
findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies.
Sun, Hequan; Ding, Jia; Piednoël, Mathieu; Schneeberger, Korbinian.
Affiliation
  • Sun H; Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany.
  • Ding J; Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany.
  • Piednoël M; Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany.
  • Schneeberger K; Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany.
Bioinformatics ; 34(4): 550-557, 2018 02 15.
Article in En | MEDLINE | ID: mdl-29444236
ABSTRACT
Motivation Analyzing k-mer frequencies in whole-genome sequencing data is becoming a common method for estimating genome size (GS). However, it remains uninvestigated how accurate the method is, especially if it can capture intra-species GS variation.

Results:

We present findGSE, which fits skew normal distributions to k-mer frequencies to estimate GS. findGSE outperformed existing tools in an extensive simulation study. Estimating GSs of 89 Arabidopsis thaliana accessions, findGSE showed the highest capability in capturing GS variations. In an application with 71 female and 71 male human individuals, findGSE delivered an average of 3039 Mb as haploid human GS, while female genomes were on average 41 Mb larger than male genomes, in astonishing agreement with size difference of the X and Y chromosomes. Further analysis showed that human GS variations link to geographical patterns and significant differences between populations, which can be explained by variable abundances of LINE-1 retrotransposons. Availability and implementation R package of findGSE is freely available at https//github.com/schneebergerlab/findGSE and supported on linux and Mac systems. Contact schneeberger@mpipz.mpg.de. Supplementary information Supplementary data are available at Bioinformatics online.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Genome, Human / Sequence Analysis, DNA / Genome, Plant / Genome Size Limits: Female / Humans / Male Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2018 Type: Article Affiliation country: Germany

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Genome, Human / Sequence Analysis, DNA / Genome, Plant / Genome Size Limits: Female / Humans / Male Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2018 Type: Article Affiliation country: Germany