Deep repeat resolution-the assembly of the Drosophila Histone Complex.
Nucleic Acids Res
; 47(3): e18, 2019 02 20.
Article
em En
| MEDLINE
| ID: mdl-30476267
Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive structures that cannot be resolved by current state-of-the-art assemblers. The most challenging of these structures are tandemly arrayed repeats, which occur in the genomes of all eukaryotes. Untangling tandem repeat clusters is exceptionally difficult, since the rare differences between repeat copies are obscured by the high error rate of long reads. Solving this problem would constitute a major step towards computing fully assembled genomes. Here, we demonstrate by example of the Drosophila Histone Complex that via machine learning algorithms, it is possible to exploit the underlying distinguishing patterns of single nucleotide variants of repeats from very noisy data to resolve a large and highly conserved repeat cluster. The ideas explored in this paper are a first step towards the automated assembly of complex repeat structures and promise to be applicable to a wide range of eukaryotic genomes.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Histonas
/
Drosophila melanogaster
/
Heurística Computacional
Limite:
Animals
Idioma:
En
Ano de publicação:
2019
Tipo de documento:
Article