Pesquisa | Portal Regional da BVS

1.

ESKEMAP: exact sketch-based read mapping.

Schulz, Tizian; Medvedev, Paul.

Algorithms Mol Biol ; 19(1): 19, 2024 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-38704605

RESUMO

BACKGROUND: Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a "similar sequence". Traditionally, "similar sequence" was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold. RESULTS: In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in O ( | t | + | p | + â 2 ) time and O ( â log â ) space, where |t| is the number of k -mers inside the sketch of the reference, |p| is the number of k -mers inside the read's sketch and â is the number of times that k -mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm's performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.

2.

Compression algorithm for colored de Bruijn graphs.

Rahman, Amatur; Dufresne, Yoann; Medvedev, Paul.

Algorithms Mol Biol ; 19(1): 20, 2024 May 26.

Artigo em Inglês | MEDLINE | ID: mdl-38797858

RESUMO

A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant calling, genome assembly, and database search. However, their size has posed a scalability challenge to algorithm developers and users. There have been numerous indexing data structures proposed that allow to store the graph compactly while supporting fast query operations. However, disk compression algorithms, which do not need to support queries on the compressed data and can thus be more space-efficient, have received little attention. The dearth of specialized compression tools has been a detriment to tool developers, tool users, and reproducibility efforts. In this paper, we develop a new tool that compresses colored de Bruijn graphs to disk, building on previous ideas for compression of k-mer sets and indexing colored de Bruijn graphs. We test our tool, called ESS-color, on various datasets, including both sequencing data and whole genomes. ESS-color achieves better compression than all evaluated tools and all datasets, with no other tool able to consistently achieve less than 44% space overhead. The software is available at http://github.com/medvedevgroup/ESSColor .

3.

The complete sequence and comparative analysis of ape sex chromosomes.

Makova, Kateryna D; Pickett, Brandon D; Harris, Robert S; Hartley, Gabrielle A; Cechova, Monika; Pal, Karol; Nurk, Sergey; Yoo, DongAhn; Li, Qiuhui; Hebbar, Prajna; McGrath, Barbara C; Antonacci, Francesca; Aubel, Margaux; Biddanda, Arjun; Borchers, Matthew; Bornberg-Bauer, Erich; Bouffard, Gerard G; Brooks, Shelise Y; Carbone, Lucia; Carrel, Laura; Carroll, Andrew; Chang, Pi-Chuan; Chin, Chen-Shan; Cook, Daniel E; Craig, Sarah J C; de Gennaro, Luciana; Diekhans, Mark; Dutra, Amalia; Garcia, Gage H; Grady, Patrick G S; Green, Richard E; Haddad, Diana; Hallast, Pille; Harvey, William T; Hickey, Glenn; Hillis, David A; Hoyt, Savannah J; Jeong, Hyeonsoo; Kamali, Kaivan; Pond, Sergei L Kosakovsky; LaPolice, Troy M; Lee, Charles; Lewis, Alexandra P; Loh, Yong-Hwee E; Masterson, Patrick; McGarvey, Kelly M; McCoy, Rajiv C; Medvedev, Paul; Miga, Karen H; Munson, Katherine M.

Nature ; 630(8016): 401-411, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38811727

RESUMO

Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.

Assuntos

Hominidae , Cromossomo X , Cromossomo Y , Animais , Feminino , Masculino , Gorilla gorilla/genética , Hominidae/genética , Hominidae/classificação , Hylobatidae/genética , Pan paniscus/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Pongo pygmaeus/genética , Telômero/genética , Cromossomo X/genética , Cromossomo Y/genética , Evolução Molecular , Variações do Número de Cópias de DNA/genética , Humanos , Espécies em Perigo de Extinção , Padrões de Referência

4.

Theoretical Analysis of Sequencing Bioinformatics Algorithms and Beyond.

Medvedev, Paul.

Commun ACM ; 66(7): 118-125, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38736702

5.

Exact Sketch-Based Read Mapping.

Schulz, Tizian; Medvedev, Paul.

Lebniz Int Proc Inform ; 2732023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38831964

RESUMO

Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a "similar sequence". Traditionally, "similar sequence" was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold. In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in ðª|t|+|p|+â2 time and Θâ2 space, where |t| is the number of k-mers inside the sketch of the reference, |p| is the number of k-mers inside the read's sketch and â is the number of times that k-mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm's performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.

6.

Compression Algorithm for Colored de Bruijn Graphs.

Rahman, Amatur; Dufresne, Yoann; Medvedev, Paul.

Lebniz Int Proc Inform ; 2732023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38712341

RESUMO

A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant calling, genome assembly, and database search. However, their size has posed a scalability challenge to algorithm developers and users. There have been numerous indexing data structures proposed that allow to store the graph compactly while supporting fast query operations. However, disk compression algorithms, which do not need to support queries on the compressed data and can thus be more space-efficient, have received little attention. The dearth of specialized compression tools has been a detriment to tool developers, tool users, and reproducibility efforts. In this paper, we develop a new tool that compresses colored de Bruijn graphs to disk, building on previous ideas for compression of k-mer sets and indexing colored de Bruijn graphs. We test our tool, called ESS-color, on various datasets, including both sequencing data and whole genomes. ESS-color achieves better compression than all evaluated tools and all datasets, with no other tool able to consistently achieve less than 44% space overhead.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA