Pesquisa | Portal Regional da BVS

Statistical framework to determine indel-length distribution.

Wygoda, Elya; Loewenthal, Gil; Moshe, Asher; Alburquerque, Michael; Mayrose, Itay; Pupko, Tal.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38269647

RESUMO

MOTIVATION: Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS: We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.

Assuntos

Algoritmos , Software , Teorema de Bayes , Alinhamento de Sequência , Mutação INDEL , Evolução Molecular

GenomeFLTR: filtering reads made easy.

Dotan, Edo; Alburquerque, Michael; Wygoda, Elya; Huchon, Dorothée; Pupko, Tal.

Nucleic Acids Res ; 51(W1): W232-W236, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37177997

RESUMO

In the last decade, advances in sequencing technology have led to an exponential increase in genomic data. These new data have dramatically changed our understanding of the evolution and function of genes and genomes. Despite improvements in sequencing technologies, identifying contaminated reads remains a complex task for many research groups. Here, we introduce GenomeFLTR, a new web server to filter contaminated reads. Reads are compared against existing sequence databases from various representative organisms to detect potential contaminants. The main features implemented in GenomeFLTR are: (i) automated updating of the relevant databases; (ii) fast comparison of each read against the database; (iii) the ability to create user-specified databases; (iv) a user-friendly interactive dashboard to investigate the origin and frequency of the contaminations; (v) the generation of a contamination-free file. Availability: https://genomefltr.tau.ac.il/.

Assuntos

Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Genoma/genética , Bases de Dados de Ácidos Nucleicos , Software

The evolutionary dynamics that retain long neutral genomic sequences in face of indel deletion bias: a model and its application to human introns.

Loewenthal, Gil; Wygoda, Elya; Nagar, Natan; Glick, Lior; Mayrose, Itay; Pupko, Tal.

Open Biol ; 12(12): 220223, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36514983

RESUMO

Insertions and deletions (indels) of short DNA segments are common evolutionary events. Numerous studies showed that deletions occur more often than insertions in both prokaryotes and eukaryotes. It raises the question why neutral sequences are not eradicated from the genome. We suggest that this is due to a phenomenon we term border-induced selection. Accordingly, a neutral sequence is bordered between conserved regions. Deletions occurring near the borders occasionally protrude to the conserved region and are thereby subject to strong purifying selection. Thus, for short neutral sequences, an insertion bias is expected. Here, we develop a set of increasingly complex models of indel dynamics that incorporate border-induced selection. Furthermore, we show that short conserved sequences within the neutrally evolving sequence help explain: (i) the presence of very long sequences; (ii) the high variance of sequence lengths; and (iii) the possible emergence of multimodality in sequence length distributions. Finally, we fitted our models to the human intron length distribution, as introns are thought to be mostly neutral and bordered by conserved exons. We show that when accounting for the occurrence of short conserved sequences within introns, we reproduce the main features, including the presence of long introns and the multimodality of intron distribution.

Assuntos

Evolução Molecular , Mutação INDEL , Humanos , Íntrons , Genoma , Genômica

An Approximate Bayesian Computation Approach for Modeling Genome Rearrangements.

Moshe, Asher; Wygoda, Elya; Ecker, Noa; Loewenthal, Gil; Avram, Oren; Israeli, Omer; Hazkani-Covo, Einat; Pe'er, Itsik; Pupko, Tal.

Mol Biol Evol ; 39(11)2022 11 03.

Artigo em Inglês | MEDLINE | ID: mdl-36282896

RESUMO

The inference of genome rearrangement events has been extensively studied, as they play a major role in molecular evolution. However, probabilistic evolutionary models that explicitly imitate the evolutionary dynamics of such events, as well as methods to infer model parameters, are yet to be fully utilized. Here, we developed a probabilistic approach to infer genome rearrangement rate parameters using an Approximate Bayesian Computation (ABC) framework. We developed two genome rearrangement models, a basic model, which accounts for genomic changes in gene order, and a more sophisticated one which also accounts for changes in chromosome number. We characterized the ABC inference accuracy using simulations and applied our methodology to both prokaryotic and eukaryotic empirical datasets. Knowledge of genome-rearrangement rates can help elucidate their role in evolution as well as help simulate genomes with evolutionary dynamics that reflect empirical genomes.

Assuntos

Evolução Molecular , Genoma , Teorema de Bayes , Simulação por Computador , Genômica

A Probabilistic Model for Indel Evolution: Differentiating Insertions from Deletions.

Loewenthal, Gil; Rapoport, Dana; Avram, Oren; Moshe, Asher; Wygoda, Elya; Itzkovitch, Alon; Israeli, Omer; Azouri, Dana; Cartwright, Reed A; Mayrose, Itay; Pupko, Tal.

Mol Biol Evol ; 38(12): 5769-5781, 2021 12 09.

Artigo em Inglês | MEDLINE | ID: mdl-34469521

RESUMO

Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.

Assuntos

Evolução Molecular , Mutação INDEL , Teorema de Bayes , Modelos Estatísticos , Filogenia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA