Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38269647

RESUMO

MOTIVATION: Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS: We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.


Assuntos
Algoritmos , Software , Teorema de Bayes , Alinhamento de Sequência , Mutação INDEL , Evolução Molecular
3.
Nature ; 624(7990): 109-114, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37938778

RESUMO

There are two main life cycles in plants-annual and perennial1,2. These life cycles are associated with different traits that determine ecosystem function3,4. Although life cycles are textbook examples of plant adaptation to different environments, we lack comprehensive knowledge regarding their global distributional patterns. Here we assembled an extensive database of plant life cycle assignments of 235,000 plant species coupled with millions of georeferenced datapoints to map the worldwide biogeography of these plant species. We found that annual plants are half as common as initially thought5-8, accounting for only 6% of plant species. Our analyses indicate that annuals are favoured in hot and dry regions. However, a more accurate model shows that the prevalence of annual species is driven by temperature and precipitation in the driest quarter (rather than yearly means), explaining, for example, why some Mediterranean systems have more annuals than desert systems. Furthermore, this pattern remains consistent among different families, indicating convergent evolution. Finally, we demonstrate that increasing climate variability and anthropogenic disturbance increase annual favourability. Considering future climate change, we predict an increase in annual prevalence for 69% of the world's ecoregions by 2060. Overall, our analyses raise concerns for ecosystem services provided by perennial plants, as ongoing changes are leading to a higher proportion of annual plants globally.


Assuntos
Ecossistema , Mapeamento Geográfico , Filogeografia , Fenômenos Fisiológicos Vegetais , Plantas , Aclimatação , Evolução Biológica , Mudança Climática/estatística & dados numéricos , Bases de Dados Factuais , Clima Desértico , Atividades Humanas , Região do Mediterrâneo , Plantas/classificação , Chuva , Temperatura
4.
Methods Mol Biol ; 2703: 123-129, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37646942

RESUMO

For decades, plant biologists have been interested in the determination and documentation of chromosome numbers for extant taxa. This central cytological character has been used as an important phylogenetic marker and as an indicator for major genomic events such as polyploidy and dysploidy. Due to their significance and the relative ease by which chromosome numbers can be obtained, chromosome numbers have been extensively recorded across the plant kingdom and documented in a wide variety of resources. This makes the collection process a wearing task, often leading to partial data retrieval. In 2015, the Chromosome Counts Database (CCDB) was assembled, being an online unified community resource. This database compiles dozens of different chromosome counts sources, of which a significant portion had been unavailable before in a digitized, searchable format. The vast amount of data assembled in CCDB has already enabled a large number of analyses to examine the evolution of different plant hierarchies, as well as the application of various follow-up analyses, such as ploidy-level inference using chromEvol. CCDB ( http://ccdb.tau.ac.il/ ) encourages data sharing among the botanical community and is expected to continue expanding as additional chromosome numbers are recorded.


Assuntos
Documentação , Armazenamento e Recuperação da Informação , Filogenia , Bases de Dados Factuais , Genômica
5.
Genome Biol Evol ; 15(7)2023 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-37401440

RESUMO

Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence-absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies.


Assuntos
Arabidopsis , Genômica , Genômica/métodos , Genoma de Planta , Análise de Sequência de DNA , Anotação de Sequência Molecular , Plantas/genética , Arabidopsis/genética
6.
New Phytol ; 240(3): 918-927, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37337836
7.
Methods Mol Biol ; 2672: 529-547, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37335498

RESUMO

The ChromEvol software was the first to implement a likelihood-based approach, using probabilistic models that depict the pattern of chromosome number change along a specified phylogeny. The initial models have been completed and expanded during the last years. New parameters that model polyploid chromosome evolution have been implemented in ChromEvol v.2. In recent years, new and more complex models have been developed. The BiChrom model is able to implement two distinct chromosome models for the two possible trait states of a binary character of interest. ChromoSSE jointly implements chromosome evolution, speciation, and extinction. In the near future, we will be able to study chromosome evolution with increasingly complex models.


Assuntos
Cromossomos , Evolução Molecular , Humanos , Funções Verossimilhança , Cromossomos/genética , Filogenia , Poliploidia
8.
Nat Plants ; 9(4): 572-587, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36973414

RESUMO

Plant genomes are characterized by large and complex gene families that often result in similar and partially overlapping functions. This genetic redundancy severely hampers current efforts to uncover novel phenotypes, delaying basic genetic research and breeding programmes. Here we describe the development and validation of Multi-Knock, a genome-scale clustered regularly interspaced short palindromic repeat toolbox that overcomes functional redundancy in Arabidopsis by simultaneously targeting multiple gene-family members, thus identifying genetically hidden components. We computationally designed 59,129 optimal single-guide RNAs that each target two to ten genes within a family at once. Furthermore, partitioning the library into ten sublibraries directed towards a different functional group allows flexible and targeted genetic screens. From the 5,635 single-guide RNAs targeting the plant transportome, we generated over 3,500 independent Arabidopsis lines that allowed us to identify and characterize the first known cytokinin tonoplast-localized transporters in plants. With the ability to overcome functional redundancy in plants at the genome-scale level, the developed strategy can be readily deployed by scientists and breeders for basic research and to expedite breeding efforts.


Assuntos
Arabidopsis , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Arabidopsis/genética , Melhoramento Vegetal , Plantas/genética , Genoma de Planta , Sistemas CRISPR-Cas , Plantas Geneticamente Modificadas/genética , Edição de Genes
9.
New Phytol ; 238(4): 1733-1744, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36759331

RESUMO

Changes in chromosome numbers, including polyploidy and dysploidy events, play a key role in eukaryote evolution as they could expediate reproductive isolation and have the potential to foster phenotypic diversification. Deciphering the pattern of chromosome-number change within a phylogeny currently relies on probabilistic evolutionary models. All currently available models assume time homogeneity, such that the transition rates are identical throughout the phylogeny. Here, we develop heterogeneous models of chromosome-number evolution that allow multiple transition regimes to operate in distinct parts of the phylogeny. The partition of the phylogeny to distinct transition regimes may be specified by the researcher or, alternatively, identified using a sequential testing approach. Once the number and locations of shifts in the transition pattern are determined, a second search phase identifies regimes with similar transition dynamics, which could indicate on convergent evolution. Using simulations, we study the performance of the developed model to detect shifts in patterns of chromosome-number evolution and demonstrate its applicability by analyzing the evolution of chromosome numbers within the Cyperaceae plant family. The developed model extends the capabilities of probabilistic models of chromosome-number evolution and should be particularly helpful for the analyses of large phylogenies that include multiple distinct subclades.


Assuntos
Cromossomos , Cyperaceae , Filogenia , Cyperaceae/genética , Poliploidia , Plantas/genética , Evolução Molecular
10.
Protein Sci ; 32(3): e4582, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36718848

RESUMO

The ConSurf web-sever for the analysis of proteins, RNA, and DNA provides a quick and accurate estimate of the per-site evolutionary rate among homologues. The analysis reveals functionally important regions, such as catalytic and ligand-binding sites, which often evolve slowly. Since the last report in 2016, ConSurf has been improved in multiple ways. It now has a user-friendly interface that makes it easier to perform the analysis and to visualize the results. Evolutionary rates are calculated based on a set of homologous sequences, collected using hidden Markov model-based search tools, recently embedded in the pipeline. Using these, and following the removal of redundancy, ConSurf assembles a representative set of effective homologues for protein and nucleic acid queries to enable informative analysis of the evolutionary patterns. The analysis is particularly insightful when the evolutionary rates are mapped on the macromolecule structure. In this respect, the availability of AlphaFold model structures of essentially all UniProt proteins makes ConSurf particularly relevant to the research community. The UniProt ID of a query protein with an available AlphaFold model can now be used to start a calculation. Another important improvement is the Python re-implementation of the entire computational pipeline, making it easier to maintain. This Python pipeline is now available for download as a standalone version. We demonstrate some of ConSurf's key capabilities by the analysis of caveolin-1, the main protein of membrane invaginations called caveolae.


Assuntos
Evolução Biológica , Evolução Molecular , Conformação Proteica , Sequência Conservada/genética , Proteínas/química , Software
11.
Methods Mol Biol ; 2545: 175-187, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36720813

RESUMO

Chromosome numbers have long been used for the identification of key genomic events such as polyploidy and dysploidy. These inferences are often challenging, particularly when applied to large phylogenies, or clades in which more than a few chromosome number transitions had occurred. Here we describe the chromEvol computational framework that infers shifts in chromosome numbers along a phylogeny using probabilistic models of chromosome number change. Given chromosome count data and an associated phylogeny, chromEvol identifies such patterns by fitting probabilistic models of chromosome number evolution to the data. We describe the chromEvol workflow using available online tools, including the specification of the desired models, the examination of model fit to the data, and the inference of ploidy levels. The pipeline can be used by the wide scientific community and requires no previous computational or programming skills.


Assuntos
Genômica , Modelos Estatísticos , Humanos , Filogenia , Ploidias , Poliploidia
12.
Open Biol ; 12(12): 220223, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36514983

RESUMO

Insertions and deletions (indels) of short DNA segments are common evolutionary events. Numerous studies showed that deletions occur more often than insertions in both prokaryotes and eukaryotes. It raises the question why neutral sequences are not eradicated from the genome. We suggest that this is due to a phenomenon we term border-induced selection. Accordingly, a neutral sequence is bordered between conserved regions. Deletions occurring near the borders occasionally protrude to the conserved region and are thereby subject to strong purifying selection. Thus, for short neutral sequences, an insertion bias is expected. Here, we develop a set of increasingly complex models of indel dynamics that incorporate border-induced selection. Furthermore, we show that short conserved sequences within the neutrally evolving sequence help explain: (i) the presence of very long sequences; (ii) the high variance of sequence lengths; and (iii) the possible emergence of multimodality in sequence length distributions. Finally, we fitted our models to the human intron length distribution, as introns are thought to be mostly neutral and bordered by conserved exons. We show that when accounting for the occurrence of short conserved sequences within introns, we reproduce the main features, including the presence of long introns and the multimodality of intron distribution.


Assuntos
Evolução Molecular , Mutação INDEL , Humanos , Íntrons , Genoma , Genômica
13.
Bioinformatics ; 38(Suppl 1): i118-i124, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758778

RESUMO

MOTIVATION: In recent years, full-genome sequences have become increasingly available and as a result many modern phylogenetic analyses are based on very long sequences, often with over 100 000 sites. Phylogenetic reconstructions of large-scale alignments are challenging for likelihood-based phylogenetic inference programs and usually require using a powerful computer cluster. Current tools for alignment trimming prior to phylogenetic analysis do not promise a significant reduction in the alignment size and are claimed to have a negative effect on the accuracy of the obtained tree. RESULTS: Here, we propose an artificial-intelligence-based approach, which provides means to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset. Our approach is based on training a regularized Lasso-regression model that optimizes the log-likelihood prediction accuracy while putting a constraint on the number of sites used for the approximation. We show that computing the likelihood based on 5% of the sites already provides accurate approximation of the tree likelihood based on the entire data. Furthermore, we show that using this Lasso-based approximation during a tree search decreased running-time substantially while retaining the same tree-search performance. AVAILABILITY AND IMPLEMENTATION: The code was implemented in Python version 3.8 and is available through GitHub (https://github.com/noaeker/lasso_positions_sampling). The datasets used in this paper were retrieved from Zhou et al. (2018) as described in section 3. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Inteligência Artificial , Software , Funções Verossimilhança , Filogenia
14.
Proc Biol Sci ; 288(1959): 20210533, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-34547912

RESUMO

The role of plant-pollinator interactions in the rapid radiation of the angiosperms have long fascinated evolutionary biologists. Studies have brought evidence for pollinator-driven diversification of various plant lineages, particularly plants with specialized flowers and concealed rewards. By contrast, little is known about how this crucial interaction has shaped macroevolutionary patterns of floral visitors. In particular, there is currently no empirical evidence that floral host association has increased diversification in bees, the most prominent group of floral visitors that essentially rely on angiosperm pollen. In this study, we examine how floral host preference influenced diversification in eucerine bees (Apidae, Eucerini), which exhibit large variations in their floral associations. We combine quantitative pollen analyses with a recently proposed phylogenetic hypothesis, and use a state speciation and extinction probabilistic approach. Using this framework, we provide the first evidence that multiple evolutionary transitions from host plants with accessible pollen to restricted pollen from 'bee-flowers' have significantly increased the diversification of a bee clade. We suggest that exploiting host plants with restricted pollen has allowed the exploitation of a new ecological niche for eucerine bees and contributed both to their colonization of vast regions of the world and their rapid diversification.


Assuntos
Flores , Polinização , Animais , Abelhas , Evolução Biológica , Filogenia , Pólen
15.
Mol Biol Evol ; 38(12): 5769-5781, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34469521

RESUMO

Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.


Assuntos
Evolução Molecular , Mutação INDEL , Teorema de Bayes , Modelos Estatísticos , Filogenia
16.
J Mol Evol ; 89(6): 329-340, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34059925

RESUMO

Preventing and controlling epidemics caused by vector-borne viruses are particularly challenging due to their diverse pool of hosts and highly adaptive nature. Many vector-borne viruses belong to the Flavivirus genus, whose members vary greatly in host range and specificity. Members of the Flavivirus genus can be categorized to four main groups: insect-specific viruses that are maintained solely in arthropod populations, mosquito-borne viruses and tick-borne viruses that are transmitted to vertebrate hosts by mosquitoes or ticks via blood feeding, and those with no-known vector. The mosquito-borne group encompasses the yellow fever, dengue, and West Nile viruses, all of which are globally spread and cause severe morbidity in humans. The Flavivirus genus is genetically diverse, and its members are subject to different host-specific and vector-specific selective constraints, which do not always align. Thus, understanding the underlying genetic differences that led to the diversity in host range within this genus is an important aspect in deciphering the mechanisms that drive host compatibility and can aid in the constant arms-race against viral threats. Here, we review the phylogenetic relationships between members of the genus, their infection bottlenecks, and phenotypic and genomic differences. We further discuss methods that utilize these differences for prediction of host shifts in flaviviruses and can contribute to viral surveillance efforts.


Assuntos
Culicidae , Flavivirus , Animais , Culicidae/genética , Flavivirus/genética , Especificidade de Hospedeiro , Humanos , Mosquitos Vetores/genética , Filogenia
17.
Nat Commun ; 12(1): 1983, 2021 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-33790270

RESUMO

Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.


Assuntos
Algoritmos , Evolução Molecular , Aprendizado de Máquina , Filogenia , Animais , Bases de Dados Genéticas/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Modelos Genéticos
18.
Genome Biol Evol ; 13(2)2021 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-33566095

RESUMO

Chromosome numbers have been widely used to describe the most fundamental genomic attribute of an organism or a lineage. Although providing strong phylogenetic signal, chromosome numbers vary remarkably among eukaryotes at all levels of taxonomic resolution. Changes in chromosome numbers regularly serve as indication of major genomic events, most notably polyploidy and dysploidy. Here, we review recent advancements in our ability to make inferences regarding historical events that led to alterations in the number of chromosomes of a lineage. We first describe the mechanistic processes underlying changes in chromosome numbers, focusing on structural chromosomal rearrangements. Then, we focus on experimental procedures, encompassing comparative cytogenomics and genomics approaches, and on computational methodologies that are based on explicit models of chromosome-number evolution. Together, these tools offer valuable predictions regarding historical events that have changed chromosome numbers and genome structures, as well as their phylogenetic and temporal placements.


Assuntos
Cromossomos de Plantas , Evolução Molecular , Modelos Genéticos , Coloração Cromossômica , Genoma de Planta , Genômica , Poliploidia
19.
Mol Ecol Resour ; 21(4): 1393-1403, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33533167

RESUMO

The study of intraspecific genomic variation in eukaryotic species has been the focus of numerous genome resequencing projects in recent years. One emerging approach for the analysis of intraspecific diversity uses the concept of a pan-genome, which theoretically represents the full set of genomic sequences and coding genes from all individuals of a given species. This approach has many advantages over reference-based methods and has been successfully applied to study both prokaryotic and eukaryotic species. However, the process of pan-genome construction still presents considerable scientific and technical challenges, especially for eukaryotic species with large and complex genomes. Although general approaches for the construction of pan-genomes have been devised, currently available software tools implement only certain modules of the entire computational procedure. Therefore, each pan-genome project requires the development of tailored analysis pipelines, thus complicating and prolonging the process and impairing research reproducibility and comparison across studies. Here, we present Panoramic, a software package for the automatic construction of eukaryotic pan-genomes. Panoramic takes raw sequencing reads as input and applies two alternative approaches for pan-genome construction. Panoramic makes pan-genome construction a considerably easier task by providing simple user interface and efficient data processing algorithms. We demonstrate the use of Panoramic by constructing the pan-genome of the model plant species Arabidopsis thaliana from sequencing data of 20 diverse ecotypes.


Assuntos
Eucariotos , Genoma , Genômica , Software , Eucariotos/genética , Reprodutibilidade dos Testes
20.
Syst Biol ; 70(3): 608-622, 2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-33252676

RESUMO

Detecting the signature of selection in coding sequences and associating it with shifts in phenotypic states can unveil genes underlying complex traits. Of the various signatures of selection exhibited at the molecular level, changes in the pattern of selection at protein-coding genes have been of main interest. To this end, phylogenetic branch-site codon models are routinely applied to detect changes in selective patterns along specific branches of the phylogeny. Many of these methods rely on a prespecified partition of the phylogeny to branch categories, thus treating the course of trait evolution as fully resolved and assuming that phenotypic transitions have occurred only at speciation events. Here, we present TraitRELAX, a new phylogenetic model that alleviates these strong assumptions by explicitly accounting for the uncertainty in the evolution of both trait and coding sequences. This joint statistical framework enables the detection of changes in selection intensity upon repeated trait transitions. We evaluated the performance of TraitRELAX using simulations and then applied it to two case studies. Using TraitRELAX, we found an intensification of selection in the primate SEMG2 gene in polygynandrous species compared to species of other mating forms, as well as changes in the intensity of purifying selection operating on sixteen bacterial genes upon transitioning from a free-living to an endosymbiotic lifestyle.[Evolutionary selection; intensification; $\gamma $-proteobacteria; genotype-phenotype; relaxation; SEMG2.].


Assuntos
Evolução Molecular , Fenótipo , Seleção Genética , Animais , Códon , Modelos Genéticos , Filogenia , Primatas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...