Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Syst Biol ; 71(5): 1178-1194, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35244183

RESUMO

Reconstructing accurate historical relationships within a species poses numerous challenges, not least in many plant groups in which gene flow is high enough to extend well beyond species boundaries. Nonetheless, the extent of tree-like history within a species is an empirical question on which it is now possible to bring large amounts of genome sequence to bear. We assess phylogenetic structure across the geographic range of the saguaro cactus, an emblematic member of Cactaceae, a clade known for extensive hybridization and porous species boundaries. Using 200 Gb of whole genome resequencing data from 20 individuals sampled from 10 localities, we assembled two data sets comprising 150,000 biallelic single nucleotide polymorphisms (SNPs) from protein coding sequences. From these, we inferred within-species trees and evaluated their significance and robustness using five qualitatively different inference methods. Despite the low sequence diversity, large census population sizes, and presence of wide-ranging pollen and seed dispersal agents, phylogenetic trees were well resolved and highly consistent across both data sets and all methods. We inferred that the most likely root, based on marginal likelihood comparisons, is to the east and south of the region of highest genetic diversity, which lies along the coast of the Gulf of California in Sonora, Mexico. Together with striking decreases in marginal likelihood found to the north, this supports hypotheses that saguaro's current range reflects postglacial expansion from the refugia in the south of its range. We conclude with observations about practical and theoretical issues raised by phylogenomic data sets within species, in which SNP-based methods must be used rather than gene tree methods that are widely used when sequence divergence is higher. These include computational scalability, inference of gene flow, and proper assessment of statistical support in the presence of linkage effects. [Phylogenomics; phylogeography; rooting; Sonoran Desert.].


Assuntos
Cactaceae , Cactaceae/genética , Hibridização Genética , Filogenia , Filogeografia , Análise de Sequência de DNA
2.
J Exp Bot ; 71(9): 2782-2795, 2020 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-31989164

RESUMO

The presence of varied numbers of CALCINEURIN B-LIKE10 (CBL10) calcium sensor genes in species across the Brassicaceae and the demonstrated role of CBL10 in salt tolerance in Arabidopsis thaliana and Eutrema salsugineum provided a unique opportunity to determine if CBL10 function is modified in different species and linked to salt tolerance. Salinity effects on species growth and cross-species complementation were used to determine the extent of conservation and divergence of CBL10 function in four species representing major lineages within the core Brassicaceae (A. thaliana, E. salsugineum, Schrenkiella parvula, and Sisymbrium irio) as well as the first diverging lineage (Aethionema arabicum). Evolutionary and functional analyses indicate that CBL10 duplicated within expanded lineage II of the Brassicaceae and that, while portions of CBL10 function are conserved across the family, there are species-specific variations in CBL10 function. Paralogous CBL10 genes within a species diverged in expression and function probably contributing to the maintenance of the duplicated gene pairs. Orthologous CBL10 genes diverged in function in a species-specific manner, suggesting that functions arose post-speciation. Multiple CBL10 genes and their functional divergence may have expanded calcium-mediated signaling responses and contributed to the ability of certain members of the Brassicaceae to maintain growth in salt-affected soils.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Brassicaceae , Proteínas de Ligação ao Cálcio , Arabidopsis/metabolismo , Proteínas de Arabidopsis/metabolismo , Brassicaceae/genética , Brassicaceae/metabolismo , Cálcio , Tolerância ao Sal
3.
Proc Natl Acad Sci U S A ; 114(45): 12003-12008, 2017 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-29078296

RESUMO

Few clades of plants have proven as difficult to classify as cacti. One explanation may be an unusually high level of convergent and parallel evolution (homoplasy). To evaluate support for this phylogenetic hypothesis at the molecular level, we sequenced the genomes of four cacti in the especially problematic tribe Pachycereeae, which contains most of the large columnar cacti of Mexico and adjacent areas, including the iconic saguaro cactus (Carnegiea gigantea) of the Sonoran Desert. We assembled a high-coverage draft genome for saguaro and lower coverage genomes for three other genera of tribe Pachycereeae (Pachycereus, Lophocereus, and Stenocereus) and a more distant outgroup cactus, Pereskia We used these to construct 4,436 orthologous gene alignments. Species tree inference consistently returned the same phylogeny, but gene tree discordance was high: 37% of gene trees having at least 90% bootstrap support conflicted with the species tree. Evidently, discordance is a product of long generation times and moderately large effective population sizes, leading to extensive incomplete lineage sorting (ILS). In the best supported gene trees, 58% of apparent homoplasy at amino sites in the species tree is due to gene tree-species tree discordance rather than parallel substitutions in the gene trees themselves, a phenomenon termed "hemiplasy." The high rate of genomic hemiplasy may contribute to apparent parallelisms in phenotypic traits, which could confound understanding of species relationships and character evolution in cacti.


Assuntos
Cactaceae/genética , Genoma de Planta/genética , Sequência de Bases , Evolução Molecular , Genômica/métodos , México , Modelos Genéticos , América do Norte , Filogenia
4.
Syst Biol ; 64(5): 709-26, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25999395

RESUMO

Terraces are sets of trees with precisely the same likelihood or parsimony score, which can be induced by missing sequences in partitioned multi-locus phylogenetic data matrices. The potentially large set of trees on a terrace can be characterized by enumeration algorithms or consensus methods that exploit the pattern of partial taxon coverage in the data, independent of the sequence data themselves. Terraces can add ambiguity and complexity to phylogenetic inference, particularly in settings where inference is already challenging: data sets with many taxa and relatively few loci. In this article we present five new findings about terraces and their impacts on phylogenetic inference. First, we clarify assumptions about partitioning scheme model parameters that are necessary for the existence of terraces. Second, we explore the dependence of terrace size on partitioning scheme and indicate how to find the partitioning scheme associated with the largest terrace containing a given tree. Third, we highlight the impact of terrace size on bootstrap estimates of confidence limits in clades, and characterize the surprising result that the bootstrap proportion for a clade, as it is usually calculated, can be entirely determined by the frequency of bipartitions on a terrace, with some bipartitions receiving high support even when incorrect. Fourth, we dissect some effects of prior distributions of edge lengths on the computed posterior probabilities of clades on terraces, to understand an example in which long edges "attract" each other in Bayesian inference. Fifth, we describe how assuming relationships between edge-lengths of different loci, as an attempt to avoid terraces, can also be problematic when taxon coverage is partial, specifically when heterotachy is present. Finally, we discuss strategies for remediation of some of these problems. One promising approach finds a minimal set of taxa which, when deleted from the data matrix, reduces the size of a terrace to a single tree.


Assuntos
Classificação/métodos , Simulação por Computador/normas , Filogenia , Modelos Genéticos
5.
Ann Bot ; 111(6): 1263-75, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23104672

RESUMO

BACKGROUND AND AIMS: Plants display a wide range of traits that allow them to use animals for vital tasks. To attract and reward aggressive ants that protect developing leaves and flowers from consumers, many plants bear extrafloral nectaries (EFNs). EFNs are exceptionally diverse in morphology and locations on a plant. In this study the evolution of EFN diversity is explored by focusing on the legume genus Senna, in which EFNs underwent remarkable morphological diversification and occur in over 80 % of the approx. 350 species. METHODS: EFN diversity in location, morphology and plant ontogeny was characterized in wild and cultivated plants, using scanning electron microscopy and microtome sectioning. From these data EFN evolution was reconstructed in a phylogenetic framework comprising 83 Senna species. KEY RESULTS: Two distinct kinds of EFNs exist in two unrelated clades within Senna. 'Individualized' EFNs (iEFNs), located on the compound leaves and sometimes at the base of pedicels, display a conspicuous, gland-like nectary structure, are highly diverse in shape and characterize the species-rich EFN clade. Previously overlooked 'non-individualized' EFNs (non-iEFNs) embedded within stipules, bracts, and sepals are cryptic and may represent a new synapomorphy for clade II. Leaves bear EFNs consistently throughout plant ontogeny. In one species, however, early seedlings develop iEFNs between the first pair of leaflets, but later leaves produce them at the leaf base. This ontogenetic shift reflects our inferred diversification history of iEFN location: ancestral leaves bore EFNs between the first pair of leaflets, while leaves derived from them bore EFNs either between multiple pairs of leaflets or at the leaf base. CONCLUSIONS: EFNs are more diverse than previously thought. EFN-bearing plant parts provide different opportunities for EFN presentation (i.e. location) and individualization (i.e. morphology), with implications for EFN morphological evolution, EFN-ant protective mutualisms and the evolutionary role of EFNs in plant diversification.


Assuntos
Formigas/fisiologia , Evolução Biológica , Senna/anatomia & histologia , Animais , Fenótipo , Néctar de Plantas/metabolismo , Senna/genética , Senna/crescimento & desenvolvimento , Simbiose
6.
BMC Evol Biol ; 10: 155, 2010 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-20500873

RESUMO

BACKGROUND: Phylogenomic studies based on multi-locus sequence data sets are usually characterized by partial taxon coverage, in which sequences for some loci are missing for some taxa. The impact of missing data has been widely studied in phylogenetics, but it has proven difficult to distinguish effects due to error in tree reconstruction from effects due to missing data per se. We approach this problem using a explicitly phylogenomic criterion of success, decisiveness, which refers to whether the pattern of taxon coverage allows for uniquely defining a single tree for all taxa. RESULTS: We establish theoretical bounds on the impact of missing data on decisiveness. Results are derived for two contexts: a fixed taxon coverage pattern, such as that observed from an already assembled data set, and a randomly generated pattern derived from a process of sampling new data, such as might be observed in an ongoing comparative genomics sequencing project. Lower bounds on how many loci are needed for decisiveness are derived for the former case, and both lower and upper bounds for the latter. When data are not decisive for all trees, we estimate the probability of decisiveness and the chances that a given edge in the tree will be distinguishable. Theoretical results are illustrated using several empirical examples constructed by mining sequence databases, genomic libraries such as ESTs and BACs, and complete genome sequences. CONCLUSION: Partial taxon coverage among loci can limit phylogenomic inference by making it impossible to distinguish among multiple alternative trees. However, even though lack of decisiveness is typical of many sparse phylogenomic data sets, it is often still possible to distinguish a large fraction of edges in the tree.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Filogenia , Mineração de Dados , Bases de Dados Genéticas , Biblioteca Genômica , Modelos Genéticos , Análise de Sequência de DNA
7.
BMC Evol Biol ; 7 Suppl 1: S3, 2007 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-17288576

RESUMO

BACKGROUND: Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants. RESULTS: A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead. CONCLUSION: Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.


Assuntos
Algoritmos , Evolução Molecular , Etiquetas de Sequências Expressas , Duplicação Gênica , Magnoliopsida/genética , Filogenia , Modelos Genéticos , Análise de Sequência de DNA/métodos
8.
PLoS One ; 10(2): e0117987, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25679219

RESUMO

Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.


Assuntos
Biologia/métodos , Bases de Dados Genéticas , Árvores/classificação , Árvores/genética , Interface Usuário-Computador
9.
Algorithms Mol Biol ; 8(1): 18, 2013 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-23837994

RESUMO

BACKGROUND: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious. RESULTS: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved. CONCLUSIONS: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.

10.
Science ; 333(6041): 448-50, 2011 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-21680810

RESUMO

A key step in assembling the tree of life is the construction of species-rich phylogenies from multilocus--but often incomplete--sequence data sets. We describe previously unknown structure in the landscape of solutions to the tree reconstruction problem, comprising sometimes vast "terraces" of trees with identical quality, arranged on islands of phylogenetically similar trees. Phylogenetic ambiguity within a terrace can be characterized efficiently and then ameliorated by new algorithms for obtaining a terrace's maximum-agreement subtree or by identifying the smallest set of new targets for additional sequencing. Algorithms to find optimal trees or estimate Bayesian posterior tree distributions may need to navigate strategically in the neighborhood of large terraces in tree space.


Assuntos
Artrópodes/classificação , Colubridae/classificação , Magnoliopsida/classificação , Modelos Estatísticos , Filogenia , Algoritmos , Animais , Artrópodes/genética , Teorema de Bayes , Evolução Biológica , Colubridae/genética , Interpretação Estatística de Dados , Genômica/métodos , Funções Verossimilhança , Magnoliopsida/genética , Poaceae/classificação , Poaceae/genética , Alinhamento de Sequência
11.
Syst Biol ; 55(5): 818-36, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17060202

RESUMO

A comprehensive phylogeny of papilionoid legumes was inferred from sequences of 2228 taxa in GenBank release 147. A semiautomated analysis pipeline was constructed to download, parse, assemble, align, combine, and build trees from a pool of 11,881 sequences. Initial steps included all-against-all BLAST similarity searches coupled with assembly, using a novel strategy for building length-homogeneous primary sequence clusters. This was followed by a combination of global and local alignment protocols to build larger secondary clusters of locally aligned sequences, thus taking into account the dramatic differences in length of the heterogeneous coding and noncoding sequence data present in GenBank. Next, clusters were checked for the presence of duplicate genes and other potentially misleading sequences and examined for combinability with other clusters on the basis of taxon overlap. Finally, two supermatrices were constructed: a "sparse" matrix based on the primary clusters alone (1794 taxa x 53,977 characters), and a somewhat more "dense" matrix based on the secondary clusters (2228 taxa x 33,168 characters). Both matrices were very sparse, with 95% of their cells containing gaps or question marks. These were subjected to extensive heuristic parsimony analyses using deterministic and stochastic heuristics, including bootstrap analyses. A "reduced consensus" bootstrap analysis was also performed to detect cryptic signal in a subtree of the data set corresponding to a "backbone" phylogeny proposed in previous studies. Overall, the dense supermatrix appeared to provide much more satisfying results, indicated by better resolution of the bootstrap tree, excellent agreement with the backbone papilionoid tree in the reduced bootstrap consensus analysis, few problematic large polytomies in the strict consensus, and less fragmentation of conventionally recognized genera. Nevertheless, at lower taxonomic levels several problems were identified and diagnosed. A large number of methodological issues in supermatrix construction at this scale are discussed, including detection of annotation errors in GenBank sequences; the shortage of effective algorithms and software for local multiple sequence alignment; the difficulty of overcoming effects of fragmentation of data into nearly disjoint blocks in sparse supermatrices; and the lack of informative tools to assess confidence limits in very large trees.


Assuntos
Bases de Dados de Ácidos Nucleicos , Fabaceae/classificação , Filogenia , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Análise por Conglomerados , Biologia Computacional/métodos , Fabaceae/genética , Dados de Sequência Molecular , Proteínas de Plantas/genética , Alinhamento de Sequência
12.
Mol Biol Evol ; 22(4): 914-24, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15625184

RESUMO

Covarion models of molecular evolution allow the rate of evolution of a site to vary through time. There are few simple and effective tests for covarion evolution, and consequently, little is known about the presence of covarion processes in molecular evolution. We describe two new tests for covarion evolution and demonstrate with simulations that they perform well under a wide range of conditions. A survey of covarion evolution in sequenced plastid genomes found evidence of covarion drift in at least 26 out of 57 genes. Covarion evolution is most evident in first and second codon positions of the plastid genes, and there is no evidence of covarion evolution in third codon positions. Therefore, the significant covarion tests are likely due to changes in the selective constraints of amino acids. The frequency of covarion evolution within the plastid genome suggests that covarion processes of evolution were important in generating the observed patterns of sequence variation among plastid genomes.


Assuntos
Evolução Molecular , Genoma , Plastídeos/genética , Códon , Modelos Estatísticos
13.
Am J Bot ; 90(8): 1215-28, 2003 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21659222

RESUMO

Phylogenetic analyses of Loasaceae that apply DNA sequence data from the plastid trnL-trnF region and matK gene in both maximum-parsimony and maximum-likelihood searches are presented. The results place subfamily Loasoideae as the sister of a subfamily Gronovioideae-Mentzelia clade. Schismocarpus is the sister of the Loasoideae-Gronovioideae-Mentzelia clade. The Schismocarpus-Loasoideae-Gronovioideae-Mentzelia clade is the sister of Eucnide. Several clades in Loasoideae receive strong support, providing insights on generic circumscription problems. Within Mentzelia, several major clades receive strong support, which clarifies relationships among previously circumscribed sections. Prior taxonomic and phylogenetic hypotheses are modeled using topology constraints in parsimony and likelihood analyses; tree lengths and likelihoods, respectively, are compared from constrained and unconstrained analyses to evaluate the relative support for various hypotheses. We use the Shimodaira-Hasegawa (SH) test to establish the significance of the differences between constrained and unconstrained topologies. The SH test rejects topologies based on hypotheses for (1) the placement of gronovioids as the sister of the rest of Loasaceae, (2) the monophyly of subfamily Mentzelioideae as well as Gronovioideae and Loasoideae, (3) the monophyly of Loasa sensu lato as circumscribed by Urban and Gilg, and (4) the monophyly of Mentzelia torreyi and Mentzelia sect. Bartonia.

14.
Science ; 306(5699): 1172-4, 2004 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-15539599

RESUMO

We assess the phylogenetic potential of approximately 300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.


Assuntos
Evolução Biológica , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Filogenia , Animais , Anopheles/classificação , Anopheles/genética , Biodiversidade , Classificação , Biologia Computacional , Família Multigênica , Proteínas de Plantas/genética , Plantas/classificação , Plantas/genética , Spodoptera/classificação , Spodoptera/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA