Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
1.
Bioinformatics ; 37(18): 2848-2857, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33792639

ABSTRACT

MOTIVATION: Microbial gene catalogs are data structures that organize genes found in microbial communities, providing a reference for standardized analysis of the microbes across samples and studies. Although gene catalogs are commonly used, they have not been critically evaluated for their effectiveness as a basis for metagenomic analyses. RESULTS: As a case study, we investigate one such catalog, the Integrated Gene Catalog (IGC), however, our observations apply broadly to most gene catalogs constructed to date. We focus on both the approach used to construct this catalog and on its effectiveness when used as a reference for microbiome studies. Our results highlight important limitations of the approach used to construct the IGC and call into question the broad usefulness of gene catalogs more generally. We also recommend best practices for the construction and use of gene catalogs in microbiome studies and highlight opportunities for future research. AVAILABILITY AND IMPLEMENTATION: All supporting scripts for our analyses can be found on GitHub: https://github.com/SethCommichaux/IGC.git. The supporting data can be downloaded from: https://obj.umiacs.umd.edu/igc-analysis/IGC_analysis_data.tar.gz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Metagenome , Microbiota , Microbiota/genetics , Metagenomics
2.
Syst Biol ; 68(6): 1052-1061, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31034053

ABSTRACT

BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.


Subject(s)
Classification/methods , Software/standards , Data Interpretation, Statistical , Phylogeny
3.
Front Zool ; 15: 43, 2018.
Article in English | MEDLINE | ID: mdl-30473719

ABSTRACT

BACKGROUND: A number of shelled and shell-less gastropods are known to use multiple defensive mechanisms, including internally generated or externally obtained biochemically active compounds and structures. Within Nudipleura, nudibranchs within Cladobranchia possess such a special defense: the ability to sequester cnidarian nematocysts - small capsules that can inject venom into the tissues of other organisms. This ability is distributed across roughly 600 species within Cladobranchia, and many questions still remain in regard to the comparative morphology and evolution of the cnidosac - the structure that houses sequestered nematocysts (called kleptocnides). In this paper, we describe cnidosac morphology across the main groups of Cladobranchia in which it occurs, and place variation in its structure in a phylogenetic context to better understand the evolution of nematocyst sequestration. RESULTS: Overall, we find that the length, size and structure of the entrance to the cnidosac varies more than expected based on previous work, as does the structure of the exit, the musculature surrounding the cnidosac, and the position and orientation of the kleptocnides. The sequestration of nematocysts has originated at least twice within Cladobranchia based on the phylogeny presented here using 94 taxa and 409 genes. CONCLUSIONS: The cnidosac is not homologous to cnidosac-like structures found in Hancockiidae. Additionally, the presence of a sac at the distal end of the digestive gland may have originated prior to the sequestration of nematocysts. This study provides a more complete picture of variation in, and evolution of, morphological characters associated with nematocyst sequestration in Cladobranchia.

4.
J Infect Dis ; 216(4): 468-476, 2017 08 15.
Article in English | MEDLINE | ID: mdl-28931241

ABSTRACT

Background: Amplified copy number in the plasmepsin II/III genes within Plasmodium falciparum has been associated with decreased sensitivity to piperaquine. To examine this association and test whether additional loci might also contribute, we performed a genome-wide association study of ex vivo P. falciparum susceptibility to piperaquine. Methods: Plasmodium falciparum DNA from 183 samples collected primarily from Cambodia was genotyped at 33716 genome-wide single nucleotide polymorphisms (SNPs). Linear mixed models and random forests were used to estimate associations between parasite genotypes and piperaquine susceptibility. Candidate polymorphisms were evaluated for their association with dihydroartemisinin-piperaquine treatment outcomes in an independent dataset. Results: Single nucleotide polymorphisms on multiple chromosomes were associated with piperaquine 90% inhibitory concentrations (IC90) in a genome-wide analysis. Fine-mapping of genomic regions implicated in genome-wide analyses identified multiple SNPs in linkage disequilibrium with each other that were significantly associated with piperaquine IC90, including a novel mutation within the gene encoding the P. falciparum chloroquine resistance transporter, PfCRT. This mutation (F145I) was associated with dihydroartemisinin-piperaquine treatment failure after adjusting for the presence of amplified plasmepsin II/III, which was also associated with decreased piperaquine sensitivity. Conclusions: Our data suggest that, in addition to plasmepsin II/III copy number, other loci, including pfcrt, may also be involved in piperaquine resistance.


Subject(s)
Drug Resistance/genetics , Membrane Transport Proteins/genetics , Plasmodium falciparum/genetics , Protozoan Proteins/genetics , Quinolines/pharmacology , Artemisinins/pharmacology , Aspartic Acid Endopeptidases/genetics , Aspartic Acid Endopeptidases/metabolism , Cambodia , DNA Copy Number Variations , DNA, Protozoan/genetics , Genetic Loci , Genome-Wide Association Study , Genotyping Techniques , Humans , Inhibitory Concentration 50 , Linkage Disequilibrium , Membrane Transport Proteins/metabolism , Mutation , Plasmodium falciparum/drug effects , Polymorphism, Single Nucleotide , Proportional Hazards Models , Protozoan Proteins/metabolism , Sensitivity and Specificity , Treatment Failure
5.
Annu Rev Entomol ; 62: 265-283, 2017 01 31.
Article in English | MEDLINE | ID: mdl-27860521

ABSTRACT

Until recently, deep-level phylogeny in Lepidoptera, the largest single radiation of plant-feeding insects, was very poorly understood. Over the past two decades, building on a preceding era of morphological cladistic studies, molecular data have yielded robust initial estimates of relationships both within and among the ∼43 superfamilies, with unsolved problems now yielding to much larger data sets from high-throughput sequencing. Here we summarize progress on lepidopteran phylogeny since 1975, emphasizing the superfamily level, and discuss some resulting advances in our understanding of lepidopteran evolution.


Subject(s)
Biological Evolution , Lepidoptera/classification , Phylogeny , Animals , Evolution, Molecular , Lepidoptera/genetics
6.
BMC Evol Biol ; 17(1): 221, 2017 Oct 26.
Article in English | MEDLINE | ID: mdl-29073890

ABSTRACT

BACKGROUND: The impact of predator-prey interactions on the evolution of many marine invertebrates is poorly understood. Since barriers to genetic exchange are less obvious in the marine realm than in terrestrial or freshwater systems, non-allopatric divergence may play a fundamental role in the generation of biodiversity. In this context, shifts between major prey types could constitute important factors explaining the biodiversity of marine taxa, particularly in groups with highly specialized diets. However, the scarcity of marine specialized consumers for which reliable phylogenies exist hampers attempts to test the role of trophic specialization in evolution. In this study, RNA-Seq data is used to produce a phylogeny of Cladobranchia, a group of marine invertebrates that feed on a diverse array of prey taxa but mostly specialize on cnidarians. The broad range of prey type preferences allegedly present in two major groups within Cladobranchia suggest that prey type shifts are relatively common over evolutionary timescales. RESULTS: In the present study, we generated a well-supported phylogeny of the major lineages within Cladobranchia using RNA-Seq data, and used ancestral state reconstruction analyses to better understand the evolution of prey preference. These analyses answered several fundamental questions regarding the evolutionary relationships within Cladobranchia, including support for a clade of species from Arminidae as sister to Tritoniidae (which both preferentially prey on Octocorallia). Ancestral state reconstruction analyses supported a cladobranchian ancestor with a preference for Hydrozoa and show that the few transitions identified only occur from lineages that prey on Hydrozoa to those that feed on other types of prey. CONCLUSIONS: There is strong phylogenetic correlation with prey preference within Cladobranchia, suggesting that prey type specialization within this group has inertia. Shifts between different types of prey have occurred rarely throughout the evolution of Cladobranchia, indicating that this may not have been an important driver of the diversity within this group.


Subject(s)
Biological Evolution , Cnidaria/genetics , Food Chain , Gastropoda/genetics , Animals , Cnidaria/classification , Gastropoda/classification , Gastropoda/physiology , Phylogeny , Sequence Analysis, RNA
7.
Proc Natl Acad Sci U S A ; 110(1): 240-5, 2013 Jan 02.
Article in English | MEDLINE | ID: mdl-23248304

ABSTRACT

The recent emergence of artemisinin-resistant Plasmodium falciparum malaria in western Cambodia could threaten prospects for malaria elimination. Identification of the genetic basis of resistance would provide tools for molecular surveillance, aiding efforts to contain resistance. Clinical trials of artesunate efficacy were conducted in Bangladesh, in northwestern Thailand near the Myanmar border, and at two sites in western Cambodia. Parasites collected from trial participants were genotyped at 8,079 single nucleotide polymorphisms (SNPs) using a P. falciparum-specific SNP array. Parasite genotypes were examined for signatures of recent positive selection and association with parasite clearance phenotypes to identify regions of the genome associated with artemisinin resistance. Four SNPs on chromosomes 10 (one), 13 (two), and 14 (one) were significantly associated with delayed parasite clearance. The two SNPs on chromosome 13 are in a region of the genome that appears to be under strong recent positive selection in Cambodia. The SNPs on chromosomes 10 and 13 lie in or near genes involved in postreplication repair, a DNA damage-tolerance pathway. Replication and validation studies are needed to refine the location of loci responsible for artemisinin resistance and to understand the mechanism behind it; however, two SNPs on chromosomes 10 and 13 may be useful markers of delayed parasite clearance in surveillance for artemisinin resistance in Southeast Asia.


Subject(s)
Artemisinins/pharmacology , Drug Resistance/genetics , Genetic Loci/genetics , Plasmodium falciparum/genetics , Selection, Genetic , Asia, Southeastern , Genetic Markers/genetics , Genotype , Likelihood Functions , Odds Ratio , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide/genetics , Principal Component Analysis , Regression Analysis
8.
J Infect Dis ; 211(5): 670-9, 2015 Mar 01.
Article in English | MEDLINE | ID: mdl-25180241

ABSTRACT

BACKGROUND: The emergence of artemisinin-resistant Plasmodium falciparum in Southeast Asia threatens malaria treatment efficacy. Mutations in a kelch protein encoded on P. falciparum chromosome 13 (K13) have been associated with resistance in vitro and in field samples from Cambodia. METHODS: P. falciparum infections from artesunate efficacy trials in Bangladesh, Cambodia, Laos, Myanmar, and Vietnam were genotyped at 33 716 genome-wide single-nucleotide polymorphisms (SNPs). Linear mixed models were used to test associations between parasite genotypes and parasite clearance half-lives following artesunate treatment. K13 mutations were tested for association with artemisinin resistance, and extended haplotypes on chromosome 13 were examined to determine whether mutations arose focally and spread or whether they emerged independently. RESULTS: The presence of nonreference K13 alleles was associated with prolonged parasite clearance half-life (P = 1.97 × 10(-12)). Parasites with a mutation in any of the K13 kelch domains displayed longer parasite clearance half-lives than parasites with wild-type alleles. Haplotype analysis revealed both population-specific emergence of mutations and independent emergence of the same mutation in different geographic areas. CONCLUSIONS: K13 appears to be a major determinant of artemisinin resistance throughout Southeast Asia. While we found some evidence of spreading resistance, there was no evidence of resistance moving westward from Cambodia into Myanmar.


Subject(s)
Antimalarials/pharmacology , Artemisinins/pharmacology , Drug Resistance , Malaria, Falciparum/parasitology , Mutation , Plasmodium falciparum/drug effects , Asia, Southeastern , Genotype , Humans , Plasmodium falciparum/genetics , Plasmodium falciparum/isolation & purification , Polymorphism, Single Nucleotide , Protozoan Proteins/genetics
9.
Syst Biol ; 63(5): 812-8, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24789072

ABSTRACT

We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing. The gateway features a garli 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The garli web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the garli web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results.


Subject(s)
Classification/methods , Phylogeny , Software , Access to Information , Internet
10.
NPJ Syst Biol Appl ; 10(1): 44, 2024 Apr 27.
Article in English | MEDLINE | ID: mdl-38678051

ABSTRACT

Malaria vaccine development is hampered by extensive antigenic variation and complex life stages of Plasmodium species. Vaccine development has focused on a small number of antigens, many of which were identified without utilizing systematic genome-level approaches. In this study, we implement a machine learning-based reverse vaccinology approach to predict potential new malaria vaccine candidate antigens. We assemble and analyze P. falciparum proteomic, structural, functional, immunological, genomic, and transcriptomic data, and use positive-unlabeled learning to predict potential antigens based on the properties of known antigens and remaining proteins. We prioritize candidate antigens based on model performance on reference antigens with different genetic diversity and quantify the protein properties that contribute most to identifying top candidates. Candidate antigens are characterized by gene essentiality, gene ontology, and gene expression in different life stages to inform future vaccine development. This approach provides a framework for identifying and prioritizing candidate vaccine antigens for a broad range of pathogens.


Subject(s)
Antigens, Protozoan , Malaria Vaccines , Malaria, Falciparum , Plasmodium falciparum , Plasmodium falciparum/immunology , Plasmodium falciparum/genetics , Malaria Vaccines/immunology , Antigens, Protozoan/immunology , Antigens, Protozoan/genetics , Malaria, Falciparum/immunology , Malaria, Falciparum/prevention & control , Machine Learning , Humans , Proteomics/methods , Vaccine Development/methods , Protozoan Proteins/immunology , Protozoan Proteins/genetics , Computational Biology/methods
11.
Syst Biol ; 61(1): 170-3, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21963610

ABSTRACT

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.


Subject(s)
Computational Biology/methods , Phylogeny , Software , Algorithms , Computing Methodologies , Evolution, Molecular , Genome
12.
R Soc Open Sci ; 10(3): 220939, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36998763

ABSTRACT

Platyhelminthes (flatworms) are a diverse invertebrate phylum useful for exploring life-history evolution. Within Platyhelminthes, only two clades develop through a larval stage: free-living polyclads and parasitic neodermatans. Neodermatan larvae are considered evolutionarily derived, whereas polyclad larvae are hypothesized to be ancestral due to ciliary band similarities among polyclad and other spiralian larvae. However, larval evolution has been challenging to investigate within polyclads due to low support for deeper phylogenetic relationships. To investigate polyclad life-history evolution, we generated transcriptomic data for 21 species of polyclads to build a well-supported phylogeny for the group. The resulting tree provides strong support for deeper nodes, and we recover a new monophyletic clade of early branching cotyleans. We then used ancestral state reconstructions to investigate ancestral modes of development within Polycladida and more broadly within flatworms. In polyclads, we were unable to reconstruct the ancestral state of deeper nodes with significant support because early branching clades show diverse modes of development. This suggests a complex history of larval evolution in polyclads that likely includes multiple losses and/or multiple gains. However, our ancestral state reconstruction across a previously published platyhelminth phylogeny supports a direct developing prorhynchid/polyclad ancestor, which suggests that a larval stage in the life cycle evolved along the polyclad stem lineage or within polyclads.

13.
J Control Release ; 362: 371-380, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37657693

ABSTRACT

Effective eye drop delivery systems for treating diseases of the posterior segment have yet to be clinically validated. Further, adherence to eye drop regimens is often problematic due to the difficulty and inconvenience of repetitive dosing. Here, we describe a strategy for topically dosing a peptide-drug conjugate to achieve effective and sustained therapeutic sunitinib concentrations to protect retinal ganglion cells (RGCs) in a rat model of optic nerve injury. We combined two promising delivery technologies, namely, a hypotonic gel-forming eye drop delivery system, and an engineered melanin binding and cell-penetrating peptide that sustains intraocular drug residence time. We found that once daily topical dosing of HR97-SunitiGel provided up to 2 weeks of neuroprotection after the last dose, effectively doubling the therapeutic window observed with SunitiGel. For chronic ocular diseases affecting the posterior segment, the convenience of an eye drop combined with intermittent dosing frequency could result in greater patient adherence, and thus, improved disease management.

14.
Nat Commun ; 14(1): 2509, 2023 05 02.
Article in English | MEDLINE | ID: mdl-37130851

ABSTRACT

Sustained drug delivery strategies have many potential benefits for treating a range of diseases, particularly chronic diseases that require treatment for years. For many chronic ocular diseases, patient adherence to eye drop dosing regimens and the need for frequent intraocular injections are significant barriers to effective disease management. Here, we utilize peptide engineering to impart melanin binding properties to peptide-drug conjugates to act as a sustained-release depot in the eye. We develop a super learning-based methodology to engineer multifunctional peptides that efficiently enter cells, bind to melanin, and have low cytotoxicity. When the lead multifunctional peptide (HR97) is conjugated to brimonidine, an intraocular pressure lowering drug that is prescribed for three times per day topical dosing, intraocular pressure reduction is observed for up to 18 days after a single intracameral injection in rabbits. Further, the cumulative intraocular pressure lowering effect increases ~17-fold compared to free brimonidine injection. Engineered multifunctional peptide-drug conjugates are a promising approach for providing sustained therapeutic delivery in the eye and beyond.


Subject(s)
Drug Delivery Systems , Melanins , Animals , Rabbits , Brimonidine Tartrate , Peptides , Machine Learning
15.
BMC Bioinformatics ; 13: 92, 2012 May 10.
Article in English | MEDLINE | ID: mdl-22574964

ABSTRACT

BACKGROUND: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for 'barcoding genes' like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. RESULTS: We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. CONCLUSIONS: We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.


Subject(s)
Algorithms , Computational Biology/methods , Genomics/methods , Software , Metagenomics , Phylogeny , Sequence Alignment
16.
Brief Bioinform ; 11(6): 537-43, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20798182

ABSTRACT

The major opportunities for broader incorporation of bioinformatics in education can be placed into three general categories: general applicability of bioinformatics in life science and related curricula; inherent fit of bioinformatics for promoting student learning in most biology programs; and the general experience and associated comfort students have with computers and technology. Conversely, the major challenges for broader incorporation of bioinformatics in education can be placed into three general categories: required infrastructure and logistics; instructor knowledge of bioinformatics and continuing education; and the breadth of bioinformatics, and the diversity of students and educational objectives. Broader incorporation of bioinformatics at all education levels requires overcoming the challenges to using transformative computer-requiring learning activities, assisting faculty in collecting assessment data on mastery of student learning outcomes, as well as creating more faculty development opportunities that span diverse skill levels, with an emphasis placed on providing resource materials that are kept up-to-date as the field and tools change.


Subject(s)
Computational Biology/education , Curriculum , Educational Status , Humans , Learning , Students , Teaching
17.
Syst Biol ; 60(6): 782-96, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21840842

ABSTRACT

This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly understood deeper relationships in the large clade Ditrysia ( > 150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly nonsynonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78-85% bootstrap) was weak or nonexistent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A "more-genes-only" data set (41 taxa×26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses.


Subject(s)
Classification/methods , Lepidoptera/classification , Lepidoptera/genetics , Phylogeny , Animals , Genes, Insect/genetics , Genetic Heterogeneity , Nucleotides/genetics , Statistics as Topic
18.
BMC Evol Biol ; 11: 182, 2011 Jun 24.
Article in English | MEDLINE | ID: mdl-21702958

ABSTRACT

BACKGROUND: Researchers conducting molecular phylogenetic studies are frequently faced with the decision of what to do when weak branch support is obtained for key nodes of importance. As one solution, the researcher may choose to sequence additional orthologous genes of appropriate evolutionary rate for the taxa in the study. However, generating large, complete data matrices can become increasingly difficult as the number of characters increases. A few empirical studies have shown that augmenting genes even for a subset of taxa can improve branch support. However, because each study differs in the number of characters and taxa, there is still a need for additional studies that examine whether incomplete sampling designs are likely to aid at increasing deep node resolution. We target Gracillariidae, a Cretaceous-age (~100 Ma) group of leaf-mining moths to test whether the strategy of adding genes for a subset of taxa can improve branch support for deep nodes. We initially sequenced ten genes (8,418 bp) for 57 taxa that represent the major lineages of Gracillariidae plus outgroups. After finding that many deep divergences remained weakly supported, we sequenced eleven additional genes (6,375 bp) for a 27-taxon subset. We then compared results from different data sets to assess whether one sampling design can be favored over another. The concatenated data set comprising all genes and all taxa and three other data sets of different taxon and gene sub-sampling design were analyzed with maximum likelihood. Each data set was subject to five different models and partitioning schemes of non-synonymous and synonymous changes. Statistical significance of non-monophyly was examined with the Approximately Unbiased (AU) test. RESULTS: Partial augmentation of genes led to high support for deep divergences, especially when non-synonymous changes were analyzed alone. Increasing the number of taxa without an increase in number of characters led to lower bootstrap support; increasing the number of characters without increasing the number of taxa generally increased bootstrap support. More than three-quarters of nodes were supported with bootstrap values greater than 80% when all taxa and genes were combined. Gracillariidae, Lithocolletinae + Leucanthiza, and Acrocercops and Parectopa groups were strongly supported in nearly every analysis. Gracillaria group was well supported in some analyses, but less so in others. We find strong evidence for the exclusion of Douglasiidae from Gracillarioidea sensu Davis and Robinson (1998). Our results strongly support the monophyly of a G.B.R.Y. clade, a group comprised of Gracillariidae + Bucculatricidae + Roeslerstammiidae + Yponomeutidae, when analyzed with non-synonymous changes only, but this group was frequently split when synonymous and non-synonymous substitutions were analyzed together. CONCLUSIONS: 1) Partially or fully augmenting a data set with more characters increased bootstrap support for particular deep nodes, and this increase was dramatic when non-synonymous changes were analyzed alone. Thus, the addition of sites that have low levels of saturation and compositional heterogeneity can greatly improve results. 2) Gracillarioidea, as defined by Davis and Robinson (1998), clearly do not include Douglasiidae, and changes to current classification will be required. 3) Gracillariidae were monophyletic in all analyses conducted, and nearly all species can be placed into one of six strongly supported clades though relationships among these remain unclear. 4) The difficulty in determining the phylogenetic placement of Bucculatricidae is probably attributable to compositional heterogeneity at the third codon position. From our tests for compositional heterogeneity and strong bootstrap values obtained when synonymous changes are excluded, we tentatively conclude that Bucculatricidae is closely related to Gracillariidae + Roeslerstammiidae + Yponomeutidae.


Subject(s)
Insect Proteins/genetics , Moths/classification , Moths/genetics , Phylogeny , Plant Leaves/parasitology , Animals , Molecular Sequence Data , Moths/physiology
19.
BMC Evol Biol ; 9: 280, 2009 Dec 02.
Article in English | MEDLINE | ID: mdl-19954545

ABSTRACT

BACKGROUND: In the mega-diverse insect order Lepidoptera (butterflies and moths; 165,000 described species), deeper relationships are little understood within the clade Ditrysia, to which 98% of the species belong. To begin addressing this problem, we tested the ability of five protein-coding nuclear genes (6.7 kb total), and character subsets therein, to resolve relationships among 123 species representing 27 (of 33) superfamilies and 55 (of 100) families of Ditrysia under maximum likelihood analysis. RESULTS: Our trees show broad concordance with previous morphological hypotheses of ditrysian phylogeny, although most relationships among superfamilies are weakly supported. There are also notable surprises, such as a consistently closer relationship of Pyraloidea than of butterflies to most Macrolepidoptera. Monophyly is significantly rejected by one or more character sets for the putative clades Macrolepidoptera as currently defined (P < 0.05) and Macrolepidoptera excluding Noctuoidea and Bombycoidea sensu lato (P < or = 0.005), and nearly so for the superfamily Drepanoidea as currently defined (P < 0.08). Superfamilies are typically recovered or nearly so, but usually without strong support. Relationships within superfamilies and families, however, are often robustly resolved. We provide some of the first strong molecular evidence on deeper splits within Pyraloidea, Tortricoidea, Geometroidea, Noctuoidea and others.Separate analyses of mostly synonymous versus non-synonymous character sets revealed notable differences (though not strong conflict), including a marked influence of compositional heterogeneity on apparent signal in the third codon position (nt3). As available model partitioning methods cannot correct for this variation, we assessed overall phylogeny resolution through separate examination of trees from each character set. Exploration of "tree space" with GARLI, using grid computing, showed that hundreds of searches are typically needed to find the best-feasible phylogeny estimate for these data. CONCLUSION: Our results (a) corroborate the broad outlines of the current working phylogenetic hypothesis for Ditrysia, (b) demonstrate that some prominent features of that hypothesis, including the position of the butterflies, need revision, and (c) resolve the majority of family and subfamily relationships within superfamilies as thus far sampled. Much further gene and taxon sampling will be needed, however, to strongly resolve individual deeper nodes.


Subject(s)
Biological Evolution , Lepidoptera/classification , Lepidoptera/genetics , Animals , Bayes Theorem , Phylogeny , Sequence Analysis, DNA
20.
Syst Biol ; 57(6): 920-38, 2008 Dec.
Article in English | MEDLINE | ID: mdl-19085333

ABSTRACT

This study attempts to resolve relationships among and within the four basal arthropod lineages (Pancrustacea, Myriapoda, Euchelicerata, Pycnogonida) and to assess the widespread expectation that remaining phylogenetic problems will yield to increasing amounts of sequence data. Sixty-eight regions of 62 protein-coding nuclear genes (approximately 41 kilobases (kb)/taxon) were sequenced for 12 taxonomically diverse arthropod taxa and a tardigrade outgroup. Parsimony, likelihood, and Bayesian analyses of total nucleotide data generally strongly supported the monophyly of each of the basal lineages represented by more than one species. Other relationships within the Arthropoda were also supported, with support levels depending on method of analysis and inclusion/exclusion of synonymous changes. Removing third codon positions, where the assumption of base compositional homogeneity was rejected, altered the results. Removing the final class of synonymous mutations--first codon positions encoding leucine and arginine, which were also compositionally heterogeneous--yielded a data set that was consistent with a hypothesis of base compositional homogeneity. Furthermore, under such a data-exclusion regime, all 68 gene regions individually were consistent with base compositional homogeneity. Restricting likelihood analyses to nonsynonymous change recovered trees with strong support for the basal lineages but not for other groups that were variably supported with more inclusive data sets. In a further effort to increase phylogenetic signal, three types of data exploration were undertaken. (1) Individual genes were ranked by their average rate of nonsynonymous change, and three rate categories were assigned--fast, intermediate, and slow. Then, bootstrap analysis of each gene was performed separately to see which taxonomic groups received strong support. Five taxonomic groups were strongly supported independently by two or more genes, and these genes mostly belonged to the slow or intermediate categories, whereas groups supported only by a single gene region tended to be from genes of the fast category, arguing that fast genes provide a less consistent signal. (2) A sensitivity analysis was performed in which increasing numbers of genes were excluded, beginning with the fastest. The number of strongly supported nodes increased up to a point and then decreased slightly. Recovery of Hexapoda required removal of fast genes. Support for Mandibulata (Pancrustacea + Myriapoda) also increased, at times to "strong" levels, with removal of the fastest genes. (3) Concordance selection was evaluated by clustering genes according to their ability to recover Pancrustacea, Euchelicerata, or Myriapoda and analyzing the three clusters separately. All clusters of genes recovered the three concordance clades but were at times inconsistent in the relationships recovered among and within these clades, a result that indicates that the a priori concordance criteria may bias phylogenetic signal in unexpected ways. In a further attempt to increase support of taxonomic relationships, sequence data from 49 additional taxa for three slow genes (i.e., EF-1 alpha, EF-2, and Pol II) were combined with the various 13-taxon data sets. The 62-taxon analyses supported the results of the 13-taxon analyses and provided increased support for additional pancrustacean clades found in an earlier analysis including only EF-1 alpha, EF-2, and Pol II.


Subject(s)
Arthropods/classification , Arthropods/genetics , Open Reading Frames/genetics , Phylogeny , Animals , Base Composition/genetics , Cell Nucleus/genetics
SELECTION OF CITATIONS
SEARCH DETAIL