Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Mol Phylogenet Evol ; 189: 107932, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37751827

RESUMEN

Diplomystidae is an early-diverged family of freshwater catfish endemic to southern South America. We have recently collected five juvenile specimens belonging to this family from the Bueno River Basin, a basin which the only previous record was a single juvenile specimen collected in 1996. This finding confirms the distribution of the family further South in northern Patagonia, but poses new questions about the origin of this population in an area with a strong glacial history. We used phylogenetic analyses to evaluate three different hypotheses that could explain the origin of this population in the basin. First, the population could have originated in Atlantic basins (East of the Andes) and dispersed to the Bueno Basin after the Last Glacial Maximum (LGM) via river reversals, as it has been proposed for other population of Diplomystes as well as for other freshwater species from Patagonia. Second, the population could have originated in the geographically close Valdivia Basin (West of the Andes) and dispersed south to its current location in the Bueno Basin. Third, regardless of its geographic origin (West or East of the Andes), the Bueno Basin population could have a longer history in the basin, surviving in situ through the LGM. In addition, we conducted species delimitation analyses using a recently developed method that uses a protracted model of speciation. Our goal was to test the species status of the Bueno Basin population along with another controversial population in Central Chile (Biobío Basin), which appeared highly divergent in previous studies with mtDNA. The phylogenetic analyses showed that the population from the Bueno Basin is more related to Atlantic than to Pacific lineages, although with a deep divergence that predated the LGM, supporting in situ survival rather than postglacial dispersal. In addition, these analyses also showed that the species D. nahuelbutaensis is polyphyletic, supporting the need for a taxonomic reevaluation. The species delimitation analyses supported two new species which are described using molecular diagnostic characters: Diplomystes arratiae sp. nov. from the Biobío, Carampangue, and Laraquete basins, maintaining D. nahuelbutaensis valid only for the Imperial Basin, and Diplomystes habitae sp. nov. from the Bueno Basin. This study greatly increases the number of species within both the family Diplomystidae and Patagonia, and contributes substantially to the knowledge of the evolution of southern South American freshwater biodiversity during its glacial history. Given the important contribution to the phylogenetic diversity of the family, we recommend a high conservation priority for both new species. Finally, this study highlights an exemplary scenario where species descriptions based only on DNA data are particularly valuable, bringing additional elements to the ongoing debate on DNA-based taxonomy.


Asunto(s)
Bagres , Animales , Filogenia , Bagres/genética , Chile , ADN Mitocondrial/genética , Filogeografía , Variación Genética
2.
PLoS Comput Biol ; 17(5): e1008924, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33983918

RESUMEN

The "multispecies" coalescent (MSC) model that underlies many genomic species-delimitation approaches is problematic because it does not distinguish between genetic structure associated with species versus that of populations within species. Consequently, as both the genomic and spatial resolution of data increases, a proliferation of artifactual species results as within-species population lineages, detected due to restrictions in gene flow, are identified as distinct species. The toll of this extends beyond systematic studies, getting magnified across the many disciplines that rely upon an accurate framework of identified species. Here we present the first of a new class of approaches that addresses this issue by incorporating an extended speciation process for species delimitation. We model the formation of population lineages and their subsequent development into independent species as separate processes and provide for a way to incorporate current understanding of the species boundaries in the system through specification of species identities of a subset of population lineages. As a result, species boundaries and within-species lineages boundaries can be discriminated across the entire system, and species identities can be assigned to the remaining lineages of unknown affinities with quantified probabilities. In addition to the identification of species units in nature, the primary goal of species delimitation, the incorporation of a speciation model also allows us insights into the links between population and species-level processes. By explicitly accounting for restrictions in gene flow not only between, but also within, species, we also address the limits of genetic data for delimiting species. Specifically, while genetic data alone is not sufficient for accurate delimitation, when considered in conjunction with other information we are able to not only learn about species boundaries, but also about the tempo of the speciation process itself.


Asunto(s)
Especiación Genética , Modelos Genéticos , Algoritmos , Animales , Biología Computacional , Simulación por Computador , Flujo Génico , Genética de Población , Modelos Estadísticos , Filogenia , Programas Informáticos , Especificidad de la Especie , Factores de Tiempo
3.
PLoS Comput Biol ; 17(9): e1008949, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34516547

RESUMEN

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.


Asunto(s)
Código de Barras del ADN Taxonómico , Filogenia , Algoritmos , Teorema de Bayes , ADN Mitocondrial/genética , Humanos , Cadenas de Markov , Método de Montecarlo , Polimorfismo de Nucleótido Simple
4.
Proc Natl Acad Sci U S A ; 114(7): 1607-1612, 2017 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-28137871

RESUMEN

The multispecies coalescent model underlies many approaches used for species delimitation. In previous work assessing the performance of species delimitation under this model, speciation was treated as an instantaneous event rather than as an extended process involving distinct phases of speciation initiation (structuring) and completion. Here, we use data under simulations that explicitly model speciation as an extended process rather than an instantaneous event and carry out species delimitation inference on these data under the multispecies coalescent. We show that the multispecies coalescent diagnoses genetic structure, not species, and that it does not statistically distinguish structure associated with population isolation vs. species boundaries. Because of the misidentification of population structure as putative species, our work raises questions about the practice of genome-based species discovery, with cascading consequences in other fields. Specifically, all fields that rely on species as units of analysis, from conservation biology to studies of macroevolutionary dynamics, will be impacted by inflated estimates of the number of species, especially as genomic resources provide unprecedented power for detecting increasingly finer-scaled genetic structure under the multispecies coalescent. As such, our work also represents a general call for systematic study to reconsider a reliance on genomic data alone. Until new methods are developed that can discriminate between structure due to population-level processes and that due to species boundaries, genomic-based results should only be considered a hypothesis that requires validation of delimited species with multiple data types, such as phenotypic and ecological information.


Asunto(s)
Flujo Génico , Especiación Genética , Genoma/genética , Modelos Genéticos , Animales , Simulación por Computador , Evolución Molecular , Humanos , Fenotipo , Filogenia , Especificidad de la Especie
5.
BMC Evol Biol ; 18(1): 123, 2018 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-30097006

RESUMEN

BACKGROUND: Macroevolutionary modeling of species diversification plays important roles in inferring large-scale biodiversity patterns. It allows estimation of speciation and extinction rates and statistically testing their relationships with different ecological factors. However, macroevolutionary patterns are ultimately generated by microevolutionary processes acting at population levels, especially when speciation and extinction are considered protracted instead of point events. Neglecting the connection between micro- and macroevolution may hinder our ability to fully understand the underlying mechanisms that drive the observed patterns. RESULTS: In this simulation study, we used the protracted speciation framework to demonstrate that distinct microevolutionary scenarios can generate very similar biodiversity patterns (e.g., latitudinal diversity gradient). We also showed that current macroevolutionary models may not be able to distinguish these different scenarios. CONCLUSIONS: Given the compounded nature of speciation and extinction rates, one needs to be cautious when inferring causal relationships between ecological factors and macroevolutioanry rates. Future studies that incorporate microevolutionary processes into current modeling approaches are in need.


Asunto(s)
Evolución Biológica , Animales , Biodiversidad , Aves/fisiología , Extinción Biológica , Especiación Genética , Filogenia
6.
Am J Bot ; 105(3): 376-384, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29710372

RESUMEN

PREMISE OF THE STUDY: Discordant gene trees are commonly encountered when sequences from thousands of loci are applied to estimate phylogenetic relationships. Several processes contribute to this discord. Yet, we have no methods that jointly model different sources of conflict when estimating phylogenies. An alternative to analyzing entire genomes or all the sequenced loci is to identify a subset of loci for phylogenetic analysis. If we can identify data partitions that are most likely to reflect descent from a common ancestor (i.e., discordant loci that indeed reflect incomplete lineage sorting [ILS], as opposed to some other process, such as lateral gene transfer [LGT]), we can analyze this subset using powerful coalescent-based species-tree approaches. METHODS: Test data sets were simulated where discord among loci could arise from ILS and LGT. Data sets where analyzed using the newly developed program CLASSIPHY (Huang et al., ) to assess whether our ability to distinguish the cause of discord among loci varied when ILS and LGT occurred in the recent versus deep past and whether the accuracy of these inferences were affected by the mutational process. KEY RESULTS: We show that accuracy of probabilistic classification of individual loci by the cause of discord differed when ILS and LGT events occurred more recently compared with the distant past and that the signal-to-noise ratio arising from the mutational process contributes to difficulties in inferring LGT data partitions. CONCLUSIONS: We discuss our findings in terms of the promise and limitations of identifying subsets of loci for species-tree inference that will not violate the underlying coalescent model (i.e., data partitions in which ILS, and not LGT, contributes to discord). We also discuss the empirical implications of our work given the many recalcitrant nodes in the tree of life (e.g., origins of angiosperms, amniotes, or Neoaves), and recent arguments for concatenating loci.


Asunto(s)
Transferencia de Gen Horizontal , Sitios Genéticos , Especiación Genética , Modelos Genéticos , Filogenia , Simulación por Computador , Genoma , Magnoliopsida/genética , Mutación
7.
Syst Biol ; 65(3): 525-45, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-26715585

RESUMEN

Current statistical biogeographical analysis methods are limited in the ways ecology can be related to the processes of diversification and geographical range evolution, requiring conflation of geography and ecology, and/or assuming ecologies that are uniform across all lineages and invariant in time. This precludes the possibility of studying a broad class of macroevolutionary biogeographical theories that relate geographical and species histories through lineage-specific ecological and evolutionary dynamics, such as taxon cycle theory. Here we present a new model that generates phylogenies under a complex of superpositioned geographical range evolution, trait evolution, and diversification processes that can communicate with each other. We present a likelihood-free method of inference under our model using discriminant analysis of principal components of summary statistics calculated on phylogenies, with the discriminant functions trained on data generated by simulations under our model. This approach of model selection by classification of empirical data with respect to data generated under training models is shown to be efficient, robust, and performs well over a broad range of parameter space defined by the relative rates of dispersal, trait evolution, and diversification processes. We apply our method to a case study of the taxon cycle, that is testing for habitat and trophic level constraints in the dispersal regimes of the Wallacean avifaunal radiation.


Asunto(s)
Análisis Discriminante , Aprendizaje Automático , Modelos Biológicos , Filogeografía/métodos , Simulación por Computador , Filogenia
8.
PLoS Comput Biol ; 11(7): e1004365, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-26200800

RESUMEN

There has been an explosion of research on host-associated microbial communities (i.e.,microbiomes). Much of this research has focused on surveys of microbial diversities across a variety of host species, including humans, with a view to understanding how these microbiomes are distributed across space and time, and how they correlate with host health, disease, phenotype, physiology and ecology. Fewer studies have focused on how these microbiomes may have evolved. In this paper, we develop an agent-based framework to study the dynamics of microbiome evolution. Our framework incorporates neutral models of how hosts acquire their microbiomes, and how the environmental microbial community that is available to the hosts is assembled. Most importantly, our framework also incorporates a Wright-Fisher genealogical model of hosts, so that the dynamics of microbiome evolution is studied on an evolutionary timescale. Our results indicate that the extent of parental contribution to microbial availability from one generation to the next significantly impacts the diversity of microbiomes: the greater the parental contribution, the less diverse the microbiomes. In contrast, even when there is only a very small contribution from a constant environmental pool, microbial communities can remain highly diverse. Finally, we show that our models may be used to construct hypotheses about the types of processes that operate to assemble microbiomes over evolutionary time.


Asunto(s)
Evolución Biológica , Ecosistema , Variación Genética/genética , Especificidad del Huésped/genética , Microbiota/genética , Modelos Genéticos , Simulación por Computador
9.
PLoS One ; 19(1): e0291801, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38206953

RESUMEN

Phylogenetic analysis of protein sequences provides a powerful means of identifying novel protein functions and subfamilies, and for identifying and resolving annotation errors. However, automation of functional clustering based on phylogenetic trees has been challenging and most of it is done manually. Clustering phylogenetic trees usually requires the delineation of tree-based thresholds (e.g., distances), leading to an ad hoc problem. We propose a new phylogenetic clustering approach that identifies clusters without using ad hoc distances or other pre-defined values. Our workflow combines uniform manifold approximation and projection (UMAP) with Gaussian mixture models as a k-means like procedure to automatically group sequences into clusters. We then apply a "second pass" clade identification algorithm to resolve non-monophyletic groups. We tested our approach with several well-curated protein families (outer membrane porins, acyltransferase, and nuclear receptors) and showed our automated methods recapitulated known subfamilies. We also applied our methods to a broad range of different protein families from multiple databases, including Pfam, PANTHER, and UniProt, and to alignments of RNA viral genomes. Our results showed that AutoPhy rapidly generated monophyletic clusters (subfamilies) within phylogenetic trees evolving at very different rates both within and among phylogenies. The phylogenetic clusters generated by AutoPhy resolved misannotations and identified new protein functional groups and novel viral strains.


Asunto(s)
Algoritmos , Proteínas , Filogenia , Proteínas/genética , Porinas/genética , Secuencia de Aminoácidos
10.
Ecol Evol ; 14(3): e11067, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38435021

RESUMEN

Climate change has the potential to disrupt species interactions across global ecosystems. Ectotherm-endotherm interactions may be especially prone to this risk due to the possible mismatch between the species in physiological response and performance. However, few studies have examined how changing temperatures might differentially impact species' niches or available suitable habitat when they have very different modes of thermoregulation. An ideal system for studying this interaction is the predator-prey system. In this study, we used ecological niche modeling to characterize the niche overlap and examine biogeography in past and future climate conditions of prairie rattlesnakes (Crotalus viridis) and Ord's kangaroo rats (Dipodomys ordii), an endotherm-ectotherm pair typifying a predator-prey species interaction. Our models show a high niche overlap between these two species (D = 0.863 and I = 0.979) and further affirm similar paleoecological distributions during the last glacial maximum (LGM) and mid-Holocene (MH). Under future climate change scenarios, we found that prairie rattlesnakes may experience a reduction in overall suitable habitat (RCP 2.6 = -1.82%, 4.5 = -4.62%, 8.5 = -7.34%), whereas Ord's kangaroo rats may experience an increase (RCP 2.6 = 9.8%, 4.5 = 11.71%, 8.5 = 8.37%). We found a shared trend of stable suitable habitat at northern latitudes but reduced suitability in southern portions of the range, and we propose future monitoring and conservation be focused on those areas. Overall, we demonstrate a biogeographic example of how interacting ectotherm-endotherm species may have mismatched responses under climate change scenarios and the models presented here can serve as a starting point for further investigation into the biogeography of these systems.

11.
BMC Bioinformatics ; 14: 158, 2013 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-23668630

RESUMEN

BACKGROUND: Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS: With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS: Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.


Asunto(s)
Filogenia , Programas Informáticos , Internet
12.
Syst Biol ; 61(4): 675-89, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22357728

RESUMEN

In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input-output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.


Asunto(s)
Evolución Biológica , Biología Computacional/normas , Lenguajes de Programación , Biodiversidad , Clasificación , Informática , Modelos Biológicos , Filogenia , Programas Informáticos
13.
Mov Ecol ; 11(1): 72, 2023 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-37919756

RESUMEN

BACKGROUND: Kangaroo rats are small mammals that are among the most abundant vertebrates in many terrestrial ecosystems in Western North America and are considered both keystone species and ecosystem engineers, providing numerous linkages between other species as both consumers and resources. However, there are challenges to studying the behavior and activity of these species due to the difficulty of observing large numbers of individuals that are small, secretive, and nocturnal. Our goal was to develop an integrated approach of miniaturized animal-borne accelerometry and radiotelemetry to classify the cryptic behavior and activity cycles of kangaroo rats and test hypotheses of how their behavior is influenced by light cycles, moonlight, and weather. METHODS: We provide a proof-of-concept approach to effectively quantify behavioral patterns of small bodied (< 50 g), nocturnal, and terrestrial free-ranging mammals using large acceleration datasets by combining low-mass, miniaturized animal-borne accelerometers with radiotelemetry and advanced machine learning techniques. We developed a method of attachment and retrieval for deploying accelerometers, a non-disruptive method of gathering observational validation datasets for acceleration data on free-ranging nocturnal small mammals, and used these techniques on Merriam's kangaroo rats to analyze how behavioral patterns relate to abiotic factors. RESULTS: We found that Merriam's kangaroo rats are only active during the nighttime phases of the diel cycle and are particularly active during later light phases of the night (i.e., late night, morning twilight, and dawn). We found no reduction in activity or foraging associated with moonlight, indicating that kangaroo rats are actually more lunarphilic than lunarphobic. We also found that kangaroo rats increased foraging effort on more humid nights, most likely as a mechanism to avoid cutaneous water loss. CONCLUSIONS: Small mammals are often integral to ecosystem functionality, as many of these species are highly abundant ecosystem engineers driving linkages in energy flow and nutrient transfer across trophic levels. Our work represents the first continuous detailed quantitative description of fine-scale behavioral activity budgets in kangaroo rats, and lays out a general framework for how to use miniaturized biologging devices on small and nocturnal mammals to examine behavioral responses to environmental factors.

14.
Mol Ecol Resour ; 22(1): 430-438, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34288531

RESUMEN

A wide range of data types can be used to delimit species and various computer-based tools dedicated to this task are now available. Although these formalized approaches have significantly contributed to increase the objectivity of species delimitation (SD) under different assumptions, they are not routinely used by alpha-taxonomists. One obvious shortcoming is the lack of interoperability among the various independently developed SD programs. Given the frequent incongruences between species partitions inferred by different SD approaches, researchers applying these methods often seek to compare these alternative species partitions to evaluate the robustness of the species boundaries. This procedure is excessively time consuming at present, and the lack of a standard format for species partitions is a major obstacle. Here, we propose a standardized format, SPART, to enable compatibility between different SD tools exporting or importing partitions. This format reports the partitions and describes, for each of them, the assignment of individuals to the "inferred species". The syntax also allows support values to be optionally reported, as well as original trees and the full command lines used in the respective SD analyses. Two variants of this format are proposed, overall using the same terminology but presenting the data either optimized for human readability (matricial SPART) or in a format in which each partition forms a separate block (SPART.XML). ABGD, DELINEATE, GMYC, PTP and TR2 have already been adapted to output SPART files and a new version of LIMES has been developed to import, export, merge and split them.

15.
Bioinformatics ; 26(12): 1569-71, 2010 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-20421198

RESUMEN

UNLABELLED: DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation routines to generate trees under a number of different phylogenetic and coalescent models. DendroPy's data simulation and manipulation facilities, in conjunction with its support of a broad range of phylogenetic data formats (NEXUS, Newick, PHYLIP, FASTA, NeXML, etc.), allow it to serve a useful role in various phyloinformatics and phylogeographic pipelines. AVAILABILITY: The stable release of the library is available for download and automated installation through the Python Package Index site (http://pypi.python.org/pypi/DendroPy), while the active development source code repository is available to the public from GitHub (http://github.com/jeetsukumaran/DendroPy).


Asunto(s)
Biología Computacional/métodos , Filogenia , Programas Informáticos , Algoritmos , Bases de Datos Factuales , Lenguajes de Programación
16.
J Chromatogr A ; 1660: 462656, 2021 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-34798444

RESUMEN

Nontargeted analysis based on mass spectrometry is a rising practice in environmental monitoring for identifying contaminants of emerging concern. Nontargeted analysis performed using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC×GC/TOF-MS) generates large numbers of possible analytes. Moreover, the default spectral library similarity score-based search algorithm used by LECO® ChromaTOF® does not ensure that high similarity scores result in correct library matches. Therefore, an additional manual screening is necessary, but leads to human errors especially when dealing with large amounts of data. To improve the speed and accuracy of the chemical identification, we developed CINeMA.py (Classification Is Never Manual Again). This programming suite automates GC×GC/TOF-MS data interpretation by determining the confidence of a match between the observed analyte mass spectrum and the LECO® ChromaTOF® software generated library hit from the NIST Electron Ionization Mass Spectral (NIST EI-MS) library. Our script allows the user to evaluate the confidence of the match using an algorithmic method that mimics the manual curation process and two different machine learning approaches (neural networks and random forest). The script allows the user to adjust various parameters (e.g., similarity threshold) and study their effects on prediction accuracy. To test CINeMA.py, we used data from two different environmental contaminant studies: an EPA study on household dust and a study on stormwater runoff. Using a reference set based on the analysis performed by highly trained users of the ChromaTOF and GC×GC/TOF-MS systems, the random forest model had the highest prediction accuracies of 86% and 83% on the EPA and Stormwater data sets, respectively. The algorithmic approach had the second-best prediction accuracy (82% and 79%), while the neural network accuracy had the lowest (63% and 67%). All the approaches required less than 1 min to classify 986 observed analytes, whereas manual data analysis required hours or days to complete. Our methods were also able to detect high confidence matches missed during the manual review. Overall, CINeMA.py provides users with a powerful suite of tools that should significantly speed-up data analysis while reducing the possibilities of manual errors and discrepancies among users, and can be applicable to other GC/EI-MS instrument based nontargeted analysis.


Asunto(s)
Electrones , Programas Informáticos , Algoritmos , Monitoreo del Ambiente , Cromatografía de Gases y Espectrometría de Masas , Humanos
17.
Mol Biol Evol ; 26(1): 1-3, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18984901

RESUMEN

Wheeler WC and Pickett KM (2008. Topology-Bayes versus clade-Bayes in phylogenetic analysis. Mol Biol Evol. 25:447-453.) discuss two ways of summarizing the posterior probability distribution of a Bayesian phylogenetic analysis, which they refer to as "topology-Bayes" and "clade-Bayes." They claim that the clade-Bayes approach leads to problems such as "exaggerated clade support, inconsistently biased priors, and the impossibility of topology hypothesis testing," which are not problems for the topology-Bayes approach. However, their argument for topology-Bayes over clade-Bayes is based on errors in the interpretation of summary statistics associated with Bayesian phylogenetic analysis. Although there is a well-documented difference between the maximum posterior probability topology and the majority-rule consensus topology (the established terms for topology-Bayes and clade-Bayes summaries, respectively), both have a place in phylogenetic analysis. Choice of summarization strategy should be driven by choice of parameters that need to be estimated versus those to be marginalized given the evolutionary questions being asked or hypotheses being tested.


Asunto(s)
Teorema de Bayes , Filogenia , Evolución Molecular , Modelos Genéticos
18.
Mol Phylogenet Evol ; 54(2): 561-70, 2010 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19679193

RESUMEN

We investigated the phylogenetic relationships and estimated the history of species diversification and biogeography in the bufonid genus Ansonia from Southeast Asia, a unique organism with tadpoles adapted to life in strong currents chiefly in montane regions and also in lowland rainforests. We estimated phylogenetic relationships among 32 named and unnamed taxa using 2461bp sequences of the mitochondrial 12S rRNA, tRNA(val), and 16S rRNA genes with equally-weighted parsimony, maximum likelihood, and Bayesian methods of inference. Monophyletic clades of Southeast Asian members of the genus Ansonia are well-supported, allowing for the interpretation of general biogeographic conclusions. The genus is divided into two major clades. One of these contains two reciprocally monophyletic subclades, one from the Malay Peninsula and Thailand and the other from Borneo. The other major clade primarily consists of Bornean taxa but also includes a monophyletic group of two Philippine species and a single peninsular Malaysian species. We estimated absolute divergence times using Bayesian methods with external calibration points to reconstruct the relative timing of faunal exchange between the major landmasses of Southeast Asia.


Asunto(s)
Bufonidae/genética , Evolución Molecular , Filogenia , Animales , Asia Sudoriental , Teorema de Bayes , Bufonidae/clasificación , ADN Mitocondrial/genética , Geografía , Funciones de Verosimilitud , Alineación de Secuencia , Análisis de Secuencia de ADN
19.
Mol Phylogenet Evol ; 57(2): 598-619, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20601009

RESUMEN

Southeast Asia's widespread species offer unique opportunities to explore the effects of geographical barriers to dispersal on patterns of vertebrate lineage diversification. We analyzed mitochondrial gene sequences (16S rDNA) from a geographically widespread sample of 266 Southeast Asian tree frogs, including 244 individuals of Polypedates leucomystax and its close relatives. Our expectation was that lineages on island archipelagos would exhibit more substantial geographic structure, corresponding to the geological history of terrestrial connectivity in this region, compared to the Asian mainland. Contrary to predictions, we found evidence of numerous highly divergent lineages from a limited area on the Asian mainland, but fewer lineages with shallower divergences throughout oceanic islands of the Philippines and Indonesia. Surprisingly and in numerous instances, lineages in the archipelagos span distinct biogeographical provinces. Phylogeographic analyses identified four major haplotype clades; summary statistics, mismatch distributions, and Bayesian coalescent inference of demography provide support for recent range expansion, population growth, and/or admixture in the Philippine and some Sulawesi populations. We speculate that the current range of P. leucomystax in Southeast Asia is much larger now than in the recent past. Conversion of forested areas to monoculture agriculture and transportation of agricultural products between islands may have facilitated unprecedented population and range expansion in P. leucomystax throughout thousands of islands in the Philippine and Indonesian archipelagos.


Asunto(s)
Anuros/clasificación , Anuros/genética , Filogeografía , Animales , ADN Ribosómico/genética , Variación Genética/genética , Humanos , Indonesia , Filipinas
20.
Trends Ecol Evol ; 33(6): 390-398, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29685579

RESUMEN

The development of process-based probabilistic models for historical biogeography has transformed the field by grounding it in modern statistical hypothesis testing. However, most of these models abstract away biological differences, reducing species to interchangeable lineages. We present here the case for reintegration of biology into probabilistic historical biogeographical models, allowing a broader range of questions about biogeographical processes beyond ancestral range estimation or simple correlation between a trait and a distribution pattern, as well as allowing us to assess how inferences about ancestral ranges themselves might be impacted by differential biological traits. We show how new approaches to inference might cope with the computational challenges resulting from the increased complexity of these trait-based historical biogeographical models.


Asunto(s)
Distribución Animal , Rasgos de la Historia de Vida , Fenotipo , Filogenia , Filogeografía , Dispersión de las Plantas , Biología Computacional , Modelos Biológicos , Modelos Estadísticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA