Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 2744: 53-76, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38683311

RESUMO

DNA sequences are increasingly used for large-scale biodiversity inventories. Because these genetic data avoid the time-consuming initial sorting of specimens based on their phenotypic attributes, they have been recently incorporated into taxonomic workflows for overlooked and diverse taxa. Major statistical developments have accompanied this new practice, and several models have been proposed to delimit species with single-locus DNA sequences. However, proposed approaches to date make different assumptions regarding taxon lineage history, leading to strong discordance whenever comparisons are made among methods. Distance-based methods, such as Automatic Barcode Gap Discovery (ABGD) and Assemble Species by Automatic Partitioning (ASAP), rely on the detection of a barcode gap (i.e., the lack of overlap in the distributions of intraspecific and interspecific genetic distances) and the associated threshold in genetic distances. Network-based methods, as exemplified by the REfined Single Linkage (RESL) algorithm for the generation of Barcode Index Numbers (BINs), use connectivity statistics to hierarchically cluster-related haplotypes into molecular operational taxonomic units (MOTUs) which serve as species proxies. Tree-based methods, including Poisson Tree Processes (PTP) and the General Mixed Yule Coalescent (GMYC), fit statistical models to phylogenetic trees by maximum likelihood or Bayesian frameworks.Multiple webservers and stand-alone versions of these methods are now available, complicating decision-making regarding the most appropriate approach to use for a given taxon of interest. For instance, tree-based methods require an initial phylogenetic reconstruction, and multiple options are now available for this purpose such as RAxML and BEAST. Across all examined species delimitation methods, judicious parameter setting is paramount, as different model parameterizations can lead to differing conclusions. The objective of this chapter is to guide users step-by-step through all the procedures involved for each of these methods, while aggregating all necessary information required to conduct these analyses. The "Materials" section details how to prepare and format input files, including options to align sequences and conduct tree reconstruction with Maximum Likelihood and Bayesian inference. The Methods section presents the procedure and options available to conduct species delimitation analyses, including distance-, network-, and tree-based models. Finally, limits and future developments are discussed in the Notes section. Most importantly, species delimitation methods discussed herein are categorized based on five indicators: reliability, availability, scalability, understandability, and usability, all of which are fundamental properties needed for any approach to gain unanimous adoption within the DNA barcoding community moving forward.


Assuntos
Algoritmos , Código de Barras de DNA Taxonômico , Filogenia , Código de Barras de DNA Taxonômico/métodos , Software , Biodiversidade , Análise de Sequência de DNA/métodos , Haplótipos/genética
2.
Methods Mol Biol ; 2744: 375-390, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38683332

RESUMO

DNA barcoding has largely established itself as a mainstay for rapid molecular taxonomic identification in both academic and applied research. The use of DNA barcoding as a molecular identification method depends on a "DNA barcode gap"-the separation between the maximum within-species difference and the minimum between-species difference. Previous work indicates the presence of a gap hinges on sampling effort for focal taxa and their close relatives. Furthermore, both theory and empirical work indicate a gap may not occur for related pairs of biological species. Here, we present a novel evaluation approach in the form of an easily calculated set of nonparametric metrics to quantify the extent of proportional overlap in inter- and intraspecific distributions of pairwise differences among target species and their conspecifics. The metrics are based on a simple count of the number of overlapping records for a species falling within the bounds of maximum intraspecific distance and minimum interspecific distance. Our approach takes advantage of the asymmetric directionality inherent in pairwise genetic distance distributions, which has not been previously done in the DNA barcoding literature. We apply the metrics to the predatory diving beetle genus Agabus as a case study because this group poses significant identification challenges due to its morphological uniformity despite both relative sampling ease and well-established taxonomy. Results herein show that target species and their nearest neighbor species were found to be tightly clustered and therefore difficult to distinguish. Such findings demonstrate that DNA barcoding can fail to fully resolve species in certain cases. Moving forward, we suggest the implementation of the proposed metrics be integrated into a common framework to be reported in any study that uses DNA barcoding for identification. In so doing, the importance of the DNA barcode gap and its components for the success of DNA-based identification using DNA barcodes can be better appreciated.


Assuntos
Código de Barras de DNA Taxonômico , Código de Barras de DNA Taxonômico/métodos , Animais , Besouros/genética , Besouros/classificação , DNA/genética , DNA/análise , Especificidade da Espécie
3.
Biodivers Data J ; 11: e96480, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38327328

RESUMO

Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically calculating the frequency of nucleotides or amino acids in each sequence position and outputting those that occur under a user-specified frequency (default of p = 0.001). These results can then be used to explore fundamental population genetic and phylogeographic patterns, mechanisms and processes at the microevolutionary level, such as nucleotide and amino acid sequence conservation. Our package extends earlier work pertaining to an implementation of VLF analysis in Microsoft Excel, which was found to be both computationally slow and error prone. We compare those results to our own herein. Results between the two implementations are found to be highly consistent for a large DNA barcode dataset of bird species. Differences in results are readily explained by both manual human error and inadequate Linnean taxonomy (specifically, species synonymy). Here, VLF is also applied to a subset of avian barcodes to assess the extent of biological artifacts at the species level for Canada goose (Branta canadensis), as well as within a large dataset of DNA barcodes for fishes of forensic and regulatory importance. The novelty of VLF and its benefit over the previous implementation include its high level of automation, speed, scalability and ease-of-use, each desirable characteristics which will be extremely valuable as more sequence data are rapidly accumulated in popular reference databases, such as BOLD and GenBank.

4.
PeerJ ; 9: e11157, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33976967

RESUMO

Although the butterflies of North America have received considerable taxonomic attention, overlooked species and instances of hybridization continue to be revealed. The present study assembles a DNA barcode reference library for this fauna to identify groups whose patterns of sequence variation suggest the need for further taxonomic study. Based on 14,626 records from 814 species, DNA barcodes were obtained for 96% of the fauna. The maximum intraspecific distance averaged 1/4 the minimum distance to the nearest neighbor, producing a barcode gap in 76% of the species. Most species (80%) were monophyletic, the others were para- or polyphyletic. Although 15% of currently recognized species shared barcodes, the incidence of such taxa was far higher in regions exposed to Pleistocene glaciations than in those that were ice-free. Nearly 10% of species displayed high intraspecific variation (>2.5%), suggesting the need for further investigation to assess potential cryptic diversity. Aside from aiding the identification of all life stages of North American butterflies, the reference library has provided new perspectives on the incidence of both cryptic and potentially over-split species, setting the stage for future studies that can further explore the evolutionary dynamics of this group.

5.
PeerJ Comput Sci ; 6: e243, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33816897

RESUMO

Assessing levels of standing genetic variation within species requires a robust sampling for the purpose of accurate specimen identification using molecular techniques such as DNA barcoding; however, statistical estimators for what constitutes a robust sample are currently lacking. Moreover, such estimates are needed because most species are currently represented by only one or a few sequences in existing databases, which can safely be assumed to be undersampled. Unfortunately, sample sizes of 5-10 specimens per species typically seen in DNA barcoding studies are often insufficient to adequately capture within-species genetic diversity. Here, we introduce a novel iterative extrapolation simulation algorithm of haplotype accumulation curves, called HACSim (Haplotype Accumulation Curve Simulator) that can be employed to calculate likely sample sizes needed to observe the full range of DNA barcode haplotype variation that exists for a species. Using uniform haplotype and non-uniform haplotype frequency distributions, the notion of sampling sufficiency (the sample size at which sampling accuracy is maximized and above which no new sampling information is likely to be gained) can be gleaned. HACSim can be employed in two primary ways to estimate specimen sample sizes: (1) to simulate haplotype sampling in hypothetical species, and (2) to simulate haplotype sampling in real species mined from public reference sequence databases like the Barcode of Life Data Systems (BOLD) or GenBank for any genomic marker of interest. While our algorithm is globally convergent, runtime is heavily dependent on initial sample sizes and skewness of the corresponding haplotype frequency distribution.

6.
Ecol Evol ; 9(5): 2996-3010, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30891232

RESUMO

DNA barcoding has greatly accelerated the pace of specimen identification to the species level, as well as species delineation. Whereas the application of DNA barcoding to the matching of unknown specimens to known species is straightforward, its use for species delimitation is more controversial, as species discovery hinges critically on present levels of haplotype diversity, as well as patterning of standing genetic variation that exists within and between species. Typical sample sizes for molecular biodiversity assessment using DNA barcodes range from 5 to 10 individuals per species. However, required levels that are necessary to fully gauge haplotype variation at the species level are presumed to be strongly taxon-specific. Importantly, little attention has been paid to determining appropriate specimen sample sizes that are necessary to reveal the majority of intraspecific haplotype variation within any one species. In this paper, we present a brief outline of the current literature and methods on intraspecific sample size estimation for the assessment of COI DNA barcode haplotype sampling completeness. The importance of adequate sample sizes for studies of molecular biodiversity is stressed, with application to a variety of metazoan taxa, through reviewing foundational statistical and population genetic models, with specific application to ray-finned fishes (Chordata: Actinopterygii). Finally, promising avenues for further research in this area are highlighted.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...