Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 16.529
1.
Invertebr Syst ; 382024 Jun.
Article En | MEDLINE | ID: mdl-38838190

Hymenoptera has some of the highest diversity and number of individuals among insects. Many of these species potentially play key roles as food sources, pest controllers and pollinators. However, little is known about the diversity and biology and ~80% of the species have not yet been described. Classical taxonomy based on morphology is a rather slow process but DNA barcoding has already brought considerable progress in identification. Innovative methods such as image-based identification and automation can further speed up the process. We present a proof of concept for image data recognition of a parasitic wasp family, the Diapriidae (Hymenoptera), obtained as part of the GBOL III project. These tiny (1.2-4.5mm) wasps were photographed and identified using DNA barcoding to provide a solid ground truth for training a neural network. Taxonomic identification was used down to the genus level. Subsequently, three different neural network architectures were trained, evaluated and optimised. As a result, 11 different genera of diaprids and one mixed group of 'other Hymenoptera' can be classified with an average accuracy of 96%. Additionally, the sex of the specimen can be classified automatically with an accuracy of >97%.


Neural Networks, Computer , Wasps , Animals , Wasps/genetics , Wasps/anatomy & histology , DNA Barcoding, Taxonomic , Image Processing, Computer-Assisted/methods , Female , Classification/methods , Species Specificity , Male
2.
STAR Protoc ; 5(2): 103125, 2024 Jun 21.
Article En | MEDLINE | ID: mdl-38870016

The ecosystem management actions taxonomy (EMAT) consists of actions taken by humans and wildlife that affect an ecosystem. Here, I present a protocol for discovering machine-readable entities of the EMAT. I describe steps for acquiring stories from online locations, collecting them into a story file, and processing them through a software package to extract those actions that match EMAT taxa. I then detail procedures for using the story file to learn new EMAT taxa.


Ecosystem , Software , Humans , Animals , Classification/methods , Conservation of Natural Resources/methods
4.
Genome Biol Evol ; 16(5)2024 May 02.
Article En | MEDLINE | ID: mdl-38748485

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.


High-Throughput Nucleotide Sequencing , Machine Learning , Databases, Genetic , Computational Biology/methods , Classification/methods
5.
Philos Trans R Soc Lond B Biol Sci ; 379(1904): 20230124, 2024 Jun 24.
Article En | MEDLINE | ID: mdl-38705180

DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX's key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.


Algorithms , DNA Barcoding, Taxonomic , DNA Barcoding, Taxonomic/methods , Classification/methods , Computer Graphics , Animals
6.
J Am Med Inform Assoc ; 31(7): 1493-1502, 2024 Jun 20.
Article En | MEDLINE | ID: mdl-38742455

BACKGROUND: Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES: This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS: We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS: The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION: The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.


Electronic Health Records , Natural Language Processing , Electronic Health Records/classification , Humans , Classification/methods , Medical Errors/classification
7.
J Dent ; 146: 105058, 2024 Jul.
Article En | MEDLINE | ID: mdl-38729286

OBJECTIVES: This review aimed to map taxonomy frameworks, descriptions, and applications of immersive technologies in the dental literature. DATA: The Preferred reporting items for systematic reviews and meta-analyses extension for scoping reviews (PRISMA-ScR) guidelines was followed, and the protocol was registered at open science framework platform (https://doi.org/10.17605/OSF.IO/H6N8M). SOURCES: Systematic search was conducted in MEDLINE (via PubMed), Scopus, and Cochrane Library databases, and complemented by manual search. STUDY SELECTION: A total of 84 articles were included, with 81 % between 2019 and 2023. Most studies were experimental (62 %), including education (25 %), protocol feasibility (20 %), in vitro (11 %), and cadaver (6 %). Other study types included clinical report/technique article (24 %), clinical study (9 %), technical note/tip to reader (4 %), and randomized controlled trial (1 %). Three-quarters of the included studies were published in oral and maxillofacial surgery (38 %), dental education (26 %), and implant (12 %) disciplines. Methods of display included head mounted display device (HMD) (55 %), see through screen (32 %), 2D screen display (11 %), and projector display (2 %). Descriptions of immersive realities were fragmented and inconsistent with lack of clear taxonomy framework for the umbrella and the subset terms including virtual reality (VR), augmented reality (AR), mixed reality (MR), augmented virtuality (AV), extended reality, and X reality. CONCLUSIONS: Immersive reality applications in dentistry are gaining popularity with a notable surge in the number of publications in the last 5 years. Ambiguities are apparent in the descriptions of immersive realities. A taxonomy framework based on method of display (full or partial) and reality class (VR, AR, or MR) is proposed. CLINICAL SIGNIFICANCE: Understanding different reality classes can be perplexing due to their blurred boundaries and conceptual overlapping. Immersive technologies offer novel educational and clinical applications. This domain is fast developing. With the current fragmented and inconsistent terminologies, a comprehensive taxonomy framework is necessary.


Dentistry , Humans , Classification , Education, Dental , Virtual Reality , Augmented Reality
8.
Syst Parasitol ; 101(3): 34, 2024 May 03.
Article En | MEDLINE | ID: mdl-38700784

Although most Latin binomial names of species are valid, many are eventually unaccepted when they are found to be synonyms of previously described species, or superseded by a new combination when the species they denote are moved to a different genus. What proportion of parasite species names become unaccepted over time, and how long does it take for incorrect names to become unaccepted? Here, we address these questions using a dataset comprising thousands of species names of parasitic helminths from four higher taxa (Acanthocephala, Nematoda, Cestoda, and Trematoda). Overall, among species names proposed in the past two-and-a-half centuries, nearly one-third have since been unaccepted, the most common reason being that they have been superseded by a new combination. A greater proportion of older names (proposed pre-1950) have since been unaccepted compared to names proposed more recently, however most taxonomic acts leading to species names being unaccepted (through either synonymy or reclassification) occurred in the past few decades. Overall, the average longevity of helminth species names that are currently unaccepted was 29 years; although many remained in use for over 100 years, about 50% of the total were invalidated within 20 years of first being proposed. The patterns observed were roughly the same for all four higher helminth taxa considered here. Our results provide a quantitative illustration of the self-correcting nature of parasite taxonomy, and can also help to calibrate future estimates of total parasite biodiversity.


Helminths , Terminology as Topic , Animals , Helminths/classification , Species Specificity , Classification
9.
Int J Med Inform ; 187: 105438, 2024 Jul.
Article En | MEDLINE | ID: mdl-38579660

BACKGROUND: Taxonomies are needed for automated analysis of clinical data in healthcare. Few reviews of the taxonomy development methods used in health sciences are found. This systematic review aimed to describe the scope of the available taxonomies relative to patient safety, the methods used for taxonomy development, and the strengths and limitations of the methods. The purpose of this systematic review is to guide future taxonomy development projects. METHODS: The CINAHL, PubMed, Scopus, and Web of Science databases were searched for studies from January 2012 to April 25, 2023. Two authors selected the studies using inclusion and exclusion criteria and critical appraisal checklists. The data were analysed inductively, and the results were reported narratively. RESULTS: The studies (n = 13) across healthcare concerned mainly taxonomies of adverse events and medication safety but little for specialised fields and information technology. Critical appraisal indicated inadequate reporting of the used taxonomy development methods. Ten phases of taxonomy development were identified: (1) defining purpose and (2) the theory base for development, (3) relevant data sources' identification, (4) main terms' identification and definitions, (5) items' coding and pooling, (6) reliability and validity evaluation of coding and/or codes, (7) development of a hierarchical structure, (8) testing the structure, (9) piloting the taxonomy and (10) reporting application and validation of the final taxonomy. Seventeen statistical tests and seven software systems were utilised, but automated data extraction methods were used rarely. Multimethod and multi-stakeholder approach, code- and hierarchy testing and piloting were strengths and time consumption and small samples in testing limitations. CONCLUSION: New taxonomies are needed on diverse specialities and information technology related to patient safety. Structured method is needed for taxonomy development, reporting and appraisal to strengthen taxonomies' quality. A new guide was proposed for taxonomy development, for which testing is required. Prospero registration number CRD42023411022.


Patient Safety , Humans , Classification/methods , Medical Informatics
10.
Methods Mol Biol ; 2744: 33-52, 2024.
Article En | MEDLINE | ID: mdl-38683310

The use of DNA has helped to improve and speed up species identification and delimitation. However, it also provides new challenges to taxonomists. Incongruence of outcome from various markers and delimitation methods, bias from sampling and skewed species distribution, implemented models, and the choice of methods/priors may mislead results and also may, in conclusion, increase elements of subjectivity in species taxonomy. The lack of direct diagnostic outcome from most contemporary molecular delimitation approaches and the need for a reference to existing and best sampled trait reference systems reveal the need for refining the criteria of species diagnosis and diagnosability in the current framework of nomenclature codes and good practices to avoid nomenclatorial instability, parallel taxonomies, and consequently more and new taxonomic impediment.


DNA , DNA/genetics , DNA Barcoding, Taxonomic/methods , Classification/methods , Phylogeny , Species Specificity
11.
Methods Mol Biol ; 2744: 77-104, 2024.
Article En | MEDLINE | ID: mdl-38683312

Over the past two decades, DNA barcoding has become the most popular exploration approach in molecular taxonomy, whether for identification, discovery, delimitation, or description of species. The present contribution focuses on the utility of DNA barcoding for taxonomic research activities related to species delimitation, emphasizing the following aspects:(1) To what extent DNA barcoding can be a valuable ally for fundamental taxonomic research, (2) its methodological and theoretical limitations, (3) the conceptual background and practical use of pairwise distances between DNA barcode sequences in taxonomy, and (4) the different ways in which DNA barcoding can be combined with complementary means of investigation within a broader integrative framework. In this chapter, we recall and discuss the key conceptual advances that have led to the so-called renaissance of taxonomy, elaborate a detailed glossary for the terms specific to this discipline (see Glossary in Chap. 35 ), and propose a newly designed step-by-step species delimitation protocol starting from DNA barcode data that includes steps from the preliminary elaboration of an optimal sampling strategy to the final decision-making process which potentially leads to nomenclatural changes.


DNA Barcoding, Taxonomic , DNA Barcoding, Taxonomic/methods , Classification/methods , Phylogeny , Animals , Species Specificity
12.
Syst Appl Microbiol ; 47(2-3): 126498, 2024 May.
Article En | MEDLINE | ID: mdl-38442686

Codes of nomenclature that provide well-regulated and stable frameworks for the naming of taxa are a fundamental underpinning of biological research. These Codes themselves require systems that govern their administration, interpretation and emendment. Here we review the provisions that have been made for the governance of the recently introduced Code of Nomenclature of Prokaryotes Described from Sequence Data (SeqCode), which provides a nomenclatural framework for the valid publication of names of Archaea and Bacteria using isolate genome, metagenome-assembled genome or single-amplified genome sequences as type material. The administrative structures supporting the SeqCode are designed to be open and inclusive. Direction is provided by the SeqCode Community, which we encourage those with an interest in prokaryotic systematics to join.


Archaea , Bacteria , Community Participation , Terminology as Topic , Archaea/classification , Archaea/genetics , Bacteria/genetics , Bacteria/classification , Classification/methods
16.
Syst Biol ; 73(1): 207-222, 2024 May 27.
Article En | MEDLINE | ID: mdl-38224495

In recent years, the study of hybridization and introgression has made significant progress, with ghost introgression-the transfer of genetic material from extinct or unsampled lineages to extant species-emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving 3 species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP uses multilocus sequence alignments directly-hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.


Classification , Phylogeny , Classification/methods , Genetic Introgression , Hybridization, Genetic , Phylogeography/methods , Computer Simulation
17.
Syst Biol ; 73(1): 183-206, 2024 May 27.
Article En | MEDLINE | ID: mdl-38189575

Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.


Deep Learning , Phylogeography , Phylogeography/methods , Likelihood Functions , Phylogeny , Classification/methods , Bayes Theorem , Viruses/genetics , Viruses/classification
...