Search | VHL Regional Portal

1.

Protocol to discover machine-readable entities of the ecosystem management actions taxonomy.

Haas, Timothy C.

STAR Protoc ; 5(2): 103125, 2024 Jun 21.

Article in English | MEDLINE | ID: mdl-38870016

ABSTRACT

The ecosystem management actions taxonomy (EMAT) consists of actions taken by humans and wildlife that affect an ecosystem. Here, I present a protocol for discovering machine-readable entities of the EMAT. I describe steps for acquiring stories from online locations, collecting them into a story file, and processing them through a software package to extract those actions that match EMAT taxa. I then detail procedures for using the story file to learn new EMAT taxa.

Subject(s)

Ecosystem , Software , Humans , Animals , Classification/methods , Conservation of Natural Resources/methods

2.

Image-based recognition of parasitoid wasps using advanced neural networks.

Shirali, Hossein; Hübner, Jeremy; Both, Robin; Raupach, Michael; Reischl, Markus; Schmidt, Stefan; Pylatiuk, Christian.

Invertebr Syst ; 382024 Jun.

Article in English | MEDLINE | ID: mdl-38838190

ABSTRACT

Hymenoptera has some of the highest diversity and number of individuals among insects. Many of these species potentially play key roles as food sources, pest controllers and pollinators. However, little is known about the diversity and biology and ~80% of the species have not yet been described. Classical taxonomy based on morphology is a rather slow process but DNA barcoding has already brought considerable progress in identification. Innovative methods such as image-based identification and automation can further speed up the process. We present a proof of concept for image data recognition of a parasitic wasp family, the Diapriidae (Hymenoptera), obtained as part of the GBOL III project. These tiny (1.2-4.5mm) wasps were photographed and identified using DNA barcoding to provide a solid ground truth for training a neural network. Taxonomic identification was used down to the genus level. Subsequently, three different neural network architectures were trained, evaluated and optimised. As a result, 11 different genera of diaprids and one mixed group of 'other Hymenoptera' can be classified with an average accuracy of 96%. Additionally, the sex of the specimen can be classified automatically with an accuracy of >97%.

Subject(s)

Neural Networks, Computer , Wasps , Animals , Wasps/genetics , Wasps/anatomy & histology , DNA Barcoding, Taxonomic , Image Processing, Computer-Assisted/methods , Female , Classification/methods , Species Specificity , Male

3.

Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.

Tian, Qinzhong; Zhang, Pinglu; Zhai, Yixiao; Wang, Yansu; Zou, Quan.

Genome Biol Evol ; 16(5)2024 05 02.

Article in English | MEDLINE | ID: mdl-38748485

ABSTRACT

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.

Subject(s)

High-Throughput Nucleotide Sequencing , Machine Learning , Databases, Genetic , Computational Biology/methods , Classification/methods

4.

A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction.

Fu, Sunyang; Wang, Liwei; He, Huan; Wen, Andrew; Zong, Nansu; Kumari, Anamika; Liu, Feifan; Zhou, Sicheng; Zhang, Rui; Li, Chenyu; Wang, Yanshan; St Sauver, Jennifer; Liu, Hongfang; Sohn, Sunghwan.

J Am Med Inform Assoc ; 31(7): 1493-1502, 2024 Jun 20.

Article in English | MEDLINE | ID: mdl-38742455

ABSTRACT

BACKGROUND: Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES: This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS: We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS: The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION: The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.

Subject(s)

Electronic Health Records , Natural Language Processing , Electronic Health Records/classification , Humans , Classification/methods , Medical Errors/classification

5.

PROTAX-GPU: a scalable probabilistic taxonomic classification system for DNA barcodes.

Li, Roy; Ratnasingham, Sujeevan; Zarubiieva, Iuliia; Somervuo, Panu; Taylor, Graham W.

Philos Trans R Soc Lond B Biol Sci ; 379(1904): 20230124, 2024 Jun 24.

Article in English | MEDLINE | ID: mdl-38705180

ABSTRACT

DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX's key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.

Subject(s)

Algorithms , DNA Barcoding, Taxonomic , DNA Barcoding, Taxonomic/methods , Classification/methods , Computer Graphics , Animals

6.

Species Diagnosis and DNA Taxonomy.

Ahrens, Dirk.

Methods Mol Biol ; 2744: 33-52, 2024.

Article in English | MEDLINE | ID: mdl-38683310

ABSTRACT

The use of DNA has helped to improve and speed up species identification and delimitation. However, it also provides new challenges to taxonomists. Incongruence of outcome from various markers and delimitation methods, bias from sampling and skewed species distribution, implemented models, and the choice of methods/priors may mislead results and also may, in conclusion, increase elements of subjectivity in species taxonomy. The lack of direct diagnostic outcome from most contemporary molecular delimitation approaches and the need for a reference to existing and best sampled trait reference systems reveal the need for refining the criteria of species diagnosis and diagnosability in the current framework of nomenclature codes and good practices to avoid nomenclatorial instability, parallel taxonomies, and consequently more and new taxonomic impediment.

Subject(s)

DNA , DNA/genetics , DNA Barcoding, Taxonomic/methods , Classification/methods , Phylogeny , Species Specificity

7.

DNA Barcoding in Species Delimitation: From Genetic Distances to Integrative Taxonomy.

Miralles, Aurélien; Puillandre, Nicolas; Vences, Miguel.

Methods Mol Biol ; 2744: 77-104, 2024.

Article in English | MEDLINE | ID: mdl-38683312

ABSTRACT

Over the past two decades, DNA barcoding has become the most popular exploration approach in molecular taxonomy, whether for identification, discovery, delimitation, or description of species. The present contribution focuses on the utility of DNA barcoding for taxonomic research activities related to species delimitation, emphasizing the following aspects:(1) To what extent DNA barcoding can be a valuable ally for fundamental taxonomic research, (2) its methodological and theoretical limitations, (3) the conceptual background and practical use of pairwise distances between DNA barcode sequences in taxonomy, and (4) the different ways in which DNA barcoding can be combined with complementary means of investigation within a broader integrative framework. In this chapter, we recall and discuss the key conceptual advances that have led to the so-called renaissance of taxonomy, elaborate a detailed glossary for the terms specific to this discipline (see Glossary in Chap. 35 ), and propose a newly designed step-by-step species delimitation protocol starting from DNA barcode data that includes steps from the preliminary elaboration of an optimal sampling strategy to the final decision-making process which potentially leads to nomenclatural changes.

Subject(s)

DNA Barcoding, Taxonomic , DNA Barcoding, Taxonomic/methods , Classification/methods , Phylogeny , Animals , Species Specificity

8.

Taxonomy development methods regarding patient safety in health sciences - A systematic review.

Syyrilä, Tiina; Koskiniemi, Saija; Manias, Elizabeth; Härkänen, Marja.

Int J Med Inform ; 187: 105438, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38579660

ABSTRACT

BACKGROUND: Taxonomies are needed for automated analysis of clinical data in healthcare. Few reviews of the taxonomy development methods used in health sciences are found. This systematic review aimed to describe the scope of the available taxonomies relative to patient safety, the methods used for taxonomy development, and the strengths and limitations of the methods. The purpose of this systematic review is to guide future taxonomy development projects. METHODS: The CINAHL, PubMed, Scopus, and Web of Science databases were searched for studies from January 2012 to April 25, 2023. Two authors selected the studies using inclusion and exclusion criteria and critical appraisal checklists. The data were analysed inductively, and the results were reported narratively. RESULTS: The studies (n = 13) across healthcare concerned mainly taxonomies of adverse events and medication safety but little for specialised fields and information technology. Critical appraisal indicated inadequate reporting of the used taxonomy development methods. Ten phases of taxonomy development were identified: (1) defining purpose and (2) the theory base for development, (3) relevant data sources' identification, (4) main terms' identification and definitions, (5) items' coding and pooling, (6) reliability and validity evaluation of coding and/or codes, (7) development of a hierarchical structure, (8) testing the structure, (9) piloting the taxonomy and (10) reporting application and validation of the final taxonomy. Seventeen statistical tests and seven software systems were utilised, but automated data extraction methods were used rarely. Multimethod and multi-stakeholder approach, code- and hierarchy testing and piloting were strengths and time consumption and small samples in testing limitations. CONCLUSION: New taxonomies are needed on diverse specialities and information technology related to patient safety. Structured method is needed for taxonomy development, reporting and appraisal to strengthen taxonomies' quality. A new guide was proposed for taxonomy development, for which testing is required. Prospero registration number CRD42023411022.

Subject(s)

Patient Safety , Humans , Classification/methods , Medical Informatics

9.

Quis custodiet ipsos custodes? A call for community participation in the governance of the SeqCode.

Sutcliffe, Iain C; Rodriguez-R, Luis M; Venter, Stephanus N; Whitman, William B.

Syst Appl Microbiol ; 47(2-3): 126498, 2024 May.

Article in English | MEDLINE | ID: mdl-38442686

ABSTRACT

Codes of nomenclature that provide well-regulated and stable frameworks for the naming of taxa are a fundamental underpinning of biological research. These Codes themselves require systems that govern their administration, interpretation and emendment. Here we review the provisions that have been made for the governance of the recently introduced Code of Nomenclature of Prokaryotes Described from Sequence Data (SeqCode), which provides a nomenclatural framework for the valid publication of names of Archaea and Bacteria using isolate genome, metagenome-assembled genome or single-amplified genome sequences as type material. The administrative structures supporting the SeqCode are designed to be open and inclusive. Direction is provided by the SeqCode Community, which we encourage those with an interest in prokaryotic systematics to join.

Subject(s)

Archaea , Bacteria , Community Participation , Terminology as Topic , Archaea/classification , Archaea/genetics , Bacteria/genetics , Bacteria/classification , Classification/methods

10.

200 years of naming dinosaurs: scientists call for overhaul of antiquated system.

Sanderson, Katharine.

Nature ; 626(8001): 936-937, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38378958

Subject(s)

Classification , Dinosaurs , Paleontology , Animals , Classification/methods , Dinosaurs/classification , History, 19th Century , History, 20th Century , History, 21st Century , Paleontology/history , Paleontology/methods , Paleontology/trends

11.

Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong.

Thompson, Ammon; Liebeskind, Benjamin J; Scully, Erik J; Landis, Michael J.

Syst Biol ; 73(1): 183-206, 2024 May 27.

Article in English | MEDLINE | ID: mdl-38189575

ABSTRACT

Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.

Subject(s)

Deep Learning , Phylogeography , Phylogeography/methods , Likelihood Functions , Phylogeny , Classification/methods , Bayes Theorem , Viruses/genetics , Viruses/classification

12.

Detection of Ghost Introgression Requires Exploiting Topological and Branch Length Information.

Pang, Xiao-Xu; Zhang, Da-Yong.

Syst Biol ; 73(1): 207-222, 2024 May 27.

Article in English | MEDLINE | ID: mdl-38224495

ABSTRACT

In recent years, the study of hybridization and introgression has made significant progress, with ghost introgression-the transfer of genetic material from extinct or unsampled lineages to extant species-emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving 3 species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP uses multilocus sequence alignments directly-hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.

Subject(s)

Classification , Phylogeny , Classification/methods , Genetic Introgression , Hybridization, Genetic , Phylogeography/methods , Computer Simulation

13.

The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference.

Khurana, Mark P; Scheidwasser-Clow, Neil; Penn, Matthew J; Bhatt, Samir; Duchêne, David A.

Syst Biol ; 73(1): 235-246, 2024 May 27.

Article in English | MEDLINE | ID: mdl-38153910

ABSTRACT

Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.

Subject(s)

Classification , Phylogeny , Classification/methods , Models, Biological , Models, Genetic , Bayes Theorem , Birth Rate

14.

Phylogenetic Biodiversity Metrics Should Account for Both Accumulation and Attrition of Evolutionary Heritage.

Rosindell, James; Manson, Kerry; Gumbs, Rikki; Pearse, William D; Steel, Mike.

Syst Biol ; 73(1): 158-182, 2024 May 27.

Article in English | MEDLINE | ID: mdl-38102727

ABSTRACT

Phylogenetic metrics are essential tools used in the study of ecology, evolution and conservation. Phylogenetic diversity (PD) in particular is one of the most prominent measures of biodiversity and is based on the idea that biological features accumulate along the edges of phylogenetic trees that are summed. We argue that PD and many other phylogenetic biodiversity metrics fail to capture an essential process that we term attrition. Attrition is the gradual loss of features through causes other than extinction. Here we introduce "EvoHeritage", a generalization of PD that is founded on the joint processes of accumulation and attrition of features. We argue that while PD measures evolutionary history, EvoHeritage is required to capture a more pertinent subset of evolutionary history including only components that have survived attrition. We show that EvoHeritage is not the same as PD on a tree with scaled edges; instead, accumulation and attrition interact in a more complex non-monophyletic way that cannot be captured by edge lengths alone. This leads us to speculate that the one-dimensional edge lengths of classic trees may be insufficiently flexible to capture the nuances of evolutionary processes. We derive a measure of EvoHeritage and show that it elegantly reproduces species richness and PD at opposite ends of a continuum based on the intensity of attrition. We demonstrate the utility of EvoHeritage in ecology as a predictor of community productivity compared with species richness and PD. We also show how EvoHeritage can quantify living fossils and resolve their associated controversy. We suggest how the existing calculus of PD-based metrics and other phylogenetic biodiversity metrics can and should be recast in terms of EvoHeritage accumulation and attrition.

Subject(s)

Biodiversity , Phylogeny , Biological Evolution , Classification/methods , Models, Biological

15.

Data-specific substitution models improve protein-based phylogenetics.

Brazão, João M; Foster, Peter G; Cox, Cymon J.

PeerJ ; 11: e15716, 2023.

Article in English | MEDLINE | ID: mdl-37576497

ABSTRACT

Calculating amino-acid substitution models that are specific for individual protein data sets is often difficult due to the computational burden of estimating large numbers of rate parameters. In this study, we tested the computational efficiency and accuracy of five methods used to estimate substitution models, namely Codeml, FastMG, IQ-TREE, P4 (maximum likelihood), and P4 (Bayesian inference). Data-specific substitution models were estimated from simulated alignments (with different lengths) that were generated from a known simulation model and simulation tree. Each of the resulting data-specific substitution models was used to calculate the maximum likelihood score of the simulation tree and simulated data that was used to calculate the model, and compared with the maximum likelihood scores of the known simulation model and simulation tree on the same simulated data. Additionally, the commonly-used empirical models, cpREV and WAG, were assessed similarly. Data-specific models performed better than the empirical models, which under-fitted the simulated alignments, had the highest difference to the simulation model maximum-likelihood score, clustered further from the simulation model in principal component analysis ordination, and inferred less accurate trees. Data-specific models and the simulation model shared statistically indistinguishable maximum-likelihood scores, indicating that the five methods were reasonably accurate at estimating substitution models by this measure. Nevertheless, tree statistics showed differences between optimal maximum likelihood trees. Unlike other model estimating methods, trees inferred using data-specific models generated with IQ-TREE and P4 (maximum likelihood) were not significantly different from the trees derived from the simulation model in each analysis, indicating that these two methods alone were the most accurate at estimating data-specific models. To show the benefits of using data-specific protein models several published data sets were reanalysed using IQ-TREE-estimated models. These newly estimated models were a better fit to the data than the empirical models that were used by the original authors, often inferred longer trees, and resulted in different tree topologies in more than half of the re-analysed data sets. The results of this study show that software availability and high computation burden are not limitations to generating better-fitting data-specific amino-acid substitution models for phylogenetic analyses.

Subject(s)

Classification , Models, Genetic , Phylogeny , Proteins , Amino Acid Substitution , Bayes Theorem , Computer Simulation , Proteins/genetics , Classification/methods

16.

Tree Visualization By One Table (tvBOT): a web application for visualizing, modifying and annotating phylogenetic trees.

Xie, Jianmin; Chen, Yuerong; Cai, Guanjing; Cai, Runlin; Hu, Zhong; Wang, Hui.

Nucleic Acids Res ; 51(W1): W587-W592, 2023 07 05.

Article in English | MEDLINE | ID: mdl-37144476

ABSTRACT

tvBOT is a user-friendly and efficient web application for visualizing, modifying, and annotating phylogenetic trees. It is highly efficient in data preparation without requiring redundant style and syntax data. Tree annotations are powered by a data-driven engine that only requires practical data organized in uniform formats and saved as one table file. A layer manager is developed to manage annotation dataset layers, allowing the addition of a specific layer by selecting the columns of a corresponding annotation data file. Furthermore, tvBOT renders style adjustments in real-time and diversified ways. All style adjustments can be made on a highly interactive user interface and are available for mobile devices. The display engine allows the changes to be updated and rendered in real-time. In addition, tvBOT supports the combination display of 26 annotation dataset types to achieve multiple formats for tree annotations with reusable phylogenetic data. Besides several publication-ready graphics formats, JSON format can be exported to save the final drawing state and all related data, which can be shared with other users, uploaded to restore the final drawing state for re-editing or used as a style template for quickly retouching a new tree file. tvBOT is freely available at: https://www.chiplot.online/tvbot.html.

Subject(s)

Classification , Data Visualization , Phylogeny , Computer Graphics , Internet , Software , User-Computer Interface , Classification/methods

17.

Rethink changing species names that honour real people.

Garbino, Guilherme S T.

Nature ; 616(7957): 433, 2023 04.

Article in English | MEDLINE | ID: mdl-37072514

Subject(s)

Classification , Names , Species Specificity , Classification/methods

18.

Using pose estimation to identify regions and points on natural history specimens.

He, Yichen; Cooney, Christopher R; Maddock, Steve; Thomas, Gavin H.

PLoS Comput Biol ; 19(2): e1010933, 2023 02.

Article in English | MEDLINE | ID: mdl-36812227

ABSTRACT

A key challenge in mobilising growing numbers of digitised biological specimens for scientific research is finding high-throughput methods to extract phenotypic measurements on these datasets. In this paper, we test a pose estimation approach based on Deep Learning capable of accurately placing point labels to identify key locations on specimen images. We then apply the approach to two distinct challenges that each requires identification of key features in a 2D image: (i) identifying body region-specific plumage colouration on avian specimens and (ii) measuring morphometric shape variation in Littorina snail shells. For the avian dataset, 95% of images are correctly labelled and colour measurements derived from these predicted points are highly correlated with human-based measurements. For the Littorina dataset, more than 95% of landmarks were accurately placed relative to expert-labelled landmarks and predicted landmarks reliably captured shape variation between two distinct shell ecotypes ('crab' vs 'wave'). Overall, our study shows that pose estimation based on Deep Learning can generate high-quality and high-throughput point-based measurements for digitised image-based biodiversity datasets and could mark a step change in the mobilisation of such data. We also provide general guidelines for using pose estimation methods on large-scale biological datasets.

Subject(s)

Birds , Classification , Snails , Animals , Birds/anatomy & histology , Snails/anatomy & histology , Classification/methods

19.

DNA barcodes for the pipefish genus Corythoichthys (Actinopterygii: Syngnathiformes) from the Indian Ocean provide insights into cryptic diversity.

Shalu, Kannan; Thomas, Liju; Ramvilas, Ghosh; Shabeena, Kadapurathillam S; Philip, Siby; Sureshkumar, Sivanpillai; Raghavan, Rajeev; Ranjeet, Kutty.

J Fish Biol ; 102(3): 680-688, 2023 Mar.

Article in English | MEDLINE | ID: mdl-36602224

ABSTRACT

The syngnathiform genus Corythoichthys comprises a group of taxonomically complex, tail-brooding (Syngnathinae) pipefishes widely distributed in the Indo-Pacific region. Due to the presence of overlapping interspecific morphological characters, reliable taxonomic information on Corythoichthys is still lacking. Using 52 CO1 sequences, including seven newly generated, a phylogenetic analysis was carried out to understand the genetic diversity, distribution and 'species groups' within the genus Corythoichthys. Species delimitation using Automatic Barcode Gap Discovery (ABGD) analysis confirmed the presence of 13 species which include 'species-complexes' previously considered as a single taxon. Our results revealed the presence of three species groups, 'C. amplexus', 'C. conspicillatus' and 'C. haematopterus' and four unidentified/undescribed species in the wider Indo-Pacific realm. Interestingly, 60 sequences and a mitogenome identified as Corythoichthys in GenBank are misidentified at the genus level. Based on our findings, we suggest that the taxonomy and systematics of Corythoichthys need to be re-examined and validated using integrative methods, and care should be taken while selecting specimens for genetic studies.

Subject(s)

Biodiversity , Classification , DNA Barcoding, Taxonomic , Smegmamorpha , Animals , Indian Ocean , Phylogeny , Smegmamorpha/classification , Smegmamorpha/genetics , Species Specificity , Classification/methods

20.

A refreshed approach to homology-Prioritizing epistemology over metaphysics.

Minelli, Alessandro.

J Morphol ; 284(1): e21533, 2023 01.

Article in English | MEDLINE | ID: mdl-36342140

ABSTRACT

Unease with the inclusion of "sameness" in Owen's definition of homology characterizes a substantial part of the literature on this subject, where this term has acquired an increasingly strict metaphysical flavor. Taken for granted the existence of body features that are "the same," their existence has been explained by appealing to universal laws of form, as the product of common ancestry, or in terms of proximal causes responsible for the emergence of conserved developmental modules. However, a fundamentally different approach is possible, if we shift attention from metaphysics to epistemology. We may reword Owen's statement as follows: organs of different animals, in so far as they can be described as the same despite any difference in form and function, are called homologues. The proposed framework provides an umbrella for both the traditional, all-or-nothing concept of homology, and the less fashionable alternatives of factorial or partial homology, as well as for an extension of homology from form to function. No less attractive is the prospect to handle also ghost homologues, the body parts or organs of which there is non-objective evidence in a given clade, but can nevertheless be represented, in a description that encapsulates some of the traits observable in their extant homologue in the sister clade. Stripped of its different and constraining metaphysical explanations, homology survives as an anchor concept to which different nomadic disciplines and research agendas can be associated.

Subject(s)

Classification , Metaphysics , Phylogeny , Animals , Knowledge , Phenotype , Classification/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL