Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros

Banco de datos
Tipo de estudio
País/Región como asunto
Tipo del documento
Asunto de la revista
Intervalo de año de publicación
1.
Bioinformatics ; 40(3)2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38441258

RESUMEN

MOTIVATION: Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices. RESULTS: We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships. AVAILABILITY AND IMPLEMENTATION: Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Reproducibilidad de los Resultados , Incertidumbre , Aprendizaje Automático , Análisis de la Célula Individual , Análisis de Secuencia de ARN
2.
Nat Plants ; 10(3): 390-401, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-38467801

RESUMEN

Scientific testing including stable isotope ratio analysis (SIRA) and trace element analysis (TEA) is critical for establishing plant origin, tackling deforestation and enforcing economic sanctions. Yet methods combining SIRA and TEA into robust models for origin verification and determination are lacking. Here we report a (1) large Eastern European timber reference database (Betula, Fagus, Pinus, Quercus) tailored to sanctioned products following the Ukraine invasion; (2) statistical test to verify samples against a claimed origin; (3) probabilistic model of SIRA, TEA and genus distribution data, using Gaussian processes, to determine timber harvest location. Our verification method rejects 40-60% of simulated false claims, depending on the spatial scale of the claim, and maintains a low probability of rejecting correct origin claims. Our determination method predicts harvest location within 180 to 230 km of true location. Our results showcase the power of combining data types with probabilistic modelling to identify and scrutinize timber harvest location claims.


Asunto(s)
Fagus , Pinus , Ucrania , Betula , Genes de Plantas
3.
Comput Struct Biotechnol J ; 19: 6157-6168, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34938408

RESUMEN

Today machine learning methods are commonly deployed for bacterial species identification using MALDI-TOF mass spectrometry data. However, most of the studies reported in literature only consider very traditional machine learning methods on small datasets that contain a limited number of species. In this paper we present benchmarking results on an unprecedented scale for a wide range of machine learning methods, using datasets that contain almost 100,000 spectra and more than 1000 different species. The size and the diversity of the data allow to compare three important identification scenarios that are often not distinguished in literature, i.e., identification for novel biological replicates, novel strains and novel species that are not present in the training data. The results demonstrate that in all three scenarios acceptable identification rates are obtained, but the numbers are typically lower than those reported in studies with a more limited analysis. Using hierarchical classification methods, we also demonstrate that taxonomic information is in general not well preserved in MALDI-TOF mass spectrometry data. For the novel species scenario, we apply for the first time neural networks with Monte Carlo dropout, which have shown to be successful in other domains, such as computer vision, for the detection of novel species.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA