Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Sensors (Basel) ; 23(9)2023 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-37177695

RESUMO

Monitoring the shoreline over time is essential to quickly identify and mitigate environmental issues such as coastal erosion. Monitoring using satellite images has two great advantages, i.e., global coverage and frequent measurement updates; but adequate methods are needed to extract shoreline information from such images. To this purpose, there are valuable non-supervised methods, but more recent research has concentrated on deep learning because of its greater potential in terms of generality, flexibility, and measurement accuracy, which, in contrast, derive from the information contained in large datasets of labeled samples. The first problem to solve, therefore, lies in obtaining large datasets suitable for this specific measurement problem, and this is a difficult task, typically requiring human analysis of a large number of images. In this article, we propose a technique to automatically create a dataset of labeled satellite images suitable for training machine learning models for shoreline detection. The method is based on the integration of data from satellite photos and data from certified, publicly accessible shoreline data. It involves several automatic processing steps, aimed at building the best possible dataset, with images including both sea and land regions, and correct labeling also in the presence of complicated water edges (which can be open or closed curves). The use of independently certified measurements for labeling the satellite images avoids the great work required to manually annotate them by visual inspection, as is done in other works in the literature. This is especially true when convoluted shorelines are considered. In addition, possible errors due to the subjective interpretation of satellite images are also eliminated. The method is developed and used specifically to build a new dataset of Sentinel-2 images, denoted SNOWED; but is applicable to different satellite images with trivial modifications. The accuracy of labels in SNOWED is directly determined by the uncertainty of the shoreline data used, which leads to sub-pixel errors in most cases. Furthermore, the quality of the SNOWED dataset is assessed through the visual comparison of a random sample of images and their corresponding labels, and its functionality is shown by training a neural model for sea-land segmentation.

2.
Entropy (Basel) ; 25(5)2023 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-37238549

RESUMO

Affective understanding of language is an important research focus in artificial intelligence. The large-scale annotated datasets of Chinese textual affective structure (CTAS) are the foundation for subsequent higher-level analysis of documents. However, there are very few published datasets for CTAS. This paper introduces a new benchmark dataset for the task of CTAS to promote development in this research direction. Specifically, our benchmark is a CTAS dataset with the following advantages: (a) it is Weibo-based, which is the most popular Chinese social media platform used by the public to express their opinions; (b) it includes the most comprehensive affective structure labels at present; and (c) we propose a maximum entropy Markov model that incorporates neural network features and experimentally demonstrate that it outperforms the two baseline models.

3.
ISPRS J Photogramm Remote Sens ; 178: 68-80, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34433999

RESUMO

As remote sensing (RS) data obtained from different sensors become available largely and openly, multimodal data processing and analysis techniques have been garnering increasing interest in the RS and geoscience community. However, due to the gap between different modalities in terms of imaging sensors, resolutions, and contents, embedding their complementary information into a consistent, compact, accurate, and discriminative representation, to a great extent, remains challenging. To this end, we propose a shared and specific feature learning (S2FL) model. S2FL is capable of decomposing multimodal RS data into modality-shared and modality-specific components, enabling the information blending of multi-modalities more effectively, particularly for heterogeneous data sources. Moreover, to better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 - hyperspectral and multispectral data, Berlin - hyperspectral and synthetic aperture radar (SAR) data, Augsburg - hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification. Extensive experiments conducted on the three datasets demonstrate the superiority and advancement of our S2FL model in the task of land cover classification in comparison with previously-proposed state-of-the-art baselines. Furthermore, the baseline codes and datasets used in this paper will be made available freely at https://github.com/danfenghong/ISPRS_S2FL.

4.
BMC Bioinformatics ; 20(1): 485, 2019 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-31547800

RESUMO

BACKGROUND: A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical multi-label classification (HMC) methods to predict annotations, using the Functional Catalogue (FunCat) or Gene Ontology (GO) label hierarchies. Most of these studies employed benchmark datasets created more than a decade ago, and thus train their models on outdated information. In this work, we provide an updated version of these datasets. By querying recent versions of FunCat and GO yeast annotations, we provide 24 new datasets in total. We compare four HMC methods, providing baseline results for the new datasets. Furthermore, we also evaluate whether the predictive models are able to discover new or wrong annotations, by training them on the old data and evaluating their results against the most recent information. RESULTS: The results demonstrated that the method based on predictive clustering trees, Clus-Ensemble, proposed in 2008, achieved superior results compared to more recent methods on the standard evaluation task. For the discovery of new knowledge, Clus-Ensemble performed better when discovering new annotations in the FunCat taxonomy, whereas hierarchical multi-label classification with genetic algorithm (HMC-GA), a method based on genetic algorithms, was overall superior when detecting annotations that were removed. In the GO datasets, Clus-Ensemble once again had the upper hand when discovering new annotations, HMC-GA performed better for detecting removed annotations. However, in this evaluation, there were less significant differences among the methods. CONCLUSIONS: The experiments have showed that protein function prediction is a very challenging task which should be further investigated. We believe that the baseline results associated with the updated datasets provided in this work should be considered as guidelines for future studies, nonetheless the old versions of the datasets should not be disregarded since other tasks in machine learning could benefit from them.


Assuntos
Aprendizado de Máquina , Anotação de Sequência Molecular/métodos , Proteômica/métodos , Análise por Conglomerados , Eucariotos/metabolismo , Ontologia Genética , Humanos
5.
BMC Bioinformatics ; 19(1): 461, 2018 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-30497376

RESUMO

BACKGROUND: Benchmark datasets are essential for both method development and performance assessment. These datasets have numerous requirements, representativeness being one. In the case of variant tolerance/pathogenicity prediction, representativeness means that the dataset covers the space of variations and their effects. RESULTS: We performed the first analysis of the representativeness of variation benchmark datasets. We used statistical approaches to investigate how proteins in the benchmark datasets were representative for the entire human protein universe. We investigated the distributions of variants in chromosomes, protein structures, CATH domains and classes, Pfam protein families, Enzyme Commission (EC) classifications and Gene Ontology annotations in 24 datasets that have been used for training and testing variant tolerance prediction methods. All the datasets were available in VariBench or VariSNP databases. We tested also whether the pathogenic variant datasets contained neutral variants defined as those that have high minor allele frequency in the ExAC database. The distributions of variants over the chromosomes and proteins varied greatly between the datasets. CONCLUSIONS: None of the datasets was found to be well representative. Many of the tested datasets had quite good coverage of the different protein characteristics. Dataset size correlates to representativeness but only weakly to the performance of methods trained on them. The results imply that dataset representativeness is an important factor and should be taken into account in predictor development and testing.


Assuntos
Benchmarking , Bases de Dados como Assunto , Cromossomos/genética , Bases de Dados de Proteínas , Frequência do Gene , Ontologia Genética , Variação Genética , Humanos , Anotação de Sequência Molecular , Domínios Proteicos , Proteínas/química
6.
BMC Bioinformatics ; 19(1): 334, 2018 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-30241466

RESUMO

BACKGROUND: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. RESULTS: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study. CONCLUSIONS: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred . ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html .


Assuntos
Biologia Computacional/métodos , Enzimas/classificação , Enzimas/metabolismo , Análise de Sequência de Proteína/métodos , Software , Terminologia como Assunto , Algoritmos , Humanos
7.
Int J Neural Syst ; 34(3): 2450009, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38318751

RESUMO

Large-scale benchmark datasets are crucial in advancing research within the computer science communities. They enable the development of more sophisticated AI models and serve as "golden" benchmarks for evaluating their performance. Thus, ensuring the quality of these datasets is of utmost importance for academic research and the progress of AI systems. For the emerging vision-language tasks, some datasets have been created and frequently used, such as Flickr30k, COCO, and NoCaps, which typically contain a large number of images paired with their ground-truth textual descriptions. In this paper, an automatic method is proposed to assess the quality of large-scale benchmark datasets designed for vision-language tasks. In particular, a new cross-modal matching model is developed, which is capable of automatically scoring the textual descriptions of visual images. Subsequently, this model is employed to evaluate the quality of vision-language datasets by automatically assigning a score to each 'ground-truth' description for every image picture. With a good agreement between manual and automated scoring results on the datasets, our findings reveal significant disparities in the quality of the ground-truth descriptions included in the benchmark datasets. Even more surprising, it is evident that a small portion of the descriptions are unsuitable for serving as reliable ground-truth references. These discoveries emphasize the need for careful utilization of these publicly accessible benchmark databases.


Assuntos
Benchmarking , Bases de Dados Factuais
8.
Genome Biol ; 24(1): 202, 2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37674236

RESUMO

BACKGROUND: Quantitative proteomics is an indispensable tool in life science research. However, there is a lack of reference materials for evaluating the reproducibility of label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based measurements among different instruments and laboratories. RESULTS: Here, we develop the Quartet standard as a proteome reference material with built-in truths, and distribute the same aliquots to 15 laboratories with nine conventional LC-MS/MS platforms across six cities in China. Relative abundance of over 12,000 proteins on 816 mass spectrometry files are obtained and compared for reproducibility among the instruments and laboratories to ultimately generate proteomics benchmark datasets. There is a wide dynamic range of proteomes spanning about 7 orders of magnitude, and the injection order has marked effects on quantitative instead of qualitative characteristics. CONCLUSION: Overall, the Quartet offers valuable standard materials and data resources for improving the quality control of proteomic analyses as well as the reproducibility and reliability of research findings.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Cromatografia Líquida , Reprodutibilidade dos Testes , Proteoma
9.
Genome Biol ; 24(1): 221, 2023 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-37798733

RESUMO

Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.


Assuntos
Benchmarking , Genômica , Genômica/métodos , Biologia Computacional/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos
10.
Nutrients ; 15(12)2023 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-37375655

RESUMO

Food classification serves as the basic step of image-based dietary assessment to predict the types of foods in each input image. However, foods in real-world scenarios are typically long-tail distributed, where a small number of food types are consumed more frequently than others, which causes a severe class imbalance issue and hinders the overall performance. In addition, none of the existing long-tailed classification methods focus on food data, which can be more challenging due to the inter-class similarity and intra-class diversity between food images. In this work, two new benchmark datasets for long-tailed food classification are introduced, including Food101-LT and VFN-LT, where the number of samples in VFN-LT exhibits real-world long-tailed food distribution. Then, a novel two-phase framework is proposed to address the problem of class imbalance by (1) undersampling the head classes to remove redundant samples along with maintaining the learned information through knowledge distillation and (2) oversampling the tail classes by performing visually aware data augmentation. By comparing our method with existing state-of-the-art long-tailed classification methods, we show the effectiveness of the proposed framework, which obtains the best performance on both Food101-LT and VFN-LT datasets. The results demonstrate the potential to apply the proposed method to related real-life applications.


Assuntos
Alimentos , Alimentos/classificação
11.
Multimed Tools Appl ; 81(24): 35001-35026, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33584121

RESUMO

Image segmentation is an essential phase of computer vision in which useful information is extracted from an image that can range from finding objects while moving across a room to detect abnormalities in a medical image. As image pixels are generally unlabelled, the commonly used approach for the same is clustering. This paper reviews various existing clustering based image segmentation methods. Two main clustering methods have been surveyed, namely hierarchical and partitional based clustering methods. As partitional clustering is computationally better, further study is done in the perspective of methods belonging to this class. Further, literature bifurcates the partitional based clustering methods into three categories, namely K-means based methods, histogram-based methods, and meta-heuristic based methods. The survey of various performance parameters for the quantitative evaluation of segmentation results is also included. Further, the publicly available benchmark datasets for image-segmentation are briefed.

12.
Front Artif Intell ; 5: 991242, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36213165

RESUMO

Even in highly-developed countries, as many as 15-30% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task that aims to make text understandable to everyone by replacing complex vocabulary and expressions with simpler ones, while preserving the original meaning. It has attracted considerable attention in the last 20 years, and fully automatic lexical simplification systems have been proposed for various languages. The main obstacle for the progress of the field is the absence of high-quality datasets for building and evaluating lexical simplification systems. In this study, we present a new benchmark dataset for lexical simplification in English, Spanish, and (Brazilian) Portuguese, and provide details about data selection and annotation procedures, to enable compilation of comparable datasets in other languages and domains. As the first multilingual lexical simplification dataset, where instances in all three languages were selected and annotated using comparable procedures, this is the first dataset that offers a direct comparison of lexical simplification systems for three languages. To showcase the usability of the dataset, we adapt two state-of-the-art lexical simplification systems with differing architectures (neural vs. non-neural) to all three languages (English, Spanish, and Brazilian Portuguese) and evaluate their performances on our new dataset. For a fairer comparison, we use several evaluation measures which capture varied aspects of the systems' efficacy, and discuss their strengths and weaknesses. We find that a state-of-the-art neural lexical simplification system outperforms a state-of-the-art non-neural lexical simplification system in all three languages, according to all evaluation measures. More importantly, we find that the state-of-the-art neural lexical simplification systems perform significantly better for English than for Spanish and Portuguese, thus posing a question if such an architecture can be used for successful lexical simplification in other languages, especially the low-resourced ones.

13.
Comput Struct Biotechnol J ; 19: 4825-4839, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34522290

RESUMO

Prediction of protein localization plays an important role in understanding protein function and mechanisms. In this paper, we propose a general deep learning-based localization prediction framework, MULocDeep, which can predict multiple localizations of a protein at both subcellular and suborganellar levels. We collected a dataset with 44 suborganellar localization annotations in 10 major subcellular compartments-the most comprehensive suborganelle localization dataset to date. We also experimentally generated an independent dataset of mitochondrial proteins in Arabidopsis thaliana cell cultures, Solanum tuberosum tubers, and Vicia faba roots and made this dataset publicly available. Evaluations using the above datasets show that overall, MULocDeep outperforms other major methods at both subcellular and suborganellar levels. Furthermore, MULocDeep assesses each amino acid's contribution to localization, which provides insights into the mechanism of protein sorting and localization motifs. A web server can be accessed at http://mu-loc.org.

14.
PeerJ Comput Sci ; 7: e735, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34977344

RESUMO

BACKGROUND AND OBJECTIVES: Kinship verification and recognition (KVR) is the machine's ability to identify the genetic and blood relationship and its degree between humans' facial images. The face is used because it is one of the most significant ways to recognize each other. Automatic KVR is an interesting area for investigation. It greatly affects real-world applications, such as searching for lost family members, forensics, and historical and genealogical studies. This paper presents a comprehensive survey that describes KVR applications and kinship types. It presents a literature review of current studies starting from handcrafted passing through shallow metric learning and ending with deep learning feature-based techniques. Furthermore, kinship mostly used datasets are discussed that in turn open the way for future directions for the research in this field. Also, the KVR limitations are discussed, such as insufficient illumination, noise, occlusion, and age variations problems. Finally, future research directions are presented, such as age and gender variation problems. METHODS: We applied a literature survey methodology to retrieve data from academic databases. An inclusion and exclusion criteria were set. Three stages were followed to select articles. Finally, the main KVR stages, along with the main methods in each stage, were presented. We believe that surveys can help researchers easily to detect areas that require more development and investigation. RESULTS: It was found that handcrafted, metric learning, and deep learning were widely utilized in kinship verification and recognition problem using facial images. CONCLUSIONS: Despite the scientific efforts that aim to address this hot research topic, many future research areas require investigation, such as age and gender variation. In the end, the presented survey makes it easier for researchers to identify the new areas that require more investigation and research.

16.
Mol Genet Genomic Med ; 8(9): e1206, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32160417

RESUMO

BACKGROUND: ACMG/AMP and AMP/ASCO/CAP have released guidelines for variation interpretation, and ESHG for diagnostic sequencing. These guidelines contain recommendations including the use of computational prediction methods. The guidelines per se and the way they are implemented cause some problems. METHODS: Logical reasoning based on domain knowledge. RESULTS: According to the guidelines, several methods have to be used and they have to agree. This means that the methods with the poorest performance overrule the better ones. The choice of the prediction method(s) should be made by experts  based on systematic benchmarking studies reporting all the relevant performance measures. Currently variation interpretation methods have been applied mainly to amino acid substitutions and splice site variants; however, predictors for some other types of variations are available and there will be tools for new application areas in the near future. Common problems in prediction method usage are discussed. The number of features used for method training or the number of variation types predicted by a tool are not indicators of method performance. Many published gene, protein or disease-specific benchmark studies suffer from too small dataset rendering the results useless. In the case of binary predictors, equal number of positive and negative cases is beneficial for training, the imbalance has to be corrected for performance assessment. Predictors cannot be better than the data they are based on and used for training and testing. Minor allele frequency (MAF) can help to detect likely benign cases, but the recommended MAF threshold is apparently too high. The fact that many rare variants are disease-causing or -related does not mean that rare variants in general would be harmful. How large a portion of the tested variants a tool can predict (coverage) is not a quality measure. CONCLUSION: Methods used for variation interpretation have to be carefully selected. It should be possible to use only one predictor, with proven good performance or a limited number of complementary predictors with state-of-the-art performance. Bear in mind that diseases and pathogenicity have a continuum and variants are not dichotomic i.e. either pathogenic or benign, either.


Assuntos
Diagnóstico por Computador/métodos , Testes Genéticos/métodos , Polimorfismo Genético , Guias de Prática Clínica como Assunto , Análise de Sequência de DNA/métodos , Conjuntos de Dados como Assunto/normas , Diagnóstico por Computador/normas , Testes Genéticos/normas , Genética Médica/organização & administração , Genética Médica/normas , Humanos , Análise de Sequência de DNA/normas , Sociedades Médicas , Software/normas
17.
PeerJ ; 5: e3893, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29372115

RESUMO

BACKGROUND: As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. METHODS: We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. RESULTS: Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. DISCUSSION: These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.

18.
Vision Res ; 116(Pt B): 258-68, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25951756

RESUMO

Scores of visual attention models have been developed over the past several decades of research. Differences in implementation, assumptions, and evaluations have made comparison of these models very difficult. Taxonomies have been constructed in an attempt at the organization and classification of models, but are not sufficient at quantifying which classes of models are most capable of explaining available data. At the same time, a multitude of physiological and behavioral findings have been published, measuring various aspects of human and non-human primate visual attention. All of these elements highlight the need to integrate the computational models with the data by (1) operationalizing the definitions of visual attention tasks and (2) designing benchmark datasets to measure success on specific tasks, under these definitions. In this paper, we provide some examples of operationalizing and benchmarking different visual attention tasks, along with the relevant design considerations.


Assuntos
Atenção/fisiologia , Simulação por Computador , Modelos Neurológicos , Percepção Visual/fisiologia , Animais , Humanos
19.
Gigascience ; 4: 20, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25941567

RESUMO

BACKGROUND: Three-dimensional (3D) imaging mass spectrometry (MS) is an analytical chemistry technique for the 3D molecular analysis of a tissue specimen, entire organ, or microbial colonies on an agar plate. 3D-imaging MS has unique advantages over existing 3D imaging techniques, offers novel perspectives for understanding the spatial organization of biological processes, and has growing potential to be introduced into routine use in both biology and medicine. Owing to the sheer quantity of data generated, the visualization, analysis, and interpretation of 3D imaging MS data remain a significant challenge. Bioinformatics research in this field is hampered by the lack of publicly available benchmark datasets needed to evaluate and compare algorithms. FINDINGS: High-quality 3D imaging MS datasets from different biological systems at several labs were acquired, supplied with overview images and scripts demonstrating how to read them, and deposited into MetaboLights, an open repository for metabolomics data. 3D imaging MS data were collected from five samples using two types of 3D imaging MS. 3D matrix-assisted laser desorption/ionization imaging (MALDI) MS data were collected from murine pancreas, murine kidney, human oral squamous cell carcinoma, and interacting microbial colonies cultured in Petri dishes. 3D desorption electrospray ionization (DESI) imaging MS data were collected from a human colorectal adenocarcinoma. CONCLUSIONS: With the aim to stimulate computational research in the field of computational 3D imaging MS, selected high-quality 3D imaging MS datasets are provided that could be used by algorithm developers as benchmark datasets.


Assuntos
Espectrometria de Massas por Ionização por Electrospray , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Animais , Benchmarking , Bases de Dados Factuais , Humanos , Imageamento Tridimensional , Metabolômica , Camundongos , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa