Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 23(1): 267, 2022 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-35804309

RESUMO

BACKGROUND: Modern mass spectrometry has revolutionized the detection and analysis of metabolites but likewise, let the data skyrocket with repositories for metabolomics data filling up with thousands of datasets. While there are many software tools for the analysis of individual experiments with a few to dozens of chromatograms, we see a demand for a contemporary software solution capable of processing and analyzing hundreds or even thousands of experiments in an integrative manner with standardized workflows. RESULTS: Here, we introduce MetHoS as an automated web-based software platform for the processing, storage and analysis of great amounts of mass spectrometry-based metabolomics data sets originating from different metabolomics studies. MetHoS is based on Big Data frameworks to enable parallel processing, distributed storage and distributed analysis of even larger data sets across clusters of computers in a highly scalable manner. It has been designed to allow the processing and analysis of any amount of experiments and samples in an integrative manner. In order to demonstrate the capabilities of MetHoS, thousands of experiments were downloaded from the MetaboLights database and used to perform a large-scale processing, storage and statistical analysis in a proof-of-concept study. CONCLUSIONS: MetHoS is suitable for large-scale processing, storage and analysis of metabolomics data aiming at untargeted metabolomic analyses. It is freely available at: https://methos.cebitec.uni-bielefeld.de/ . Users interested in analyzing their own data are encouraged to apply for an account.


Assuntos
Metabolômica , Software , Processamento Eletrônico de Dados , Espectrometria de Massas , Metabolômica/métodos , Fluxo de Trabalho
2.
Sensors (Basel) ; 22(14)2022 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-35891060

RESUMO

Data augmentation is an established technique in computer vision to foster the generalization of training and to deal with low data volume. Most data augmentation and computer vision research are focused on everyday images such as traffic data. The application of computer vision techniques in domains like marine sciences has shown to be not that straightforward in the past due to special characteristics, such as very low data volume and class imbalance, because of costly manual annotation by human domain experts, and general low species abundances. However, the data volume acquired today with moving platforms to collect large image collections from remote marine habitats, like the deep benthos, for marine biodiversity assessment and monitoring makes the use of computer vision automatic detection and classification inevitable. In this work, we investigate the effect of data augmentation in the context of taxonomic classification in underwater, i.e., benthic images. First, we show that established data augmentation methods (i.e., geometric and photometric transformations) perform differently in marine image collections compared to established image collections like the Cityscapes dataset, showing everyday traffic images. Some of the methods even decrease the learning performance when applied to marine image collections. Second, we propose new data augmentation combination policies motivated by our observations and compare their effect to those proposed by the AutoAugment algorithm and can show that the proposed augmentation policy outperforms the AutoAugment results for marine image collections. We conclude that in the case of small marine image datasets, background knowledge, and heuristics should sometimes be applied to design an effective data augmentation method.


Assuntos
Aprendizado Profundo , Algoritmos , Biodiversidade , Ecossistema , Humanos , Processamento de Imagem Assistida por Computador/métodos
3.
Syst Biol ; 69(6): 1231-1253, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32298457

RESUMO

Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000-20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term-ideally perpetual-data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach-linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated $ \le $2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000-40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic data.].


Assuntos
Classificação , Bases de Dados Factuais/normas , Animais , Bases de Dados Factuais/tendências
4.
Sensors (Basel) ; 21(4)2021 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-33561961

RESUMO

In recent years, an increasing number of cabled Fixed Underwater Observatories (FUOs) have been deployed, many of them equipped with digital cameras recording high-resolution digital image time series for a given period. The manual extraction of quantitative information from these data regarding resident species is necessary to link the image time series information to data from other sensors but requires computational support to overcome the bottleneck problem in manual analysis. As a priori knowledge about the objects of interest in the images is almost never available, computational methods are required that are not dependent on the posterior availability of a large training data set of annotated images. In this paper, we propose a new strategy for collecting and using training data for machine learning-based observatory image interpretation much more efficiently. The method combines the training efficiency of a special active learning procedure with the advantages of deep learning feature representations. The method is tested on two highly disparate data sets. In our experiments, we can show that the proposed method ALMI achieves on one data set a classification accuracy A > 90% with less than N = 258 data samples and A > 80% after N = 150 iterations, i.e., training samples, on the other data set outperforming the reference method regarding accuracy and training data required.

5.
Bioinformatics ; 35(10): 1802-1804, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30346487

RESUMO

MOTIVATION: Live cell imaging plays a pivotal role in understanding cell growth. Yet, there is a lack of visualization alternatives for quick qualitative characterization of colonies. RESULTS: SeeVis is a Python workflow for automated and qualitative visualization of time-lapse microscopy data. It automatically pre-processes the movie frames, finds particles, traces their trajectories and visualizes them in a space-time cube offering three different color mappings to highlight different features. It supports the user in developing a mental model for the data. SeeVis completes these steps in 1.15 s/frame and creates a visualization with a selected color mapping. AVAILABILITY AND IMPLEMENTATION: https://github.com/ghattab/seevis/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Microfluídica , Software , Microscopia , Fluxo de Trabalho
6.
BMC Bioinformatics ; 20(1): 303, 2019 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-31164082

RESUMO

BACKGROUND: The spatial distribution and colocalization of functionally related metabolites is analysed in order to investigate the spatial (and functional) aspects of molecular networks. We propose to consider community detection for the analysis of m/z-images to group molecules with correlative spatial distribution into communities so they hint at functional networks or pathway activity. To detect communities, we investigate a spectral approach by optimizing the modularity measure. We present an analysis pipeline and an online interactive visualization tool to facilitate explorative analysis of the results. The approach is illustrated with synthetical benchmark data and two real world data sets (barley seed and glioblastoma section). RESULTS: For the barley sample data set, our approach is able to reproduce the findings of a previous work that identified groups of molecules with distributions that correlate with anatomical structures of the barley seed. The analysis of glioblastoma section data revealed that some molecular compositions are locally focused, indicating the existence of a meaningful separation in at least two areas. This result is in line with the prior histological knowledge. In addition to confirming prior findings, the resulting graph structures revealed new subcommunities of m/z-images (i.e. metabolites) with more detailed distribution patterns. Another result of our work is the development of an interactive webtool called GRINE (Analysis of GRaph mapped Image Data NEtworks). CONCLUSIONS: The proposed method was successfully applied to identify molecular communities of laterally co-localized molecules. For both application examples, the detected communities showed inherent substructures that could easily be investigated with the proposed visualization tool. This shows the potential of this approach as a complementary addition to pixel clustering methods.


Assuntos
Visualização de Dados , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Neoplasias Encefálicas/patologia , Análise por Conglomerados , Glioblastoma/patologia , Hordeum , Humanos , Análise de Componente Principal , Sementes/anatomia & histologia , Sementes/química
7.
Bioinformatics ; 29(19): 2452-9, 2013 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-23918246

RESUMO

MOTIVATION: The research area metabolomics achieved tremendous popularity and development in the last couple of years. Owing to its unique interdisciplinarity, it requires to combine knowledge from various scientific disciplines. Advances in the high-throughput technology and the consequently growing quality and quantity of data put new demands on applied analytical and computational methods. Exploration of finally generated and analyzed datasets furthermore relies on powerful tools for data mining and visualization. RESULTS: To cover and keep up with these requirements, we have created MeltDB 2.0, a next-generation web application addressing storage, sharing, standardization, integration and analysis of metabolomics experiments. New features improve both efficiency and effectivity of the entire processing pipeline of chromatographic raw data from pre-processing to the derivation of new biological knowledge. First, the generation of high-quality metabolic datasets has been vastly simplified. Second, the new statistics tool box allows to investigate these datasets according to a wide spectrum of scientific and explorative questions. AVAILABILITY: The system is publicly available at https://meltdb.cebitec.uni-bielefeld.de. A login is required but freely available.


Assuntos
Metabolômica/métodos , Design de Software , Análise por Conglomerados , Mineração de Dados , Bases de Dados Genéticas , Internet
8.
Mol Cell Proteomics ; 11(8): 512-26, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22493176

RESUMO

Liquid chromatography coupled to tandem mass spectrometry in combination with stable-isotope labeling is an established and widely spread method to measure gene expression on the protein level. However, it is often not considered that two opposing processes are responsible for the amount of a protein in a cell--the synthesis as well as the degradation. With this work, we provide an integrative, high-throughput method--from the experimental setup to the bioinformatics analysis--to measure synthesis and degradation rates of an organism's proteome. Applicability of the approach is demonstrated with an investigation of heat shock response, a well-understood regulatory mechanism in bacteria, on the biotechnologically relevant Corynebacterium glutamicum. Utilizing a multilabeling approach using both heavy stable nitrogen as well as carbon isotopes cells are metabolically labeled in a pulse-chase experiment to trace the labels' incorporation in newly synthesized proteins and its loss during protein degradation. Our work aims not only at the calculation of protein turnover rates but also at their statistical evaluation, including variance and hierarchical cluster analysis using the rich internet application QuPE.


Assuntos
Proteínas de Bactérias/metabolismo , Biologia Computacional/métodos , Corynebacterium glutamicum/metabolismo , Proteômica/métodos , Sequência de Aminoácidos , Proteínas de Bactérias/análise , Proteínas de Bactérias/classificação , Isótopos de Carbono , Cromatografia Líquida , Análise por Conglomerados , Corynebacterium glutamicum/crescimento & desenvolvimento , Resposta ao Choque Térmico , Temperatura Alta , Internet , Marcação por Isótopo/métodos , Dados de Sequência Molecular , Isótopos de Nitrogênio , Peptídeos/análise , Peptídeos/metabolismo , Proteólise , Reprodutibilidade dos Testes , Espectrometria de Massas por Ionização por Electrospray , Temperatura
9.
Bioinformatics ; 28(8): 1143-50, 2012 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-22390938

RESUMO

MOTIVATION: Bioimaging techniques rapidly develop toward higher resolution and dimension. The increase in dimension is achieved by different techniques such as multitag fluorescence imaging, Matrix Assisted Laser Desorption / Ionization (MALDI) imaging or Raman imaging, which record for each pixel an N-dimensional intensity array, representing local abundances of molecules, residues or interaction patterns. The analysis of such multivariate bioimages (MBIs) calls for new approaches to support users in the analysis of both feature domains: space (i.e. sample morphology) and molecular colocation or interaction. In this article, we present our approach WHIDE (Web-based Hyperbolic Image Data Explorer) that combines principles from computational learning, dimension reduction and visualization in a free web application. RESULTS: We applied WHIDE to a set of MBI recorded using the multitag fluorescence imaging Toponome Imaging System. The MBI show field of view in tissue sections from a colon cancer study and we compare tissue from normal/healthy colon with tissue classified as tumor. Our results show, that WHIDE efficiently reduces the complexity of the data by mapping each of the pixels to a cluster, referred to as Molecular Co-Expression Phenotypes and provides a structural basis for a sophisticated multimodal visualization, which combines topology preserving pseudocoloring with information visualization. The wide range of WHIDE's applicability is demonstrated with examples from toponome imaging, high content screens and MALDI imaging (shown in the Supplementary Material). AVAILABILITY AND IMPLEMENTATION: The WHIDE tool can be accessed via the BioIMAX website http://ani.cebitec.uni-bielefeld.de/BioIMAX/; Login: whidetestuser; Password: whidetest.


Assuntos
Mineração de Dados , Diagnóstico por Imagem/métodos , Neoplasias do Colo/patologia , Humanos , Internet , Análise de Componente Principal , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Análise Espectral Raman
10.
PLoS One ; 18(7): e0282723, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37467187

RESUMO

Fixed underwater observatories (FUO), equipped with digital cameras and other sensors, become more commonly used to record different kinds of time series data for marine habitat monitoring. With increasing numbers of campaigns, numbers of sensors and campaign time, the volume and heterogeneity of the data, ranging from simple temperature time series to series of HD images or video call for new data science approaches to analyze the data. While some works have been published on the analysis of data from one campaign, we address the problem of analyzing time series data from two consecutive monitoring campaigns (starting late 2017 and late 2018) in the same habitat. While the data from campaigns in two separate years provide an interesting basis for marine biology research, it also presents new data science challenges, like the the marine image analysis in data form more than one campaign. In this paper, we analyze the polyp activity of two Paragorgia arborea cold water coral (CWC) colonies using FUO data collected from November 2017 to June 2018 and from December 2018 to April 2019. We successfully apply convolutional neural networks (CNN) for the segmentation and classification of the coral and the polyp activities. The result polyp activity data alone showed interesting temporal patterns with differences and similarities between the two time periods. A one month "sleeping" period in spring with almost no activity was observed in both coral colonies, but with a shift of approximately one month. A time series prediction experiment allowed us to predict the polyp activity from the non-image sensor data using recurrent neural networks (RNN). The results pave a way to a new multi-sensor monitoring strategy for Paragorgia arborea behaviour.


Assuntos
Antozoários , Animais , Ciência de Dados , Ecossistema , Água , Redes Neurais de Computação
11.
PLoS One ; 18(2): e0272103, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36827378

RESUMO

Diatoms represent one of the morphologically and taxonomically most diverse groups of microscopic eukaryotes. Light microscopy-based taxonomic identification and enumeration of frustules, the silica shells of these microalgae, is broadly used in aquatic ecology and biomonitoring. One key step in emerging digital variants of such investigations is segmentation, a task that has been addressed before, but usually in manually captured megapixel-sized images of individual diatom cells with a mostly clean background. In this paper, we applied deep learning-based segmentation methods to gigapixel-sized, high-resolution scans of diatom slides with a realistically cluttered background. This setup requires large slide scans to be subdivided into small images (tiles) to apply a segmentation model to them. This subdivision (tiling), when done using a sliding window approach, often leads to cropping relevant objects at the boundaries of individual tiles. We hypothesized that in the case of diatom analysis, reducing the amount of such cropped objects in the training data can improve segmentation performance by allowing for a better discrimination of relevant, intact frustules or valves from small diatom fragments, which are considered irrelevant when counting diatoms. We tested this hypothesis by comparing a standard sliding window / fixed-stride tiling approach with two new approaches we term object-based tile positioning with and without object integrity constraint. With all three tiling approaches, we trained Mask-R-CNN and U-Net models with different amounts of training data and compared their performance. Object-based tiling with object integrity constraint led to an improvement in pixel-based precision by 12-17 percentage points without substantially impairing recall when compared with standard sliding window tiling. We thus propose that training segmentation models with object-based tiling schemes can improve diatom segmentation from large gigapixel-sized images but could potentially also be relevant for other image domains.


Assuntos
Aprendizado Profundo , Diatomáceas , Microscopia , Processamento de Imagem Assistida por Computador/métodos
12.
Sci Data ; 9(1): 414, 2022 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-35840583

RESUMO

Underwater images are used to explore and monitor ocean habitats, generating huge datasets with unusual data characteristics that preclude traditional data management strategies. Due to the lack of universally adopted data standards, image data collected from the marine environment are increasing in heterogeneity, preventing objective comparison. The extraction of actionable information thus remains challenging, particularly for researchers not directly involved with the image data collection. Standardized formats and procedures are needed to enable sustainable image analysis and processing tools, as are solutions for image publication in long-term repositories to ascertain reuse of data. The FAIR principles (Findable, Accessible, Interoperable, Reusable) provide a framework for such data management goals. We propose the use of image FAIR Digital Objects (iFDOs) and present an infrastructure environment to create and exploit such FAIR digital objects. We show how these iFDOs can be created, validated, managed and stored, and which data associated with imagery should be curated. The goal is to reduce image management overheads while simultaneously creating visibility for image acquisition and publication efforts.

13.
BMC Bioinformatics ; 12: 297, 2011 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-21777450

RESUMO

BACKGROUND: Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or findings with collaborating experts from various places. RESULTS: In this paper, we present a novel Web2.0 approach, BioIMAX, for the collaborative exploration and analysis of multivariate image data by combining the webs collaboration and distribution architecture with the interface interactivity and computation power of desktop applications, recently called rich internet application. CONCLUSIONS: BioIMAX allows scientists to discuss and share data or results with collaborating experts and to visualize, annotate, and explore multivariate image data within one web-based platform from any location via a standard web browser requiring only a username and a password. BioIMAX can be accessed at http://ani.cebitec.uni-bielefeld.de/BioIMAX with the username "test" and the password "test1" for testing purposes.


Assuntos
Processamento de Imagem Assistida por Computador , Software , Instrução por Computador , Comportamento Cooperativo , Diagnóstico por Imagem , Internet
14.
Proteome Sci ; 9: 30, 2011 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-21663690

RESUMO

BACKGROUND: Mass spectrometry-based proteomics has reached a stage where it is possible to comprehensively analyze the whole proteome of a cell in one experiment. Here, the employment of stable isotopes has become a standard technique to yield relative abundance values of proteins. In recent times, more and more experiments are conducted that depict not only a static image of the up- or down-regulated proteins at a distinct time point but instead compare developmental stages of an organism or varying experimental conditions. RESULTS: Although the scientific questions behind these experiments are of course manifold, there are, nevertheless, two questions that commonly arise: 1) which proteins are differentially regulated regarding the selected experimental conditions, and 2) are there groups of proteins that show similar abundance ratios, indicating that they have a similar turnover? We give advice on how these two questions can be answered and comprehensively compare a variety of commonly applied computational methods and their outcomes. CONCLUSIONS: This work provides guidance through the jungle of computational methods to analyze mass spectrometry-based isotope-labeled datasets and recommends an effective and easy-to-use evaluation strategy. We demonstrate our approach with three recently published datasets on Bacillus subtilis 12 and Corynebacterium glutamicum 3. Special focus is placed on the application and validation of cluster analysis methods. All applied methods were implemented within the rich internet application QuPE 4. Results can be found at http://qupe.cebitec.uni-bielefeld.de.

15.
MethodsX ; 8: 101218, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34434741

RESUMO

The present work describes a new computer-assisted image analysis method for the rapid, simple, objective and reproducible quantification of actively discharged fungal spores which can serve as a manual for laboratories working in this context. The method can be used with conventional laboratory equipment by using bright field microscopes, standard scanners and the open-source software ImageJ. Compared to other conidia quantification methods by computer-assisted image analysis, the presented method bears a higher potential to be applied for large-scale sample quantities. The key to make quantification faster is the calculation of the linear relationship between the gray value and the automatically counted number of conidia that has only to be performed once in the beginning of analysis. Afterwards, the gray value is used as single parameter for quantification. The fast, easy and objective determination of sporulation capacity enables facilitated quality control of fungal formulations designed for biological pest control.•Rapid, simple, objective and reproducible quantification of fungal sporulation suitable for large-scale sample quantities.•Requires conventional laboratory equipment and open-source software without technical or computational expertise.•The number of automatically counted conidia can be correlated with the gray value and after initial calculation of a linear fit, the gray value can be applied as single quantification parameter.

16.
Sci Rep ; 11(1): 4606, 2021 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-33633175

RESUMO

Mass Spectrometry Imaging (MSI) is an established and still evolving technique for the spatial analysis of molecular co-location in biological samples. Nowadays, MSI is expanding into new domains such as clinical pathology. In order to increase the value of MSI data, software for visual analysis is required that is intuitive and technique independent. Here, we present QUIMBI (QUIck exploration tool for Multivariate BioImages) a new tool for the visual analysis of MSI data. QUIMBI is an interactive visual exploration tool that provides the user with a convenient and straightforward visual exploration of morphological and spectral features of MSI data. To improve the overall quality of MSI data by reducing non-tissue specific signals and to ensure optimal compatibility with QUIMBI, the tool is combined with the new pre-processing tool ProViM (Processing for Visualization and multivariate analysis of MSI Data), presented in this work. The features of the proposed visual analysis approach for MSI data analysis are demonstrated with two use cases. The results show that the use of ProViM and QUIMBI not only provides a new fast and intuitive visual analysis, but also allows the detection of new co-location patterns in MSI data that are difficult to find with other methods.


Assuntos
Diagnóstico por Imagem/métodos , Processamento de Imagem Assistida por Computador/métodos , Espectrometria de Massas/métodos , Animais , Humanos , Rim/anatomia & histologia , Masculino , Camundongos , Pseudoxantoma Elástico/patologia , Pele/patologia , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Vibrissas/anatomia & histologia
17.
Bioinformatics ; 25(23): 3128-34, 2009 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-19808875

RESUMO

MOTIVATION: The goal of present -omics sciences is to understand biological systems as a whole in terms of interactions of the individual cellular components. One of the main building blocks in this field of study is proteomics where tandem mass spectrometry (LC-MS/MS) in combination with isotopic labelling techniques provides a common way to obtain a direct insight into regulation at the protein level. Methods to identify and quantify the peptides contained in a sample are well established, and their output usually results in lists of identified proteins and calculated relative abundance values. The next step is to move ahead from these abstract lists and apply statistical inference methods to compare measurements, to identify genes that are significantly up- or down-regulated, or to detect clusters of proteins with similar expression profiles. RESULTS: We introduce the Rich Internet Application (RIA) Qupe providing comprehensive data management and analysis functions for LC-MS/MS experiments. Starting with the import of mass spectra data the system guides the experimenter through the process of protein identification by database search, the calculation of protein abundance ratios, and in particular, the statistical evaluation of the quantification results including multivariate analysis methods such as analysis of variance or hierarchical cluster analysis. While a data model to store these results has been developed, a well-defined programming interface facilitates the integration of novel approaches. A compute cluster is utilized to distribute computationally intensive calculations, and a web service allows to interchange information with other -omics software applications. To demonstrate that Qupe represents a step forward in quantitative proteomics analysis an application study on Corynebacterium glutamicum has been carried out. AVAILABILITY AND IMPLEMENTATION: Qupe is implemented in Java utilizing Hibernate, Echo2, R and the Spring framework. We encourage the usage of the RIA in the sense of the 'software as a service' concept, maintained on our servers and accessible at the following location: http://qupe.cebitec.uni-bielefeld.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Espectrometria de Massas/métodos , Proteoma/análise , Proteômica/métodos , Software , Bases de Dados de Proteínas , Internet
18.
Nucleic Acids Res ; 36(7): 2230-9, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18285365

RESUMO

Metagenomics is providing striking insights into the ecology of microbial communities. The recently developed massively parallel 454 pyrosequencing technique gives the opportunity to rapidly obtain metagenomic sequences at a low cost and without cloning bias. However, the phylogenetic analysis of the short reads produced represents a significant computational challenge. The phylogenetic algorithm CARMA for predicting the source organisms of environmental 454 reads is described. The algorithm searches for conserved Pfam domain and protein families in the unassembled reads of a sample. These gene fragments (environmental gene tags, EGTs), are classified into a higher-order taxonomy based on the reconstruction of a phylogenetic tree of each matching Pfam family. The method exhibits high accuracy for a wide range of taxonomic groups, and EGTs as short as 27 amino acids can be phylogenetically classified up to the rank of genus. The algorithm was applied in a comparative study of three aquatic microbial samples obtained by 454 pyrosequencing. Profound differences in the taxonomic composition of these samples could be clearly revealed.


Assuntos
Algoritmos , Microbiologia Ambiental , Genômica/métodos , Filogenia , DNA/classificação , RNA Ribossômico 16S/classificação , Software , Microbiologia da Água
19.
Front Artif Intell ; 3: 49, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33733166

RESUMO

Deep artificial neural networks have become the go-to method for many machine learning tasks. In the field of computer vision, deep convolutional neural networks achieve state-of-the-art performance for tasks such as classification, object detection, or instance segmentation. As deep neural networks become more and more complex, their inner workings become more and more opaque, rendering them a "black box" whose decision making process is no longer comprehensible. In recent years, various methods have been presented that attempt to peek inside the black box and to visualize the inner workings of deep neural networks, with a focus on deep convolutional neural networks for computer vision. These methods can serve as a toolbox to facilitate the design and inspection of neural networks for computer vision and the interpretation of the decision making process of the network. Here, we present the new tool Interactive Feature Localization in Deep neural networks (IFeaLiD) which provides a novel visualization approach to convolutional neural network layers. The tool interprets neural network layers as multivariate feature maps and visualizes the similarity between the feature vectors of individual pixels of an input image in a heat map display. The similarity display can reveal how the input image is perceived by different layers of the network and how the perception of one particular image region compares to the perception of the remaining image. IFeaLiD runs interactively in a web browser and can process even high resolution feature maps in real time by using GPU acceleration with WebGL 2. We present examples from four computer vision datasets with feature maps from different layers of a pre-trained ResNet101. IFeaLiD is open source and available online at https://ifealid.cebitec.uni-bielefeld.de.

20.
Sci Rep ; 10(1): 14416, 2020 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-32879374

RESUMO

Deep convolutional neural networks are emerging as the state of the art method for supervised classification of images also in the context of taxonomic identification. Different morphologies and imaging technologies applied across organismal groups lead to highly specific image domains, which need customization of deep learning solutions. Here we provide an example using deep convolutional neural networks (CNNs) for taxonomic identification of the morphologically diverse microalgal group of diatoms. Using a combination of high-resolution slide scanning microscopy, web-based collaborative image annotation and diatom-tailored image analysis, we assembled a diatom image database from two Southern Ocean expeditions. We use these data to investigate the effect of CNN architecture, background masking, data set size and possible concept drift upon image classification performance. Surprisingly, VGG16, a relatively old network architecture, showed the best performance and generalizing ability on our images. Different from a previous study, we found that background masking slightly improved performance. In general, training only a classifier on top of convolutional layers pre-trained on extensive, but not domain-specific image data showed surprisingly high performance (F1 scores around 97%) with already relatively few (100-300) examples per class, indicating that domain adaptation to a novel taxonomic group can be feasible with a limited investment of effort.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA