Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38441258

RESUMO

MOTIVATION: Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices. RESULTS: We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships. AVAILABILITY AND IMPLEMENTATION: Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Reprodutibilidade dos Testes , Incerteza , Aprendizado de Máquina , Análise de Célula Única , Análise de Sequência de RNA
2.
BMC Bioinformatics ; 25(1): 59, 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38321386

RESUMO

The prediction of interactions between novel drugs and biological targets is a vital step in the early stage of the drug discovery pipeline. Many deep learning approaches have been proposed over the last decade, with a substantial fraction of them sharing the same underlying two-branch architecture. Their distinction is limited to the use of different types of feature representations and branches (multi-layer perceptrons, convolutional neural networks, graph neural networks and transformers). In contrast, the strategy used to combine the outputs (embeddings) of the branches has remained mostly the same. The same general architecture has also been used extensively in the area of recommender systems, where the choice of an aggregation strategy is still an open question. In this work, we investigate the effectiveness of three different embedding aggregation strategies in the area of drug-target interaction (DTI) prediction. We formally define these strategies and prove their universal approximator capabilities. We then present experiments that compare the different strategies on benchmark datasets from the area of DTI prediction, showcasing conditions under which specific strategies could be the obvious choice.


Assuntos
Benchmarking , Descoberta de Drogas , Fontes de Energia Elétrica , Redes Neurais de Computação
3.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33834200

RESUMO

The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.


Assuntos
Aprendizado Profundo , Escherichia coli/genética , Genoma Bacteriano , Genômica/métodos , Sítio de Iniciação de Transcrição , Sequência de Bases , Sítios de Ligação , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Escherichia coli/metabolismo , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
4.
Bioinformatics ; 38(3): 597-603, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34718418

RESUMO

MOTIVATION: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. RESULTS: We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget. AVAILABILITY AND IMPLEMENTATION: CpG Transformer is freely available at https://github.com/gdewael/cpg-transformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Epigenoma , Sequência de Bases , Análise de Sequência de DNA/métodos , Redes Neurais de Computação
5.
Brief Bioinform ; 21(1): 262-271, 2020 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30329015

RESUMO

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models. The machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package: https://github.com/aatapa/RLScore.

6.
Nucleic Acids Res ; 47(6): e36, 2019 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-30753697

RESUMO

Annotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and ribosome binding translation initiation sequence region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, used for the identification of open reading frames in prokaryotes without a priori knowledge of the translational landscape. Through extensive validation of the model trained on various sets of data, multiple species sequence similarity, mass spectrometry and Edman degradation verified proteins, the effectiveness of DeepRibo is highlighted.


Assuntos
Algoritmos , Anotação de Sequência Molecular/métodos , Células Procarióticas/metabolismo , Biossíntese de Proteínas/fisiologia , Ribossomos/metabolismo , Sítios de Ligação , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Ensaios de Triagem em Larga Escala/métodos , Redes Neurais de Computação , Fases de Leitura Aberta , Células Procarióticas/química , Processamento de Proteína Pós-Traducional , Alinhamento de Sequência/métodos , Transdução de Sinais
7.
Anal Chem ; 92(11): 7523-7531, 2020 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-32330016

RESUMO

In diagnostics of infectious diseases, matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry (MALDI-TOF MS) can be applied for the identification of pathogenic microorganisms. However, to achieve a trustworthy identification from MALDI-TOF MS data, a significant amount of biomass should be considered. The bacterial load that potentially occurs in a sample is therefore routinely amplified by culturing, which is a time-consuming procedure. In this paper, we show that culturing can be avoided by conducting MALDI-TOF MS on individual bacterial cells. This results in a more rapid identification of species with an acceptable accuracy. We propose a deep learning architecture to analyze the data and compare its performance with traditional supervised machine learning algorithms. We illustrate our workflow on a large data set that contains bacterial species related to urinary tract infections. Overall we obtain accuracies up to 85% in discriminating five different species.


Assuntos
Aprendizado Profundo , Bactérias Gram-Negativas/citologia , Bactérias Gram-Negativas/patogenicidade , Bactérias Gram-Positivas/citologia , Bactérias Gram-Positivas/patogenicidade , Análise de Célula Única , Aerossóis/química , Bactérias Gram-Negativas/isolamento & purificação , Bactérias Gram-Positivas/isolamento & purificação , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz
8.
Cytometry A ; 97(7): 713-726, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31889414

RESUMO

Investigating phenotypic heterogeneity can help to better understand and manage microbial communities. However, characterizing phenotypic heterogeneity remains a challenge, as there is no standardized analysis framework. Several optical tools are available, such as flow cytometry and Raman spectroscopy, which describe optical properties of the individual cell. In this work, we compare Raman spectroscopy and flow cytometry to study phenotypic heterogeneity in bacterial populations. The growth stages of three replicate Escherichia coli populations were characterized using both technologies. Our findings show that flow cytometry detects and quantifies shifts in phenotypic heterogeneity at the population level due to its high-throughput nature. Raman spectroscopy, on the other hand, offers a much higher resolution at the single-cell level (i.e., more biochemical information is recorded). Therefore, it can identify distinct phenotypic populations when coupled with analyses tailored toward single-cell data. In addition, it provides information about biomolecules that are present, which can be linked to cell functionality. We propose a computational workflow to distinguish between bacterial phenotypic populations using Raman spectroscopy and validated this approach with an external data set. We recommend using flow cytometry to quantify phenotypic heterogeneity at the population level, and Raman spectroscopy to perform a more in-depth analysis of heterogeneity at the single-cell level. © 2019 International Society for Advancement of Cytometry.


Assuntos
Bactérias , Análise Espectral Raman , Escherichia coli/genética , Citometria de Fluxo , Fenótipo , Análise de Célula Única
9.
Cytometry A ; 95(7): 782-791, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31099963

RESUMO

Recent years have seen an increased interest in employing data analysis techniques for the automated identification of cell populations in the field of cytometry. These techniques highly depend on the use of a distance metric, a function that quantifies the distances between single-cell measurements. In most cases, researchers simply use the Euclidean distance metric. In this article, we exploit the availability of single-cell labels to find an optimal Mahalanobis distance metric derived from the data. We show that such a Mahalanobis distance metric results in an improved identification of cell populations compared with the Euclidean distance metric. Once determined, it can be used for the analysis of multiple samples that were measured under the same experimental setup. We illustrate this approach for cytometry data from two different origins, that is, flow cytometry applied to microbial cells and mass cytometry for the analysis of human blood cells. We also illustrate that such a distance metric results in an improved identification of cell populations when clustering methods are employed. Generally, these results imply that the performance of data analysis techniques can be improved by using a more advanced distance metric. © 2019 International Society for Advancement of Cytometry.


Assuntos
Citometria de Fluxo/métodos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Bactérias/citologia , Células Sanguíneas/citologia , Análise por Conglomerados , Humanos , Microbiota , Análise de Célula Única
10.
Appl Environ Microbiol ; 85(8)2019 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-30796063

RESUMO

Isogenic bacterial populations are known to exhibit phenotypic heterogeneity at the single-cell level. Because of difficulties in assessing the phenotypic heterogeneity of a single taxon in a mixed community, the importance of this deeper level of organization remains relatively unknown for natural communities. In this study, we have used membrane-based microcosms that allow the probing of the phenotypic heterogeneity of a single taxon while interacting with a synthetic or natural community. Individual taxa were studied under axenic conditions, as members of a coculture with physical separation, and as a mixed culture. Phenotypic heterogeneity was assessed through both flow cytometry and Raman spectroscopy. Using this setup, we investigated the effect of microbial interactions on the individual phenotypic heterogeneities of two interacting drinking water isolates. Through flow cytometry we have demonstrated that interactions between these bacteria lead to a reduction of their individual phenotypic diversities and that this adjustment is conditional on the bacterial taxon. Single-cell Raman spectroscopy confirmed a taxon-dependent phenotypic shift due to the interaction. In conclusion, our data suggest that bacterial interactions may be a general driver of phenotypic heterogeneity in mixed microbial populations.IMPORTANCE Laboratory studies have shown the impact of phenotypic heterogeneity on the survival and functionality of isogenic populations. Because phenotypic heterogeneity plays an important role in pathogenicity and virulence, antibiotic resistance, biotechnological applications, and ecosystem properties, it is crucial to understand its influencing factors. An unanswered question is whether bacteria in mixed communities influence the phenotypic heterogeneity of their community partners. We found that coculturing bacteria leads to a reduction in their individual phenotypic heterogeneities, which led us to the hypothesis that the individual phenotypic diversity of a taxon is dependent on the community composition.


Assuntos
Cultura Axênica , Bactérias/crescimento & desenvolvimento , Fenômenos Fisiológicos Bacterianos , Técnicas de Cocultura , Interações Microbianas/fisiologia , Bactérias/genética , Biodiversidade , DNA Bacteriano , Ecossistema , Enterobacter/genética , Enterobacter/crescimento & desenvolvimento , Enterobacter/fisiologia , Meio Ambiente , Microbiologia Ambiental , Citometria de Fluxo , Heterogeneidade Genética , Fenótipo , Pseudomonas/genética , Pseudomonas/crescimento & desenvolvimento , Pseudomonas/fisiologia , Virulência
11.
Nucleic Acids Res ; 45(7): e51, 2017 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-27986855

RESUMO

In microRNA (miRNA) target prediction, typically two levels of information need to be modeled: the number of potential miRNA binding sites present in a target mRNA and the genomic context of each individual site. Single model structures insufficiently cope with this complex training data structure, consisting of feature vectors of unequal length as a consequence of the varying number of miRNA binding sites in different mRNAs. To circumvent this problem, we developed a two-layered, stacked model, in which the influence of binding site context is separately modeled. Using logistic regression and random forests, we applied the stacked model approach to a unique data set of 7990 probed miRNA-mRNA interactions, hereby including the largest number of miRNAs in model training to date. Compared to lower-complexity models, a particular stacked model, named miSTAR (miRNA stacked model target prediction; www.mi-star.org), displays a higher general performance and precision on top scoring predictions. More importantly, our model outperforms published and widely used miRNA target prediction algorithms. Finally, we highlight flaws in cross-validation schemes for evaluation of miRNA target prediction models and adopt a more fair and stringent approach.


Assuntos
Regiões 3' não Traduzidas , MicroRNAs/metabolismo , Modelos Genéticos , Algoritmos , Sítios de Ligação , Humanos , Aprendizado de Máquina , RNA Mensageiro/metabolismo , Software
12.
Neural Comput ; 30(8): 2245-2283, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29894652

RESUMO

Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.

13.
Cytometry A ; 91(12): 1184-1191, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29165907

RESUMO

Multicolor approaches are challenging for microbial flow cytometry; as flow cytometers are mainly developed for biomedical applications, modern instruments contain more detectors than needed. Some of these additional fluorescence detectors measure biological information due to spectral overlap, yet the extent to which this information is relevant for the identification of bacterial populations is ambiguous. In this paper we characterize the usefulness of these additional detectors. We propose a data-driven detector selection method to select the smallest subset of detectors that will optimally discriminate between bacterial populations. Using a detector elimination strategy, we show that one or more detectors can be removed without loss of resolving power. A number of additional detectors are included in the final subset, which help to improve the identification of bacterial populations. Experimental data were retrieved from two types of modern cytometers with different configurations. The method reveals a clear ordering of detector importances, which depends on the instrument from which the data were retrieved. In addition, we were able to pinpoint unexpected behavior of SYBR Green I in the red spectrum. As the field of microbial flow cytometry is maturing, these results motivate the construction of a different kind of cytometric instruments for microbiologists, for which the number of detectors is reduced, but tailored toward the characteristics of microbial experiments. © 2017 International Society for Advancement of Cytometry.


Assuntos
Bactérias/isolamento & purificação , Citometria de Fluxo/métodos
14.
Nat Plants ; 10(3): 390-401, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38467801

RESUMO

Scientific testing including stable isotope ratio analysis (SIRA) and trace element analysis (TEA) is critical for establishing plant origin, tackling deforestation and enforcing economic sanctions. Yet methods combining SIRA and TEA into robust models for origin verification and determination are lacking. Here we report a (1) large Eastern European timber reference database (Betula, Fagus, Pinus, Quercus) tailored to sanctioned products following the Ukraine invasion; (2) statistical test to verify samples against a claimed origin; (3) probabilistic model of SIRA, TEA and genus distribution data, using Gaussian processes, to determine timber harvest location. Our verification method rejects 40-60% of simulated false claims, depending on the spatial scale of the claim, and maintains a low probability of rejecting correct origin claims. Our determination method predicts harvest location within 180 to 230 km of true location. Our results showcase the power of combining data types with probabilistic modelling to identify and scrutinize timber harvest location claims.


Assuntos
Fagus , Pinus , Ucrânia , Betula , Genes de Plantas
15.
Artigo em Inglês | MEDLINE | ID: mdl-33125335

RESUMO

In genomics, a wide range of machine learning methodologies have been investigated to annotate biological sequences for positions of interest such as transcription start sites, translation initiation sites, methylation sites, splice sites and promoter start sites. In recent years, this area has been dominated by convolutional neural networks, which typically outperform previously-designed methods as a result of automated scanning for influential sequence motifs. However, those architectures do not allow for the efficient processing of the full genomic sequence. As an improvement, we introduce transformer architectures for whole genome sequence labeling tasks. We show that these architectures, recently introduced for natural language processing, are better suited for processing and annotating long DNA sequences. We apply existing networks and introduce an optimized method for the calculation of attention from input nucleotides. To demonstrate this, we evaluate our architecture on several sequence labeling tasks, and find it to achieve state-of-the-art performances when comparing it to specialized models for the annotation of transcription start sites, translation initiation sites and 4mC methylation in E. coli.


Assuntos
Escherichia coli , Genômica , Sequência de Bases , Aprendizado de Máquina , Redes Neurais de Computação
16.
Intensive Crit Care Nurs ; 68: 103117, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34393009

RESUMO

OBJECTIVE: To determine risk factors for pressure injury in distinct intensive care subpopulations according to admission type (Medical; Surgical elective; Surgery emergency; Trauma/Burns). METHODOLOGY/DESIGN: Predictive modelling using generalised linear mixed models with backward elimination on prospectively gathered data of 13 044 adult intensive care patients. SETTINGS: 1110 intensive care units, 89 countries worldwide. MAIN OUTCOME MEASURES: Pressure injury risk factors. RESULTS: A generalised linear mixed model including admission type outperformed a model without admission type (p = 0.004). Admission type Trauma/Burns was not withheld in the model and excluded from further analyses. For the other three admission types (Medical, Surgical elective, and Surgical emergency), backward elimination resulted in distinct prediction models with 23, 17, and 16 predictors, respectively, and five common predictors only. The Area Under the Receiver Operating Curve was 0.79 for Medical admissions; and 0.88 for both the Surgical elective and Surgical emergency models. CONCLUSIONS: Risk factors for pressure injury differ according to whether intensive care patients have been admitted for medical reasons, or elective or emergency surgery. Prediction models for pressure injury should target distinct subpopulations with differing pressure injury risk profiles. Type of intensive care admission is a simple and easily retrievable parameter to distinguish between such subgroups.


Assuntos
Cuidados Críticos , Unidades de Terapia Intensiva , Úlcera por Pressão , Adulto , Humanos , Mortalidade Hospitalar , Hospitalização , Estudos Retrospectivos , Fatores de Risco , Curva ROC
17.
Methods Mol Biol ; 2516: 51-59, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35922621

RESUMO

A major goal in synthetic biology is the engineering of synthetic gene circuits with a predictable, controlled and designed outcome. This creates a need for building blocks that can modulate gene expression without interference with the native cell system. A tool allowing forward engineering of promoters with predictable transcription initiation frequency is still lacking. Promoter libraries specific for σ70 to ensure the orthogonality of gene expression were built in Escherichia coli and labeled using fluorescence-activated cell sorting to obtain high-throughput DNA sequencing data to train a convolutional neural network. We were able to confirm in vivo that the model is able to predict the promoter transcription initiation frequency (TIF) of new promoter sequences. Here, we provide an online tool for promoter design (ProD) in E. coli, which can be used to tailor output sequences of desired promoter TIF or predict the TIF of a custom sequence.


Assuntos
Proteínas de Escherichia coli , Escherichia coli , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Regiões Promotoras Genéticas , Biologia Sintética
18.
mSphere ; 6(1)2021 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-33536320

RESUMO

Microbial flow cytometry can rapidly characterize the status of microbial communities. Upon measurement, large amounts of quantitative single-cell data are generated, which need to be analyzed appropriately. Cytometric fingerprinting approaches are often used for this purpose. Traditional approaches either require a manual annotation of regions of interest, do not fully consider the multivariate characteristics of the data, or result in many community-describing variables. To address these shortcomings, we propose an automated model-based fingerprinting approach based on Gaussian mixture models, which we call PhenoGMM. The method successfully quantifies changes in microbial community structure based on flow cytometry data, which can be expressed in terms of cytometric diversity. We evaluate the performance of PhenoGMM using data sets from both synthetic and natural ecosystems and compare the method with a generic binning fingerprinting approach. PhenoGMM supports the rapid and quantitative screening of microbial community structure and dynamics.IMPORTANCE Microorganisms are vital components in various ecosystems on Earth. In order to investigate the microbial diversity, researchers have largely relied on the analysis of 16S rRNA gene sequences from DNA. Flow cytometry has been proposed as an alternative technology to characterize microbial community diversity and dynamics. The technology enables a fast measurement of optical properties of individual cells. So-called fingerprinting techniques are needed in order to describe microbial community diversity and dynamics based on flow cytometry data. In this work, we propose a more advanced fingerprinting strategy based on Gaussian mixture models. We evaluated our workflow on data sets from both synthetic and natural ecosystems, illustrating its general applicability for the analysis of microbial flow cytometry data. PhenoGMM supports a rapid and quantitative analysis of microbial community structure using flow cytometry.


Assuntos
Citometria de Fluxo/métodos , Microbiota , Distribuição Normal , Biodiversidade
19.
ISME J ; 15(1): 354-358, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32879459

RESUMO

Variations in the gut microbiome have been associated with changes in health state such as Crohn's disease (CD). Most surveys characterize the microbiome through analysis of the 16S rRNA gene. An alternative technology that can be used is flow cytometry. In this report, we reanalyzed a disease cohort that has been characterized by both technologies. Changes in microbial community structure are reflected in both types of data. We demonstrate that cytometric fingerprints can be used as a diagnostic tool in order to classify samples according to CD state. These results highlight the potential of flow cytometry to perform rapid diagnostics of microbiome-associated diseases.


Assuntos
Doença de Crohn , Microbioma Gastrointestinal , Microbiota , Doença de Crohn/diagnóstico , Fezes , Humanos , RNA Ribossômico 16S/genética
20.
Comput Struct Biotechnol J ; 19: 6157-6168, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34938408

RESUMO

Today machine learning methods are commonly deployed for bacterial species identification using MALDI-TOF mass spectrometry data. However, most of the studies reported in literature only consider very traditional machine learning methods on small datasets that contain a limited number of species. In this paper we present benchmarking results on an unprecedented scale for a wide range of machine learning methods, using datasets that contain almost 100,000 spectra and more than 1000 different species. The size and the diversity of the data allow to compare three important identification scenarios that are often not distinguished in literature, i.e., identification for novel biological replicates, novel strains and novel species that are not present in the training data. The results demonstrate that in all three scenarios acceptable identification rates are obtained, but the numbers are typically lower than those reported in studies with a more limited analysis. Using hierarchical classification methods, we also demonstrate that taxonomic information is in general not well preserved in MALDI-TOF mass spectrometry data. For the novel species scenario, we apply for the first time neural networks with Monte Carlo dropout, which have shown to be successful in other domains, such as computer vision, for the detection of novel species.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA