Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 174, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38698340

RESUMO

BACKGROUND: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .


Assuntos
Biologia Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Bases de Dados de Proteínas , Algoritmos
2.
PLoS Comput Biol ; 18(6): e1010177, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35658018

RESUMO

Engineered microbial cells present a sustainable alternative to fossil-based synthesis of chemicals and fuels. Cellular synthesis routes are readily assembled and introduced into microbial strains using state-of-the-art synthetic biology tools. However, the optimization of the strains required to reach industrially feasible production levels is far less efficient. It typically relies on trial-and-error leading into high uncertainty in total duration and cost. New techniques that can cope with the complexity and limited mechanistic knowledge of the cellular regulation are called for guiding the strain optimization. In this paper, we put forward a multi-agent reinforcement learning (MARL) approach that learns from experiments to tune the metabolic enzyme levels so that the production is improved. Our method is model-free and does not assume prior knowledge of the microbe's metabolic network or its regulation. The multi-agent approach is well-suited to make use of parallel experiments such as multi-well plates commonly used for screening microbial strains. We demonstrate the method's capabilities using the genome-scale kinetic model of Escherichia coli, k-ecoli457, as a surrogate for an in vivo cell behaviour in cultivation experiments. We investigate the method's performance relevant for practical applicability in strain engineering i.e. the speed of convergence towards the optimum response, noise tolerance, and the statistical stability of the solutions found. We further evaluate the proposed MARL approach in improving L-tryptophan production by yeast Saccharomyces cerevisiae, using publicly available experimental data on the performance of a combinatorial strain library. Overall, our results show that multi-agent reinforcement learning is a promising approach for guiding the strain optimization beyond mechanistic knowledge, with the goal of faster and more reliably obtaining industrially attractive production levels.


Assuntos
Engenharia Metabólica , Saccharomyces cerevisiae , Escherichia coli/genética , Escherichia coli/metabolismo , Engenharia Metabólica/métodos , Redes e Vias Metabólicas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Biologia Sintética
3.
Nat Methods ; 16(4): 299-302, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30886413

RESUMO

Mass spectrometry is a predominant experimental technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/), which provides a fast computational approach for molecular structure identification. SIRIUS 4 integrates CSI:FingerID for searching in molecular structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.


Assuntos
Metabolômica/métodos , Estrutura Molecular , Processamento de Sinais Assistido por Computador , Espectrometria de Massas em Tandem/métodos , Algoritmos , Teorema de Bayes , Biomarcadores , Análise por Conglomerados , Biologia Computacional/métodos , Gráficos por Computador , Bases de Dados Factuais , Processamento Eletrônico de Dados , Internet , Isótopos , Funções Verossimilhança , Metaboloma , Redes Neurais de Computação , Linguagens de Programação , Interface Usuário-Computador
4.
Bioinformatics ; 37(12): 1724-1731, 2021 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-33244585

RESUMO

MOTIVATION: Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2). RESULTS: We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining MS2 data and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1. AVAILABILITY AND IMPLEMENTATION: Software and data are freely available at https://github.com/aalto-ics-kepaco/msms_rt_score_integration. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Espectrometria de Massas em Tandem , Cromatografia Líquida , Modelos Estatísticos
5.
Bioinformatics ; 37(Suppl_1): i93-i101, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252952

RESUMO

MOTIVATION: Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine-learning models offer time- and cost-efficient means to aid this process by prioritizing the most effective drug combinations for further pre-clinical and clinical validation. However, the complexity of the underlying interaction patterns across multiple drug doses and in different cellular contexts poses challenges to the predictive modeling of drug combination effects. RESULTS: We introduce comboLTR, highly time-efficient method for learning complex, non-linear target functions for describing the responses of therapeutic agent combinations in various doses and cancer cell-contexts. The method is based on a polynomial regression via powerful latent tensor reconstruction. It uses a combination of recommender system-style features indexing the data tensor of response values in different contexts, and chemical and multi-omics features as inputs. We demonstrate that comboLTR outperforms state-of-the-art methods in terms of predictive performance and running time, and produces highly accurate results even in the challenging and practical inference scenario where full dose-response matrices are predicted for completely new drug combinations with no available combination and monotherapy response measurements in any training cell line. AVAILABILITY AND IMPLEMENTATION: comboLTR code is available at https://github.com/aalto-ics-kepaco/ComboLTR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Neoplasias , Algoritmos , Linhagem Celular , Combinação de Medicamentos , Humanos
6.
PLoS Comput Biol ; 17(5): e1008920, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33945539

RESUMO

Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.


Assuntos
Genética Microbiana/estatística & dados numéricos , Genômica/estatística & dados numéricos , Metabolômica/estatística & dados numéricos , Software , Vias Biossintéticas/genética , Biologia Computacional , Mineração de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Genoma Microbiano , Fenômenos Microbiológicos , Família Multigênica , Análise de Regressão
7.
Bioinformatics ; 35(14): i548-i557, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510676

RESUMO

MOTIVATION: Metabolic flux balance analysis (FBA) is a standard tool in analyzing metabolic reaction rates compatible with measurements, steady-state and the metabolic reaction network stoichiometry. Flux analysis methods commonly place model assumptions on fluxes due to the convenience of formulating the problem as a linear programing model, while many methods do not consider the inherent uncertainty in flux estimates. RESULTS: We introduce a novel paradigm of Bayesian metabolic flux analysis that models the reactions of the whole genome-scale cellular system in probabilistic terms, and can infer the full flux vector distribution of genome-scale metabolic systems based on exchange and intracellular (e.g. 13C) flux measurements, steady-state assumptions, and objective function assumptions. The Bayesian model couples all fluxes jointly together in a simple truncated multivariate posterior distribution, which reveals informative flux couplings. Our model is a plug-in replacement to conventional metabolic balance methods, such as FBA. Our experiments indicate that we can characterize the genome-scale flux covariances, reveal flux couplings, and determine more intracellular unobserved fluxes in Clostridium acetobutylicum from 13C data than flux variability analysis. AVAILABILITY AND IMPLEMENTATION: The COBRA compatible software is available at github.com/markusheinonen/bamfa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Clostridium acetobutylicum , Análise do Fluxo Metabólico , Teorema de Bayes , Redes e Vias Metabólicas , Modelos Biológicos
8.
Appl Microbiol Biotechnol ; 104(24): 10515-10529, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33147349

RESUMO

In this work, deoxyribose-5-phosphate aldolase (Ec DERA, EC 4.1.2.4) from Escherichia coli was chosen as the protein engineering target for improving the substrate preference towards smaller, non-phosphorylated aldehyde donor substrates, in particular towards acetaldehyde. The initial broad set of mutations was directed to 24 amino acid positions in the active site or in the close vicinity, based on the 3D complex structure of the E. coli DERA wild-type aldolase. The specific activity of the DERA variants containing one to three amino acid mutations was characterised using three different substrates. A novel machine learning (ML) model utilising Gaussian processes and feature learning was applied for the 3rd mutagenesis round to predict new beneficial mutant combinations. This led to the most clear-cut (two- to threefold) improvement in acetaldehyde (C2) addition capability with the concomitant abolishment of the activity towards the natural donor molecule glyceraldehyde-3-phosphate (C3P) as well as the non-phosphorylated equivalent (C3). The Ec DERA variants were also tested on aldol reaction utilising formaldehyde (C1) as the donor. Ec DERA wild-type was shown to be able to carry out this reaction, and furthermore, some of the improved variants on acetaldehyde addition reaction turned out to have also improved activity on formaldehyde. KEY POINTS: • DERA aldolases are promiscuous enzymes. • Synthetic utility of DERA aldolase was improved by protein engineering approaches. • Machine learning methods aid the protein engineering of DERA.


Assuntos
Escherichia coli , Frutose-Bifosfato Aldolase , Aldeído Liases/genética , Aldeído Liases/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Frutose-Bifosfato Aldolase/genética , Aprendizado de Máquina , Engenharia de Proteínas , Especificidade por Substrato
9.
Bioinformatics ; 34(17): i875-i883, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30423079

RESUMO

Motivation: Liquid Chromatography (LC) followed by tandem Mass Spectrometry (MS/MS) is one of the predominant methods for metabolite identification. In recent years, machine learning has started to transform the analysis of tandem mass spectra and the identification of small molecules. In contrast, LC data is rarely used to improve metabolite identification, despite numerous published methods for retention time prediction using machine learning. Results: We present a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column. Our method has important advantages over previous approaches: We show that retention order is much better conserved between instruments than retention time. To this end, our method can be trained using retention time measurements from different LC systems and configurations without tedious pre-processing, significantly increasing the amount of available training data. Our experiments demonstrate that retention order prediction is an effective way to learn retention behaviour of molecules from heterogeneous retention time data. Finally, we demonstrate how retention order prediction and MS/MS-based scores can be combined for more accurate metabolite identifications when analyzing a complete LC-MS/MS run. Availability and implementation: Implementation of the method is available at https://version.aalto.fi/gitlab/bache1/retention_order_prediction.git.


Assuntos
Cromatografia Líquida/métodos , Espectrometria de Massas em Tandem/métodos
10.
Bioinformatics ; 34(14): 2409-2417, 2018 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-29420676

RESUMO

Motivation: In the analysis of metabolism, two distinct and complementary approaches are frequently used: Principal component analysis (PCA) and stoichiometric flux analysis. PCA is able to capture the main modes of variability in a set of experiments and does not make many prior assumptions about the data, but does not inherently take into account the flux mode structure of metabolism. Stoichiometric flux analysis methods, such as Flux Balance Analysis (FBA) and Elementary Mode Analysis, on the other hand, are able to capture the metabolic flux modes, however, they are primarily designed for the analysis of single samples at a time, and not best suited for exploratory analysis on a large sets of samples. Results: We propose a new methodology for the analysis of metabolism, called Principal Metabolic Flux Mode Analysis (PMFA), which marries the PCA and stoichiometric flux analysis approaches in an elegant regularized optimization framework. In short, the method incorporates a variance maximization objective form PCA coupled with a stoichiometric regularizer, which penalizes projections that are far from any flux modes of the network. For interpretability, we also introduce a sparse variant of PMFA that favours flux modes that contain a small number of reactions. Our experiments demonstrate the versatility and capabilities of our methodology. The proposed method can be applied to genome-scale metabolic network in efficient way as PMFA does not enumerate elementary modes. In addition, the method is more robust on out-of-steady steady-state experimental data than competing flux mode analysis approaches. Availability and implementation: Matlab software for PMFA and SPMFA and dataset used for experiments are available in https://github.com/aalto-ics-kepaco/PMFA. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Análise do Fluxo Metabólico/métodos , Redes e Vias Metabólicas , Modelos Biológicos , Software , Análise de Componente Principal
11.
Bioinformatics ; 34(13): i509-i518, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949975

RESUMO

Motivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results: We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Antineoplásicos/farmacologia , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Neoplasias/tratamento farmacológico , Máquina de Vetores de Suporte , Antineoplásicos/uso terapêutico , Linhagem Celular Tumoral , Humanos , Neoplasias/enzimologia , Neoplasias/metabolismo , Proteínas Quinases/efeitos dos fármacos , Proteínas Quinases/metabolismo , Transdução de Sinais , Software , Resultado do Tratamento
12.
PLoS Comput Biol ; 13(8): e1005678, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28787438

RESUMO

Due to relatively high costs and labor required for experimental profiling of the full target space of chemical compounds, various machine learning models have been proposed as cost-effective means to advance this process in terms of predicting the most potent compound-target interactions for subsequent verification. However, most of the model predictions lack direct experimental validation in the laboratory, making their practical benefits for drug discovery or repurposing applications largely unknown. Here, we therefore introduce and carefully test a systematic computational-experimental framework for the prediction and pre-clinical verification of drug-target interactions using a well-established kernel-based regression algorithm as the prediction model. To evaluate its performance, we first predicted unmeasured binding affinities in a large-scale kinase inhibitor profiling study, and then experimentally tested 100 compound-kinase pairs. The relatively high correlation of 0.77 (p < 0.0001) between the predicted and measured bioactivities supports the potential of the model for filling the experimental gaps in existing compound-target interaction maps. Further, we subjected the model to a more challenging task of predicting target interactions for such a new candidate drug compound that lacks prior binding profile information. As a specific case study, we used tivozanib, an investigational VEGF receptor inhibitor with currently unknown off-target profile. Among 7 kinases with high predicted affinity, we experimentally validated 4 new off-targets of tivozanib, namely the Src-family kinases FRK and FYN A, the non-receptor tyrosine kinase ABL1, and the serine/threonine kinase SLK. Our sub-sequent experimental validation protocol effectively avoids any possible information leakage between the training and validation data, and therefore enables rigorous model validation for practical applications. These results demonstrate that the kernel-based modeling approach offers practical benefits for probing novel insights into the mode of action of investigational compounds, and for the identification of new target selectivities for drug repurposing applications.


Assuntos
Biologia Computacional/métodos , Descoberta de Drogas/métodos , Modelos Estatísticos , Inibidores de Proteínas Quinases , Algoritmos , Bases de Dados Factuais , Humanos , Ligação Proteica , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/metabolismo , Inibidores de Proteínas Quinases/farmacologia , Reprodutibilidade dos Testes
13.
Proc Natl Acad Sci U S A ; 112(41): 12580-5, 2015 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-26392543

RESUMO

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.


Assuntos
Bases de Dados de Proteínas , Aprendizado de Máquina , Espectrometria de Massas , Metabolômica , Animais , Humanos
15.
Microbiology (Reading) ; 163(6): 829-839, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28635591

RESUMO

Multiple interacting factors affect the performance of engineered biological systems in synthetic biology projects. The complexity of these biological systems means that experimental design should often be treated as a multiparametric optimization problem. However, the available methodologies are either impractical, due to a combinatorial explosion in the number of experiments to be performed, or are inaccessible to most experimentalists due to the lack of publicly available, user-friendly software. Although evolutionary algorithms may be employed as alternative approaches to optimize experimental design, the lack of simple-to-use software again restricts their use to specialist practitioners. In addition, the lack of subsidiary approaches to further investigate critical factors and their interactions prevents the full analysis and exploitation of the biotechnological system. We have addressed these problems and, here, provide a simple-to-use and freely available graphical user interface to empower a broad range of experimental biologists to employ complex evolutionary algorithms to optimize their experimental designs. Our approach exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation. We demonstrate the utility of this method using an example in which the culture conditions for the microbial production of a bioactive human protein are optimized. CamOptimus is available through: (https://doi.org/10.17863/CAM.10257).


Assuntos
Biologia Computacional/métodos , Muramidase/biossíntese , Pichia/genética , Algoritmos , Evolução Biológica , Biotecnologia , Biologia Computacional/instrumentação , Humanos , Internet , Muramidase/genética , Pichia/metabolismo , Proteínas Recombinantes/biossíntese , Proteínas Recombinantes/genética , Software
16.
Bioinformatics ; 32(12): i28-i36, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307628

RESUMO

MOTIVATION: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. RESULTS: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. CONTACT: celine.brouard@aalto.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Metabolômica , Estrutura Molecular , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Compostos Químicos
17.
Bioinformatics ; 32(13): 1981-9, 2016 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153689

RESUMO

MOTIVATION: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. RESULTS: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/aalto-ics-kepaco CONTACTS: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Análise Multivariada , Algoritmos , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
18.
BMC Med ; 14: 68, 2016 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-27055815

RESUMO

BACKGROUND: New treatment options are needed to maintain and improve therapy for tuberculosis, which caused the death of 1.5 million people in 2013 despite potential for an 86 % treatment success rate. A greater understanding of Mycobacterium tuberculosis (M.tb) bacilli that persist through drug therapy will aid drug development programs. Predictive biomarkers for treatment efficacy are also a research priority. METHODS AND RESULTS: Genome-wide transcriptional profiling was used to map the mRNA signatures of M.tb from the sputa of 15 patients before and 3, 7 and 14 days after the start of standard regimen drug treatment. The mRNA profiles of bacilli through the first 2 weeks of therapy reflected drug activity at 3 days with transcriptional signatures at days 7 and 14 consistent with reduced M.tb metabolic activity similar to the profile of pre-chemotherapy bacilli. These results suggest that a pre-existing drug-tolerant M.tb population dominates sputum before and after early drug treatment, and that the mRNA signature at day 3 marks the killing of a drug-sensitive sub-population of bacilli. Modelling patient indices of disease severity with bacterial gene expression patterns demonstrated that both microbiological and clinical parameters were reflected in the divergent M.tb responses and provided evidence that factors such as bacterial load and disease pathology influence the host-pathogen interplay and the phenotypic state of bacilli. Transcriptional signatures were also defined that predicted measures of early treatment success (rate of decline in bacterial load over 3 days, TB test positivity at 2 months, and bacterial load at 2 months). CONCLUSIONS: This study defines the transcriptional signature of M.tb bacilli that have been expectorated in sputum after two weeks of drug therapy, characterizing the phenotypic state of bacilli that persist through treatment. We demonstrate that variability in clinical manifestations of disease are detectable in bacterial sputa signatures, and that the changing M.tb mRNA profiles 0-2 weeks into chemotherapy predict the efficacy of treatment 6 weeks later. These observations advocate assaying dynamic bacterial phenotypes through drug therapy as biomarkers for treatment success.


Assuntos
Antituberculosos/administração & dosagem , Monitoramento de Medicamentos/métodos , Mycobacterium tuberculosis , RNA Mensageiro/análise , Tuberculose Pulmonar , Bacillus , Mapeamento Cromossômico/métodos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/isolamento & purificação , Valor Preditivo dos Testes , Escarro/microbiologia , Tuberculose Pulmonar/diagnóstico , Tuberculose Pulmonar/tratamento farmacológico , Tuberculose Pulmonar/microbiologia
19.
Bioinformatics ; 30(12): i157-64, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931979

RESUMO

MOTIVATION: Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods have been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways in which the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods have been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures. RESULTS: Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures. We introduce a family of kernels capturing the similarity of fragmentation trees, and combine these kernels using recently proposed multiple kernel learning approaches. Experiments on two large reference datasets show that the new methods significantly improve molecular fingerprint prediction accuracy. These improvements result in better metabolite identification, doubling the number of metabolites ranked at the top position of the candidates list.


Assuntos
Inteligência Artificial , Metabolômica/métodos , Espectrometria de Massas em Tandem/métodos , Algoritmos , Estrutura Molecular , Software
20.
Appl Environ Microbiol ; 81(20): 7088-97, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26231646

RESUMO

Refrigerated food processing facilities are specific man-made niches likely to harbor cold-tolerant bacteria. To characterize this type of microbiota and study the link between processing plant and product microbiomes, we followed and compared microbiota associated with the raw materials and processing stages of a vacuum-packaged, cooked sausage product affected by a prolonged quality fluctuation with occasional spoilage manifestations during shelf life. A total of 195 samples were subjected to culturing and amplicon sequence analyses. Abundant mesophilic psychrotrophs were detected within the microbiomes throughout the different compartments of the production plant environment. However, each of the main genera of food safety and quality interest, e.g., Leuconostoc, Brochothrix, and Yersinia, had their own characteristic patterns of contamination. Bacteria from the genus Leuconostoc, commonly causing spoilage of cold-stored, modified-atmosphere-packaged foods, were detected in high abundance (up to >98%) in the sausages studied. The same operational taxonomic units (OTUs) were, however, detected in lower abundances in raw meat and emulsion (average relative abundance of 2%±5%), as well as on the processing plant surfaces (<4%). A completely different abundance profile was found for OTUs phylogenetically close to the species Yersinia pseudotuberculosis. These OTUs were detected in high abundance (up to 28%) on the processing plant surfaces but to a lesser extent (<1%) in raw meat, sausage emulsion, and sausages. The fact that Yersinia-like OTUs were found on the surfaces of a high-hygiene packaging compartment raises food safety concerns related to their resilient existence on surfaces.


Assuntos
Bactérias/classificação , Bactérias/isolamento & purificação , Biota , Microbiologia Ambiental , Produtos da Carne/microbiologia , Carne/microbiologia , Refrigeração , Bactérias/genética , Temperatura Baixa , Manipulação de Alimentos , Inocuidade dos Alimentos , Dados de Sequência Molecular , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA