RESUMO
In metabolomics and other fields dealing with small compounds, mass spectrometry is applied as a sensitive high-throughput technique. Recently, fragmentation trees have been proposed to automatically analyze the fragmentation mass spectra recorded by such instruments. Computationally, this leads to the problem of finding a maximum weight subtree in an edge-weighted and vertex-colored graph, such that every color appears, at most once in the solution. We introduce new heuristics and an exact algorithm for this Maximum Colorful Subtree problem and evaluate them against existing algorithms on real-world and artificial datasets. Our tree completion heuristic consistently scores better than other heuristics, while the integer programming-based algorithm produces optimal trees with modest running times. Our fast and accurate heuristic can help determine molecular formulas based on fragmentation trees. On the other hand, optimal trees from the integer linear program are useful if structure is relevant, for example for tree alignments.
Assuntos
Algoritmos , Espectrometria de Massas/métodos , Bases de Dados como Assunto , Metaboloma , Metabolômica , Programação Linear , Reserpina/metabolismo , Fatores de TempoRESUMO
The automated fragmentation analysis of high resolution EI mass spectra based on a fragmentation tree algorithm is introduced. Fragmentation trees are constructed from EI spectra by automated signal extraction and evaluation. These trees explain relevant fragmentation reactions and assign molecular formulas to fragments. The method enables the identification of the molecular ion and the molecular formula of a metabolite if the molecular ion is present in the spectrum. These identifications are independent of existing library knowledge and, thus, support assignment and structural elucidation of unknown compounds. The method works even if the molecular ion is of very low abundance or hidden under contaminants with higher masses. We apply the algorithm to a selection of 50 derivatized and underivatized metabolites and demonstrate that in 78% of cases the molecular ion can be correctly assigned. The automatically constructed fragmentation trees correspond very well to published mechanisms and allow the assignment of specific relevant fragments and fragmentation pathways even in the most complex EI-spectra in our dataset. This method will be very helpful in the automated analysis of metabolites that are not included in common libraries and it thus has the potential to support the explorative character of metabolomics studies.
Assuntos
Algoritmos , Análise por Conglomerados , Cromatografia Gasosa-Espectrometria de Massas/métodos , Metabolômica/métodos , Bases de Dados Factuais , Modelos QuímicosRESUMO
MOTIVATION: Mass spectrometry allows sensitive, automated and high-throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of 'unknown' small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pattern similarities are strongly correlated with the chemical similarity of the molecules, and allow us to cluster compounds based solely on their fragmentation patterns. RESULTS: Aligning fragmentation trees is computationally hard. Nevertheless, we present three exact algorithms for the problem: a dynamic programming (DP) algorithm, a sparse variant of the DP, and an Integer Linear Program (ILP). Evaluation of our methods on three different datasets showed that thousands of alignments can be computed in a matter of minutes using DP, even for 'challenging' instances. Running times of the sparse DP were an order of magnitude better than for the classical DP. The ILP was clearly outperformed by both DP approaches. We also found that for both DP algorithms, computing the 1% slowest alignments required as much time as computing the 99% fastest.
Assuntos
Algoritmos , Biologia Computacional/métodos , Espectrometria de Massas , Metabolômica/métodos , Bases de Dados FactuaisRESUMO
Mass spectrometry allows sensitive, automated, and high-throughput analysis of small molecules. In principle, tandem mass spectrometry allows us to identify "unknown" small molecules not in any database, but the automated interpretation of such data is in its infancy. Fragmentation trees have recently been introduced for the automated analysis of the fragmentation patterns of small molecules. We present a method for the automated comparison of such fragmentation patterns, based on aligning the compounds' fragmentation trees. We cluster compounds based solely on their fragmentation patterns and show a good agreement with known compound classes. Fragmentation pattern similarities are strongly correlated with the chemical similarity of molecules. We present a tool for searching a database for compounds with fragmentation pattern similar to an unknown sample compound. We apply this tool to metabolites from Icelandic poppy. Our method allows fully automated computational identification of small molecules that cannot be found in any database.
Assuntos
Espectrometria de Massas/métodos , Estatística como Assunto/métodos , Análise por Conglomerados , Bases de Dados Factuais , Papaver/químicaRESUMO
Since metabolites cannot be predicted from the genome sequence, high-throughput de novo identification of small molecules is highly sought. Mass spectrometry (MS) in combination with a fragmentation technique is commonly used for this task. Unfortunately, automated analysis of such data is in its infancy. Recently, fragmentation trees have been proposed as an analysis tool for such data. Additional fragmentation steps (MS(n)) reveal more information about the molecule. We propose to use MS(n) data for the computation of fragmentation trees, and present the Colorful Subtree Closure problem to formalize this task: There, we search for a colorful subtree inside a vertex-colored graph, such that the weight of the transitive closure of the subtree is maximal. We give several negative results regarding the tractability and approximability of this and related problems. We then present an exact dynamic programming algorithm, which is parameterized by the number of colors in the graph and is swift in practice. Evaluation of our method on a dataset of 45 reference compounds showed that the quality of constructed fragmentation trees is improved by using MS(n) instead of MS² measurements.
Assuntos
Algoritmos , Interpretação Estatística de Dados , Espectrometria de Massas/métodos , Metaboloma , Espectrometria de Massas/normas , Modelos Químicos , Peso Molecular , Padrões de ReferênciaRESUMO
The structural elucidation of organic compounds in complex biofluids and tissues remains a significant analytical challenge. For mass spectrometry, the manual interpretation of collision-induced dissociation (CID) mass spectra is cumbersome and requires expert knowledge, as the fragmentation mechanisms of ions formed from small molecules are not completely understood. The automated identification of compounds is generally limited to searching in spectral libraries. Here, we present a method for interpreting the CID spectra of the organic compound's protonated ions by computing fragmentation trees that establish not only the molecular formula of the compound and all fragment ions but also the dependencies between fragment ions. This is an important step toward the automated identification of unknowns from the CID spectra of compounds that are not in any database.
Assuntos
Compostos Orgânicos/química , Estatística como Assunto/métodos , Espectrometria de Massas em Tandem/métodos , Sistemas Inteligentes , SoftwareRESUMO
Glycans are molecules made from simple sugars that form complex tree structures. Glycans constitute one of the most important protein modifications and identification of glycans remains a pressing problem in biology. Unfortunately, the structure of glycans is hard to predict from the genome sequence of an organism. In this paper, we consider the problem of deriving the topology of a glycan solely from tandem mass spectrometry (MS) data. We study, how to generate glycan tree candidates that sufficiently match the sample mass spectrum, avoiding the combinatorial explosion of glycan structures. Unfortunately, the resulting problem is known to be computationally hard. We present an efficient exact algorithm for this problem based on fixed-parameter algorithmics that can process a spectrum in a matter of seconds. We also report some preliminary results of our method on experimental data, combining it with a preliminary candidate evaluation scheme. We show that our approach is fast in applications, and that we can reach very well de novo identification results. Finally, we show how to count the number of glycan topologies for a fixed size or a fixed mass. We generalize this result to count the number of (labeled) trees with bounded out degree, improving on results obtained using Pólya's enumeration theorem.
Assuntos
Algoritmos , Biologia Computacional/métodos , Polissacarídeos/química , Espectrometria de Massas em Tandem/métodos , Configuração de Carboidratos , Bases de Dados FactuaisRESUMO
MOTIVATION: Mass spectrometry is among the most widely used technologies in proteomics and metabolomics. Being a high-throughput method, it produces large amounts of data that necessitates an automated analysis of the spectra. Clearly, database search methods for protein analysis can easily be adopted to analyze metabolite mass spectra. But for metabolites, de novo interpretation of spectra is even more important than for protein data, because metabolite spectra databases cover only a small fraction of naturally occurring metabolites: even the model plant Arabidopsis thaliana has a large number of enzymes whose substrates and products remain unknown. The field of bio-prospection searches biologically diverse areas for metabolites which might serve as pharmaceuticals. De novo identification of metabolite mass spectra requires new concepts and methods since, unlike proteins, metabolites possess a non-linear molecular structure. RESULTS: In this work, we introduce a method for fully automated de novo identification of metabolites from tandem mass spectra. Mass spectrometry data is usually assumed to be insufficient for identification of molecular structures, so we want to estimate the molecular formula of the unknown metabolite, a crucial step for its identification. The method first calculates all molecular formulas that explain the parent peak mass. Then, a graph is build where vertices correspond to molecular formulas of all peaks in the fragmentation mass spectra, whereas edges correspond to hypothetical fragmentation steps. Our algorithm afterwards calculates the maximum scoring subtree of this graph: each peak in the spectra must be scored at most once, so the subtree shall contain only one explanation per peak. Unfortunately, finding this subtree is NP-hard. We suggest three exact algorithms (including one fixed parameter tractable algorithm) as well as two heuristics to solve the problem. Tests on real mass spectra show that the FPT algorithm and the heuristics solve the problem suitably fast and provide excellent results: for all 32 test compounds the correct solution was among the top five suggestions, for 26 compounds the first suggestion of the exact algorithm was correct. AVAILABILITY: http://www.bio.inf.uni-jena.de/tandemms
Assuntos
Algoritmos , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Perfilação da Expressão Gênica/métodos , Espectrometria de Massas/métodosRESUMO
MOTIVATION: An important tool for analyzing biological networks is the ability to perform homology searches, i.e. given a pattern network one would like to be able to search for occurrences of similar (sub)networks within a set of host networks. In the context of metabolic pathways, Pinter et al. [Bioinformatics, 2005] proposed to solve this computationally hard problem by restricting it to the case where both the pattern and host networks are trees. This restriction, however, severely limits the applicability of their algorithm. RESULTS: We propose a very fast and simple algorithm for the alignment of metabolic pathways that does not restrict the topology of the host or pattern network in any way; instead, our algorithm exploits a natural property of metabolic networks that we call 'local diversity property'. Experiments on a test bed of metabolic pathways from the BioCyc database indicate that our algorithm is much faster than the restricted algorithm of Pinter et al.-the metabolic pathways of two organisms can be aligned in mere seconds-and yet has a wider range of applicability and yields new biological insights. Our ideas can likely be extended to work for the alignment of various types of biological networks other than metabolic pathways. AVAILABILITY: Our algorithm has been implemented in C++ as a user-friendly metabolic pathway alignment tool called METAPAT. The tool runs under Linux or Windows and can be downloaded at http://theinf1.informatik.uni-jena.de/metapat/
Assuntos
Algoritmos , Expressão Gênica/fisiologia , Modelos Biológicos , Reconhecimento Automatizado de Padrão/métodos , Proteoma/metabolismo , Alinhamento de Sequência/métodos , Transdução de Sinais/fisiologia , Simulação por Computador , Variação Genética/genética , Proteoma/genéticaRESUMO
SUMMARY: Motifs are small connected subnetworks that a network displays in significantly higher frequencies than would be expected for a random network. They have recently gathered much attention as a concept to uncover structural design principles of complex biological networks. FANMOD is a tool for fast network motif detection; it relies on recently developed algorithms to improve the efficiency of network motif detection by some orders of magnitude over existing tools. This facilitates the detection of larger motifs in bigger networks than previously possible. Additional benefits of FANMOD are the ability to analyze colored networks, a graphical user interface and the ability to export results to a variety of machine- and human-readable file formats including comma-separated values and HTML.