Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Bioinformatics ; 37(13): 1805-1813, 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-33471063

RESUMO

MOTIVATION: Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. RESULTS: In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. AVAILABILITY AND IMPLEMENTATION: The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Nucleic Acids Res ; 47(D1): D678-D686, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30407573

RESUMO

The Integrated Microbial Genome/Virus (IMG/VR) system v.2.0 (https://img.jgi.doe.gov/vr/) is the largest publicly available data management and analysis platform dedicated to viral genomics. Since the last report published in the 2016, NAR Database Issue, the data has tripled in size and currently contains genomes of 8389 cultivated reference viruses, 12 498 previously published curated prophages derived from cultivated microbial isolates, and 735 112 viral genomic fragments computationally predicted from assembled shotgun metagenomes. Nearly 60% of the viral genomes and genome fragments are clustered into 110 384 viral Operational Taxonomic Units (vOTUs) with two or more members. To improve data quality and predictions of host specificity, IMG/VR v.2.0 now separates prokaryotic and eukaryotic viruses, utilizes known prophage sequences to improve taxonomic assignments, and provides viral genome quality scores based on the estimated genome completeness. New features also include enhanced BLAST search capabilities for external queries. Finally, geographic map visualization to locate user-selected viral genomes or genome fragments has been implemented and download options have been extended. All of these features make IMG/VR v.2.0 a key resource for the study of viruses.


Assuntos
Gerenciamento de Dados/métodos , Genoma Viral , Genômica/métodos , Software
3.
BMC Bioinformatics ; 21(Suppl 6): 434, 2020 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-33203352

RESUMO

BACKGROUND: The alignment of protein-protein interaction networks was recently formulated as an integer quadratic programming problem, along with a linearization that can be solved by integer linear programming software tools. However, the resulting integer linear program has a huge number of variables and constraints, rendering it of no practical use. RESULTS: We present a compact integer linear programming reformulation of the protein-protein interaction network alignment problem, which can be solved using state-of-the-art mathematical modeling and integer linear programming software tools, along with empirical results showing that small biological networks, such as virus-host protein-protein interaction networks, can be aligned in a reasonable amount of time on a personal computer and the resulting alignments are structurally coherent and biologically meaningful. CONCLUSIONS: The implementation of the integer linear programming reformulation using current mathematical modeling and integer linear programming software tools provided biologically meaningful alignments of virus-host protein-protein interaction networks.


Assuntos
Programação Linear , Mapas de Interação de Proteínas , Software , Algoritmos , Modelos Teóricos
4.
BMC Bioinformatics ; 21(Suppl 6): 265, 2020 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-33203353

RESUMO

BACKGROUND: All molecular functions and biological processes are carried out by groups of proteins that interact with each other. Metaproteomic data continuously generates new proteins whose molecular functions and relations must be discovered. A widely accepted structure to model functional relations between proteins are protein-protein interaction networks (PPIN), and their analysis and alignment has become a key ingredient in the study and prediction of protein-protein interactions, protein function, and evolutionary conserved assembly pathways of protein complexes. Several PPIN aligners have been proposed, but attaining the right balance between network topology and biological information is one of the most difficult and key points in the design of any PPIN alignment algorithm. RESULTS: Motivated by the challenge of well-balanced and efficient algorithms, we have designed and implemented AligNet, a parameter-free pairwise PPIN alignment algorithm aimed at bridging the gap between topologically efficient and biologically meaningful matchings. A comparison of the results obtained with AligNet and with the best aligners shows that AligNet achieves indeed a good balance between topological and biological matching. CONCLUSION: In this paper we present AligNet, a new pairwise global PPIN aligner that produces biologically meaningful alignments, by achieving a good balance between structural matching and protein function conservation, and more efficient computations than state-of-the-art tools.


Assuntos
Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas , Algoritmos , Evolução Biológica , Proteínas/metabolismo
5.
ScientificWorldJournal ; 2014: 254279, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24982934

RESUMO

Several polynomial time computable metrics on the class of semibinary tree-sibling time consistent phylogenetic networks are available in the literature; in particular, the problem of deciding if two networks of this kind are isomorphic is in P. In this paper, we show that if we remove the semibinarity condition, then the problem becomes much harder. More precisely, we prove that the isomorphism problem for generic tree-sibling time consistent phylogenetic networks is polynomially equivalent to the graph isomorphism problem. Since the latter is believed not to belong to P, the chances are that it is impossible to define a metric on the class of all tree-sibling time consistent phylogenetic networks that can be computed in polynomial time.


Assuntos
Algoritmos , Filogenia , Biologia Computacional , Humanos
6.
Nat Commun ; 15(1): 544, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38228587

RESUMO

What a strain is and how many strains make up a natural bacterial population remain elusive concepts despite their apparent importance for assessing the role of intra-population diversity in disease emergence or response to environmental perturbations. To advance these concepts, we sequenced 138 randomly selected Salinibacter ruber isolates from two solar salterns and assessed these genomes against companion short-read metagenomes from the same samples. The distribution of genome-aggregate average nucleotide identity (ANI) values among these isolates revealed a bimodal distribution, with four-fold lower occurrence of values between 99.2% and 99.8% relative to ANI >99.8% or <99.2%, revealing a natural "gap" in the sequence space within species. Accordingly, we used this ANI gap to define genomovars and a higher ANI value of >99.99% and shared gene-content >99.0% to define strains. Using these thresholds and extrapolating from how many metagenomic reads each genomovar uniquely recruited, we estimated that -although our 138 isolates represented about 80% of the Sal. ruber population- the total population in one saltern pond is composed of 5,500 to 11,000 genomovars, the great majority of which appear to be rare in-situ. These data also revealed that the most frequently recovered isolate in lab media was often not the most abundant genomovar in-situ, suggesting that cultivation biases are significant, even in cases that cultivation procedures are thought to be robust. The methodology and ANI thresholds outlined here should represent a useful guide for future microdiversity surveys of additional microbial species.


Assuntos
Bactérias , Bacteroidetes , Bactérias/genética , Bacteroidetes/genética , Metagenômica/métodos , Metagenoma/genética , Filogenia , Genoma Bacteriano/genética
7.
PLoS One ; 18(2): e0281047, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36758030

RESUMO

Metabolism is characterised by chemical reactions linked to each other, creating a complex network structure. The whole metabolic network is divided into pathways of chemical reactions, such that every pathway is a metabolic function. A simplified representation of metabolism, which we call an abstract metabolic network, is a graph in which metabolic pathways are nodes and there is an edge between two nodes if their corresponding pathways share one or more compounds. The abstract metabolic network of a given organism results in a small network that requires low computational power to be analysed and makes it a suitable model to perform a large-scale comparison of organisms' metabolism. To explore the potentials and limits of such a basic representation, we considered a comprehensive set of KEGG organisms, represented through their abstract metabolic network. We performed pairwise comparisons using graph kernel methods and analyse the results through exploratory data analysis and machine learning techniques. The results show that abstract metabolic networks discriminate macro evolutionary events, indicating that they are expressive enough to capture key steps in metabolism evolution.


Assuntos
Aprendizado de Máquina , Redes e Vias Metabólicas , Modelos Biológicos
8.
J Comput Biol ; 28(12): 1181-1195, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34714118

RESUMO

The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.


Assuntos
Classificação/métodos , Biologia Computacional/métodos , Algoritmos , Modelos Genéticos , Filogenia
9.
PLoS One ; 16(2): e0246962, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33577575

RESUMO

Metabolic pathway comparison and interaction between different species can detect important information for drug engineering and medical science. In the literature, proposals for reconstructing and comparing metabolic networks present two main problems: network reconstruction requires usually human intervention to integrate information from different sources and, in metabolic comparison, the size of the networks leads to a challenging computational problem. We propose to automatically reconstruct a metabolic network on the basis of KEGG database information. Our proposal relies on a two-level representation of the huge metabolic network: the first level is graph-based and depicts pathways as nodes and relations between pathways as edges; the second level represents each metabolic pathway in terms of its reactions content. The two-level representation complies with the KEGG database, which decomposes the metabolism of all the different organisms into "reference" pathways in a standardised way. On the basis of this two-level representation, we introduce some similarity measures for both levels. They allow for both a local comparison, pathway by pathway, and a global comparison of the entire metabolism. We developed a tool, MetNet, that implements the proposed methodology. MetNet makes it possible to automatically reconstruct the metabolic network of two organisms selected in KEGG and to compare their two networks both quantitatively and visually. We validate our methodology by presenting some experiments performed with MetNet.


Assuntos
Redes e Vias Metabólicas , Metabolômica/métodos , Animais , Análise por Conglomerados , Humanos , Software , Simbiose
10.
J Math Biol ; 61(2): 253-276, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19760227

RESUMO

Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces M(n)(R) of real-valued n x n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L(p) metrics on M(n)(R), with p [epsilon] R(>0).


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Distribuições Estatísticas
11.
PLoS One ; 15(12): e0236304, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33284827

RESUMO

MOTIVATION: Beside socio-economic issues, coronavirus pandemic COVID-19, the infectious disease caused by the newly discovered coronavirus SARS-CoV-2, has caused a deep impact in the scientific community, that has considerably increased its effort to discover the infection strategies of the new virus. Among the extensive and crucial research that has been carried out in the last months, the analysis of the virus-host relationship plays an important role in drug discovery. Virus-host protein-protein interactions are the active agents in virus replication, and the analysis of virus-host protein-protein interaction networks is fundamental to the study of the virus-host relationship. RESULTS: We have adapted and implemented a recent integer linear programming model for protein-protein interaction network alignment to virus-host networks, and obtained a consensus alignment of the SARS-CoV-1 and SARS-CoV-2 virus-host protein-protein interaction networks. Despite the lack of shared human proteins in these virus-host networks, and the low number of preserved virus-host interactions, the consensus alignment revealed aligned human proteins that share a function related to viral infection, as well as human proteins of high functional similarity that interact with SARS-CoV-1 and SARS-CoV-2 proteins, whose alignment would preserve these virus-host interactions.


Assuntos
Interações entre Hospedeiro e Microrganismos/fisiologia , Mapas de Interação de Proteínas/fisiologia , SARS-CoV-2/metabolismo , COVID-19/virologia , Coronavirus/metabolismo , Infecções por Coronavirus/virologia , Humanos , Modelos Teóricos , Pandemias , Pneumonia Viral/virologia , Programação Linear , Ligação Proteica/fisiologia , Proteínas/metabolismo , Glicoproteína da Espícula de Coronavírus/metabolismo , Replicação Viral/fisiologia
12.
Biology (Basel) ; 10(1)2020 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-33374107

RESUMO

Defining the essential gene components for a system to be considered alive is a crucial step toward the synthesis of artificial life. Fifteen years ago, Gil and coworkers proposed the core of a putative minimal bacterial genome, which would provide the capability to achieve metabolic homeostasis, reproduce, and evolve to a bacterium in an ideally controlled environment. They also proposed a simplified metabolic chart capable of providing energy and basic components for a minimal living cell. For this work, we have identified the components of the minimal metabolic network based on the aforementioned studies, associated them to the KEGG database and, by applying the MetaDAG methodology, determined its Metabolic Building Blocks (MBB) and reconstructed its metabolic Directed Acyclic Graph (m-DAG). The reaction graph of this metabolic network consists of 80 compounds and 98 reactions, while its m-DAG has 36 MBBs. Additionally, we identified 12 essential reactions in the m-DAG that are critical for maintaining the connectivity of this network. In a similar manner, we reconstructed the m-DAG of JCVI-syn3.0, which is an artificially designed and manufactured viable cell whose genome arose by minimizing the one from Mycoplasma mycoides JCVI-syn1.0, and of "Candidatus Nasuia deltocephalinicola", the bacteria with the smallest natural genome known to date. The comparison of the m-DAGs derived from a theoretical, an artificial, and a natural genome denote slightly different lifestyles, with a consistent core metabolism. The MetaDAG methodology we employ uses homogeneous descriptors and identifiers from the KEGG database, so that comparisons between bacterial strains are not only easy but also suitable for many research fields. The modeling of m-DAGs based on minimal metabolisms can be the first step for the synthesis and manipulation of minimal cells.

13.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32055857

RESUMO

The Symbiotic Genomes Database (SymGenDB; http://symbiogenomesdb.uv.es/) is a public resource of manually curated associations between organisms involved in symbiotic relationships, maintaining a catalog of completely sequenced/finished bacterial genomes exclusively. It originally consisted of three modules where users could search for the bacteria involved in a specific symbiotic relationship, their genomes and their genes (including their orthologs). In this update, we present an additional module that includes a representation of the metabolic network of each organism included in the database, as Directed Acyclic Graphs (MetaDAGs). This module provides unique opportunities to explore the metabolism of each individual organism and/or to evaluate the shared and joint metabolic capabilities of the organisms of the same genera included in our listing, allowing users to construct predictive analyses of metabolic associations and complementation between systems. We also report a ~25% increase in manually curated content in the database, i.e. bacterial genomes and their associations, with a final count of 2328 bacterial genomes associated to 498 hosts. We describe new querying possibilities for all the modules, as well as new display features for the MetaDAGs module, providing a relevant range of content and utility. This update continues to improve SymGenDB and can help elucidate the mechanisms by which organisms depend on each other.


Assuntos
Bases de Dados Genéticas , Genômica , Metadados , Simbiose/genética , Genoma Bacteriano/genética , Redes e Vias Metabólicas/genética
14.
Bioinformatics ; 24(13): 1481-8, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18477576

RESUMO

MOTIVATION: The presence of reticulate evolutionary events in phylogenies turn phylogenetic trees into phylogenetic networks. These events imply in particular that there may exist multiple evolutionary paths from a non-extant species to an extant one, and this multiplicity makes the comparison of phylogenetic networks much more difficult than the comparison of phylogenetic trees. In fact, all attempts to define a sound distance measure on the class of all phylogenetic networks have failed so far. Thus, the only practical solutions have been either the use of rough estimates of similarity (based on comparison of the trees embedded in the networks), or narrowing the class of phylogenetic networks to a certain class where such a distance is known and can be efficiently computed. The first approach has the problem that one may identify two networks as equivalent, when they are not; the second one has the drawback that there may not exist algorithms to reconstruct such networks from biological sequences. RESULTS: We present in this article a distance measure on the class of semi-binary tree-sibling time consistent phylogenetic networks, which generalize tree-child time consistent phylogenetic networks, and thus also galled-trees. The practical interest of this distance measure is 2-fold: it can be computed in polynomial time by means of simple algorithms, and there also exist polynomial-time algorithms for reconstructing networks of this class from DNA sequence data. AVAILABILITY: The Perl package Bio::PhyloNetwork, included in the BioPerl bundle, implements many algorithms on phylogenetic networks, including the computation of the distance presented in this article. SUPPLEMENTARY INFORMATION: Some counterexamples, proofs of the results not included in this article, and some computational experiments are available at Bioinformatics online.


Assuntos
Mapeamento Cromossômico/métodos , Evolução Molecular , Modelos Genéticos , Filogenia , Proteoma/genética , Análise de Sequência de DNA/métodos , Transdução de Sinais/genética , Simulação por Computador
15.
PLoS One ; 12(10): e0186626, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29023538

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0177031.].

16.
PLoS One ; 12(5): e0177031, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28493998

RESUMO

In this paper we propose a new methodology for the analysis of metabolic networks. We use the notion of strongly connected components of a graph, called in this context metabolic building blocks. Every strongly connected component is contracted to a single node in such a way that the resulting graph is a directed acyclic graph, called a metabolic DAG, with a considerably reduced number of nodes. The property of being a directed acyclic graph brings out a background graph topology that reveals the connectivity of the metabolic network, as well as bridges, isolated nodes and cut nodes. Altogether, it becomes a key information for the discovery of functional metabolic relations. Our methodology has been applied to the glycolysis and the purine metabolic pathways for all organisms in the KEGG database, although it is general enough to work on any database. As expected, using the metabolic DAGs formalism, a considerable reduction on the size of the metabolic networks has been obtained, specially in the case of the purine pathway due to its relative larger size. As a proof of concept, from the information captured by a metabolic DAG and its corresponding metabolic building blocks, we obtain the core of the glycolysis pathway and the core of the purine metabolism pathway and detect some essential metabolic building blocks that reveal the key reactions in both pathways. Finally, the application of our methodology to the glycolysis pathway and the purine metabolism pathway reproduce the tree of life for the whole set of the organisms represented in the KEGG database which supports the utility of this research.


Assuntos
Redes e Vias Metabólicas , Metabolômica/métodos , Algoritmos , Gráficos por Computador , Glicólise , Humanos , Modelos Biológicos , Purinas/metabolismo
18.
BMC Syst Biol ; 8: 58, 2014 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-24886436

RESUMO

BACKGROUND: Comparing the metabolic pathways of different species is useful for understanding metabolic functions and can help in studying diseases and engineering drugs. Several comparison techniques for metabolic pathways have been introduced in the literature as a first attempt in this direction. The approaches are based on some simplified representation of metabolic pathways and on a related definition of a similarity score (or distance measure) between two pathways. More recent comparative research focuses on alignment techniques that can identify similar parts between pathways. RESULTS: We propose a methodology for the pairwise comparison and alignment of metabolic pathways that aims at providing the largest conserved substructure of the pathways under consideration. The proposed methodology has been implemented in a tool called MP-Align, which has been used to perform several validation tests. The results showed that our similarity score makes it possible to discriminate between different domains and to reconstruct a meaningful phylogeny from metabolic data. The results further demonstrate that our alignment algorithm correctly identifies subpathways sharing a common biological function. CONCLUSION: The results of the validation tests performed with MP-Align are encouraging. A comparison with another proposal in the literature showed that our alignment algorithm is particularly well-suited to finding the largest conserved subpathway of the pathways under examination.


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Algoritmos , Gráficos por Computador , Glicólise , Reprodutibilidade dos Testes , Especificidade da Espécie
19.
Artigo em Inglês | MEDLINE | ID: mdl-20660951

RESUMO

Galled trees, directed acyclic graphs that model evolutionary histories with isolated hybridization events, have become very popular due to both their biological significance and the existence of polynomial-time algorithms for their reconstruction. In this paper, we establish to which extent several distance measures for the comparison of evolutionary networks are metrics for galled trees, and hence, when they can be safely used to evaluate galled tree reconstruction methods.


Assuntos
Filogenia , Biologia Computacional/métodos , Evolução Molecular , Perfilação da Expressão Gênica/métodos , Hibridização Genética , Modelos Genéticos
20.
Artigo em Inglês | MEDLINE | ID: mdl-19179698

RESUMO

The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the first in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we study three metrics that have already been introduced in the literature: the Robinson-Foulds distance, the tripartitions distance and the mu-distance. They generalize to networks the classical Robinson-Foulds or partition distance for phylogenetic trees. We analyze the behavior of these metrics by studying their least and largest values and when they achieve them. As a by-product of this study, we obtain tight bounds on the size of a tree-child time consistent phylogenetic network.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Modelos Genéticos , Filogenia , Algoritmos , Transferência Genética Horizontal , Hibridização Genética , Recombinação Genética , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA