Pesquisa | Portal Regional da BVS

1.

An analysis of the graph processing landscape.

Coimbra, Miguel E; Francisco, Alexandre P; Veiga, Luís.

J Big Data ; 8(1): 55, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33850687

RESUMO

The value of graph-based big data can be unlocked by exploring the topology and metrics of the networks they represent, and the computational approaches to this exploration take on many forms. For the use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations. Extracting information from graphs involves processing all their elements globally, which can be done with single-machine systems (with varying approaches to hardware usage), distributed systems (either homogeneous or heterogeneous groups of machines) and systems dedicated to high-performance computing (HPC). For these systems focused on processing the bulk of graph elements, common use-cases consist in executing for example algorithms for vertex ranking or community detection, which produce insights on graph structure and relevance of their elements. Many distributed systems (such as Flink, Spark) and libraries (e.g. Gelly, GraphX) have been built to enable these tasks and improve performance. This is achieved with techniques ranging from classic load balancing (often geared to reduce communication overhead) to exploring trade-offs between delaying computation and relaxing accuracy. In this survey we firstly familiarize the reader with common graph datasets and applications in the world of today. We provide an overview of different aspects of the graph processing landscape and describe classes of systems based on a set of dimensions we describe. The dimensions we detail encompass paradigms to express graph processing, different types of systems to use, coordination and communication models in distributed graph processing, partitioning techniques and different definitions related to the potential for a graph to be updated. This survey is aimed at both the experienced software engineer or researcher as well as the graduate student looking for an understanding of the landscape of solutions (and their limitations) for graph processing.

2.

Distance-based phylogenetic inference from typing data: a unifying view.

Vaz, Cátia; Nascimento, Marta; Carriço, João A; Rocher, Tatiana; Francisco, Alexandre P.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-32734294

RESUMO

Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. Moreover, their use is becoming standard, in particular with the introduction of high-throughput sequencing. On the other hand, the data being generated are massive and many algorithms have been proposed for a phylogenetic analysis of typing data, addressing both correctness and scalability issues. Most of the distance-based algorithms for inferring phylogenetic trees follow the closest pair joining scheme. This is one of the approaches used in hierarchical clustering. Moreover, although phylogenetic inference algorithms may seem rather different, the main difference among them resides on how one defines cluster proximity and on which optimization criterion is used. Both cluster proximity and optimization criteria rely often on a model of evolution. In this work, we review, and we provide a unified view of these algorithms. This is an important step not only to better understand such algorithms but also to identify possible computational bottlenecks and improvements, important to deal with large data sets.

Assuntos

Algoritmos , Bases de Dados de Ácidos Nucleicos , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Genéticos , Filogenia

3.

GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens.

Zhou, Zhemin; Alikhan, Nabil-Fareed; Sergeant, Martin J; Luhmann, Nina; Vaz, Cátia; Francisco, Alexandre P; Carriço, João André; Achtman, Mark.

Genome Res ; 28(9): 1395-1404, 2018 09.

Artigo em Inglês | MEDLINE | ID: mdl-30049790

RESUMO

Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static "GrapeTree Layout" algorithm that supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel minimum spanning tree algorithm (MSTree V2) to reconstruct genetic relationships despite high levels of missing data. GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.

Assuntos

Bactérias/genética , Código de Barras de DNA Taxonômico/métodos , Genoma Bacteriano , Filogenia , Software , Alelos , Bactérias/classificação , Bactérias/patogenicidade

4.

Using Machine Learning to Improve the Prediction of Functional Outcome in Ischemic Stroke Patients.

Monteiro, Miguel; Fonseca, Ana Catarina; Freitas, Ana Teresa; Pinho E Melo, Teresa; Francisco, Alexandre P; Ferro, Jose M; Oliveira, Arlindo L.

IEEE/ACM Trans Comput Biol Bioinform ; 15(6): 1953-1959, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29994736

RESUMO

Ischemic stroke is a leading cause of disability and death worldwide among adults. The individual prognosis after stroke is extremely dependent on treatment decisions physicians take during the acute phase. In the last five years, several scores such as the ASTRAL, DRAGON, and THRIVE have been proposed as tools to help physicians predict the patient functional outcome after a stroke. These scores are rule-based classifiers that use features available when the patient is admitted to the emergency room. In this paper, we apply machine learning techniques to the problem of predicting the functional outcome of ischemic stroke patients, three months after admission. We show that a pure machine learning approach achieves only a marginally superior Area Under the ROC Curve (AUC) ( 0.808±0.085) than that of the best score ( 0.771±0.056) when using the features available at admission. However, we observed that by progressively adding features available at further points in time, we can significantly increase the AUC to a value above 0.90. We conclude that the results obtained validate the use of the scores at the time of admission, but also point to the importance of using more features, which require more advanced methods, when possible.

Assuntos

Isquemia Encefálica , Diagnóstico por Computador/métodos , Aprendizado de Máquina , Algoritmos , Área Sob a Curva , Isquemia Encefálica/diagnóstico , Isquemia Encefálica/epidemiologia , Isquemia Encefálica/fisiopatologia , Isquemia Encefálica/terapia , Humanos , Resultado do Tratamento

5.

Large-Scale Simulations of Bacterial Populations Over Complex Networks.

Teixeira, Andreia Sofia; Monteiro, Pedro T; Carriço, João A; Santos, Francisco C; Francisco, Alexandre P.

J Comput Biol ; 25(8): 850-861, 2018 08.

Artigo em Inglês | MEDLINE | ID: mdl-29985650

RESUMO

The understanding of bacterial population genetics and evolution is crucial in epidemic outbreak studies and pathogen surveillance. However, all epidemiological studies are limited to their sampling capacities which, by being usually biased or limited due to economic constraints, can hamper the real knowledge of the bacterial population structure of a given species. To this end, mathematical models and large-scale simulations can provide a quantitative analytical framework that can be used to assess how or if limited sampling can infer the true population structure. In this article, we address the large-scale simulation of genetic evolution of bacterial populations, using Wright-Fisher model, in the presence of complex host contact networks. We present an efficient approach for large-scale simulations over complex host contact networks, using MapReduce on top of Apache Spark and GraphX API. We evaluate the relation between cluster computing power and simulations speedup and include insights on how bacterial population diversity can be affected by mutation and recombination rates, and network topology.

Assuntos

Bactérias/classificação , Bactérias/genética , Evolução Biológica , Simulação por Computador , Redes Reguladoras de Genes , Genética Populacional , Algoritmos , Humanos , Modelos Genéticos , Filogenia

6.

Fast phylogenetic inference from typing data.

Carriço, João A; Crochemore, Maxime; Francisco, Alexandre P; Pissis, Solon P; Ribeiro-Gonçalves, Bruno; Vaz, Cátia.

Algorithms Mol Biol ; 13: 4, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29467814

RESUMO

BACKGROUND: Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles. On the other hand, computing genetic evolutionary distances among a set of typing profiles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. RESULTS: We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.

7.

PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods.

Nascimento, Marta; Sousa, Adriano; Ramirez, Mário; Francisco, Alexandre P; Carriço, João A; Vaz, Cátia.

Bioinformatics ; 33(1): 128-129, 2017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-27605102

RESUMO

High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolutionary analysis and epidemiological surveillance. However, many of the analysis tools currently available do not scale well to these large datasets, nor provide the means to fully integrate ancillary data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data analysis algorithms and new visualization modules, as well as the capability of saving projects for subsequent work or for dissemination of results. AVAILABILITY AND IMPLEMENTATION: http://www.phyloviz.net/ (licensed under GPLv3). CONTACT: cvaz@inesc-id.ptSupplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Bactérias/genética , Genoma Bacteriano , Filogenia , Análise de Sequência de DNA/métodos , Software , Bactérias/classificação , Técnicas de Tipagem Bacteriana/métodos , Tipagem de Sequências Multilocus/métodos , Polimorfismo de Nucleotídeo Único

8.

PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees.

Ribeiro-Gonçalves, Bruno; Francisco, Alexandre P; Vaz, Cátia; Ramirez, Mário; Carriço, João André.

Nucleic Acids Res ; 44(W1): W246-51, 2016 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-27131357

RESUMO

High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software. PHYLOViZ Online is freely available at https://online.phyloviz.net.

Assuntos

Alelos , Filogenia , Streptococcus pneumoniae/classificação , Interface Usuário-Computador , Algoritmos , Gráficos por Computador , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Sequenciamento de Nucleotídeos em Larga Escala , Disseminação de Informação , Armazenamento e Recuperação da Informação , Internet , Polimorfismo de Nucleotídeo Único , Sorogrupo , Streptococcus pneumoniae/genética

9.

Not seeing the forest for the trees: size of the minimum spanning trees (MSTs) forest and branch significance in MST-based phylogenetic analysis.

Teixeira, Andreia Sofia; Monteiro, Pedro T; Carriço, João A; Ramirez, Mário; Francisco, Alexandre P.

PLoS One ; 10(3): e0119315, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25799056

RESUMO

Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in such trees. In this paper, we address this problem for MSTs by introducing a new edge betweenness metric for undirected and weighted graphs. This spanning edge betweenness metric is defined as the fraction of equivalent MSTs where a given edge is present. The metric provides a per edge statistic that is similar to that of the bootstrap approach frequently used in phylogenetics to support the grouping of taxa. We provide methods for the exact computation of this metric based on the well known Kirchhoff's matrix tree theorem. Moreover, we implement and make available a module for the PHYLOViZ software and evaluate the proposed metric concerning both effectiveness and computational performance. Analysis of trees generated using multilocus sequence typing data (MLST) and the goeBURST algorithm revealed that the space of possible MSTs in real data sets is extremely large. Selection of the edge to be represented using bootstrap could lead to unreliable results since alternative edges are present in the same fraction of equivalent MSTs. The choice of the MST to be presented, results from criteria implemented in the algorithm that must be based in biologically plausible models.

Assuntos

Algoritmos , Gráficos por Computador , Filogenia , Bactérias/classificação

10.

TypOn: the microbial typing ontology.

Vaz, Cátia; Francisco, Alexandre P; Silva, Mickael; Jolley, Keith A; Bray, James E; Pouseele, Hannes; Rothganger, Joerg; Ramirez, Mário; Carriço, João A.

J Biomed Semantics ; 5(1): 43, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25584183

RESUMO

ABSTRACT: Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resistance. Due to advances in DNA sequencing technology, these methods have evolved to become focused on sequence-based methodologies. The need to have a common understanding of the concepts described and the ability to share results within the community at a global level are increasingly important requisites for the continued development of portable and accurate sequence-based typing methods, especially with the recent introduction of Next Generation Sequencing (NGS) technologies. In this paper, we present an ontology designed for the sequence-based microbial typing field, capable of describing any of the sequence-based typing methodologies currently in use and being developed, including novel NGS based methods. This is a fundamental step to accurately describe, analyze, curate, and manage information for microbial typing based on sequence based typing methods.

11.

Interactogeneous: disease gene prioritization using heterogeneous networks and full topology scores.

Gonçalves, Joana P; Francisco, Alexandre P; Moreau, Yves; Madeira, Sara C.

PLoS One ; 7(11): e49634, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23185389

RESUMO

Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson's disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.

Assuntos

Biomarcadores/metabolismo , Doença/genética , Algoritmos , Área Sob a Curva , Biologia Computacional/métodos , Mineração de Dados , Bases de Dados Genéticas , Redes Reguladoras de Genes , Genes , Humanos , Modelos Estatísticos , Curva ROC , Reprodutibilidade dos Testes

12.

Regulatory Snapshots: integrative mining of regulatory modules from expression time series and regulatory networks.

Gonçalves, Joana P; Aires, Ricardo S; Francisco, Alexandre P; Madeira, Sara C.

PLoS One ; 7(5): e35977, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22563474

RESUMO

Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots.

Assuntos

Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Software , Transcrição Gênica/genética , Algoritmos , Transição Epitelial-Mesenquimal/genética , Resposta ao Choque Térmico/genética , Temperatura Alta , Humanos , Modelos Genéticos , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética

13.

Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis.

Vinga, Susana; Carvalho, Alexandra M; Francisco, Alexandre P; Russo, Luís Ms; Almeida, Jonas S.

Algorithms Mol Biol ; 7(1): 10, 2012 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-22551152

RESUMO

BACKGROUND: Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2-L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations. RESULTS: The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm. CONCLUSIONS: The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.

14.

PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods.

Francisco, Alexandre P; Vaz, Cátia; Monteiro, Pedro T; Melo-Cristino, José; Ramirez, Mário; Carriço, Joäo A.

BMC Bioinformatics ; 13: 87, 2012 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-22568821

RESUMO

BACKGROUND: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. RESULTS: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. CONCLUSIONS: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

Assuntos

Filogenia , Análise de Sequência de DNA/métodos , Software , Algoritmos , Repetições Minissatélites , Tipagem de Sequências Multilocus/métodos , Polimorfismo de Nucleotídeo Único , Streptococcus pneumoniae/classificação , Streptococcus pneumoniae/genética

15.

TFRank: network-based prioritization of regulatory associations underlying transcriptional responses.

Gonçalves, Joana P; Francisco, Alexandre P; Mira, Nuno P; Teixeira, Miguel C; Sá-Correia, Isabel; Oliveira, Arlindo L; Madeira, Sara C.

Bioinformatics ; 27(22): 3149-57, 2011 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-21965816

RESUMO

MOTIVATION: Uncovering mechanisms underlying gene expression control is crucial to understand complex cellular responses. Studies in gene regulation often aim to identify regulatory players involved in a biological process of interest, either transcription factors coregulating a set of target genes or genes eventually controlled by a set of regulators. These are frequently prioritized with respect to a context-specific relevance score. Current approaches rely on relevance measures accounting exclusively for direct transcription factor-target interactions, namely overrepresentation of binding sites or target ratios. Gene regulation has, however, intricate behavior with overlapping, indirect effect that should not be neglected. In addition, the rapid accumulation of regulatory data already enables the prediction of large-scale networks suitable for higher level exploration by methods based on graph theory. A paradigm shift is thus emerging, where isolated and constrained analyses will likely be replaced by whole-network, systemic-aware strategies. RESULTS: We present TFRank, a graph-based framework to prioritize regulatory players involved in transcriptional responses within the regulatory network of an organism, whereby every regulatory path containing genes of interest is explored and incorporated into the analysis. TFRank selected important regulators of yeast adaptation to stress induced by quinine and acetic acid, which were missed by a direct effect approach. Notably, they reportedly confer resistance toward the chemicals. In a preliminary study in human, TFRank unveiled regulators involved in breast tumor growth and metastasis when applied to genes whose expression signatures correlated with short interval to metastasis.

Assuntos

Regulação da Expressão Gênica , Redes Reguladoras de Genes , Fatores de Transcrição/metabolismo , Transcrição Gênica , Ácido Acético/farmacologia , Sítios de Ligação , Humanos , Metástase Neoplásica , Quinina/farmacologia , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transcrição Gênica/efeitos dos fármacos

16.

Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach.

Francisco, Alexandre P; Bugalho, Miguel; Ramirez, Mário; Carriço, João A.

BMC Bioinformatics ; 10: 152, 2009 May 18.

Artigo em Inglês | MEDLINE | ID: mdl-19450271

RESUMO

BACKGROUND: Multilocus Sequence Typing (MLST) is a frequently used typing method for the analysis of the clonal relationships among strains of several clinically relevant microbial species. MLST is based on the sequence of housekeeping genes that result in each strain having a distinct numerical allelic profile, which is abbreviated to a unique identifier: the sequence type (ST). The relatedness between two strains can then be inferred by the differences between allelic profiles. For a more comprehensive analysis of the possible patterns of evolutionary descent, a set of rules were proposed and implemented in the eBURST algorithm. These rules allow the division of a data set into several clusters of related strains, dubbed clonal complexes, by implementing a simple model of clonal expansion and diversification. Within each clonal complex, the rules identify which links between STs correspond to the most probable pattern of descent. However, the eBURST algorithm is not globally optimized, which can result in links, within the clonal complexes, that violate the rules proposed. RESULTS: Here, we present a globally optimized implementation of the eBURST algorithm - goeBURST. The search for a global optimal solution led to the formalization of the problem as a graphic matroid, for which greedy algorithms that provide an optimal solution exist. Several public data sets of MLST data were tested and differences between the two implementations were found and are discussed for five bacterial species: Enterococcus faecium, Streptococcus pneumoniae, Burkholderia pseudomallei, Campylobacter jejuni and Neisseria spp.. A novel feature implemented in goeBURST is the representation of the level of tiebreak rule reached before deciding if a link should be drawn, which can used to visually evaluate the reliability of the represented hypothetical pattern of descent. CONCLUSION: goeBURST is a globally optimized implementation of the eBURST algorithm, that identifies alternative patterns of descent for several bacterial species. Furthermore, the algorithm can be applied to any multilocus typing data based on the number of differences between numeric profiles. A software implementation is available at http://goeBURST.phyloviz.net.

Assuntos

Bactérias/genética , Análise por Conglomerados , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Bases de Dados Genéticas , Evolução Molecular , Genes Bacterianos , Modelos Genéticos

17.

YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae.

Monteiro, Pedro T; Mendes, Nuno D; Teixeira, Miguel C; d'Orey, Sofia; Tenreiro, Sandra; Mira, Nuno P; Pais, Hélio; Francisco, Alexandre P; Carvalho, Alexandra M; Lourenço, Artur B; Sá-Correia, Isabel; Oliveira, Arlindo L; Freitas, Ana T.

Nucleic Acids Res ; 36(Database issue): D132-6, 2008 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18032429

RESUMO

The Yeast search for transcriptional regulators and consensus tracking (YEASTRACT) information system (www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in September 2007, this database contains over 30 990 regulatory associations between Transcription Factors (TFs) and target genes and includes 284 specific DNA binding sites for 108 characterized TFs. Computational tools are also provided to facilitate the exploitation of the gathered data when solving a number of biological questions, in particular the ones that involve the analysis of global gene expression results. In this new release, YEASTRACT includes DISCOVERER, a set of computational tools that can be used to identify complex motifs over-represented in the promoter regions of co-regulated genes. The motifs identified are then clustered in families, represented by a position weight matrix and are automatically compared with the known transcription factor binding sites described in YEASTRACT. Additionally, in this new release, it is possible to generate graphic depictions of transcriptional regulatory networks for documented or potential regulatory associations between TFs and target genes. The visual display of these networks of interactions is instrumental in functional studies. Tutorials are available on the system to exemplify the use of all the available tools.

Assuntos

Bases de Dados de Ácidos Nucleicos , Redes Reguladoras de Genes , Regiões Promotoras Genéticas , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação , Regulação Fúngica da Expressão Gênica , Internet , Software

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA