Pesquisa | Biblioteca Virtual em Saúde

1.

Ultra-sensitive molecular residual disease detection through whole genome sequencing with single-read error correction.

Li, Xinxing; Liu, Tao; Bacchiocchi, Antonella; Li, Mengxing; Cheng, Wen; Wittkop, Tobias; Mendez, Fernando; Wang, Yingyu; Tang, Paul; Yao, Qianqian; Bosenberg, Marcus W; Sznol, Mario; Yan, Qin; Faham, Malek; Weng, Li; Halaban, Ruth; Jin, Hai; Hu, Zhiqian.

medRxiv ; 2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38260271

RESUMO

While whole genome sequencing (WGS) of cell-free DNA (cfDNA) holds enormous promise for molecular residual disease (MRD) detection, its performance is limited by WGS error rate. Here we introduce AccuScan, an efficient cfDNA WGS technology that enables genome-wide error correction at single read level, achieving an error rate of 4.2×10 -7 , which is about two orders of magnitude lower than a read-centric de-noising method. When applied to MRD detection, AccuScan demonstrated analytical sensitivity down to 10 -6 circulating tumor allele fraction at 99% sample level specificity. In colorectal cancer, AccuScan showed 90% landmark sensitivity for predicting relapse. It also showed robust MRD performance with esophageal cancer using samples collected as early as 1 week after surgery, and predictive value for immunotherapy monitoring with melanoma patients. Overall, AccuScan provides a highly accurate WGS solution for MRD, empowering circulating tumor DNA detection at parts per million range without high sample input nor personalized reagents. One Sentence Summary: AccuScan showed remarkable ultra-low limit of detection with a short turnaround time, low sample requirement and a simple workflow for MRD detection.

2.

Multiplex Identification of Antigen-Specific T Cell Receptors Using a Combination of Immune Assays and Immune Receptor Sequencing.

Klinger, Mark; Pepin, Francois; Wilkins, Jen; Asbury, Thomas; Wittkop, Tobias; Zheng, Jianbiao; Moorhead, Martin; Faham, Malek.

PLoS One ; 10(10): e0141561, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26509579

RESUMO

Monitoring antigen-specific T cells is critical for the study of immune responses and development of biomarkers and immunotherapeutics. We developed a novel multiplex assay that combines conventional immune monitoring techniques and immune receptor repertoire sequencing to enable identification of T cells specific to large numbers of antigens simultaneously. We multiplexed 30 different antigens and identified 427 antigen-specific clonotypes from 5 individuals with frequencies as low as 1 per million T cells. The clonotypes identified were validated several ways including repeatability, concordance with published clonotypes, and high correlation with ELISPOT. Applying this technology we have shown that the vast majority of shared antigen-specific clonotypes identified in different individuals display the same specificity. We also showed that shared antigen-specific clonotypes are simpler sequences and are present at higher frequencies compared to non-shared clonotypes specific to the same antigen. In conclusion this technology enables sensitive and quantitative monitoring of T cells specific for hundreds or thousands of antigens simultaneously allowing the study of T cell responses with an unprecedented resolution and scale.

Assuntos

ELISPOT , Epitopos de Linfócito T/imunologia , Sequenciamento de Nucleotídeos em Larga Escala , Receptores de Antígenos de Linfócitos T/genética , Receptores Imunológicos/genética , Especificidade do Receptor de Antígeno de Linfócitos T/genética , Especificidade do Receptor de Antígeno de Linfócitos T/imunologia , Evolução Clonal/genética , Evolução Clonal/imunologia , ELISPOT/métodos , ELISPOT/normas , Humanos , Reprodutibilidade dos Testes

3.

Genome and proteome annotation using automatically recognized concepts and functional networks.

Bivol, Adrian; Wittkop, Tobias; Davis, Darcy; Mooney, Sean D.

AMIA Jt Summits Transl Sci Proc ; 2013: 26, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24303290

RESUMO

Many tools have been developed for prediction of the function or disease association of genes and proteins, and this continues to be a highly active area of bioinformatics research. Typically, these methods predict which concepts should be annotated to genes or proteins, using terms from ontologies such as Gene Ontology (GO), largely overlooking other ontologies that are available. Here, we set out to broadly evaluate novel, automatically retrieved, gene-term annotations and identify those concepts of publicly available ontologies that can be predicted using a generalized tool for prediction of annotations. We identified terms that perform better than expected by chance using randomly generated gene sets and show that both manually curated terms in GO and automatically recognized terms can be used to develop reasonable predictive models. In all, we characterize terms in over 250 ontologies and identify more than 127,000 statistically significant terms that can be predicted on human genes.

4.

STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation.

Wittkop, Tobias; TerAvest, Emily; Evani, Uday S; Fleisch, K Mathew; Berman, Ari E; Powell, Corey; Shah, Nigam H; Mooney, Sean D.

BMC Bioinformatics ; 14: 53, 2013 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-23409969

RESUMO

BACKGROUND: Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. RESULTS: As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. CONCLUSION: Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/.

Assuntos

Genes , Anotação de Sequência Molecular , Proteínas , Software , Vocabulário Controlado , Humanos , Doença de Huntington/genética , Doença de Huntington/metabolismo , Internet , Doença de Parkinson/genética , Doença de Parkinson/metabolismo , Mapeamento de Interação de Proteínas

5.

A large-scale evaluation of computational protein function prediction.

Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian.

Nat Methods ; 10(3): 221-7, 2013 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-23353650

RESUMO

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Assuntos

Biologia Computacional/métodos , Biologia Molecular/métodos , Anotação de Sequência Molecular , Proteínas/fisiologia , Algoritmos , Animais , Bases de Dados de Proteínas , Exorribonucleases/classificação , Exorribonucleases/genética , Exorribonucleases/fisiologia , Previsões , Humanos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Especificidade da Espécie

6.

Density parameter estimation for finding clusters of homologous proteins--tracing actinobacterial pathogenicity lifestyles.

Röttger, Richard; Kalaghatgi, Prabhav; Sun, Peng; Soares, Siomar de Castro; Azevedo, Vasco; Wittkop, Tobias; Baumbach, Jan.

Bioinformatics ; 29(2): 215-22, 2013 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-23142964

RESUMO

MOTIVATION: Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles. RESULTS: Our main contribution is a method for identifying a suitable and robust density parameter for protein homology detection without a given gold standard. Therefore, we study the core genome of 89 actinobacteria. This allows us to incorporate background knowledge, i.e. the assumption that a set of evolutionarily closely related species should share a comparably high number of evolutionarily conserved proteins (emerging from phylum-specific housekeeping genes). We apply our strategy to find genes/proteins that are specific for certain actinobacterial lifestyles, i.e. different types of pathogenicity. The whole study was performed with transitivity clustering, as it only requires a single intuitive density parameter and has been shown to be well applicable for the task of protein sequence clustering. Note, however, that the presented strategy generally does not depend on our clustering method but can easily be adapted to other clustering approaches. AVAILABILITY: All results are publicly available at http://transclust.mmci.uni-saarland.de/actino_core/ or as Supplementary Material of this article. CONTACT: roettger@mpi-inf.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Actinobacteria/classificação , Proteínas de Bactérias/química , Homologia de Sequência de Aminoácidos , Actinobacteria/genética , Actinobacteria/patogenicidade , Algoritmos , Proteínas de Bactérias/genética , Análise por Conglomerados , Genoma Bacteriano , Modelos Genéticos , Filogenia , Alinhamento de Sequência

7.

Genetic correction of Huntington's disease phenotypes in induced pluripotent stem cells.

An, Mahru C; Zhang, Ningzhe; Scott, Gary; Montoro, Daniel; Wittkop, Tobias; Mooney, Sean; Melov, Simon; Ellerby, Lisa M.

Cell Stem Cell ; 11(2): 253-63, 2012 Aug 03.

Artigo em Inglês | MEDLINE | ID: mdl-22748967

RESUMO

Huntington's disease (HD) is caused by a CAG expansion in the huntingtin gene. Expansion of the polyglutamine tract in the huntingtin protein results in massive cell death in the striatum of HD patients. We report that human induced pluripotent stem cells (iPSCs) derived from HD patient fibroblasts can be corrected by the replacement of the expanded CAG repeat with a normal repeat using homologous recombination, and that the correction persists in iPSC differentiation into DARPP-32-positive neurons in vitro and in vivo. Further, correction of the HD-iPSCs normalized pathogenic HD signaling pathways (cadherin, TGF-ß, BDNF, and caspase activation) and reversed disease phenotypes such as susceptibility to cell death and altered mitochondrial bioenergetics in neural stem cells. The ability to make patient-specific, genetically corrected iPSCs from HD patients will provide relevant disease models in identical genetic backgrounds and is a critical step for the eventual use of these cells in cell replacement therapy.

Assuntos

Doença de Huntington/genética , Doença de Huntington/patologia , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Diferenciação Celular , Células Cultivadas , Humanos , Células-Tronco Pluripotentes Induzidas/patologia , Fenótipo

8.

DEFOG: discrete enrichment of functionally organized genes.

Wittkop, Tobias; Berman, Ari E; Fleisch, K Mathew; Mooney, Sean D.

Integr Biol (Camb) ; 4(7): 795-804, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22706384

RESUMO

High-throughput biological experiments commonly result in a list of genes or proteins of interest. In order to understand the observed changes of the genes and to generate new hypotheses, one needs to understand the functions and roles of the genes and how those functions relate to the experimental conditions. Typically, statistical tests are performed in order to detect enriched Gene Ontology categories or pathways, i.e. the categories are observed in the genes of interest more often than is expected by chance. Depending on the number of genes and the complexity and quantity of functions in which they are involved, such an analysis can easily result in hundreds of enriched terms. To this end we developed DEFOG, a web-based application that facilitates the functional analysis of gene sets by hierarchically organizing the genes into functionally related modules. Our computational pipeline utilizes three powerful tools to achieve this goal: (1) GeneMANIA creates a functional consensus network of the genes of interest based on gene-list-specific data fusion of hundreds of genomic networks from publicly available sources; (2) Transitivity Clustering organizes those genes into a clear hierarchy of functionally related groups, and (3) Ontologizer performs a Gene Ontology enrichment analysis on the resulting gene clusters. DEFOG integrates this computational pipeline within an easy-to-use web interface, thus allowing for a novel visual analysis of gene sets that aids in the discovery of potentially important biological mechanisms and facilitates the creation of new hypotheses. DEFOG is available at http://www.mooneygroup.org/defog.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Envelhecimento/genética , Algoritmos , Animais , Gráficos por Computador , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Humanos , Internet , Família Multigênica , Análise de Sequência com Séries de Oligonucleotídeos , Software

9.

clusterMaker: a multi-algorithm clustering plugin for Cytoscape.

Morris, John H; Apeltsin, Leonard; Newman, Aaron M; Baumbach, Jan; Wittkop, Tobias; Su, Gang; Bader, Gary D; Ferrin, Thomas E.

BMC Bioinformatics ; 12: 436, 2011 Nov 09.

Artigo em Inglês | MEDLINE | ID: mdl-22070249

RESUMO

BACKGROUND: In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. RESULTS: Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. CONCLUSIONS: The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.

Assuntos

Algoritmos , Saccharomyces cerevisiae/genética , Software , Animais , Análise por Conglomerados , Genômica , Camundongos , Mapas de Interação de Proteínas , Racemases e Epimerases/genética , Saccharomyces cerevisiae/enzimologia

10.

Comprehensive cluster analysis with Transitivity Clustering.

Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan.

Nat Protoc ; 6(3): 285-95, 2011 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-21372810

RESUMO

Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes â¼1 h to complete.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Software , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Perfilação da Expressão Gênica , Internet , Dados de Sequência Molecular , Análise de Sequência/métodos , Homologia de Sequência , Interface Usuário-Computador

11.

Partitioning biological data with transitivity clustering.

Wittkop, Tobias; Emig, Dorothea; Lange, Sita; Rahmann, Sven; Albrecht, Mario; Morris, John H; Böcker, Sebastian; Stoye, Jens; Baumbach, Jan.

Nat Methods ; 7(6): 419-20, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20508635

Assuntos

Análise por Conglomerados , Interpretação Estatística de Dados , Animais , Humanos

12.

Efficient online transcription factor binding site adjustment by integrating transitive graph projection with MoRAine 2.0.

Wittkop, Tobias; Rahmann, Sven; Baumbach, Jan.

J Integr Bioinform ; 7(3)2010 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-20375458

RESUMO

UNLABELLED: We investigated the problem of imprecisely determined prokaryotic transcription factor (TF) binding sites (TFBSs). We found that the identification and reinvestigation of questionable binding motifs may result in improved models of these motifs. Subsequent modelbased predictions of gene regulatory interactions may be performed with increased accuracy when the TFBSs annotation underlying these models has been re-adjusted. We present MoRAine 2.0, a significantly improved version of MoRAine. It can automatically identify cases of unfavorable TFBS strand annotations and imprecisely determined TFBS positions. With release 2.0, we close the gap between reasonable running time and high accuracy. Furthermore, it requires only minimal input from the user: (1) the input TFBS sequences and (2) the length of the flanking sequences. CONCLUSIONS: MoRAine 2.0 is an easy-to-use, integrated, and publicly available web tool for the re-annotation of questionable TFBSs. It can be used online or downloaded as a stand-alone version from http://moraine.cebitec.uni-bielefeld.de.

Assuntos

Biologia Computacional/métodos , Internet , Software , Fatores de Transcrição/metabolismo , Sequência de Bases , Sítios de Ligação , Dados de Sequência Molecular , Matrizes de Pontuação de Posição Específica

13.

Integrated analysis and reconstruction of microbial transcriptional gene regulatory networks using CoryneRegNet.

Baumbach, Jan; Wittkop, Tobias; Kleindt, Christiane Katja; Tauch, Andreas.

Nat Protoc ; 4(6): 992-1005, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19498379

RESUMO

CoryneRegNet is the reference database and analysis platform for corynebacterial gene regulatory networks. It provides web-based access to integrated data on gene regulatory interactions of corynebacteria relevant to human medicine and biotechnology, Escherichia coli and Mycobacterium tuberculosis. To facilitate the analysis and reconstruction of the corresponding networks, CoryneRegNet provides user-friendly interfaces for bioinformatics analysis and network visualization tools. This protocol describes four major workflows: (1) querying the regulatory network of a gene of interest, (2) prediction and interspecies transfer of gene regulatory interactions, (3) visualization and comparison of predicted or known networks and (4) integration of gene expression data analysis and visualization. This protocol guides the user through the most important features of CoryneRegNet and takes 45-60 min to complete.

Assuntos

Biologia Computacional/métodos , Corynebacterium/genética , Bases de Dados Genéticas , Regulação Bacteriana da Expressão Gênica , Redes Reguladoras de Genes , Internet , Fatores de Transcrição/genética , Transcrição Gênica , Interface Usuário-Computador

14.

MoRAine--a web server for fast computational transcription factor binding motif re-annotation.

Baumbach, Jan; Wittkop, Tobias; Weile, Jochen; Kohl, Thomas; Rahmann, Sven.

J Integr Bioinform ; 5(2)2008 Aug 25.

Artigo em Inglês | MEDLINE | ID: mdl-20134062

RESUMO

BACKGROUND: A precise experimental identification of transcription factor binding motifs (TFBMs), accurate to a single base pair, is time-consuming and diffcult. For several databases, TFBM annotations are extracted from the literature and stored 5' --> 3' relative to the target gene. Mixing the two possible orientations of a motif results in poor information content of subsequently computed position frequency matrices (PFMs) and sequence logos. Since these PFMs are used to predict further TFBMs, we address the question if the TFBMs underlying a PFM can be re-annotated automatically to improve both the information content of the PFM and subsequent classification performance. RESULTS: We present MoRAine, an algorithm that re-annotates transcription factor binding motifs. Each motif with experimental evidence underlying a PFM is compared against each other such motif. The goal is to re-annotate TFBMs by possibly switching their strands and shifting them a few positions in order to maximize the information content of the resulting adjusted PFM. We present two heuristic strategies to perform this optimization and subsequently show that MoRAine significantly improves the corresponding sequence logos. Furthermore, we justify the method by evaluating specificity, sensitivity, true positive, and false positive rates of PFM-based TFBM predictions for E. coli using the original database motifs and the MoRAine-adjusted motifs. The classification performance is considerably increased if MoRAine is used as a preprocessing step. CONCLUSIONS: MoRAine is integrated into a publicly available web server and can be used online or downloaded as a stand-alone version from http://moraine.cebitec. uni-bielefeld.de.

Assuntos

Internet , Software , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Algoritmos , Sequência de Bases , Sítios de Ligação , Dados de Sequência Molecular

15.

Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing.

Wittkop, Tobias; Baumbach, Jan; Lobo, Francisco P; Rahmann, Sven.

BMC Bioinformatics ; 8: 396, 2007 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-17941985

RESUMO

BACKGROUND: Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. RESULTS: We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools (Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation). Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192 187 prokaryotic protein sequences (66 organisms) obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. CONCLUSION: FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at http://gi.cebitec.uni-bielefeld.de/comet/force/.

Assuntos

Algoritmos , Análise por Conglomerados , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Inteligência Artificial , Dados de Sequência Molecular

16.

Exact and heuristic algorithms for weighted cluster editing.

Rahmann, Sven; Wittkop, Tobias; Baumbach, Jan; Martin, Marcel; Truss, Anke; Böcker, Sebastian.

Comput Syst Bioinformatics Conf ; 6: 391-401, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17951842

RESUMO

Clustering objects according to given similarity or distance values is a ubiquitous problem in computational biology with diverse applications, e.g., in defining families of orthologous genes, or in the analysis of microarray experiments. While there exists a plenitude of methods, many of them produce clusterings that can be further improved. "Cleaning up" initial clusterings can be formalized as projecting a graph on the space of transitive graphs; it is also known as the cluster editing or cluster partitioning problem in the literature. In contrast to previous work on cluster editing, we allow arbitrary weights on the similarity graph. To solve the so-defined weighted transitive graph projection problem, we present (1) the first exact fixed-parameter algorithm, (2) a polynomial-time greedy algorithm that returns the optimal result on a well-defined subset of "close-to-transitive" graphs and works heuristically on other graphs, and (3) a fast heuristic that uses ideas similar to those from the Fruchterman-Reingold graph layout algorithm. We compare quality and running times of these algorithms on both artificial graphs and protein similarity graphs derived from the 66 organisms of the COG dataset.

Assuntos

Algoritmos , Análise por Conglomerados , Documentação/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos

17.

CoryneRegNet 3.0--an interactive systems biology platform for the analysis of gene regulatory networks in corynebacteria and Escherichia coli.

Baumbach, Jan; Wittkop, Tobias; Rademacher, Katrin; Rahmann, Sven; Brinkrolf, Karina; Tauch, Andreas.

J Biotechnol ; 129(2): 279-89, 2007 Apr 30.

Artigo em Inglês | MEDLINE | ID: mdl-17229482

RESUMO

CoryneRegNet is an ontology-based data warehouse for the reconstruction and visualization of transcriptional regulatory interactions in prokaryotes. To extend the biological content of CoryneRegNet, we added comprehensive data on transcriptional regulations in the model organism Escherichia coli K-12, originally deposited in the international reference database RegulonDB. The enhanced web interface of CoryneRegNet offers several types of search options. The results of a search are displayed in a table-based style and include a visualization of the genetic organization of the respective gene region. Information on DNA binding sites of transcriptional regulators is depicted by sequence logos. The results can also be displayed by several layouters implemented in the graphical user interface GraphVis, allowing, for instance, the visualization of genome-wide network reconstructions and the homology-based inter-species comparison of reconstructed gene regulatory networks. In an application example, we compare the composition of the gene regulatory networks involved in the SOS response of E. coli and Corynebacterium glutamicum. CoryneRegNet is available at the following URL: http://www.cebitec.uni-bielefeld.de/groups/gi/software/coryneregnet/.

Assuntos

Corynebacterium glutamicum/genética , Bases de Dados Genéticas , Escherichia coli/genética , Redes Reguladoras de Genes/genética , Biologia de Sistemas , Regulação da Expressão Gênica

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA