Búsqueda | Portal Regional de la BVS

Domain tree-based analysis of protein architecture evolution.

Forslund, Kristoffer; Henricson, Anna; Hollich, Volker; Sonnhammer, Erik L L.

Mol Biol Evol ; 25(2): 254-64, 2008 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-18025066

RESUMEN

Understanding the dynamics behind domain architecture evolution is of great importance to unravel the functions of proteins. Complex architectures have been created throughout evolution by rearrangement and duplication events. An interesting question is how many times a particular architecture has been created, a form of convergent evolution or domain architecture reinvention. Previous studies have approached this issue by comparing architectures found in different species. We wanted to achieve a finer-grained analysis by reconstructing protein architectures on complete domain trees. The prevalence of domain architecture reinvention in 96 genomes was investigated with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. Domain architectures were taken from Pfam. To ensure robustness, we applied the method to bootstrap trees and only considered results with strong statistical support. We detected multiple origins for 12.4% of the scored architectures. In a much smaller data set, the subset of completely domain-assigned proteins, the figure was 5.6%. These results indicate that domain architecture reinvention is a much more common phenomenon than previously thought. We also determined which domains are most frequent in multiply created architectures and assessed whether specific functions could be attributed to them. However, no strong functional bias was found in architectures with multiple origins.

Asunto(s)

Algoritmos , Evolución Molecular , Estructura Terciaria de Proteína , Biología Computacional , Programas Informáticos

PfamAlyzer: domain-centric homology search.

Hollich, Volker; Sonnhammer, Erik L L.

Bioinformatics ; 23(24): 3382-3, 2007 Dec 15.

Artículo en Inglés | MEDLINE | ID: mdl-17977882

RESUMEN

UNLABELLED: PfamAlyzer is a Java applet that enables exploration of Pfam domain architectures using a user-friendly graphical interface. It can search the UniProt protein database for a domain pattern. Domain patterns similar to the query are presented graphically by PfamAlyzer either in a ranked list or pinned to the tree of life. Such domain-centric homology search can assist identification of distant homologs with shared domain architecture. AVAILABILITY: PfamAlyzer has been integrated with the Pfam database and can be accessed at http://pfam.cgb.ki.se/pfamalyzer.

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Interfaz Usuario-Computador , Almacenamiento y Recuperación de la Información/métodos , Lenguajes de Programación , Estructura Terciaria de Proteína , Homología de Secuencia de Aminoácido

Pfam: clans, web tools and services.

Finn, Robert D; Mistry, Jaina; Schuster-Böckler, Benjamin; Griffiths-Jones, Sam; Hollich, Volker; Lassmann, Timo; Moxon, Simon; Marshall, Mhairi; Khanna, Ajay; Durbin, Richard; Eddy, Sean R; Sonnhammer, Erik L L; Bateman, Alex.

Nucleic Acids Res ; 34(Database issue): D247-51, 2006 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-16381856

RESUMEN

Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam/), the USA (http://pfam.wustl.edu/), France (http://pfam.jouy.inra.fr/) and Sweden (http://pfam.cgb.ki.se/).

Asunto(s)

Bases de Datos de Proteínas , Proteínas/clasificación , Gráficos por Computador , Internet , Cadenas de Markov , Estructura Terciaria de Proteína , Proteínas/química , Alineación de Secuencia , Programas Informáticos , Interfaz Usuario-Computador

Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction.

Hollich, Volker; Milchert, Lena; Arvestad, Lars; Sonnhammer, Erik L L.

Mol Biol Evol ; 22(11): 2257-64, 2005 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-16049194

RESUMEN

Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods.

Asunto(s)

Clasificación/métodos , Evolución Molecular , Modelos Genéticos , Filogenia , Proteínas/genética , Secuencia de Bases , Análisis por Conglomerados , Simulación por Computador , Estudios de Evaluación como Asunto , Alineación de Secuencia

Scoredist: a simple and robust protein sequence distance estimator.

Sonnhammer, Erik L L; Hollich, Volker.

BMC Bioinformatics ; 6: 108, 2005 Apr 27.

Artículo en Inglés | MEDLINE | ID: mdl-15857510

RESUMEN

BACKGROUND: Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number of true substitutions given an observed alignment. So far, the most accurate protein distance estimators have looked for the optimal matrix in a series of transition probability matrices, e.g. the Dayhoff series. The evolutionary distance between two aligned sequences is here estimated as the evolutionary distance of the optimal matrix. The optimal matrix can be found either by an iterative search for the Maximum Likelihood matrix, or by integration to find the Expected Distance. As a consequence, these methods are more complex to implement and computationally heavier than correction-based methods. Another problem is that the result may vary substantially depending on the evolutionary model used for the matrices. An ideal distance estimator should produce consistent and accurate distances independent of the evolutionary model used. RESULTS: We propose a correction-based protein sequence estimator called Scoredist. It uses a logarithmic correction of observed divergence based on the alignment score according to the BLOSUM62 score matrix. We evaluated Scoredist and a number of optimal matrix methods using three evolutionary models for both training and testing Dayhoff, Jones-Taylor-Thornton, and Muller-Vingron, as well as Whelan and Goldman solely for testing. Test alignments with known distances between 0.01 and 2 substitutions per position (1-200 PAM) were simulated using ROSE. Scoredist proved as accurate as the optimal matrix methods, yet substantially more robust. When trained on one model but tested on another one, Scoredist was nearly always more accurate. The Jukes-Cantor and Kimura correction methods were also tested, but were substantially less accurate. CONCLUSION: The Scoredist distance estimator is fast to implement and run, and combines robustness with accuracy. Scoredist has been incorporated into the Belvu alignment viewer, which is available at ftp://ftp.cgb.ki.se/pub/prog/belvu/.

Asunto(s)

Biología Computacional/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Evolución Biológica , Calibración , Simulación por Computador , Evolución Molecular , Funciones de Verosimilitud , Modelos Estadísticos , Datos de Secuencia Molecular , Método de Montecarlo , Reconocimiento de Normas Patrones Automatizadas , Filogenia , Homología de Secuencia de Aminoácido

Creation of a minimal tiling path of genomic clones for Drosophila: provision of a common resource.

Hollich, Volker; Johnson, Eric; Furlong, Eileen E; Beckmann, Boris; Carlson, Joseph; Celniker, Susan E; Hoheisel, Jörg D.

Biotechniques ; 37(2): 282-4, 2004 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-15335221

RESUMEN

On the basis of shotgun subclone libraries used in the sequencing of the Drosophila melanogaster genome, a minimal tiling path of subclones across much of the genome was determined. About 320,000 shotgun clones for chromosomes X(12-20), 2R, 2L, 3R, and 4 were available from the Berkeley Drosophila Genome Project. The clone inserts have an average length of 3.4 kb and are amenable to standard PCR amplification. The resulting tiling path covers 86.2% of chromosome X(12-20), 86.2% of chromosomal arm 2R, 79.0% of 2L, 89.6% of 3R, and 80.5% of chromosome 4. In total, the 25,135 clones represent 76.7 Mb--equivalent to about 67% of the genome--and would be suitable for producing a microarray on a single slide.

Asunto(s)

Mapeo Cromosómico/métodos , Clonación Molecular/métodos , Drosophila melanogaster/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reacción en Cadena de la Polimerasa/métodos , Análisis de Secuencia de ADN/métodos , Animales

The Pfam protein families database.

Bateman, Alex; Coin, Lachlan; Durbin, Richard; Finn, Robert D; Hollich, Volker; Griffiths-Jones, Sam; Khanna, Ajay; Marshall, Mhairi; Moxon, Simon; Sonnhammer, Erik L L; Studholme, David J; Yeats, Corin; Eddy, Sean R.

Nucleic Acids Res ; 32(Database issue): D138-41, 2004 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-14681378

RESUMEN

Pfam is a large collection of protein families and domains. Over the past 2 years the number of families in Pfam has doubled and now stands at 6190 (version 10.0). Methodology improvements for searching the Pfam collection locally as well as via the web are described. Other recent innovations include modelling of discontinuous domains allowing Pfam domain definitions to be closer to those found in structure databases. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam/), the USA (http://pfam.wustl.edu/), France (http://pfam.jouy.inra.fr/) and Sweden (http://Pfam.cgb.ki.se/).

Asunto(s)

Bases de Datos de Proteínas , Proteínas/química , Proteínas/clasificación , Animales , Biología Computacional , Humanos , Internet , Modelos Moleculares , Familia de Multigenes , Estructura Terciaria de Proteína

OrthoGUI: graphical presentation of Orthostrapper results.

Hollich, Volker; Storm, Christian E V; Sonnhammer, Erik L L.

Bioinformatics ; 18(9): 1272-3, 2002 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-12217923

RESUMEN

SUMMARY: Orthostrapper is a program that calculates orthology support values for pairs of sequences in a multiple alignment (Storm and Sonnhammer, Bioinformatics, 18, 92-99, 2002). Here we present OrthoGUI, a web interface and display tool for Orthostrapper analysis. OrthoGUI visualizes the Orthostrapper output in both tabular and tree representations, and can also apply a clustering algorithm to identify groups of multiple orthologs, which are indicated by colour coding. AVAILABILITY: http://www.cgb.ki.se/OrthoGUI CONTACT: erik.sonnhammer@cgb.ki.se

Asunto(s)

Gráficos por Computador , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia , Transportadoras de Casetes de Unión a ATP/genética , Transportadoras de Casetes de Unión a ATP/metabolismo , Adenosina Trifosfato/metabolismo , Algoritmos , Presentación de Datos , Humanos , Internet , Unión Proteica/genética , Sensibilidad y Especificidad , Especificidad de la Especie , Interfaz Usuario-Computador

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA