Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Nucleic Acids Res ; 36(Database issue): D398-401, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17932066

RESUMEN

In protein research, structural classifications of protein domains provided by databases such as SCOP play an important role. However, as such databases have to be curated and prepared carefully, they update only up to a few times per year, and in between newly entered PDB structures cannot be used in cases where a structural classification is required. The Automated Protein Structure Identification (AutoPSI) database delivers predicted SCOP classifications for several thousand yet unclassified PDB entries as well as millions of UniProt sequences in an automated fashion. In order to obtain predictions, we make use of two recently published methods, namely AutoSCOP (sequence-based) and Vorolign (structure-based) and the consensus of both. With our predictions, we bridge the gap between SCOP versions for proteins with known structures in the PDB and additionally make structure predictions for a very large number of UniProt proteins. AutoPSI is freely accessible at http://www.bio.ifi.lmu.de/AutoPSIDB.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Internet , Proteínas/clasificación , Análisis de Secuencia de Proteína , Interfaz Usuario-Computador
2.
Bioinformatics ; 23(5): 651-3, 2007 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-17237069

RESUMEN

UNLABELLED: Given the growing amount of biological data, data mining methods have become an integral part of bioinformatics research. Unfortunately, standard data mining tools are often not sufficiently equipped for handling raw data such as e.g. amino acid sequences. One popular and freely available framework that contains many well-known data mining algorithms is the Waikato Environment for Knowledge Analysis (Weka). In the BioWeka project, we introduce various input formats for bioinformatics data and bioinformatics methods like alignments to Weka. This allows users to easily combine them with Weka's classification, clustering, validation and visualization facilities on a single platform and therefore reduces the overhead of converting data between different data formats as well as the need to write custom evaluation procedures that can deal with many different programs. We encourage users to participate in this project by adding their own components and data formats to BioWeka. AVAILABILITY: The software, documentation and tutorial are available at http://www.bioweka.org.


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Algoritmos , Bases de Datos Factuales , Interfaz Usuario-Computador
3.
Bioinformatics ; 23(10): 1203-10, 2007 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-17379694

RESUMEN

MOTIVATION: The sequence patterns contained in the available motif and hidden Markov model (HMM) databases are a valuable source of information for protein sequence annotation. For structure prediction and fold recognition purposes, we computed mappings from such pattern databases to the protein domain hierarchy given by the ASTRAL compendium and applied them to the prediction of SCOP classifications. Our aim is to make highly confident predictions also for non-trivial cases if possible and abstain from a prediction otherwise, and thus to provide a method that can be used as a first step in a pipeline of prediction methods. We describe two successful examples for such pipelines. With the AutoSCOP approach, it is possible to make predictions in a large-scale manner for many domains of the available sequences in the well-known protein sequence databases. RESULTS: AutoSCOP computes unique sequence patterns and pattern combinations for SCOP classifications. For instance, we assign a SCOP superfamily to a pattern found in its members whenever the pattern does not occur in any other SCOP superfamily. Especially on the fold and superfamily level, our method achieves both high sensitivity (above 93%) and high specificity (above 98%) on the difference set between two ASTRAL versions, due to being able to abstain from unreliable predictions. Further, on a harder test set filtered at low sequence identity, the combination with profile-profile alignments improves accuracy and performs comparably even to structure alignment methods. Integrating our method with structure alignment, we are able to achieve an accuracy of 99% on SCOP fold classifications on this set. In an analysis of false assignments of domains from new folds/superfamilies/families to existing SCOP classifications, AutoSCOP correctly abstains for more than 70% of the domains belonging to new folds and superfamilies, and more than 80% of the domains belonging to new families. These findings show that our approach is a useful additional filter for SCOP classification prediction of protein domains in combination with well-known methods such as profile-profile alignment. AVAILABILITY: A web server where users can input their domain sequences is available at http://www.bio.ifi.lmu.de/autoscop.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Reconocimiento de Normas Patrones Automatizadas , Proteínas/química , Programas Informáticos , Animales , Ratones , Pliegue de Proteína , Estructura Terciaria de Proteína
4.
Bioinformatics ; 23(2): e205-11, 2007 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-17237093

RESUMEN

UNLABELLED: Vorolign, a fast and flexible structural alignment method for two or more protein structures is introduced. The method aligns protein structures using double dynamic programming and measures the similarity of two residues based on the evolutionary conservation of their corresponding Voronoi-contacts in the protein structure. This similarity function allows aligning protein structures even in cases where structural flexibilities exist. Multiple structural alignments are generated from a set of pairwise alignments using a consistency-based, progressive multiple alignment strategy. RESULTS: The performance of Vorolign is evaluated for different applications of protein structure comparison, including automatic family detection as well as pairwise and multiple structure alignment. Vorolign accurately detects the correct family, superfamily or fold of a protein with respect to the SCOP classification on a set of difficult target structures. A scan against a database of >4000 proteins takes on average 1 min per target. The performance of Vorolign in calculating pairwise and multiple alignments is found to be comparable with other pairwise and multiple protein structure alignment methods. AVAILABILITY: Vorolign is freely available for academic users as a web server at http://www.bio.ifi.lmu.de/Vorolign


Asunto(s)
Algoritmos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestructura , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Simulación por Computador , Datos de Secuencia Molecular , Homología de Secuencia
5.
Methods Inf Med ; 57(S 01): e92-e105, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-30016815

RESUMEN

INTRODUCTION: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. "Smart Medical Information Technology for Healthcare (SMITH)" is one of four consortia funded by the German Medical Informatics Initiative (MI-I) to create an alliance of universities, university hospitals, research institutions and IT companies. SMITH's goals are to establish Data Integration Centers (DICs) at each SMITH partner hospital and to implement use cases which demonstrate the usefulness of the approach. OBJECTIVES: To give insight into architectural design issues underlying SMITH data integration and to introduce the use cases to be implemented. GOVERNANCE AND POLICIES: SMITH implements a federated approach as well for its governance structure as for its information system architecture. SMITH has designed a generic concept for its data integration centers. They share identical services and functionalities to take best advantage of the interoperability architectures and of the data use and access process planned. The DICs provide access to the local hospitals' Electronic Medical Records (EMR). This is based on data trustee and privacy management services. DIC staff will curate and amend EMR data in the Health Data Storage. METHODOLOGY AND ARCHITECTURAL FRAMEWORK: To share medical and research data, SMITH's information system is based on communication and storage standards. We use the Reference Model of the Open Archival Information System and will consistently implement profiles of Integrating the Health Care Enterprise (IHE) and Health Level Seven (HL7) standards. Standard terminologies will be applied. The SMITH Market Place will be used for devising agreements on data access and distribution. 3LGM2 for enterprise architecture modeling supports a consistent development process.The DIC reference architecture determines the services, applications and the standardsbased communication links needed for efficiently supporting the ingesting, data nourishing, trustee, privacy management and data transfer tasks of the SMITH DICs. The reference architecture is adopted at the local sites. Data sharing services and the market place enable interoperability. USE CASES: The methodological use case "Phenotype Pipeline" (PheP) constructs algorithms for annotations and analyses of patient-related phenotypes according to classification rules or statistical models based on structured data. Unstructured textual data will be subject to natural language processing to permit integration into the phenotyping algorithms. The clinical use case "Algorithmic Surveillance of ICU Patients" (ASIC) focusses on patients in Intensive Care Units (ICU) with the acute respiratory distress syndrome (ARDS). A model-based decision-support system will give advice for mechanical ventilation. The clinical use case HELP develops a "hospital-wide electronic medical record-based computerized decision support system to improve outcomes of patients with blood-stream infections" (HELP). ASIC and HELP use the PheP. The clinical benefit of the use cases ASIC and HELP will be demonstrated in a change of care clinical trial based on a step wedge design. DISCUSSION: SMITH's strength is the modular, reusable IT architecture based on interoperability standards, the integration of the hospitals' information management departments and the public-private partnership. The project aims at sustainability beyond the first 4-year funding period.


Asunto(s)
Atención a la Salud , Tecnología de la Información , Algoritmos , Gestión Clínica , Comunicación , Sistemas de Apoyo a Decisiones Clínicas , Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Unidades de Cuidados Intensivos , Modelos Teóricos , Fenotipo , Políticas
6.
J Bioinform Comput Biol ; 2(2): 289-307, 2004 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15297983

RESUMEN

Recognition of protein-DNA binding sites in genomic sequences is a crucial step for discovering biological functions of genomic sequences. Explosive growth in availability of sequence information has resulted in a demand for binding site detection methods with high specificity. The motivation of the work presented here is to address this demand by a systematic approach based on Maximum Likelihood Estimation. A general framework is developed in which a large class of binding site detection methods can be described in a uniform and consistent way. Protein-DNA binding is determined by binding energy, which is an approximately linear function within the space of sequence words. All matrix based binding word detectors can be regarded as different linear classifiers which attempt to estimate the linear separation implied by the binding energy function. The standard approaches of consensus sequences and profile matrices are described using this framework. A maximum likelihood approach for determining this linear separation leads to a novel matrix type, called the binding matrix. The binding matrix is the most specific matrix based classifier which is consistent with the input set of known binding words. It achieves significant improvements in specificity compared to other matrices. This is demonstrated using 95 sets of experimentally determined binding words provided by the TRANSFAC database.


Asunto(s)
Algoritmos , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , ADN/química , ADN/genética , Alineación de Secuencia/métodos , Análisis de Secuencia/métodos , Sitios de Unión/genética , Funciones de Verosimilitud , Unión Proteica/genética , Factores de Transcripción/química , Factores de Transcripción/genética
7.
Bioinformatics ; 22(2): 181-7, 2006 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-16267083

RESUMEN

MOTIVATION: The prediction of protein domains is a crucial task for functional classification, homology-based structure prediction and structural genomics. In this paper, we present the SSEP-Domain protein domain prediction approach, which is based on the application of secondary structure element alignment (SSEA) and profile-profile alignment (PPA) in combination with InterPro pattern searches. SSEA allows rapid screening for potential domain regions while PPA provides us with the necessary specificity for selecting significant hits. The combination with InterPro patterns allows finding domain regions without solved structural templates if sequence family definitions exist. RESULTS: A preliminary version of SSEP-Domain was ranked among the top-performing domain prediction servers in the CASP 6 and CAFASP 4 experiments. Evaluation of the final version shows further improvement over these results together with a significant speed-up. AVAILABILITY: The server is available at http://www.bio.ifi.lmu.de/SSEP/


Asunto(s)
Algoritmos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteínas/análisis , Proteínas/clasificación , Homología de Secuencia de Aminoácido
8.
Bioinformatics ; 21(24): 4425-6, 2005 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-16216828

RESUMEN

SUMMARY: Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.


Asunto(s)
Proteínas/química , Proteínas/genética , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Biología Computacional , Bases de Datos de Proteínas , Estructura Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA