Búsqueda | Portal de Búsqueda de la BVS España

Automatic extraction of protein point mutations using a graph bigram association.

Lee, Lawrence C; Horn, Florence; Cohen, Fred E.

PLoS Comput Biol ; 3(2): e16, 2007 Feb 02.

Artículo en Inglés | MEDLINE | ID: mdl-17274683

RESUMEN

Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are described in the peer-reviewed published literature. We describe an application, Mutation GraB (Graph Bigram), that identifies, extracts, and verifies point mutations from biomedical literature. The principal problem of point mutation extraction is to link the point mutation with its associated protein and organism of origin. Our algorithm uses a graph-based bigram traversal to identify these relevant associations and exploits the Swiss-Prot protein database to verify this information. The graph bigram method is different from other models for point mutation extraction in that it incorporates frequency and positional data of all terms in an article to drive the point mutation-protein association. Our method was tested on 589 articles describing point mutations from the G protein-coupled receptor (GPCR), tyrosine kinase, and ion channel protein families. We evaluated our graph bigram metric against a word-proximity metric for term association on datasets of full-text literature in these three different protein families. Our testing shows that the graph bigram metric achieves a higher F-measure for the GPCRs (0.79 versus 0.76), protein tyrosine kinases (0.72 versus 0.69), and ion channel transporters (0.76 versus 0.74). Importantly, in situations where more than one protein can be assigned to a point mutation and disambiguation is required, the graph bigram metric achieves a precision of 0.84 compared with the word distance metric precision of 0.73. We believe the graph bigram search metric to be a significant improvement over previous search metrics for point mutation extraction and to be applicable to text-mining application requiring the association of words.

Asunto(s)

Algoritmos , Inteligencia Artificial , Bases de Datos de Proteínas , Mutación , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido

Phosphorylation and intramolecular stabilization of the ligand binding domain in the nuclear receptor steroidogenic factor 1.

Desclozeaux, Marion; Krylova, Irina N; Horn, Florence; Fletterick, Robert J; Ingraham, Holly A.

Mol Cell Biol ; 22(20): 7193-203, 2002 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-12242296

RESUMEN

Steroidogenic factor 1 (SF-1) is an orphan nuclear receptor with no known ligand. We showed previously that phosphorylation at serine 203 located N'-terminal to the ligand binding domain (LBD) enhanced cofactor recruitment, analogous to the ligand-mediated recruitment in ligand-dependent receptors. In this study, results of biochemical analyses and an LBD helix assembly assay suggest that the SF-1 LBD adopts an active conformation, with helices 1 and 12 packed against the predicted alpha-helical bundle, in the apparent absence of ligand. Fine mapping of the previously defined proximal activation function in SF-1 showed that the activation function mapped fully to helix 1 of the LBD. Limited proteolyses demonstrate that phosphorylation of S203 in the hinge region mimics the stabilizing effects of ligand on the LBD. Moreover, similar effects were observed in an SF-1/thyroid hormone LBD chimera receptor, illustrating that the S203 phosphorylation effects are transferable to a heterologous ligand-dependent receptor. Our collective data suggest that the hinge together with helix 1 is an individualized specific motif, which is tightly associated with its cognate LBD. For SF-1, we find that this intramolecular association and hence receptor activity are further enhanced by mitogen-activated protein kinase phosphorylation, thus mimicking many of the ligand-induced changes observed for ligand-dependent receptors.

Asunto(s)

Proteínas de Unión al ADN/metabolismo , Secuencias Hélice-Asa-Hélice , Receptores de Hormona Tiroidea/metabolismo , Factores de Transcripción/metabolismo , Células 3T3 , Secuencia de Aminoácidos , Animales , Sitios de Unión , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Factores de Transcripción Fushi Tarazu , Proteínas de Homeodominio , Humanos , Ligandos , Ratones , Proteínas Quinasas Activadas por Mitógenos/metabolismo , Modelos Moleculares , Datos de Secuencia Molecular , Fosforilación , Estructura Terciaria de Proteína , Receptores Citoplasmáticos y Nucleares , Receptores de Hormona Tiroidea/química , Receptores de Hormona Tiroidea/genética , Proteínas Recombinantes de Fusión/química , Proteínas Recombinantes de Fusión/genética , Proteínas Recombinantes de Fusión/metabolismo , Homología de Secuencia de Aminoácido , Factor Esteroidogénico 1 , Receptores beta de Hormona Tiroidea , Factores de Transcripción/química , Factores de Transcripción/genética

GRIS: glycoprotein-hormone receptor information system.

Van Durme, Joost; Horn, Florence; Costagliola, Sabine; Vriend, Gert; Vassart, Gilbert.

Mol Endocrinol ; 20(9): 2247-55, 2006 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-16543405

RESUMEN

The glycoprotein-hormone receptor information system (GRIS) presents a comprehensive view on all available molecular data for the lutropin/choriogonadotropin receptor, follitropin receptor, and thyrotropin receptor G protein-coupled receptors. It features a mutation database presently containing 696 point mutations, combined with all sequences and the associated homology models. The mutation information was automatically extracted from the literature and manually augmented with respect to constitutivity, surface expression, sensitivity to hormones, and binding affinity. All information in this integrated system is presented in a G protein-coupled receptor specialist-friendly way. A series of interactive tools such as rotamer analysis, mutation prediction, or cavity visualization aids with the design and interpretation of experiments. A universal residue numbering system has been introduced to ease database searches as well as the use of the information in conjunction with literature data from diverse origins. Users can upload new mutations. GRIS is freely accessible at http://gris.ulb.ac.be/.

Asunto(s)

Biología Computacional , Glicoproteínas/química , Glicoproteínas/metabolismo , Hormonas/química , Hormonas/metabolismo , Secuencia de Aminoácidos , Animales , Glicoproteínas/genética , Humanos , Ligandos , Modelos Moleculares , Datos de Secuencia Molecular , Mutación/genética , Unión Proteica , Estructura Terciaria de Proteína , Alineación de Secuencia

NRSAS: Nuclear Receptor Structure Analysis Servers.

Bettler, Emmanuel; Krause, Roland; Horn, Florence; Vriend, Gert.

Nucleic Acids Res ; 31(13): 3400-3, 2003 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-12824335

RESUMEN

We present a coherent series of servers that can perform a large number of structure analyses on nuclear hormone receptors. These servers are part of the NucleaRDB project, which provides a powerful information system for nuclear hormone receptors. The computations performed by the servers include homology modelling, structure validation, calculating contacts, accessibility values, hydrogen bonding patterns, predicting mutations and a host of two- and three-dimensional visualisations. The Nuclear Receptor Structure Analysis Servers (NRSAS) are freely accessible at http://www.cmbi.kun.nl/NR/servers/html/ and in-house copies can be obtained upon request.

Asunto(s)

Hormonas , Receptores Citoplasmáticos y Nucleares/química , Enlace de Hidrógeno , Internet , Modelos Moleculares , Mutación , Conformación Proteica , Receptores Citoplasmáticos y Nucleares/genética , Programas Informáticos , Homología Estructural de Proteína

GPCRDB information system for G protein-coupled receptors.

Horn, Florence; Bettler, Emmanuel; Oliveira, Laerte; Campagne, Fabien; Cohen, Fred E; Vriend, Gerrit.

Nucleic Acids Res ; 31(1): 294-7, 2003 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-12520006

RESUMEN

The GPCRDB is a molecular class-specific information system that collects, combines, validates and disseminates heterogeneous data on G protein-coupled receptors (GPCRs). The database stores data on sequences, ligand binding constants and mutations. The system also provides computationally derived data such as sequence alignments, homology models, and a series of query and visualization tools. The GPCRDB is updated automatically once every 4-5 months and is freely accessible at http://www.gpcr.org/7tm/.

Asunto(s)

Bases de Datos de Proteínas , Receptores de Superficie Celular , Secuencia de Aminoácidos , Biología Computacional , Proteínas de Unión al GTP Heterotriméricas/metabolismo , Humanos , Sistemas de Información , Ligandos , Modelos Moleculares , Mutación , Receptores de Superficie Celular/química , Receptores de Superficie Celular/genética , Receptores de Superficie Celular/metabolismo , Alineación de Secuencia

NRMD: Nuclear Receptor Mutation Database.

Van Durme, Joost J J; Bettler, Emmanuel; Folkertsma, Simon; Horn, Florence; Vriend, Gert.

Nucleic Acids Res ; 31(1): 331-3, 2003 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-12520016

RESUMEN

The NRMD is a database for nuclear receptor mutation information. It includes mutation information from SWISS-PROT/TrEMBL, several web-based mutation data resources, and data extracted from the literature in a fully automatic manner. Because it is also possible to add mutations manually, a hundred mutations were added for completeness. At present, the NRMD contains information about 893 mutations in 54 nuclear receptors. A common numbering scheme for all nuclear receptors eases the use of the information for many kinds of studies. The NRMD is freely available to academia and industry as a stand-alone version at: www.receptors.org/NR/.

Asunto(s)

Bases de Datos de Ácidos Nucleicos , Mutación , Receptores Citoplasmáticos y Nucleares/genética , Animales

A family-based approach reveals the function of residues in the nuclear receptor ligand-binding domain.

Folkertsma, Simon; van Noort, Paula; Van Durme, Joost; Joosten, Henk-Jan; Bettler, Emmanuel; Fleuren, Wilco; Oliveira, Laerte; Horn, Florence; de Vlieg, Jacob; Vriend, Gerrit.

J Mol Biol ; 341(2): 321-35, 2004 Aug 06.

Artículo en Inglés | MEDLINE | ID: mdl-15276826

RESUMEN

Literature studies, 3D structure data, and a series of sequence analysis techniques were combined to reveal important residues in the structure and function of the ligand-binding domain of nuclear hormone receptors. A structure-based multiple sequence alignment allowed for the seamless combination of data from many different studies on different receptors into one single functional model. It was recently shown that a combined analysis of sequence entropy and variability can divide residues in five classes; (1) the main function or active site, (2) support for the main function, (3) signal transduction, (4) modulator or ligand binding and (5) the rest. Mutation data extracted from the literature and intermolecular contacts observed in nuclear receptor structures were analyzed in view of this classification and showed that the main function or active site residues of the nuclear receptor ligand-binding domain are involved in cofactor recruitment. Furthermore, the sequence entropy-variability analysis identified the presence of signal transduction residues that are located between the ligand, cofactor and dimer sites, suggesting communication between these regulatory binding sites. Experimental and computational results agreed well for most residues for which mutation data and intermolecular contact data were available. This allows us to predict the role of the residues for which no functional data is available yet. This study illustrates the power of family-based approaches towards the analysis of protein function, and it points out the problems and possibilities presented by the massive amounts of data that are becoming available in the "omics era". The results shed light on the nuclear receptor family that is involved in processes ranging from cancer to infertility, and that is one of the more important targets in the pharmaceutical industry.

Asunto(s)

Aminoácidos/química , Familia de Multigenes/fisiología , Mutación , Conformación Proteica , Receptores Citoplasmáticos y Nucleares/química , Aminoácidos/metabolismo , Sitios de Unión , Entropía , Humanos , Ligandos , Modelos Moleculares , Unión Proteica , Receptores Citoplasmáticos y Nucleares/metabolismo , Alineación de Secuencia , Transducción de Señal

Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors.

Horn, Florence; Lau, Anthony L; Cohen, Fred E.

Bioinformatics ; 20(4): 557-68, 2004 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-14990452

RESUMEN

MOTIVATION: The amount of genomic and proteomic data that is published daily in the scientific literature is outstripping the ability of experimental scientists to stay current. Reviews, the traditional medium for collating published observations, are also unable to keep pace. For some specific classes of information (e.g. sequences and protein structures), obligatory data deposition policies have helped. However, a great deal of other valuable information is spread throughout the literature hindering coherent access. We are involved in the Molecular Class-Specific Information System (MCSIS) project, a collaborative effort to design and automate the maintenance of protein family databases. The first two databases, the GPCRDB and NucleaRDB, are focused on G protein-coupled receptors (GPCRs) and nuclear hormone receptors (NRs), respectively. The main aim of the MCSIS project is to gather heterogeneous data from across a variety of electronic and literature sources in order to draw new inferences about the target protein families. RESULTS: We present a computational method that identifies and extracts mutation data from the scientific literature. We focused on the extraction of single point mutations for the GPCR and NR superfamilies. After validation by plausibility filters, the mutation data is integrated into the corresponding MCSIS where it is combined with structural and sequence information already stored in these databases. We extracted and validated 2736 true point mutations from 914 articles on GPCRs and 785 true point mutations from 1094 articles on NRs. The current version of our automated extraction algorithm identifies 49.3% of the GPCR point mutations with a specificity of 87.9%, and 64.5% of the NR point mutations with a specificity of 85.8%. MuteXt routinely analyzes 100 electronic articles in approximately 1 h.

Asunto(s)

Algoritmos , Sistemas de Administración de Bases de Datos , Bases de Datos Bibliográficas , Almacenamiento y Recuperación de la Información/métodos , Mutación , Publicaciones Periódicas como Asunto , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Sustitución de Aminoácidos , Bases de Datos de Proteínas , MEDLINE , Procesamiento de Lenguaje Natural , Proteínas/clasificación , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/clasificación , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Alineación de Secuencia/métodos , Programas Informáticos , Validación de Programas de Computación

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA