Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Res Sq ; 2024 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-38260496

RESUMEN

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

2.
medRxiv ; 2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-38076790

RESUMEN

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in a cohort of developmental disorder cases, including evidence of 119 novel genetic disorders without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable. Interactive web viewer and downloads available at pop.evemodel.org.

3.
Science ; 380(6640): 93-101, 2023 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-36926954

RESUMEN

Although most cancer drugs modulate the activities of cellular pathways by changing posttranslational modifications (PTMs), little is known regarding the extent and the time- and dose-response characteristics of drug-regulated PTMs. In this work, we introduce a proteomic assay called decryptM that quantifies drug-PTM modulation for thousands of PTMs in cells to shed light on target engagement and drug mechanism of action. Examples range from detecting DNA damage by chemotherapeutics, to identifying drug-specific PTM signatures of kinase inhibitors, to demonstrating that rituximab kills CD20-positive B cells by overactivating B cell receptor signaling. DecryptM profiling of 31 cancer drugs in 13 cell lines demonstrates the broad applicability of the approach. The resulting 1.8 million dose-response curves are provided as an interactive molecular resource in ProteomicsDB.


Asunto(s)
Antineoplásicos , Apoptosis , Procesamiento Proteico-Postraduccional , Proteómica , Antígenos CD20/metabolismo , Antineoplásicos/farmacología , Apoptosis/efectos de los fármacos , Linfocitos B/efectos de los fármacos , Línea Celular Tumoral , Daño del ADN , Procesamiento Proteico-Postraduccional/efectos de los fármacos , Proteómica/métodos , Receptores de Antígenos de Linfocitos B/metabolismo , Transducción de Señal , Humanos
4.
Nat Methods ; 17(5): 495-503, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32284610

RESUMEN

We have used a mass spectrometry-based proteomic approach to compile an atlas of the thermal stability of 48,000 proteins across 13 species ranging from archaea to humans and covering melting temperatures of 30-90 °C. Protein sequence, composition and size affect thermal stability in prokaryotes and eukaryotic proteins show a nonlinear relationship between the degree of disordered protein structure and thermal stability. The data indicate that evolutionary conservation of protein complexes is reflected by similar thermal stability of their proteins, and we show examples in which genomic alterations can affect thermal stability. Proteins of the respiratory chain were found to be very stable in many organisms, and human mitochondria showed close to normal respiration at 46 °C. We also noted cell-type-specific effects that can affect protein stability or the efficacy of drugs. This meltome atlas broadly defines the proteome amenable to thermal profiling in biology and drug discovery and can be explored online at http://meltomeatlas.proteomics.wzw.tum.de:5003/ and http://www.proteomicsdb.org.


Asunto(s)
Regulación de la Expresión Génica , Células Procariotas/metabolismo , Proteínas/química , Proteínas/metabolismo , Proteoma/análisis , Temperatura de Transición , Animales , Proteínas del Complejo de Cadena de Transporte de Electrón/metabolismo , Humanos , Mitocondrias/metabolismo , Estabilidad Proteica , Programas Informáticos , Especificidad de la Especie
5.
Orthopade ; 49(2): 183-189, 2020 Feb.
Artículo en Alemán | MEDLINE | ID: mdl-31919555

RESUMEN

BACKGROUND: There are case descriptions of pronounced peri-implant inflammatory reactions and necrosis in non-infectious knee joint replacements with metal-polyethylene pairing. OBJECTIVES: Due to the histopathological similarities to the dysfunctional metal-on-metal (MoM) hip joint replacement, MoM-like reactions in knee joint arthroplasty ("ARMD-KEP") are proposed and a histopathological comparison is made. MATERIALS AND METHODS: This analysis evaluates five cases of "ARMD-KEP" using: (1) the SLIM consensus classification, (2) the particle algorithm, (3) the CD3 focus score and (4) the AVAL score. The comparison groups consist of 11 adverse cases of MoM hip and 20 cases of knee joint arthroplasty without adverse reaction. RESULTS: The ARMD-KEP cases were identified as SLIM type VI. Their median ALVAL score was 10. The CD3 focus score confirmed an adverse reaction. Particle corrosion was found in two of five cases. CONCLUSIONS: This data indicates that, in rare cases, an adverse MoM-like reaction may be present in knee replacements, with inflammatory and immunological expression similar to that of the adverse MoM reaction in the hip. The pathomechanisms can be discussed as follows: (1) secondary metal-metal contact, (2) dysfunctional loading of the coupling mechanism and (3) corrosion of the metal components. Much like trunnionosis in the hip, the term "hingiosis" is proposed for corrosion phenomena in dysfunctional conditions of coupled knee endoprosthetic systems.


Asunto(s)
Artroplastia de Reemplazo de Rodilla , Prótesis Articulares de Metal sobre Metal , Falla de Prótesis , Humanos , Polietileno , Diseño de Prótesis , Reoperación
6.
Mol Syst Biol ; 15(2): e8503, 2019 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-30777892

RESUMEN

Genome-, transcriptome- and proteome-wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating to which human proteins exist, where they are expressed and in which quantities are not fully understood. Therefore, we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein-level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNAs, that few proteins show tissue-specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation.


Asunto(s)
Genoma Humano/genética , Proteoma/genética , Distribución Tisular/genética , Transcriptoma/genética , Regulación de la Expresión Génica/genética , Humanos , Espectrometría de Masas/métodos , Proteómica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos
7.
Mol Syst Biol ; 15(2): e8513, 2019 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-30777893

RESUMEN

Despite their importance in determining protein abundance, a comprehensive catalogue of sequence features controlling protein-to-mRNA (PTR) ratios and a quantification of their effects are still lacking. Here, we quantified PTR ratios for 11,575 proteins across 29 human tissues using matched transcriptomes and proteomes. We estimated by regression the contribution of known sequence determinants of protein synthesis and degradation in addition to 45 mRNA and 3 protein sequence motifs that we found by association testing. While PTR ratios span more than 2 orders of magnitude, our integrative model predicts PTR ratios at a median precision of 3.2-fold. A reporter assay provided functional support for two novel UTR motifs, and an immobilized mRNA affinity competition-binding assay identified motif-specific bound proteins for one motif. Moreover, our integrative model led to a new metric of codon optimality that captures the effects of codon frequency on protein synthesis and degradation. Altogether, this study shows that a large fraction of PTR ratio variation in human tissues can be predicted from sequence, and it identifies many new candidate post-transcriptional regulatory elements.


Asunto(s)
Proteínas/genética , Proteoma/genética , Distribución Tisular/genética , Transcriptoma/genética , Regulación de la Expresión Génica/genética , Genoma Humano/genética , Humanos , Espectrometría de Masas/métodos , Proteómica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos
8.
Bioinformatics ; 35(9): 1582-1584, 2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-30304492

RESUMEN

SUMMARY: Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users. AVAILABILITY AND IMPLEMENTATION: https://github.com/debbiemarkslab/evcouplings.


Asunto(s)
Análisis de Secuencia , Programas Informáticos , Proteínas , ARN , Alineación de Secuencia
10.
Mol Cell ; 71(1): 178-190.e8, 2018 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-29979965

RESUMEN

The TP53 gene is frequently mutated in human cancer. Research has focused predominantly on six major "hotspot" codons, which account for only ∼30% of cancer-associated p53 mutations. To comprehensively characterize the consequences of the p53 mutation spectrum, we created a synthetically designed library and measured the functional impact of ∼10,000 DNA-binding domain (DBD) p53 variants in human cells in culture and in vivo. Our results highlight the differential outcome of distinct p53 mutations in human patients and elucidate the selective pressure driving p53 conservation throughout evolution. Furthermore, while loss of anti-proliferative functionality largely correlates with the occurrence of cancer-associated p53 mutations, we observe that selective gain-of-function may further favor particular mutants in vivo. Finally, when combined with additional acquired p53 mutations, seemingly neutral TP53 SNPs may modulate phenotypic outcome and, presumably, tumor progression.


Asunto(s)
Evolución Molecular , Biblioteca de Genes , Mutación , Neoplasias/genética , Proteína p53 Supresora de Tumor/genética , Animales , Células HEK293 , Humanos , Ratones , Ratones Desnudos , Neoplasias/metabolismo , Polimorfismo de Nucleótido Simple , Dominios Proteicos , Proteína p53 Supresora de Tumor/metabolismo
11.
Proteins ; 86(10): 1064-1074, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30020551

RESUMEN

Binding small ligands such as ions or macromolecules such as DNA, RNA, and other proteins is one important aspect of the molecular function of proteins. Many binding sites remain without experimental annotations. Predicting binding sites on a per-residue level is challenging, but if 3D structures are known, information about coevolving residue pairs (evolutionary couplings) can predict catalytic residues through mutual information. Here, we predicted protein binding sites from evolutionary couplings derived from a global statistical model using maximum entropy. Additionally, we included information from sequence variation. A simple method using a weighted sum over eight scores substantially outperformed random (F1 = 19.3% ± 0.7% vs F1 = 2% for random). Training a neural network on these eight scores (along with predicted solvent accessibility and conservation in protein families) improved substantially (F1 = 26.2% ±0.8%). Although the machine learning was limited by the small data set and possibly wrong annotations of binding sites, the predicted binding sites formed spatial clusters in the protein. The source code of the binding site predictions is available through GitHub: https://github.com/Rostlab/bindPredict.


Asunto(s)
Evolución Molecular , Proteínas/química , Sitios de Unión , Evolución Biológica , ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Bases de Datos de Proteínas , Entropía , Variación Genética , Humanos , Aprendizaje Automático , Modelos Biológicos , Modelos Moleculares , Redes Neurales de la Computación , Unión Proteica , Proteínas/genética , Proteínas/metabolismo
12.
PLoS Comput Biol ; 14(3): e1005983, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29499035

RESUMEN

Immunogenicity is a major problem during the development of biotherapeutics since it can lead to rapid clearance of the drug and adverse reactions. The challenge for biotherapeutic design is therefore to identify mutants of the protein sequence that minimize immunogenicity in a target population whilst retaining pharmaceutical activity and protein function. Current approaches are moderately successful in designing sequences with reduced immunogenicity, but do not account for the varying frequencies of different human leucocyte antigen alleles in a specific population and in addition, since many designs are non-functional, require costly experimental post-screening. Here, we report a new method for de-immunization design using multi-objective combinatorial optimization. The method simultaneously optimizes the likelihood of a functional protein sequence at the same time as minimizing its immunogenicity tailored to a target population. We bypass the need for three-dimensional protein structure or molecular simulations to identify functional designs by automatically generating sequences using probabilistic models that have been used previously for mutation effect prediction and structure prediction. As proof-of-principle we designed sequences of the C2 domain of Factor VIII and tested them experimentally, resulting in a good correlation with the predicted immunogenicity of our model.


Asunto(s)
Anticuerpos Monoclonales/inmunología , Anticuerpos Monoclonales/uso terapéutico , Biología Computacional/métodos , Ingeniería de Proteínas/métodos , Proteínas Recombinantes/inmunología , Proteínas Recombinantes/uso terapéutico , Secuencia de Aminoácidos , Anticuerpos Monoclonales/química , Anticuerpos Monoclonales/genética , Humanos , Proteínas Recombinantes/química , Proteínas Recombinantes/genética
13.
Nature ; 556(7699): 118-121, 2018 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-29590088

RESUMEN

The shape, elongation, division and sporulation (SEDS) proteins are a large family of ubiquitous and essential transmembrane enzymes with critical roles in bacterial cell wall biology. The exact function of SEDS proteins was for a long time poorly understood, but recent work has revealed that the prototypical SEDS family member RodA is a peptidoglycan polymerase-a role previously attributed exclusively to members of the penicillin-binding protein family. This discovery has made RodA and other SEDS proteins promising targets for the development of next-generation antibiotics. However, little is known regarding the molecular basis of SEDS activity, and no structural data are available for RodA or any homologue thereof. Here we report the crystal structure of Thermus thermophilus RodA at a resolution of 2.9 Å, determined using evolutionary covariance-based fold prediction to enable molecular replacement. The structure reveals a ten-pass transmembrane fold with large extracellular loops, one of which is partially disordered. The protein contains a highly conserved cavity in the transmembrane domain, reminiscent of ligand-binding sites in transmembrane receptors. Mutagenesis experiments in Bacillus subtilis and Escherichia coli show that perturbation of this cavity abolishes RodA function both in vitro and in vivo, indicating that this cavity is catalytically essential. These results provide a framework for understanding bacterial cell wall synthesis and SEDS protein function.


Asunto(s)
Cristalografía por Rayos X/métodos , Nucleotidiltransferasas/química , Peptidoglicano/metabolismo , Thermus thermophilus/enzimología , Bacillus subtilis/genética , Biocatálisis , Pared Celular/enzimología , Pared Celular/metabolismo , Escherichia coli/genética , Modelos Moleculares , Nucleotidiltransferasas/metabolismo , Dominios Proteicos , Pliegue de Proteína , Relación Estructura-Actividad , Thermus thermophilus/genética
14.
Nat Biotechnol ; 35(2): 128-135, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-28092658

RESUMEN

Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for ∼7,000 human proteins at http://evmutation.org/.


Asunto(s)
Secuencia Conservada/genética , Análisis Mutacional de ADN/métodos , Epistasis Genética/genética , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteoma/genética , Secuencia de Aminoácidos/genética , Evolución Molecular , Humanos , Datos de Secuencia Molecular , Mutación/genética , Proteoma/química
15.
Cell ; 167(1): 158-170.e12, 2016 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-27662088

RESUMEN

Protein flexibility ranges from simple hinge movements to functional disorder. Around half of all human proteins contain apparently disordered regions with little 3D or functional information, and many of these proteins are associated with disease. Building on the evolutionary couplings approach previously successful in predicting 3D states of ordered proteins and RNA, we developed a method to predict the potential for ordered states for all apparently disordered proteins with sufficiently rich evolutionary information. The approach is highly accurate (79%) for residue interactions as tested in more than 60 known disordered regions captured in a bound or specific condition. Assessing the potential for structure of more than 1,000 apparently disordered regions of human proteins reveals a continuum of structural order with at least 50% with clear propensity for three- or two-dimensional states. Co-evolutionary constraints reveal hitherto unseen structures of functional importance in apparently disordered proteins.


Asunto(s)
Proteínas Intrínsecamente Desordenadas/química , Evolución Molecular Dirigida/métodos , Genómica , Humanos , Proteínas Intrínsecamente Desordenadas/genética , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteoma/química , Proteoma/genética
16.
Nat Methods ; 12(8): 751-4, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26121406

RESUMEN

Accurate determination of protein structure by NMR spectroscopy is challenging for larger proteins, for which experimental data are often incomplete and ambiguous. Evolutionary sequence information together with advances in maximum entropy statistical methods provide a rich complementary source of structural constraints. We have developed a hybrid approach (evolutionary coupling-NMR spectroscopy; EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings and demonstrate accurate structure determination for several proteins 6-41 kDa in size.


Asunto(s)
Biología Computacional/métodos , Espectroscopía de Resonancia Magnética/métodos , Proteínas/química , Algoritmos , Cristalografía por Rayos X , Evolución Molecular , Humanos , Hidrodinámica , Imagenología Tridimensional , Modelos Estadísticos , Conformación Molecular , Conformación Proteica , Proteínas Proto-Oncogénicas/química , Proteínas Proto-Oncogénicas p21(ras) , Proteínas ras/química
17.
Nat Commun ; 6: 6077, 2015 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-25584517

RESUMEN

Insect odorant receptors (ORs) comprise an enormous protein family that translates environmental chemical signals into neuronal electrical activity. These heptahelical receptors are proposed to function as ligand-gated ion channels and/or to act metabotropically as G protein-coupled receptors (GPCRs). Resolving their signalling mechanism has been hampered by the lack of tertiary structural information and primary sequence similarity to other proteins. We use amino acid evolutionary covariation across these ORs to define restraints on structural proximity of residue pairs, which permit de novo generation of three-dimensional models. The validity of our analysis is supported by the location of functionally important residues in highly constrained regions of the protein. Importantly, insect OR models exhibit a distinct transmembrane domain packing arrangement to that of canonical GPCRs, establishing the structural unrelatedness of these receptor families. The evolutionary couplings and models predict odour binding and ion conduction domains, and provide a template for rationale structure-activity dissection.


Asunto(s)
Aminoácidos/química , Evolución Molecular , Insectos/metabolismo , Receptores Odorantes/química , Aminoácidos/genética , Animales , Receptores Odorantes/genética , Xenopus
18.
Elife ; 32014 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-25255213

RESUMEN

Protein-protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein-protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein-protein interaction networks and used for interaction predictions at residue resolution.


Asunto(s)
Proteínas de Escherichia coli/química , Escherichia coli/genética , Genoma Bacteriano , Mapeo de Interacción de Proteínas , Bases de Datos de Proteínas , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Evolución Molecular , Expresión Génica , Redes Reguladoras de Genes , Modelos Moleculares , Unión Proteica , Conformación Proteica
19.
BMC Bioinformatics ; 15: 85, 2014 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-24669753

RESUMEN

BACKGROUND: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. RESULTS: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. CONCLUSIONS: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Conformación Proteica , Proteínas/genética
20.
BMC Bioinformatics ; 14 Suppl 3: S7, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23514582

RESUMEN

BACKGROUND: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. METHODS: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. RESULTS AND CONCLUSIONS: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.


Asunto(s)
Proteínas/fisiología , Homología de Secuencia de Aminoácido , Algoritmos , Proteínas/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...