Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38213002

RESUMO

MOTIVATION: methyLImp, a method we recently introduced for the missing value estimation of DNA methylation data, has demonstrated competitive performance in data imputation compared to the existing, general-purpose, approaches. However, imputation running time was considerably long and unfeasible in case of large datasets with numerous missing values. RESULTS: methyLImp2 made possible computations that were previously unfeasible. We achieved this by introducing two important modifications that have significantly reduced the original running time without sacrificing prediction performance. First, we implemented a chromosome-wise parallel version of methyLImp. This parallelization reduced the runtime by several 10-fold in our experiments. Then, to handle large datasets, we also introduced a mini-batch approach that uses only a subset of the samples for the imputation. Thus, it further reduces the running time from days to hours or even minutes in large datasets. AVAILABILITY AND IMPLEMENTATION: The R package methyLImp2 is under review for Bioconductor. It is currently freely available on Github https://github.com/annaplaksienko/methyLImp2.


Assuntos
Biologia Computacional , Metilação de DNA
2.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35794713

RESUMO

In recent years there has been a widespread interest in researching biomarkers of aging that could predict physiological vulnerability better than chronological age. Aging, in fact, is one of the most relevant risk factors for a wide range of maladies, and molecular surrogates of this phenotype could enable better patients stratification. Among the most promising of such biomarkers is DNA methylation-based biological age. Given the potential and variety of computational implementations (epigenetic clocks), we here present a systematic review of such clocks. Furthermore, we provide a large-scale performance comparison across different tissues and diseases in terms of age prediction accuracy and age acceleration, a measure of deviance from physiology. Our analysis offers both a state-of-the-art overview of the computational techniques developed so far and a heterogeneous picture of performances, which can be helpful in orienting future research.


Assuntos
Metilação de DNA , Epigênese Genética , Biomarcadores , Epigenômica/métodos
3.
Nucleic Acids Res ; 49(W1): W199-W206, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-34038548

RESUMO

Methylage is an epigenetic marker of biological age that exploits the correlation between the methylation state of specific CG dinucleotides (CpGs) and chronological age (in years), gestational age (in weeks), cellular age (in cell cycles or as telomere length, in kilobases). Using DNA methylation data, methylage is measurable via the so called epigenetic clocks. Importantly, alterations of the correlation between methylage and age (age acceleration or deceleration) have been stably associated with pathological states and occur long before clinical signs of diseases become overt, making epigenetic clocks a potentially disruptive tool in preventive, diagnostic and also in forensic applications. Nevertheless, methylage dependency from CpGs selection, mathematical modelling, tissue specificity and age range, still makes the potential of this biomarker limited. In order to enhance model comparisons, interchange, availability, robustness and standardization, we organized a selected set of clocks within a hub webservice, EstimAge (Estimate of methylation Age, http://estimage.iac.rm.cnr.it), which intuitively and informatively enables quick identification, computation and comparison of available clocks, with the support of standard statistics.


Assuntos
Metilação de DNA , Software , Ilhas de CpG , Epigênese Genética , Internet , Fatores de Tempo
4.
Bioinformatics ; 37(4): 506-513, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32976564

RESUMO

MOTIVATION: Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue-residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. RESULTS: Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. AVAILABILITY AND IMPLEMENTATION: The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Algoritmos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Alinhamento de Sequência
5.
J Proteome Res ; 19(7): 2873-2878, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31971806

RESUMO

Omics techniques provide a spectrum of information at the genomic level, whose analysis can characterize complex traits at a molecular level. The relationship among genotype and phenotype implies that from genome information the molecular pathways and biological processes underlying a given phenotype are discovered. In dealing with this problem, gene enrichment analysis has become the most widely adopted strategy. Here we present NETGE-PLUS, a Web server for standard and network-based functional interpretation of gene sets of human and of model organisms, including Sus scrofa, Saccharomyces cerevisiae, Escherichia coli, and Arabidopsis thaliana. NETGE-PLUS enables the functional enrichment of both simple and ranked lists of genes, introducing also the possibility of exploring relationships among KEGG pathways. A Web interface makes data retrieval complete and user-friendly. NETGE-PLUS is publicly available at http://net-ge2.biocomp.unibo.it.


Assuntos
Arabidopsis , Software , Arabidopsis/genética , Bases de Dados Genéticas , Genômica , Humanos , Armazenamento e Recuperação da Informação , Internet , Probabilidade
6.
Bioinformatics ; 35(19): 3786-3793, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30796811

RESUMO

MOTIVATION: DNA methylation is a stable epigenetic mark with major implications in both physiological (development, aging) and pathological conditions (cancers and numerous diseases). Recent research involving methylation focuses on the development of molecular age estimation methods based on DNA methylation levels (mAge). An increasing number of studies indicate that divergences between mAge and chronological age may be associated to age-related diseases. Current advances in high-throughput technologies have allowed the characterization of DNA methylation levels throughout the human genome. However, experimental methylation profiles often contain multiple missing values that can affect the analysis of the data and also mAge estimation. Although several imputation methods exist, a major deficiency lies in the inability to cope with large datasets, such as DNA methylation chips. Specific methods for imputing missing methylation data are therefore needed. RESULTS: We present a simple and computationally efficient imputation method, metyhLImp, based on linear regression. The rationale of the approach lies in the observation that methylation levels show a high degree of inter-sample correlation. We performed a comparative study of our approach with other imputation methods on DNA methylation data of healthy and disease samples from different tissues. Performances have been assessed both in terms of imputation accuracy and in terms of the impact imputed values have on mAge estimation. In comparison to existing methods, our linear regression model proves to perform equally or better and with good computational efficiency. The results of our analysis provide recommendations for accurate estimation of missing methylation values. AVAILABILITY AND IMPLEMENTATION: The R-package methyLImp is freely available at https://github.com/pdilena/methyLImp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Epigenômica , Humanos , Modelos Lineares , Análise de Sequência com Séries de Oligonucleotídeos , Projetos de Pesquisa
7.
Bioinformatics ; 32(22): 3489-3491, 2016 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-27485441

RESUMO

MOTIVATION: Gene enrichment is a requisite for the interpretation of biological complexity related to specific molecular pathways and biological processes. Furthermore, when interpreting NGS data and human variations, including those related to pathologies, gene enrichment allows the inclusion of other genes that in the human interactome space may also play important key roles in the emergency of the phenotype. Here, we describe NET-GE, a web server for associating biological processes and pathways to sets of human proteins involved in the same phenotype RESULTS: NET-GE is based on protein-protein interaction networks, following the notion that for a set of proteins, the context of their specific interactions can better define their function and the processes they can be related to in the biological complexity of the cell. Our method is suited to extract statistically validated enriched terms from Gene Ontology, KEGG and REACTOME annotation databases. Furthermore, NET-GE is effective even when the number of input proteins is small. AVAILABILITY AND IMPLEMENTATION: NET-GE web server is publicly available and accessible at http://net-ge.biocomp.unibo.it/enrich CONTACT: gigi@biocomp.unibo.itSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genes , Software , Bases de Dados Factuais , Humanos , Internet , Proteínas/genética , Proteínas/metabolismo
8.
Bioinformatics ; 31(7): 1053-9, 2015 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-25429059

RESUMO

MOTIVATION: Mechanotransduction--the ability to output a biochemical signal from a mechanical input--is related to the initiation and progression of a broad spectrum of molecular events. Yet, the characterization of mechanotransduction lacks some of the most basic tools as, for instance, it can hardly be recognized by enrichment analysis tools, nor could we find any pathway representation. This greatly limits computational testing and hypothesis generation on mechanotransduction biological relevance and involvement in disease or physiological mechanisms. RESULTS: We here present a molecular map of mechanotransduction, built in CellDesigner to warrant that maximum information is embedded in a compact network format. To validate the map's necessity we tested its redundancy in comparison with existing pathways, and to estimate its sufficiency, we quantified its ability to reproduce biological events with dynamic simulations, using Signaling Petri Networks. AVAILABILITY AND IMPLEMENTATION: SMBL language map is available in the Supplementary Data: core_map.xml, basic_map.xml. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genes/genética , Mecanotransdução Celular , Redes e Vias Metabólicas , Modelos Biológicos , Transdução de Sinais/fisiologia , Software , Autoimunidade/genética , Simulação por Computador , Humanos
9.
Eukaryot Cell ; 14(11): 1114-26, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26342020

RESUMO

Candida albicans is associated with humans as both a harmless commensal organism and a pathogen. Cph2 is a transcription factor whose DNA binding domain is similar to that of mammalian sterol response element binding proteins (SREBPs). SREBPs are master regulators of cellular cholesterol levels and are highly conserved from fungi to mammals. However, ergosterol biosynthesis is regulated by the zinc finger transcription factor Upc2 in C. albicans and several other yeasts. Cph2 is not necessary for ergosterol biosynthesis but is important for colonization in the murine gastrointestinal (GI) tract. Here we demonstrate that Cph2 is a membrane-associated transcription factor that is processed to release the N-terminal DNA binding domain like SREBPs, but its cleavage is not regulated by cellular levels of ergosterol or oxygen. Chromatin immunoprecipitation sequencing (ChIP-seq) shows that Cph2 binds to the promoters of HMS1 and other components of the regulatory circuit for GI tract colonization. In addition, 50% of Cph2 targets are also bound by Hms1 and other factors of the regulatory circuit. Several common targets function at the head of the glycolysis pathway. Thus, Cph2 is an integral part of the regulatory circuit for GI colonization that regulates glycolytic flux. Transcriptome sequencing (RNA-seq) shows a significant overlap in genes differentially regulated by Cph2 and hypoxia, and Cph2 is important for optimal expression of some hypoxia-responsive genes in glycolysis and the citric acid cycle. We suggest that Cph2 and Upc2 regulate hypoxia-responsive expression in different pathways, consistent with a synthetic lethal defect of the cph2 upc2 double mutant in hypoxia.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Candida albicans/genética , Proteínas Fúngicas/genética , Regulação Fúngica da Expressão Gênica , Sequência de Bases , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Candida albicans/metabolismo , Candida albicans/patogenicidade , Proteínas Fúngicas/metabolismo , Dados de Sequência Molecular , Ligação Proteica , Elementos de Resposta , Transcriptoma , Virulência/genética
10.
BMC Bioinformatics ; 16: 346, 2015 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-26511083

RESUMO

BACKGROUND: Functional annotation of genes and gene products is a major challenge in the post-genomic era. Nowadays, gene function curation is largely based on manual assignment of Gene Ontology (GO) annotations to genes by using published literature. The annotation task is extremely time-consuming, therefore there is an increasing interest in automated tools that can assist human experts. RESULTS: Here we introduce GOTA, a GO term annotator for biomedical literature. The proposed approach makes use only of information that is readily available from public repositories and it is easily expandable to handle novel sources of information. We assess the classification capabilities of GOTA on a large benchmark set of publications. The overall performances are encouraging in comparison to the state of the art in multi-label classification over large taxonomies. Furthermore, the experimental tests provide some interesting insights into the potential improvement of automated annotation tools. CONCLUSIONS: GOTA implements a flexible and expandable model for GO annotation of biomedical literature. The current version of the GOTA tool is freely available at http://gota.apice.unibo.it.


Assuntos
Interface Usuário-Computador , Animais , Mineração de Dados , Ontologia Genética , Humanos , Internet , Anotação de Sequência Molecular
11.
BMC Genomics ; 16 Suppl 8: S6, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26110971

RESUMO

BACKGROUND: Enrichment analysis is a widely applied procedure for shedding light on the molecular mechanisms and functions at the basis of phenotypes, for enlarging the dataset of possibly related genes/proteins and for helping interpretation and prioritization of newly determined variations. Several standard and Network-based enrichment methods are available. Both approaches rely on the annotations that characterize the genes/proteins included in the input set; network based ones also include in different ways physical and functional relationships among different genes or proteins that can be extracted from the available biological networks of interactions. RESULTS: Here we describe a novel procedure based on the extraction from the STRING interactome of sub-networks connecting proteins that share the same Gene Ontology(GO) terms for Biological Process (BP). Enrichment analysis is performed by mapping the protein set to be analyzed on the sub-networks, and then by collecting the corresponding annotations. We test the ability of our enrichment method in finding annotation terms disregarded by other enrichment methods available. We benchmarked 244 sets of proteins associated to different Mendelian diseases, according to the OMIM web resource. In 143 cases (58%), the network-based procedure extracts GO terms neglected by the standard method, and in 86 cases (35%), some of the newly enriched GO terms are not included in the set of annotations characterizing the input proteins. We present in detail six cases where our network-based enrichment provides an insight into the biological basis of the diseases, outperforming other freely available network-based methods. CONCLUSIONS: Considering a set of proteins in the context of their interaction network can help in better defining their functions. Our novel method exploits the information contained in the STRING database for building the minimal connecting network containing all the proteins annotated with the same GO term. The enrichment procedure is performed considering the GO-specific network modules and, when tested on the OMIM-derived benchmark sets, it is able to extract enrichment terms neglected by other methods. Our procedure is effective even when the size of the input protein set is small, requiring at least two input proteins.


Assuntos
Fenômenos Biológicos , Bases de Dados Genéticas , Redes Reguladoras de Genes , Biologia Computacional , Humanos , Análise da Randomização Mendeliana , Proteínas/genética , Proteínas/metabolismo , Software
12.
Front Bioinform ; 4: 1306244, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38501111

RESUMO

Introduction: DNA methylation clocks presents advantageous characteristics with respect to the ambitious goal of identifying very early markers of disease, based on the concept that accelerated ageing is a reliable predictor in this sense. Methods: Such tools, being epigenomic based, are expected to be conditioned by sex and tissue specificities, and this work is about quantifying this dependency as well as that from the regression model and the size of the training set. Results: Our quantitative results indicate that elastic-net penalization is the best performing strategy, and better so when-unsurprisingly-the data set is bigger; sex does not appear to condition clocks performances and tissue specific clocks appear to perform better than generic blood clocks. Finally, when considering all trained clocks, we identified a subset of genes that, to the best of our knowledge, have not been presented yet and might deserve further investigation: CPT1A, MMP15, SHROOM3, SLIT3, and SYNGR. Conclusion: These factual starting points can be useful for the future medical translation of clocks and in particular in the debate between multi-tissue clocks, generally trained on a large majority of blood samples, and tissue-specific clocks.

13.
BMC Bioinformatics ; 14: 159, 2013 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-23672344

RESUMO

BACKGROUND: Molecular pathways represent an ensemble of interactions occurring among molecules within the cell and between cells. The identification of similarities between molecular pathways across organisms and functions has a critical role in understanding complex biological processes. For the inference of such novel information, the comparison of molecular pathways requires to account for imperfect matches (flexibility) and to efficiently handle complex network topologies. To date, these characteristics are only partially available in tools designed to compare molecular interaction maps. RESULTS: Our approach MIMO (Molecular Interaction Maps Overlap) addresses the first problem by allowing the introduction of gaps and mismatches between query and template pathways and permits -when necessary- supervised queries incorporating a priori biological information. It then addresses the second issue by relying directly on the rich graph topology described in the Systems Biology Markup Language (SBML) standard, and uses multidigraphs to efficiently handle multiple queries on biological graph databases. The algorithm has been here successfully used to highlight the contact point between various human pathways in the Reactome database. CONCLUSIONS: MIMO offers a flexible and efficient graph-matching tool for comparing complex biological pathways.


Assuntos
Redes e Vias Metabólicas , Transdução de Sinais , Software , Algoritmos , Aminoácidos/metabolismo , Ciclo do Ácido Cítrico , Gráficos por Computador , Bases de Dados Factuais , Transporte de Elétrons , Humanos , Mitose , Biologia de Sistemas , Via de Sinalização Wnt
14.
Bioinformatics ; 28(19): 2449-57, 2012 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-22847931

RESUMO

MOTIVATION: Residue-residue contact prediction is important for protein structure prediction and other applications. However, the accuracy of current contact predictors often barely exceeds 20% on long-range contacts, falling short of the level required for ab initio structure prediction. RESULTS: Here, we develop a novel machine learning approach for contact map prediction using three steps of increasing resolution. First, we use 2D recursive neural networks to predict coarse contacts and orientations between secondary structure elements. Second, we use an energy-based method to align secondary structure elements and predict contact probabilities between residues in contacting alpha-helices or strands. Third, we use a deep neural network architecture to organize and progressively refine the prediction of contacts, integrating information over both space and time. We train the architecture on a large set of non-redundant proteins and test it on a large set of non-homologous domains, as well as on the set of protein domains used for contact prediction in the two most recent CASP8 and CASP9 experiments. For long-range contacts, the accuracy of the new CMAPpro predictor is close to 30%, a significant increase over existing approaches. AVAILABILITY: CMAPpro is available as part of the SCRATCH suite at http://scratch.proteomics.ics.uci.edu/. CONTACT: pfbaldi@uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas/química , Algoritmos , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
15.
Bioinformatics ; 26(18): 2250-8, 2010 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-20610612

RESUMO

MOTIVATION: Searching for structural similarity is a key issue of protein functional annotation. The maximum contact map overlap (CMO) is one of the possible measures of protein structure similarity. Exact and approximate methods known to optimize the CMO are computationally expensive and this hampers their applicability to large-scale comparison of protein structures. RESULTS: In this article, we describe a heuristic algorithm (Al-Eigen) for finding a solution to the CMO problem. Our approach relies on the approximation of contact maps by eigendecomposition. We obtain good overlaps of two contact maps by computing the optimal global alignment of few principal eigenvectors. Our algorithm is simple, fast and its running time is independent of the amount of contacts in the map. Experimental testing indicates that the algorithm is comparable to exact CMO methods in terms of the overlap quality, to structural alignment methods in terms of structure similarity detection and it is fast enough to be suited for large-scale comparison of protein structures. Furthermore, our preliminary tests indicates that it is quite robust to noise, which makes it suitable for structural similarity detection also for noisy and incomplete contact maps. AVAILABILITY: Available at http://bioinformatics.cs.unibo.it/Al-Eigen.


Assuntos
Algoritmos , Proteínas/química , Biologia Computacional/métodos , Conformação Proteica , Proteínas/fisiologia
16.
PLoS One ; 15(3): e0229763, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32155174

RESUMO

INTRODUCTION: Meta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability. MATERIAL AND METHODS: To systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 × 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis). RESULTS AND CONCLUSION: The pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/normas , Metanálise como Assunto , Análise de Sequência de DNA/normas , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos , Software/normas
17.
Bioinformatics ; 24(10): 1313-5, 2008 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-18381401

RESUMO

UNLABELLED: Fault Tolerant Contact Map Reconstruction (FT-COMAR) is a heuristic algorithm for the reconstruction of the protein three-dimensional structure from (possibly) incomplete (i.e. containing unknown entries) and noisy contact maps. FT-COMAR runs within minutes, allowing its application to a large-scale number of predictions. AVAILABILITY: http://bioinformatics.cs.unibo.it/FT-COMAR


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/ultraestrutura , Software , Sítios de Ligação , Simulação por Computador , Ligação Proteica , Conformação Proteica , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
20.
Artigo em Inglês | MEDLINE | ID: mdl-20855922

RESUMO

Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In literature, there is no justification for the adoption of the MCLACHLAN instead of other substitution matrices. In this paper, we approach the problem of computing the optimal similarity matrix for contact prediction with correlated mutations, i.e., the similarity matrix that maximizes the accuracy of contact prediction with correlated mutations. We describe an optimization procedure, based on the gradient descent method, for computing the optimal similarity matrix and perform an extensive number of experimental tests. Our tests show that there is a large number of optimal matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in protein contact prediction is independent of the optimized similarity matrix. This suggests that the poor scoring of the correlated mutations approach may be due to the choice of the linear correlation function in evaluating correlated mutations.


Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Mutação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA