Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37847785

RESUMO

MOTIVATION: The usefulness of supervised molecular property prediction (MPP) is well-recognized in many applications. However, the insufficiency and the imbalance of labeled data make the learning problem difficult. Moreover, the reliability of the predictions is also a huddle in the deployment of MPP models in safety-critical fields. RESULTS: We propose the Evidential Meta-model for Molecular Property Prediction (EM3P2) method that returns uncertainty estimates along with its predictions. Our EM3P2 trains an evidential graph isomorphism network classifier using multi-task molecular property datasets under the model-agnostic meta-learning (MAML) framework while addressing the problem of data imbalance.Our results showed better prediction performances compared to existing meta-MPP models. Furthermore, we showed that the uncertainty estimates returned by our EM3P2 can be used to reject uncertain predictions for applications that require higher confidence. AVAILABILITY AND IMPLEMENTATION: Source code available for download at https://github.com/Ajou-DILab/EM3P2.


Assuntos
Biologia Computacional , Aprendizado de Máquina Supervisionado
2.
Exp Mol Med ; 55(8): 1734-1742, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37524869

RESUMO

The detection of somatic DNA variants in tumor samples with low tumor purity or sequencing depth remains a daunting challenge despite numerous attempts to address this problem. In this study, we constructed a substantially extended set of actual positive variants originating from a wide range of tumor purities and sequencing depths, as well as actual negative variants derived from sequencer-specific sequencing errors. A deep learning model named AIVariant, trained on this extended dataset, outperforms previously reported methods when tested under various tumor purities and sequencing depths, especially low tumor purity and sequencing depth.


Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Frequência do Gene , Biologia Computacional/métodos , Algoritmos , Neoplasias/genética , Neoplasias/diagnóstico , Mutação
3.
Transl Psychiatry ; 11(1): 590, 2021 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-34785643

RESUMO

Established genetic risk factors for Alzheimer's disease (AD) account for only a portion of AD heritability. The aim of this study was to identify novel associations between genetic variants and AD-specific brain atrophy. We conducted genome-wide association studies for brain magnetic resonance imaging measures of hippocampal volume and entorhinal cortical thickness in 2643 Koreans meeting the clinical criteria for AD (n = 209), mild cognitive impairment (n = 1449) or normal cognition (n = 985). A missense variant, rs77359862 (R274W), in the SHANK-associated RH Domain Interactor (SHARPIN) gene was associated with entorhinal cortical thickness (p = 5.0 × 10-9) and hippocampal volume (p = 5.1 × 10-12). It revealed an increased risk of developing AD in the mediation analyses. This variant was also associated with amyloid-ß accumulation (p = 0.03) and measures of memory (p = 1.0 × 10-4) and executive function (p = 0.04). We also found significant association of other SHARPIN variants with hippocampal volume in the Alzheimer's Disease Neuroimaging Initiative (rs3417062, p = 4.1 × 10-6) and AddNeuroMed (rs138412600, p = 5.9 × 10-5) cohorts. Further, molecular dynamics simulations and co-immunoprecipitation indicated that the variant significantly reduced the binding of linear ubiquitination assembly complex proteins, SHPARIN and HOIL-1 Interacting Protein (HOIP), altering the downstream NF-κB signaling pathway. These findings suggest that SHARPIN plays an important role in the pathogenesis of AD.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Peptídeos beta-Amiloides/metabolismo , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Disfunção Cognitiva/diagnóstico por imagem , Disfunção Cognitiva/genética , Estudo de Associação Genômica Ampla , Humanos , Imageamento por Ressonância Magnética , Proteínas do Tecido Nervoso , Ubiquitinas
4.
PLoS One ; 15(7): e0236197, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32701958

RESUMO

Genome-wide association studies of gastric cancer (GC) cases have revealed common gastric cancer susceptibility loci with low effect size. We investigated rare variants with high effect size via whole-exome sequencing (WES) of subjects with familial clustering of gastric cancer. WES of DNAs from the blood of 19 gastric cancer patients and 36 unaffected family members from 14 families with two or more gastric cancer patients were tested. Linkage analysis combined with association tests were performed using Pedigree Variant Annotation, Analysis, and Search Tool (pVAAST) software. Based on the logarithm of odds (LOD) and permutation-based composite likelihood ratio test (CLRT) from pVAAST, MUC4 was identified as a predisposing gene (LOD P-value = 1.9×10-5; permutation-based P-value of CLRT ≤ 9.9×10-9). In a larger cohort consisting of 597 GC patients and 9,759 healthy controls genotyped with SNP array, we discovered common variants in MUC4 regions (rs148735556, rs11717039, and rs547775645) significantly associated with GC supporting the association of MUC4 with gastric cancer. And the MUC4 variants were found in higher frequency in The Cancer Genome Atlas Study (TCGA) germline samples of patients with multiple cancer types. Immunohistochemistry indicated that MUC4 was downregulated in the noncancerous gastric mucosa of subjects with MUC4 germline missense variants, suggesting that loss of the protective function of MUC4 predisposes an individual to gastric cancer. Rare variants in MUC4 can be novel gastric cancer susceptibility loci in Koreans possessing the familial clustering of gastric cancer.


Assuntos
Sequenciamento do Exoma , Ligação Genética , Predisposição Genética para Doença , Variação Genética , Mucina-4/genética , Estudos de Coortes , Família , Feminino , Células Germinativas/metabolismo , Humanos , Masculino , Pessoa de Meia-Idade , Mucina-4/química , Linhagem , Reprodutibilidade dos Testes , Estômago/patologia , Neoplasias Gástricas/genética
5.
BMC Bioinformatics ; 20(Suppl 13): 381, 2019 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-31337329

RESUMO

BACKGROUND: How can we obtain fast and high-quality clusters in genome scale bio-networks? Graph clustering is a powerful tool applied on bio-networks to solve various biological problems such as protein complexes detection, disease module detection, and gene function prediction. Especially, MCL (Markov Clustering) has been spotlighted due to its superior performance on bio-networks. MCL, however, is skewed towards finding a large number of very small clusters (size 1-3) and fails to detect many larger clusters (size 10+). To resolve this fragmentation problem, MLR-MCL (Multi-level Regularized MCL) has been developed. MLR-MCL still suffers from the fragmentation and, in cases, unrealistically large clusters are generated. RESULTS: In this paper, we propose PS-MCL (Parallel Shotgun Coarsened MCL), a parallel graph clustering method outperforming MLR-MCL in terms of running time and cluster quality. PS-MCL adopts an efficient coarsening scheme, called SC (Shotgun Coarsening), to improve graph coarsening in MLR-MCL. SC allows merging multiple nodes at a time, which leads to improvement in quality, time and space usage. Also, PS-MCL parallelizes main operations used in MLR-MCL which includes matrix multiplication. CONCLUSIONS: Experiments show that PS-MCL dramatically alleviates the fragmentation problem, and outperforms MLR-MCL in quality and running time. We also show that the running time of PS-MCL is effectively reduced with parallelization.


Assuntos
Algoritmos , Proteínas/metabolismo , Análise por Conglomerados , Cadeias de Markov , Mapas de Interação de Proteínas , Proteínas/química
6.
PLoS One ; 13(9): e0204101, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30265692

RESUMO

Can structural information of proteins generate essential features for predicting the deleterious effect of a single nucleotide variant (SNV) independent of the known existence of the SNV in diseases? In this work, we answer the question by examining the performance of features generated from prior knowledge with the goal towards determining the pathogenic effect of rare variants in rare disease. We take the approach of prioritizing SNV loci focusing on protein structure-based features. The proposed structure-based features are generated from geometric, physical, chemical, and functional properties of the variant loci and structural neighbors of the loci utilizing multiple homologous structures. The performance of the structure-based features alone, trained on 80% of HumVar-HumDiv combination (HumVD-train) and tested on 20% of HumVar-HumDiv (HumVD-test), ClinVar and ClinVar rare variant rare disease (ClinVarRVRD) datasets, showed high levels of discernibility in determining the SNV's pathogenic or benign effects on patients. Combined structure- and sequence-based features generated from prior knowledge on a random forest model further improved the F scores to 0.84 (HumVD-test), 0.75 (ClinVar), and 0.75 (ClinVarRVRD). Including features based on the difference between wild-type in addition to the features based on loci information increased the F score slightly more to 0.90 (HumVD-test), 0.78 (ClinVar), and 0.76 (ClinVarRVRD). The empirical examination and high F scores of the results based on loci information alone suggest that location of SNV plays a primary role in determining functional impact of mutation and that structure-based features can help enhance the prediction performance.


Assuntos
Polimorfismo de Nucleotídeo Único/genética , Doenças Raras/genética , Algoritmos , Proteína BRCA2/genética , Sítios de Ligação , Biologia Computacional , Loci Gênicos , Humanos , Aprendizado de Máquina , Curva ROC , Reprodutibilidade dos Testes
7.
PLoS One ; 13(7): e0200579, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30044837

RESUMO

How can we find patterns and anomalies in a tensor, i.e., multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives at each time step? Finding patterns and anomalies in multi-dimensional data have many important applications, including building safety monitoring, health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard tensor decomposition results are not directly interpretable and few methods that propose to increase interpretability need to be made faster, more memory efficient, and more accurate for large and quickly generated data in the online environment. We propose two versions of a fast, accurate, and directly interpretable tensor decomposition method we call CTD that is based on efficient sampling method. First is the static version of CTD, i.e., CTD-S, that provably guarantees up to 11× higher accuracy than that of the state-of-the-art method. Also, CTD-S is made up to 2.3× faster and up to 24× more memory-efficient than the state-of-the-art method by removing redundancy. Second is the dynamic version of CTD, i.e. CTD-D, which is the first interpretable dynamic tensor decomposition method ever proposed. It is also made up to 82× faster than the already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in online distributed denial of service (DDoS) attack detection and online troll detection.


Assuntos
Redes de Comunicação de Computadores/organização & administração , Segurança Computacional , Análise de Dados , Sistemas de Informação/organização & administração , Algoritmos , Estudos de Viabilidade , Fatores de Tempo
9.
Bioinformatics ; 34(24): 4151-4158, 2018 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-29931238

RESUMO

Motivation: Given multi-platform genome data with prior knowledge of functional gene sets, how can we extract interpretable latent relationships between patients and genes? More specifically, how can we devise a tensor factorization method which produces an interpretable gene factor matrix based on functional gene set information while maintaining the decomposition quality and speed? Results: We propose GIFT, a Guided and Interpretable Factorization for Tensors. GIFT provides interpretable factor matrices by encoding prior knowledge as a regularization term in its objective function. We apply GIFT to the PanCan12 dataset (TCGA multi-platform genome data) and compare the performance with P-Tucker, our baseline method without prior knowledge constraint, and Silenced-TF, our naive interpretable method. Results show that GIFT produces interpretable factorizations with high scalability and accuracy. Furthermore, we demonstrate how results of GIFT can be used to reveal significant relations between (cancer, gene sets, genes) and validate the findings based on literature evidence. Availability and implementation: The code and datasets used in the paper are available at https://github.com/leesael/GIFT. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Neoplasias/genética , Biologia Computacional , Humanos , Software
10.
Int J Mol Sci ; 17(6)2016 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-27258269

RESUMO

How can complex relationships among molecular or clinico-pathological entities of neurological disorders be represented and analyzed? Graphs seem to be the current answer to the question no matter the type of information: molecular data, brain images or neural signals. We review a wide spectrum of graph representation and graph analysis methods and their application in the study of both the genomic level and the phenotypic level of the neurological disorder. We find numerous research works that create, process and analyze graphs formed from one or a few data types to gain an understanding of specific aspects of the neurological disorders. Furthermore, with the increasing number of data of various types becoming available for neurological disorders, we find that integrative analysis approaches that combine several types of data are being recognized as a way to gain a global understanding of the diseases. Although there are still not many integrative analyses of graphs due to the complexity in analysis, multi-layer graph analysis is a promising framework that can incorporate various data types. We describe and discuss the benefits of the multi-layer graph framework for studies of neurological disease.


Assuntos
Análise por Conglomerados , Modelos Biológicos , Doenças do Sistema Nervoso/etiologia , Doenças do Sistema Nervoso/metabolismo , Animais , Encéfalo/metabolismo , Encéfalo/fisiopatologia , Simulação por Computador , Redes Reguladoras de Genes , Humanos , Redes e Vias Metabólicas , Rede Nervosa , Doenças do Sistema Nervoso/patologia , Vias Neurais , Mapas de Interação de Proteínas , Transdução de Sinais
12.
Int J Data Min Bioinform ; 12(4): 417-33, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26510295

RESUMO

In this work, we propose a LMDS-based binding-site search for improving the search speed of the Patch-Surfer method. Patch-Surfer is efficient in recognition of protein-ligand binding partners, further speedup is necessary to address multiple-user access. Futher speedup is realised by exploiting Landmark Multi-Dimensional Scaling (LMDS). It computes embedding coordinates for data points based on their distances from landmark points. When selecting the landmark points, we adopt two approaches--random and greedy selection. Our method approximately retrieves top-k results and the accuracy increases as we exploit more landmark points. Although two landmark selection approaches show comparable results, the greedy selection shows the best performance when the number of landmark points is large. Using our method, the searching time is reduced up to 99% and it retrieves almost 80% of exact top-k results. Additionally, LMDS-based binding-site search+ improves the retrieval accuracy from 80% to 95% while sacrificing the speedup ratio from 99% to 90% compared to Patch-Surfer.


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Análise de Sequência de Proteína/métodos , Sítios de Ligação , Ligantes , Proteínas/genética
13.
Bioinformatics ; 31(22): 3653-9, 2015 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-26209432

RESUMO

MOTIVATION: As the quantity of genomic mutation data increases, the likelihood of finding patients with similar genomic profiles, for various disease inferences, increases. However, so does the difficulty in identifying them. Similarity search based on patient mutation profiles can solve various translational bioinformatics tasks, including prognostics and treatment efficacy predictions for better clinical decision making through large volume of data. However, this is a challenging problem due to heterogeneous and sparse characteristics of the mutation data as well as their high dimensionality. RESULTS: To solve this problem we introduce a compact representation and search strategy based on Gene-Ontology and orthogonal non-negative matrix factorization. Statistical significance between the identified cancer subtypes and their clinical features are computed for validation; results show that our method can identify and characterize clinically meaningful tumor subtypes comparable or better in most datasets than the recently introduced Network-Based Stratification method while enabling real-time search. To the best of our knowledge, this is the first attempt to simultaneously characterize and represent somatic mutational data for efficient search purposes. AVAILABILITY: The implementations are available at: https://sites.google.com/site/postechdm/research/implementation/orgos. CONTACT: sael@cs.stonybrook.edu or hwanjoyu@postech.ac.kr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Ontologia Genética , Mutação/genética , Distribuição de Qui-Quadrado , Humanos , Neoplasias/genética , Reprodutibilidade dos Testes , Análise de Sobrevida
14.
Methods Mol Biol ; 1137: 105-17, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24573477

RESUMO

The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas/química , Software , Navegador , Bases de Dados de Proteínas
15.
Int J Mol Sci ; 14(10): 20635-57, 2013 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-24132151

RESUMO

With the accumulation of next generation sequencing data, there is increasing interest in the study of intra-species difference in molecular biology, especially in relation to disease analysis. Furthermore, the dynamics of the protein is being identified as a critical factor in its function. Although accuracy of protein structure prediction methods is high, provided there are structural templates, most methods are still insensitive to amino-acid differences at critical points that may change the overall structure. Also, predicted structures are inherently static and do not provide information about structural change over time. It is challenging to address the sensitivity and the dynamics by computational structure predictions alone. However, with the fast development of diverse mass spectrometry coupled experiments, low-resolution but fast and sensitive structural information can be obtained. This information can then be integrated into the structure prediction process to further improve the sensitivity and address the dynamics of the protein structures. For this purpose, this article focuses on reviewing two aspects: the types of mass spectrometry coupled experiments and structural data that are obtainable through those experiments; and the structure prediction methods that can utilize these data as constraints. Also, short review of current efforts in integrating experimental data in the structural modeling is provided.


Assuntos
Proteínas/química , Simulação por Computador , Humanos , Espectrometria de Massas/métodos , Estrutura Molecular
16.
BMC Med Inform Decis Mak ; 13 Suppl 1: S8, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23691543

RESUMO

Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.


Assuntos
Indexação e Redação de Resumos/métodos , Biologia Computacional , Armazenamento e Recuperação da Informação , Estrutura Terciária de Proteína , Algoritmos , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas , Humanos , Imageamento Tridimensional/estatística & dados numéricos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Modelos Moleculares , Conformação Proteica
17.
BMC Bioinformatics ; 13 Suppl 2: S7, 2012 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-22536870

RESUMO

BACKGROUND: Many of solved tertiary structures of unknown functions do not have global sequence and structural similarities to proteins of known function. Often functional clues of unknown proteins can be obtained by predicting small ligand molecules that bind to the proteins. METHODS: In our previous work, we have developed an alignment free local surface-based pocket comparison method, named Patch-Surfer, which predicts ligand molecules that are likely to bind to a protein of interest. Given a query pocket in a protein, Patch-Surfer searches a database of known pockets and finds similar ones to the query. Here, we have extended the database of ligand binding pockets for Patch-Surfer to cover diverse types of binding ligands. RESULTS AND CONCLUSION: We selected 9393 representative pockets with 2707 different ligand types from the Protein Data Bank. We tested Patch-Surfer on the extended pocket database to predict binding ligand of 75 non-homologous proteins that bind one of seven different ligands. Patch-Surfer achieved the average enrichment factor at 0.1 percent of over 20.0. The results did not depend on the sequence similarity of the query protein to proteins in the database, indicating that Patch-Surfer can identify correct pockets even in the absence of known homologous structures in the database.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Algoritmos , Sítios de Ligação , Humanos , Ligantes , Proteínas/química , Proteínas/metabolismo , Proteínas/fisiologia
18.
Protein Sci ; 21(5): 686-96, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22374910

RESUMO

Bacterial formyl-CoA:oxalate CoA-transferase (FCOCT) and oxalyl-CoA decarboxylase work in tandem to perform a proton-consuming decarboxylation that has been suggested to have a role in generalized acid resistance. FCOCT is the product of uctB in the acidophilic acetic acid bacterium Acetobacter aceti. As expected for an acid-resistance factor, UctB remains folded at the low pH values encountered in the A. aceti cytoplasm. A comparison of crystal structures of FCOCTs and related proteins revealed few features in UctB that would distinguish it from nonacidophilic proteins and thereby account for its acid stability properties, other than a strikingly featureless electrostatic surface. The apparently neutral surface is a result of a "speckled" charge decoration, in which charged surface residues are surrounded by compensating charges but do not form salt bridges. A quantitative comparison among orthologs identified a pattern of residue substitution in UctB that may be a consequence of selection for protein stability by constant exposure to acetic acid. We suggest that this surface charge pattern, which is a distinctive feature of A. aceti proteins, creates a stabilizing electrostatic network without stiffening the protein or compromising protein-solvent interactions.


Assuntos
Acetobacter/fisiologia , Proteínas de Bactérias/química , Coenzima A-Transferases/química , Ácido Acético , Acetobacter/enzimologia , Proteínas de Bactérias/metabolismo , Coenzima A-Transferases/metabolismo , Etanol , Concentração de Íons de Hidrogênio , Modelos Moleculares , Estabilidade Proteica , Eletricidade Estática , Especificidade por Substrato
19.
J Struct Funct Genomics ; 13(2): 111-23, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22270458

RESUMO

The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteínas/análise , Análise de Sequência de Proteína/métodos , Software , Sítios de Ligação , Biologia Computacional/métodos , Internet , Anotação de Sequência Molecular , Conformação Proteica , Proteínas/química , Reprodutibilidade dos Testes , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade
20.
Proteins ; 80(4): 1177-95, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22275074

RESUMO

Functional elucidation of proteins is one of the essential tasks in biology. Function of a protein, specifically, small ligand molecules that bind to a protein, can be predicted by finding similar local surface regions in binding sites of known proteins. Here, we developed an alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets. Patch-Surfer was benchmarked on three datasets, which consist in total of 390 proteins that bind to one of 21 ligands. Patch-Surfer showed superior performance to existing methods including a global pocket comparison method, Pocket-Surfer, which we have previously introduced. Particularly, as intended, the accuracy showed large improvement for flexible ligand molecules, which bind to pockets in different conformations.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/química , Software , Homologia Estrutural de Proteína , Sítios de Ligação , Fenômenos Químicos , Bases de Dados de Proteínas , Interações Hidrofóbicas e Hidrofílicas , Ligantes , Eletricidade Estática , Propriedades de Superfície , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...