Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37332013

RESUMEN

We report the structure-based pathogenicity relationship identifier (SPRI), a novel computational tool for accurate evaluation of pathological effects of missense single mutations and prediction of higher-order spatially organized units of mutational clusters. SPRI can effectively extract properties determining pathogenicity encoded in protein structures, and can identify deleterious missense mutations of germ line origin associated with Mendelian diseases, as well as mutations of somatic origin associated with cancer drivers. It compares favorably to other methods in predicting deleterious mutations. Furthermore, SPRI can discover spatially organized pathogenic higher-order spatial clusters (patHOS) of deleterious mutations, including those of low recurrence, and can be used for discovery of candidate cancer driver genes and driver mutations. We further demonstrate that SPRI can take advantage of AlphaFold2 predicted structures and can be deployed for saturation mutation analysis of the whole human proteome.


Asunto(s)
Mutación Missense , Neoplasias , Humanos , Virulencia , Mutación , Neoplasias/genética , Biología Computacional/métodos
2.
J Chem Inf Model ; 64(7): 2445-2453, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37903033

RESUMEN

miRNAs (microRNAs) target specific mRNA (messenger RNA) sites to regulate their translation expression. Although miRNA targeting can rely on seed region base pairing, animal miRNAs, including human miRNAs, typically cooperate with several cofactors, leading to various noncanonical pairing rules. Therefore, identifying the binding sites of animal miRNAs remains challenging. Because experiments for mapping miRNA targets are costly, computational methods are preferred for extracting potential miRNA-mRNA fragment binding pairs first. However, existing prediction tools can have significant false positives due to the prevalent noncanonical miRNA binding behaviors and the information-biased training negative sets that were used while constructing these tools. To overcome these obstacles, we first prepared an information-balanced miRNA binding pair ground-truth data set. A miRNA-mRNA interaction-aware model was then designed to help identify miRNA binding events. On the test set, our model (auROC = 94.4%) outperformed existing models by at least 2.8% in auROC. Furthermore, we showed that this model can suggest potential binding patterns for miRNA-mRNA sequence interacting pairs. Finally, we made the prepared data sets and the designed model available at http://cosbi2.ee.ncku.edu.tw/mirna_binding/download.


Asunto(s)
MicroARNs , Animales , Humanos , MicroARNs/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Algoritmos , Biología Computacional/métodos
3.
Mol Biol Evol ; 38(7): 2715-2731, 2021 06 25.
Artículo en Inglés | MEDLINE | ID: mdl-33674876

RESUMEN

SARS-CoV-2 infects humans through the binding of viral S-protein (spike protein) to human angiotensin I converting enzyme 2 (ACE2). The structure of the ACE2-S-protein complex has been deciphered and we focused on the 27 ACE2 residues that bind to S-protein. From human sequence databases, we identified nine ACE2 variants at ACE2-S-protein binding sites. We used both experimental assays and protein structure analysis to evaluate the effect of each variant on the binding affinity of ACE2 to S-protein. We found one variant causing complete binding disruption, two and three variants, respectively, strongly and mildly reducing the binding affinity, and two variants strongly enhancing the binding affinity. We then collected the ACE2 gene sequences from 57 nonhuman primates. Among the 6 apes and 20 Old World monkeys (OWMs) studied, we found no new variants. In contrast, all 11 New World monkeys (NWMs) studied share four variants each causing a strong reduction in binding affinity, the Philippine tarsier also possesses three such variants, and 18 of the 19 prosimian species studied share one variant causing a strong reduction in binding affinity. Moreover, one OWM and three prosimian variants increased binding affinity by >50%. Based on these findings, we proposed that the common ancestor of primates was strongly resistant to and that of NWMs was completely resistant to SARS-CoV-2 and so is the Philippine tarsier, whereas apes and OWMs, like most humans, are susceptible. This study increases our understanding of the differences in susceptibility to SARS-CoV-2 infection among primates.


Asunto(s)
COVID-19 , Resistencia a la Enfermedad/genética , Peptidil-Dipeptidasa A , SARS-CoV-2 , Animales , COVID-19/genética , COVID-19/inmunología , Chlorocebus aethiops , Humanos , Macaca mulatta , Peptidil-Dipeptidasa A/genética , Peptidil-Dipeptidasa A/inmunología , SARS-CoV-2/genética , SARS-CoV-2/inmunología
4.
Proc Natl Acad Sci U S A ; 116(38): 19009-19018, 2019 09 17.
Artículo en Inglés | MEDLINE | ID: mdl-31484772

RESUMEN

How negative selection, positive selection, and population size contribute to the large variation in nucleotide substitution rates among RNA viruses remains unclear. Here, we studied the ratios of nonsynonymous-to-synonymous substitution rates (dN/dS) in protein-coding genes of human RNA and DNA viruses and mammals. Among the 21 RNA viruses studied, 18 showed a genome-average dN/dS from 0.01 to 0.10, indicating that over 90% of nonsynonymous mutations are eliminated by negative selection. Only HIV-1 showed a dN/dS (0.31) higher than that (0.22) in mammalian genes. By comparing the dN/dS values among genes in the same genome and among species or strains, we found that both positive selection and population size play significant roles in the dN/dS variation among genes and species. Indeed, even in flaviviruses and picornaviruses, which showed the lowest ratios among the 21 species studied, positive selection appears to have contributed significantly to dN/dS We found the view that positive selection occurs much more frequently in influenza A subtype H3N2 than subtype H1N1 holds only for the hemagglutinin and neuraminidase genes, but not for other genes. Moreover, we found no support for the view that vector-borne RNA viruses have lower dN/dS ratios than non-vector-borne viruses. In addition, we found a correlation between dN and dS, implying a correlation between dN and the mutation rate. Interestingly, only 2 of the 8 DNA viruses studied showed a dN/dS < 0.10, while 4 showed a dN/dS > 0.22. These observations increase our understanding of the mechanisms of RNA virus evolution.


Asunto(s)
Evolución Molecular , Infecciones por Virus ARN/virología , Virus ARN/genética , Selección Genética , Proteínas Virales/genética , Animales , Genoma Viral , Humanos , Mamíferos , Tasa de Mutación
5.
BMC Bioinformatics ; 22(1): 503, 2021 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-34656087

RESUMEN

BACKGROUND: Piwi-interacting RNAs (piRNAs) are the small non-coding RNAs (ncRNAs) that silence genomic transposable elements. And researchers found out that piRNA also regulates various endogenous transcripts. However, there is no systematic understanding of the piRNA binding patterns and how piRNA targets genes. While various prediction methods have been developed for other similar ncRNAs (e.g., miRNAs), piRNA holds distinctive characteristics and requires its own computational model for binding target prediction. RESULTS: Recently, transcriptome-wide piRNA binding events in C. elegans were probed by PRG-1 CLASH experiments. Based on the probed piRNA-messenger RNAs (mRNAs) binding pairs, in this research, we devised the first deep learning architecture based on multi-head attention to computationally identify piRNA targeting mRNA sites. In the devised deep network, the given piRNA and mRNA segment sequences are first one-hot encoded and undergo a combined operation of convolution and squeezing-extraction to unravel motif patterns. And we incorporate a novel multi-head attention sub-network to extract the hidden piRNA binding rules that can simulate the biological piRNA target recognition process. Finally, the true piRNA-mRNA binding pairs are identified by a deep fully connected sub-network. Our model obtains a supreme discriminatory power of AUC [Formula: see text] 93.3% on an independent test set and successfully extracts the verified binding pattern of a synthetic piRNA. These results demonstrated that the devised model achieves high prediction performance and suggests testable potential biological piRNA binding rules. CONCLUSIONS: In this research, we developed the first deep learning method to identify piRNA targeting sites on C. elegans mRNAs. And the developed deep learning method is demonstrated to be of high accuracy and can provide biological insights into piRNA-mRNA binding patterns. The piRNA binding target identification network can be downloaded from http://cosbi2.ee.ncku.edu.tw/data_download/piRNA_mRNA_binding .


Asunto(s)
Proteínas de Caenorhabditis elegans , MicroARNs , Animales , Proteínas Argonautas , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genética , Elementos Transponibles de ADN , ARN Mensajero/genética , ARN Interferente Pequeño/genética
6.
BMC Bioinformatics ; 22(Suppl 10): 271, 2021 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-34058988

RESUMEN

BACKGROUND: Translational regulation is one important aspect of gene expression regulation. Dysregulation of translation results in abnormal cell physiology and leads to diseases. Ribosome profiling (RP), also called ribo-seq, is a powerful experimental technique to study translational regulation. It can capture a snapshot of translation by deep sequencing of ribosome-protected mRNA fragments. Many ribosome profiling data processing tools have been developed. However, almost all tools analyze ribosome profiling data at the gene level. Since different isoforms of a gene may produce different proteins with distinct biological functions, it is advantageous to analyze ribosome profiling data at the isoform level. To meet this need, previously we developed a pipeline to analyze 610 public human ribosome profiling data at the isoform level and constructed HRPDviewer database. RESULTS: To allow other researchers to use our pipeline as well, here we implement our pipeline as an easy-to-use software tool called RPiso. Compared to Ribomap (a widely used tool which provides isoform-level ribosome profiling analyses), our RPiso (1) estimates isoform abundance more accurately, (2) supports analyses on more species, and (3) provides a web-based viewer for interactively visualizing ribosome profiling data on the selected mRNA isoforms. CONCLUSIONS: In this study, we developed RPiso software tool ( http://cosbi7.ee.ncku.edu.tw/RPiso/ ) to provide isoform-level ribosome profiling analyses. RPiso is very easy to install and execute. RPiso also provides a web-based viewer for interactively visualizing ribosome profiling data on the selected mRNA isoforms. We believe that RPiso is a useful tool for researchers to analyze and visualize their own ribosome profiling data at the isoform level.


Asunto(s)
Biosíntesis de Proteínas , Ribosomas , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ribosomas/genética , Ribosomas/metabolismo , Programas Informáticos
7.
BMC Med Inform Decis Mak ; 18(Suppl 2): 42, 2018 07 23.
Artículo en Inglés | MEDLINE | ID: mdl-30066644

RESUMEN

BACKGROUND: Relationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge. Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Automatic extraction of such information and storing it in structured form could help researchers more easily access such information and also make it possible to incorporate it in advanced integrative analysis. In this study, we developed a novel approach to extract bio-entity relationships information using Nature Language Processing (NLP) and a graph-theoretic algorithm. METHODS: Our method, called GRGT (Grammatical Relationship Graph for Triplets), not only extracts the pairs of terms that have certain relationships, but also extracts the type of relationship (the word describing the relationships). In addition, the directionality of the relationship can also be extracted. Our method is based on the assumption that a triplet exists for a pair of interactions. A triplet is defined as two terms (entities) and an interaction word describing the relationship of the two terms in a sentence. We first use a sentence parsing tool to obtain the sentence structure represented as a dependency graph where words are nodes and edges are typed dependencies. The shortest paths among the pairs of words in the triplet are then extracted, which form the basis for our information extraction method. Flexible pattern matching scheme was then used to match a triplet graph with unknown relationship to those triplet graphs with labels (True or False) in the database. RESULTS: We applied the method on three benchmark datasets to extract the protein-protein-interactions (PPIs), and obtained better precision than the top performing methods in literature. CONCLUSIONS: We have developed a method to extract the protein-protein interactions from biomedical literature. PPIs extracted by our method have higher precision among other methods, suggesting that our method can be used to effectively extract PPIs and deposit them into databases. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bio-entities.


Asunto(s)
Algoritmos , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Proteínas/metabolismo , Bases de Datos Factuales
8.
Proc Natl Acad Sci U S A ; 109(4): 1170-5, 2012 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-22238424

RESUMEN

Protein structure and function are closely related, especially in functional surfaces, which are local spatial regions that perform the biological functions. Also, protein structures tend to evolve more slowly than amino acid sequences. We have therefore developed a method to classify proteins using the structures of functional surfaces; we call it protein surface classification (PSC). PSC may reflect functional relationships among proteins and may detect evolutionary relationships among highly divergent sequences. We focused on the surfaces of ligand-bound regions because they represent well-defined structures. Specifically, we used structural attributes to measure similarities between binding surfaces and constructed a PSC library of ~2,000 binding surface types from the bound forms. Using flavin mononucleotide-binding proteins and glycosidases as examples, we show how the evolutionary position of an uncharacterized protein can be defined and its function inferred from the characterized members of the same surface subtype. We found that proteins with the same enzyme nomenclature may be divided into subtypes and that two proteins in the same CATH (Class, Architecture, Topology, Homologous superfamily) fold may belong to two different surface types. In conclusion, our approach complements the sequence-based and fold-domain classifications and has the advantage of associating the shape of a protein with its biological function. As an expandable library, PSC provides a resource of spatial patterns for studying the evolution of protein structure and function.


Asunto(s)
Modelos Moleculares , Conformación Proteica , Proteínas/química , Proteínas/clasificación , Propiedades de Superficie , Sitios de Unión/genética , Bases de Datos Genéticas , Mononucleótido de Flavina/metabolismo , Oxidorreductasas/química , Oxidorreductasas/clasificación , Pliegue de Proteína , Proteínas/metabolismo
9.
Nucleic Acids Res ; 40(Web Server issue): W435-9, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22669905

RESUMEN

We recently proposed to classify proteins by their functional surfaces. Using the structural attributes of functional surfaces, we inferred the pairwise relationships of proteins and constructed an expandable database of protein surface classification (PSC). As the functional surface(s) of a protein is the local region where the protein performs its function, our classification may reflect the functional relationships among proteins. Currently, PSC contains a library of 1974 surface types that include 25,857 functional surfaces identified from 24,170 bound structures. The search tool in PSC empowers users to explore related surfaces that share similar local structures and core functions. Each functional surface is characterized by structural attributes, which are geometric, physicochemical or evolutionary features. The attributes have been normalized as descriptors and integrated to produce a profile for each functional surface in PSC. In addition, binding ligands are recorded for comparisons among homologs. PSC allows users to exploit related binding surfaces to reveal the changes in functionally important residues on homologs that have led to functional divergence during evolution. The substitutions at the key residues of a spatial pattern may determine the functional evolution of a protein. In PSC (http://pocket.uchicago.edu/psc/), a pool of changes in residues on similar functional surfaces is provided.


Asunto(s)
Proteínas/química , Proteínas/clasificación , Programas Informáticos , Homología Estructural de Proteína , Alcohol Deshidrogenasa/química , Análisis por Conglomerados , Humanos , Internet , Propiedades de Superficie
10.
Proc Natl Acad Sci U S A ; 108(13): 5313-8, 2011 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-21402946

RESUMEN

Protein binding site residues, especially catalytic residues, play a central role in protein function. Because more than 99% of the ∼ 12 million protein sequences in the nonredundant protein database have no structural information, it is desirable to develop methods to predict the binding site residues of a protein from its primary sequence. This task is highly challenging, because the binding site residues constitute only a small portion of a protein. However, the binding site residues of a protein are clustered in its functional pocket(s), and their spatial patterns tend to be conserved in evolution. To take advantage of these evolutionary and structural principles, we constructed a database of ∼ 50,000 templates (called the pocket-containing segment database), each of which includes not only a sequence segment that contains a functional pocket but also the structural attributes of the pocket. To use this database, we designed a template-matching technique, termed residue-matching profiling, and established a criterion for selecting templates for a query sequence. Finally, we developed a probabilistic model for assigning spatial scores to matched residues between the template and query sequence in local alignments using a set of selected scoring matrices and for computing the binding likelihood of each matched residue in the query sequence. From the likelihoods, one can predict the binding site residues in the query sequence. An automated computational pipeline was developed for our method. A performance evaluation shows that our method achieves a 70% precision in predicting binding site residues at 60% sensitivity.


Asunto(s)
Secuencia de Aminoácidos , Evolución Biológica , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Sitios de Unión , Bases de Datos de Proteínas , Modelos Moleculares , Datos de Secuencia Molecular , Pliegue de Proteína , Estructura Terciaria de Proteína
11.
Comput Methods Programs Biomed ; 254: 108260, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38878357

RESUMEN

BACKGROUND AND OBJECTIVE: Proteome microarrays are one of the popular high-throughput screening methods for large-scale investigation of protein interactions in cells. These interactions can be measured on protein chips when coupled with fluorescence-labeled probes, helping indicate potential biomarkers or discover drugs. Several computational tools were developed to help analyze the protein chip results. However, existing tools fail to provide a user-friendly interface for biologists and present only one or two data analysis methods suitable for limited experimental designs, restricting the use cases. METHODS: In order to facilitate the biomarker examination using protein chips, we implemented a user-friendly and comprehensive web tool called BAPCP (Biomarker Analysis tool for Protein Chip Platforms) in this research to deal with diverse chip data distributions. RESULTS: BAPCP is well integrated with standard chip result files and includes 7 data normalization methods and 7 custom-designed quality control/differential analysis filters for biomarker extraction among experiment groups. Moreover, it can handle cost-efficient chip designs that repeat several blocks/samples within one single slide. Using experiments of the human coronavirus (HCoV) protein microarray and the E. coli proteome chip that helps study the immune response of Kawasaki disease as examples, we demonstrated that BAPCP can accelerate the time-consuming week-long manual biomarker identification process to merely 3 min. CONCLUSIONS: The developed BAPCP tool provides substantial analysis support for protein interaction studies and conforms to the necessity of expanding computer usage and exchanging information in bioscience and medicine. The web service of BAPCP is available at https://cosbi.ee.ncku.edu.tw/BAPCP/.


Asunto(s)
Biomarcadores , Análisis por Matrices de Proteínas , Programas Informáticos , Biomarcadores/metabolismo , Humanos , Internet , Proteoma , Interfaz Usuario-Computador , Escherichia coli , Proteómica/métodos , Biología Computacional
12.
Nucleic Acids Res ; 38(Database issue): D288-95, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19880384

RESUMEN

fPOP (footprinting Pockets Of Proteins, http://pocket.uchicago.edu/fpop/) is a relational database of the protein functional surfaces identified by analyzing the shapes of binding sites in approximately 42,700 structures, including both holo and apo forms. We previously used a purely geometric method to extract the spatial patterns of functional surfaces (split pockets) in approximately 19,000 bound structures and constructed a database, SplitPocket (http://pocket.uchicago.edu/). These functional surfaces are now used as spatial templates to predict the binding surfaces of unbound structures. To conduct a shape comparison, we use the Smith-Waterman algorithm to footprint an unbound pocket fragment with those of the functional surfaces in SplitPocket. The pairwise alignment of the unbound and bound pocket fragments is used to evaluate the local structural similarity via geometric matching. The final results of our large-scale computation, including approximately 90,000 identified or predicted functional surfaces, are stored in fPOP. This database provides an easily accessible resource for studying functional surfaces, assessing conformational changes between bound and unbound forms and analyzing functional divergence. Moreover, it may facilitate the exploration of the physicochemical textures of molecules and the inference of protein function. Finally, our approach provides a framework for classification of proteins into families on the basis of their functional surfaces.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Proteínas , Proteínas/química , Algoritmos , Animales , Proteínas Bacterianas/química , Sitios de Unión , Biología Computacional/tendencias , Simulación por Computador , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Conformación Proteica , Estructura Terciaria de Proteína , Programas Informáticos
13.
Sci Rep ; 12(1): 2565, 2022 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-35173175

RESUMEN

Alpha/beta hydrolase domain-containing protein 5 (ABHD5) is a highly conserved protein that regulates various lipid metabolic pathways via interactions with members of the perilipin (PLIN) and Patatin-like phospholipase domain-containing protein (PNPLA) protein families. Loss of function mutations in ABHD5 result in Chanarin-Dorfman Syndrome (CDS), characterized by ectopic lipid accumulation in numerous cell types and severe ichthyosis. Recent data demonstrates that ABHD5 is the target of synthetic and endogenous ligands that might be therapeutic beneficial for treating metabolic diseases and cancers. However, the structural basis of ABHD5 functional activities, such as protein-protein interactions and ligand binding is presently unknown. To address this gap, we constructed theoretical structural models of ABHD5 by comparative modeling and topological shape analysis to assess the spatial patterns of ABHD5 conformations computed in protein dynamics. We identified functionally important residues on ABHD5 surface for lipolysis activation by PNPLA2, lipid droplet targeting and PLIN-binding. We validated the computational model by examining the effects of mutating key residues in ABHD5 on an array of functional assays. Our integrated computational and experimental findings provide new insights into the structural basis of the diverse functions of ABHD5 as well as pathological mutations that result in CDS.


Asunto(s)
1-Acilglicerol-3-Fosfato O-Aciltransferasa/química , 1-Acilglicerol-3-Fosfato O-Aciltransferasa/metabolismo , Biología Computacional/métodos , Lipasa/metabolismo , Gotas Lipídicas/metabolismo , Mutación , 1-Acilglicerol-3-Fosfato O-Aciltransferasa/genética , Humanos , Ligandos , Gotas Lipídicas/química , Conformación Proteica
14.
Comput Biol Med ; 151(Pt B): 106314, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36455295

RESUMEN

Comparative analysis among multiple gene lists on their functional features is now a routine task due to the advancement of high-throughput experiments. Several enrichment analysis tools were developed in the past. However, these tools mainly focus on one gene list and contain only gene ontology or interaction features. What makes it worse, comparative investigation and customized feature set reanalysis are still unavailable. Therefore, we constructed the YMLA (Yeast Multiple List Analyzer) platform in this research. YMLA includes 39 yeast features and facilitates comparative analysis among multiple gene lists via tabular views, heatmaps, and network plots. Moreover, the customized feature set reanalysis function was implemented in YMLA to help form mechanism hypotheses based on a selected enriched feature subset. We demonstrated the biological applicability of YMLA via example lists consisting of genes with top/bottom translation efficiency values. The analysis results provided by YMLA reveal novel facts consistent with previous experiments. YMLA is available at https://cosbi7.ee.ncku.edu.tw/YMLA/.


Asunto(s)
Saccharomyces cerevisiae , Programas Informáticos , Saccharomyces cerevisiae/genética
15.
Nucleic Acids Res ; 37(Web Server issue): W384-9, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19406922

RESUMEN

SplitPocket (http://pocket.uchicago.edu/) is a web server to identify functional surfaces of protein from structure coordinates. Using the Alpha Shape Theory, we previously developed an analytical approach to identify protein functional surfaces by the geometric concept of a split pocket, which is a pocket split by a binding ligand. Our geometric approach extracts site-specific spatial information from coordinates of structures. To reduce the search space, probe radii are designed according to the physicochemical textures of molecules. The method uses the weighted Delaunay triangulation and the discrete flow algorithm to obtain geometric measurements and spatial patterns for each predicted pocket. It can also measure the hydrophobicity on a surface patch. Furthermore, we quantify the evolutionary conservation of surface patches by an index derived from the entropy scores in HSSP (homology-derived secondary structure of proteins). We have used the method to examine approximately 1.16 million potential pockets and identified the split pockets in >26,000 structures in the Protein Data Bank. This integrated web server of functional surfaces provides a source of spatial patterns to serve as templates for predicting the functional surfaces of unbound structures involved in binding activities. These spatial patterns should also be useful for protein functional inference, structural evolution and drug design.


Asunto(s)
Proteínas/química , Programas Informáticos , Dominio Catalítico , Humanos , Internet , Ligandos , Proteínas Quinasas Activadas por Mitógenos/química , Modelos Moleculares , Conformación Proteica , Familia-src Quinasas/química
16.
Comput Struct Biotechnol J ; 19: 3692-3707, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34285772

RESUMEN

Phosphoinositides (PIs) are a family of eight lipids consisting of phosphatidylinositol (PtdIns) and its seven phosphorylated forms. PIs have important regulatory functions in the cell including lipid signaling, protein transport, and membrane trafficking. Yeast has been recognized as a eukaryotic model system to study lipid-protein interactions. Hundreds of yeast PI-binding proteins have been identified, but this research knowledge remains scattered. Besides, the complete PI-binding spectrum and potential PI-binding domains have not been interlinked. No comprehensive databases are available to support the lipid-protein interaction research on phosphoinositides. Here we constructed the first knowledgebase of Yeast Phosphoinositide-Binding Proteins (YPIBP), a repository consisting of 679 PI-binding proteins collected from high-throughput proteome-array and lipid-array studies, QuickGO, and a rigorous literature mining. The YPIBP also contains protein domain information in categories of lipid-binding domains, lipid-related domains and other domains. The YPIBP provides search and browse modes along with two enrichment analyses (PI-binding enrichment analysis and domain enrichment analysis). An interactive visualization is given to summarize the PI-domain-protein interactome. Finally, three case studies were given to demonstrate the utility of YPIBP. The YPIBP knowledgebase consolidates the present knowledge and provides new insights of the PI-binding proteins by bringing comprehensive and in-depth interaction network of the PI-binding proteins. YPIBP is available at http://cosbi7.ee.ncku.edu.tw/YPIBP/.

17.
Comput Struct Biotechnol J ; 19: 5149-5159, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34589189

RESUMEN

Transcript isoforms regulated by alternative splicing can substantially impact carcinogenesis, leading to a need to obtain clues for both gene differential expression and malfunctions of isoform distributions in cancer studies. The Cancer Genome Atlas (TCGA) project was launched in 2008 to collect cancer-related genome mutation raw data from the population. While many repositories tried to add insights into the raw data in TCGA, no existing database provides both comprehensive gene-level and isoform-level cancer stage marker investigation and survival analysis. We constructed Cancer DEIso to facilitate in-depth analyses for both gene-level and isoform-level human cancer studies. Patient RNA-seq data, sample sheets, patient clinical data, and human genome datasets were collected and processed in Cancer DEIso. And four functions to search differentially expressed genes/isoforms between cancer stages were implemented: (i) Search potential gene/isoform markers for a specified cancer type and its two stages; (ii) Search potentially induced cancer types and stages for a gene/isoform; (iii) Expression survival analysis on a given gene/isoform for some cancer; (iv) Gene/isoform stage expression comparison visualization. As an example, we demonstrate that Cancer DEIso can indicate potential colorectal cancer isoform diagnostic markers that are not easily detected when only gene-level expressions are considered. Cancer DEIso is available at http://cosbi4.ee.ncku.edu.tw/DEIso/.

18.
Database (Oxford) ; 20202020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-33186464

RESUMEN

Nowadays high-throughput omics technologies are routinely used in biological research. From the omics data, researchers can easily get two gene lists (e.g. stress-induced genes vs. stress-repressed genes) related to their biological question. The next step would be to apply enrichment analysis tools to identify distinct functional/regulatory features between these two gene lists for further investigation. Although various enrichment analysis tools are already available, two challenges remain to be addressed. First, most existing tools are designed to analyze only one gene list, so they cannot directly compare two gene lists. Second, almost all existing tools focus on identifying the enriched qualitative features (e.g. gene ontology [GO] terms, pathways, domains, etc.). Many quantitative features (e.g. number of mRNA isoforms of a gene, mRNA half-life, protein half-life, transcriptional plasticity, translational efficiency, etc.) are available in the yeast, but no existing tools provide analyses on these quantitative features. To address these two challenges, here we present Yeast Quantitative Features Comparator (YQFC) that can directly compare various quantitative features between two yeast gene lists. In YQFC, we comprehensively collected and processed 85 quantitative features from the yeast literature and yeast databases. For each quantitative feature, YQFC provides three statistical tests (t-test, U test and KS test) to test whether this quantitative feature is statistically different between the two input yeast gene lists. The distinct quantitative features identified by YQFC may help researchers to study the underlying molecular mechanisms that differentiate the two input yeast gene lists. We believe that YQFC is a useful tool to expedite the biological research that uses high-throughput omics technologies. DATABASE URL: http://cosbi2.ee.ncku.edu.tw/YQFC/.


Asunto(s)
Bases de Datos Genéticas , Saccharomyces cerevisiae , Biología Computacional , Proteínas , Saccharomyces cerevisiae/genética , Programas Informáticos
19.
Proteins ; 76(4): 959-76, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19326458

RESUMEN

The function of a protein is often fulfilled via molecular interactions on its surfaces, so identifying the functional surface(s) of a protein is helpful for understanding its function. Here, we introduce the concept of a split pocket, which is a pocket that is split by a cognate ligand. We use a geometric approach that is site-specific. Specifically, we first compute a set of all pockets in the protein with its ligand(s) and a set of all pockets with the ligand(s) removed and then compare the two sets of pockets to identify the split pocket(s) of the protein. To reduce the search space and expedite the process of surface partitioning, we design probe radii according to the physicochemical textures of molecules. Our method achieves a success rate of 96% on a benchmark test set. We conduct a large-scale computation to identify approximately 19,000 split pockets from 11,328 structures (1.16 million potential pockets); for each pocket, we obtain residue composition, solvent-accessible area, and molecular volume. With this database of split pockets, our method can be used to predict the functional surfaces of unbound structures. Indeed, the functional surface of an unbound protein may often be found from its similarity to remotely related bound forms that belong to distinct folds. Finally, we apply our method to identify glucose-binding proteins, including unbound structures. Our study demonstrates the power of geometric and evolutionary matching for studying protein functional evolution and provides a framework for classifying protein functions by local spatial patterns of functional surfaces.


Asunto(s)
Proteínas/química , Secuencia de Aminoácidos , Animales , Simulación por Computador , Humanos , Ligandos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Alineación de Secuencia
20.
Artículo en Inglés | MEDLINE | ID: mdl-35261984

RESUMEN

With the rapid progress of cancer genome studies, many missense mutations in populations of somatic cells of different cancer types and at different stages have been identified. However, it is challenging to understand the implications of these cancer-related variants. We have developed a computational method that integrates structural, topographical, and evolutionary information for assessments of biochemical effects and the extent of deleteriousness of the cancer-related variants. We have mapped somatic missense mutations from the Catalogue of Somatic Mutations In Cancer (COSMIC) to 3D structures in the Protein Data Bank (PDB). Our results show that a large portion of these missense mutations is located on protein surface pockets, which often serve as a structural and functional unit of cancer variants. We provide detailed analysis of several examples and assessment on the importance of these variants, including prediction of previously unreported cancer-variants, along with independent evidence from the literature. Furthermore, we show our predictions can inform on the functional roles and the mechanism of predicted cancer variants.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA