Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
Add more filters

Publication year range
1.
Nucleic Acids Res ; 51(D1): D1353-D1359, 2023 Jan 06.
Article in English | MEDLINE | ID: mdl-36399499

ABSTRACT

The Open Targets Platform (https://platform.opentargets.org/) is an open source resource to systematically assist drug target identification and prioritisation using publicly available data. Since our last update, we have reimagined, redesigned, and rebuilt the Platform in order to streamline data integration and harmonisation, expand the ways in which users can explore the data, and improve the user experience. The gene-disease causal evidence has been enhanced and expanded to better capture disease causality across rare, common, and somatic diseases. For target and drug annotations, we have incorporated new features that help assess target safety and tractability, including genetic constraint, PROTACtability assessments, and AlphaFold structure predictions. We have also introduced new machine learning applications for knowledge extraction from the published literature, clinical trial information, and drug labels. The new technologies and frameworks introduced since the last update will ease the introduction of new features and the creation of separate instances of the Platform adapted to user requirements. Our new Community forum, expanded training materials, and outreach programme support our users in a range of use cases.

2.
J Proteome Res ; 19(3): 1209-1221, 2020 03 06.
Article in English | MEDLINE | ID: mdl-32008325

ABSTRACT

Even though in the last few years several families of eukaryotic ß-barrel outer membrane proteins have been discovered, their computational characterization and their annotation in public databases are far from complete. The PFAM database includes only very few characteristic profiles for these families, and in most cases, the profile hidden Markov models (pHMMs) have been trained using prokaryotic and eukaryotic proteins together. Here, we present for the first time a comprehensive computational analysis of eukaryotic transmembrane ß-barrels. Twelve characteristic pHMMs were built, based on an extensive literature search, which can discriminate eukaryotic ß-barrels from other classes of proteins (globular and bacterial ß-barrel ones), as well as between mitochondrial and chloroplastic ones. We built eight novel profiles for the chloroplastic ß-barrel families that are not present in the PFAM database and also updated the profile for the MDM10 family (PF12519) in the PFAM database and divide the porin family (PF01459) into two separate families, namely, VDAC and TOM40.


Subject(s)
Eukaryota , Porins , Eukaryota/genetics , Eukaryotic Cells , Mitochondria , Proteins
3.
Bioinformatics ; 35(13): 2208-2215, 2019 07 01.
Article in English | MEDLINE | ID: mdl-30445435

ABSTRACT

MOTIVATION: Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. RESULTS: We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Supervised Machine Learning , Algorithms , Markov Chains , Models, Statistical , Sequence Analysis
4.
Bioinformatics ; 35(24): 5309-5312, 2019 12 15.
Article in English | MEDLINE | ID: mdl-31250907

ABSTRACT

SUMMARY: JUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. AVAILABILITY AND IMPLEMENTATION: http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Sequence Analysis
5.
Nucleic Acids Res ; 45(D1): D219-D227, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899601

ABSTRACT

The Database of Protein Disorder (DisProt, URL: www.disprot.org) has been significantly updated and upgraded since its last major renewal in 2007. The current release holds information on more than 800 entries of IDPs/IDRs, i.e. intrinsically disordered proteins or regions that exist and function without a well-defined three-dimensional structure. We have re-curated previous entries to purge DisProt from conflicting cases, and also upgraded the functional classification scheme to reflect continuous advance in the field in the past 10 years or so. We define IDPs as proteins that are disordered along their entire sequence, i.e. entirely lack structural elements, and IDRs as regions that are at least five consecutive residues without well-defined structure. We base our assessment of disorder strictly on experimental evidence, such as X-ray crystallography and nuclear magnetic resonance (primary techniques) and a broad range of other experimental approaches (secondary techniques). Confident and ambiguous annotations are highlighted separately. DisProt 7.0 presents classified knowledge regarding the experimental characterization and functional annotations of IDPs/IDRs, and is intended to provide an invaluable resource for the research community for a better understanding structural disorder and for developing better computational tools for studying disordered proteins.


Subject(s)
Databases, Protein , Intrinsically Disordered Proteins , Animals , Crystallography, X-Ray , Fluorescence Resonance Energy Transfer , Forecasting , Forms and Records Control , Humans , Intrinsically Disordered Proteins/classification , Nuclear Magnetic Resonance, Biomolecular , Protein Conformation
6.
Bioinformatics ; 33(10): 1521-1527, 2017 May 15.
Article in English | MEDLINE | ID: mdl-28108451

ABSTRACT

MOTIVATION: In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. RESULTS: The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. AVAILABILITY AND IMPLEMENTATION: A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. CONTACT: pbagos@compgen.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetics, Population/methods , Genome-Wide Association Study/statistics & numerical data , Meta-Analysis as Topic , Models, Genetic , Software , Genetic Predisposition to Disease , Genomics/methods , Humans , Hypertension/genetics , Polymorphism, Single Nucleotide , Statistics as Topic
7.
Bioinformatics ; 32(8): 1158-62, 2016 04 15.
Article in English | MEDLINE | ID: mdl-26644416

ABSTRACT

MOTIVATION: The translocon recognizes sufficiently hydrophobic regions of a protein and inserts them into the membrane. Computational methods try to determine what hydrophobic regions are recognized by the translocon. Although these predictions are quite accurate, many methods still fail to distinguish marginally hydrophobic transmembrane (TM) helices and equally hydrophobic regions in soluble protein domains. In vivo, this problem is most likely avoided by targeting of the TM-proteins, so that non-TM proteins never see the translocon. Proteins are targeted to the translocon by an N-terminal signal peptide. The targeting is also aided by the fact that the N-terminal helix is more hydrophobic than other TM-helices. In addition, we also recently found that the C-terminal helix is more hydrophobic than central helices. This information has not been used in earlier topology predictors. RESULTS: Here, we use the fact that the N- and C-terminal helices are more hydrophobic to develop a new version of the first-principle-based topology predictor, SCAMPI. The new predictor has two main advantages; first, it can be used to efficiently separate membrane and non-membrane proteins directly without the use of an extra prefilter, and second it shows improved performance for predicting the topology of membrane proteins that contain large non-membrane domains. AVAILABILITY AND IMPLEMENTATION: The predictor, a web server and all datasets are available at http://scampi.bioinfo.se/ CONTACT: arne@bioinfo.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Hydrophobic and Hydrophilic Interactions , Protein Structure, Secondary , Computational Biology , Forecasting , Membrane Proteins , Protein Sorting Signals
8.
Bioinformatics ; 32(10): 1571-3, 2016 05 15.
Article in English | MEDLINE | ID: mdl-26794316

ABSTRACT

UNLABELLED: : Accurate topology prediction of transmembrane ß-barrels is still an open question. Here, we present BOCTOPUS2, an improved topology prediction method for transmembrane ß-barrels that can also identify the barrel domain, predict the topology and identify the orientation of residues in transmembrane ß-strands. The major novelty of BOCTOPUS2 is the use of the dyad-repeat pattern of lipid and pore facing residues observed in transmembrane ß-barrels. In a cross-validation test on a benchmark set of 42 proteins, BOCTOPUS2 predicts the correct topology in 69% of the proteins, an improvement of more than 10% over the best earlier method (BOCTOPUS) and in addition, it produces significantly fewer erroneous predictions on non-transmembrane ß-barrel proteins. AVAILABILITY AND IMPLEMENTATION: BOCTOPUS2 webserver along with full dataset and source code is available at http://boctopus.bioinfo.se/ CONTACT: : arne@bioinfo.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Membrane Proteins/chemistry , Computational Biology , Models, Molecular , Programming Languages , Protein Structure, Secondary
9.
Bioinformatics ; 32(17): i665-i671, 2016 09 01.
Article in English | MEDLINE | ID: mdl-27587687

ABSTRACT

MOTIVATION: The PRED-TMBB method is based on Hidden Markov Models and is capable of predicting the topology of beta-barrel outer membrane proteins and discriminate them from water-soluble ones. Here, we present an updated version of the method, PRED-TMBB2, with several newly developed features that improve its performance. The inclusion of a properly defined end state allows for better modeling of the beta-barrel domain, while different emission probabilities for the adjacent residues in strands are used to incorporate knowledge concerning the asymmetric amino acid distribution occurring there. Furthermore, the training was performed using newly developed algorithms in order to optimize the labels of the training sequences. Moreover, the method is retrained on a larger, non-redundant dataset which includes recently solved structures, and a newly developed decoding method was added to the already available options. Finally, the method now allows the incorporation of evolutionary information in the form of multiple sequence alignments. RESULTS: The results of a strict cross-validation procedure show that PRED-TMBB2 with homology information performs significantly better compared to other available prediction methods. It yields 76% in correct topology predictions and outperforms the best available predictor by 7%, with an overall SOV of 0.9. Regarding detection of beta-barrel proteins, PRED-TMBB2, using just the query sequence as input, achieves an MCC value of 0.92, outperforming even predictors designed for this task and are much slower. AVAILABILITY AND IMPLEMENTATION: The method, along with all datasets used, is freely available for academic users at http://www.compgen.org/tools/PRED-TMBB2 CONTACT: pbagos@compgen.org.


Subject(s)
Membrane Proteins , Algorithms , Computational Biology , Markov Chains , Protein Structure, Secondary , Sequence Alignment , Sequence Homology, Amino Acid
10.
Nucleic Acids Res ; 43(W1): W401-7, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-25969446

ABSTRACT

TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions.


Subject(s)
Membrane Proteins/chemistry , Protein Sorting Signals , Software , Algorithms , Humans , Internet , Protein Conformation , Structural Homology, Protein
12.
Article in English | MEDLINE | ID: mdl-38686701

ABSTRACT

CONTEXT: The role of glucagon-like peptide-1(GLP-1) in Type 2 diabetes (T2D) and obesity is not fully understood. OBJECTIVE: We investigate the association of cardiometabolic, diet and lifestyle parameters on fasting and postprandial GLP-1 in people at risk of, or living with, T2D. METHOD: We analysed cross-sectional data from the two Innovative Medicines Initiative (IMI) Diabetes Research on Patient Stratification (DIRECT) cohorts, cohort 1(n=2127) individuals at risk of diabetes; cohort 2 (n=789) individuals with new-onset of T2D. RESULTS: Our multiple regression analysis reveals that fasting total GLP-1 is associated with an insulin resistant phenotype and observe a strong independent relationship with male sex, increased adiposity and liver fat particularly in the prediabetes population. In contrast, we showed that incremental GLP-1 decreases with worsening glycaemia, higher adiposity, liver fat, male sex and reduced insulin sensitivity in the prediabetes cohort. Higher fasting total GLP-1 was associated with a low intake of wholegrain, fruit and vegetables inpeople with prediabetes, and with a high intake of red meat and alcohol in people with diabetes. CONCLUSION: These studies provide novel insights into the association between fasting and incremental GLP-1, metabolic traits of diabetes and obesity, and dietary intake and raise intriguing questions regarding the relevance of fasting GLP-1 in the pathophysiology T2D.

13.
Nucleic Acids Res ; 39(Database issue): D324-31, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20952406

ABSTRACT

We describe here OMPdb, which is currently the most complete and comprehensive collection of integral ß-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69,354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family's domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane ß-barrels.


Subject(s)
Bacterial Outer Membrane Proteins/chemistry , Databases, Protein , Bacterial Outer Membrane Proteins/classification , Gram-Negative Bacteria , Protein Structure, Tertiary
14.
Nat Commun ; 14(1): 5062, 2023 08 21.
Article in English | MEDLINE | ID: mdl-37604891

ABSTRACT

We evaluate the shared genetic regulation of mRNA molecules, proteins and metabolites derived from whole blood from 3029 human donors. We find abundant allelic heterogeneity, where multiple variants regulate a particular molecular phenotype, and pleiotropy, where a single variant associates with multiple molecular phenotypes over multiple genomic regions. The highest proportion of share genetic regulation is detected between gene expression and proteins (66.6%), with a further median shared genetic associations across 49 different tissues of 78.3% and 62.4% between plasma proteins and gene expression. We represent the genetic and molecular associations in networks including 2828 known GWAS variants, showing that GWAS variants are more often connected to gene expression in trans than other molecular phenotypes in the network. Our work provides a roadmap to understanding molecular networks and deriving the underlying mechanism of action of GWAS variants using different molecular phenotypes in an accessible tissue.


Subject(s)
Genomics , Multifactorial Inheritance , Humans , Phenotype , RNA, Messenger , Research Personnel
15.
Proteomics ; 12(14): 2282-94, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22685073

ABSTRACT

For current state-of-the-art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased data sets obtained from protein structures or biochemical assays. Here, we test a number of topology predictors on an "unseen" set of proteins of known structure and also on four "genome-scale" data sets, including one recent large set of experimentally validated human membrane proteins with glycosylated sites. The set of glycosylated proteins is also used to examine the ability of prediction methods to separate membrane from nonmembrane proteins. The results show that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method that combines several of the other prediction methods. The best methods to distinguish membrane from nonmembrane proteins belong to the "Phobius" group of predictors. We further observe that the reported high accuracies in the smaller benchmark sets are not quite maintained in larger scale benchmarks. Instead, we estimate the performance of the best prediction methods for eukaryotic membrane proteins to be between 60% and 70%. The low agreement between predictions from different methods questions earlier estimates about the global properties of the membrane proteome. Finally, we suggest a pipeline to estimate these properties using a combination of the best predictors that could be applied in large-scale proteomics studies of membrane proteins.


Subject(s)
Computational Biology/methods , Membrane Proteins/chemistry , Proteome/chemistry , Databases, Protein , Glycosylation , Humans , Linear Models , Protein Structure, Secondary , Sequence Alignment
16.
Nat Biotechnol ; 40(7): 1023-1025, 2022 07.
Article in English | MEDLINE | ID: mdl-34980915

ABSTRACT

Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.


Subject(s)
Language , Protein Sorting Signals , Algorithms , Amino Acid Sequence , Protein Sorting Signals/genetics , Proteins
17.
Bioinformatics ; 26(22): 2811-7, 2010 Nov 15.
Article in English | MEDLINE | ID: mdl-20847219

ABSTRACT

MOTIVATION: Computational prediction of signal peptides is of great importance in computational biology. In addition to the general secretory pathway (Sec), Bacteria, Archaea and chloroplasts possess another major pathway that utilizes the Twin-Arginine translocase (Tat), which recognizes longer and less hydrophobic signal peptides carrying a distinctive pattern of two consecutive Arginines (RR) in the n-region. A major functional differentiation between the Sec and Tat export pathways lies in the fact that the former translocates secreted proteins unfolded through a protein-conducting channel, whereas the latter translocates completely folded proteins using an unknown mechanism. The purpose of this work is to develop a novel method for predicting and discriminating Sec from Tat signal peptides at better accuracy. RESULTS: We report the development of a novel method, PRED-TAT, which is capable of discriminating Sec from Tat signal peptides and predicting their cleavage sites. The method is based on Hidden Markov Models and possesses a modular architecture suitable for both Sec and Tat signal peptides. On an independent test set of experimentally verified Tat signal peptides, PRED-TAT clearly outperforms the previously proposed methods TatP and TATFIND, whereas, when evaluated as a Sec signal peptide predictor compares favorably to top-scoring predictors such as SignalP and Phobius. The method is freely available for academic users at http://www.compgen.org/tools/PRED-TAT/.


Subject(s)
Computational Biology/methods , Markov Chains , Protein Sorting Signals , Databases, Protein , Membrane Transport Proteins/chemistry , Protein Folding , Secretory Pathway
18.
Bioinformatics ; 26(19): 2490-2, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20601677

ABSTRACT

UNLABELLED: ExTopoDB is a publicly accessible database of experimentally derived topological models of transmembrane proteins. It contains information collected from studies in the literature that report the use of biochemical methods for the determination of the topology of α-helical transmembrane proteins. Transmembrane protein topology is highly important in order to understand their function and ExTopoDB provides an up to date, complete and comprehensive dataset of experimentally determined topologies of α-helical transmembrane proteins. Topological information is combined with transmembrane topology prediction resulting in more reliable topological models. AVAILABILITY: http://bioinformatics.biol.uoa.gr/ExTopoDB.


Subject(s)
Databases, Protein , Membrane Proteins/chemistry , Software , Protein Conformation , Sequence Analysis, Protein
19.
Comput Struct Biotechnol J ; 19: 6090-6097, 2021.
Article in English | MEDLINE | ID: mdl-34849210

ABSTRACT

Hidden Markov Models (HMMs) are amongst the most successful methods for predicting protein features in biological sequence analysis. However, there are biological problems where the Markovian assumption is not sufficient since the sequence context can provide useful information for prediction purposes. Several extensions of HMMs have appeared in the literature in order to overcome their limitations. We apply here a hybrid method that combines HMMs and Neural Networks (NNs), termed Hidden Neural Networks (HNNs), for biological sequence analysis in a straightforward manner. In this framework, the traditional HMM probability parameters are replaced by NN outputs. As a case study, we focus on the topology prediction of for alpha-helical and beta-barrel membrane proteins. The HNNs show performance gains compared to standard HMMs and the respective predictors outperform the top-scoring methods in the field. The implementation of HNNs can be found in the package JUCHMME, downloadable from http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. The updated PRED-TMBB2 and HMM-TM prediction servers can be accessed at www.compgen.org.

20.
Front Bioinform ; 1: 646581, 2021.
Article in English | MEDLINE | ID: mdl-36303794

ABSTRACT

OMPdb (www.ompdb.org) was introduced as a database for ß-barrel outer membrane proteins from Gram-negative bacteria in 2011 and then included 69,354 entries classified into 85 families. The database has been updated continuously using a collection of characteristic profile Hidden Markov Models able to discriminate between the different families of prokaryotic transmembrane ß-barrels. The number of families has increased ultimately to a total of 129 families in the current, second major version of OMPdb. New additions have been made in parallel with efforts to update existing families and add novel families. Here, we present the upgrade of OMPdb, which from now on aims to become a global repository for all transmembrane ß-barrel proteins, both eukaryotic and bacterial.

SELECTION OF CITATIONS
SEARCH DETAIL