RESUMO
AIMS/HYPOTHESIS: Type 2 diabetes is a chronic condition that is caused by hyperglycaemia. Our aim was to characterise the metabolomics to find their association with the glycaemic spectrum and find a causal relationship between metabolites and type 2 diabetes. METHODS: As part of the Innovative Medicines Initiative - Diabetes Research on Patient Stratification (IMI-DIRECT) consortium, 3000 plasma samples were measured with the Biocrates AbsoluteIDQ p150 Kit and Metabolon analytics. A total of 911 metabolites (132 targeted metabolomics, 779 untargeted metabolomics) passed the quality control. Multivariable linear and logistic regression analysis estimates were calculated from the concentration/peak areas of each metabolite as an explanatory variable and the glycaemic status as a dependent variable. This analysis was adjusted for age, sex, BMI, study centre in the basic model, and additionally for alcohol, smoking, BP, fasting HDL-cholesterol and fasting triacylglycerol in the full model. Statistical significance was Bonferroni corrected throughout. Beyond associations, we investigated the mediation effect and causal effects for which causal mediation test and two-sample Mendelian randomisation (2SMR) methods were used, respectively. RESULTS: In the targeted metabolomics, we observed four (15), 34 (99) and 50 (108) metabolites (number of metabolites observed in untargeted metabolomics appear in parentheses) that were significantly different when comparing normal glucose regulation vs impaired glucose regulation/prediabetes, normal glucose regulation vs type 2 diabetes, and impaired glucose regulation vs type 2 diabetes, respectively. Significant metabolites were mainly branched-chain amino acids (BCAAs), with some derivatised BCAAs, lipids, xenobiotics and a few unknowns. Metabolites such as lysophosphatidylcholine a C17:0, sum of hexoses, amino acids from BCAA metabolism (including leucine, isoleucine, valine, N-lactoylvaline, N-lactoylleucine and formiminoglutamate) and lactate, as well as an unknown metabolite (X-24295), were associated with HbA1c progression rate and were significant mediators of type 2 diabetes from baseline to 18 and 48 months of follow-up. 2SMR was used to estimate the causal effect of an exposure on an outcome using summary statistics from UK Biobank genome-wide association studies. We found that type 2 diabetes had a causal effect on the levels of three metabolites (hexose, glutamate and caproate [fatty acid (FA) 6:0]), whereas lipids such as specific phosphatidylcholines (PCs) (namely PC aa C36:2, PC aa C36:5, PC ae C36:3 and PC ae C34:3) as well as the two n-3 fatty acids stearidonate (18:4n3) and docosapentaenoate (22:5n3) potentially had a causal role in the development of type 2 diabetes. CONCLUSIONS/INTERPRETATION: Our findings identify known BCAAs and lipids, along with novel N-lactoyl-amino acid metabolites, significantly associated with prediabetes and diabetes, that mediate the effect of diabetes from baseline to follow-up (18 and 48 months). Causal inference using genetic variants shows the role of lipid metabolism and n-3 fatty acids as being causal for metabolite-to-type 2 diabetes whereas the sum of hexoses is causal for type 2 diabetes-to-metabolite. Identified metabolite markers are useful for stratifying individuals based on their risk progression and should enable targeted interventions.
RESUMO
Even though in the last few years several families of eukaryotic ß-barrel outer membrane proteins have been discovered, their computational characterization and their annotation in public databases are far from complete. The PFAM database includes only very few characteristic profiles for these families, and in most cases, the profile hidden Markov models (pHMMs) have been trained using prokaryotic and eukaryotic proteins together. Here, we present for the first time a comprehensive computational analysis of eukaryotic transmembrane ß-barrels. Twelve characteristic pHMMs were built, based on an extensive literature search, which can discriminate eukaryotic ß-barrels from other classes of proteins (globular and bacterial ß-barrel ones), as well as between mitochondrial and chloroplastic ones. We built eight novel profiles for the chloroplastic ß-barrel families that are not present in the PFAM database and also updated the profile for the MDM10 family (PF12519) in the PFAM database and divide the porin family (PF01459) into two separate families, namely, VDAC and TOM40.
Assuntos
Eucariotos , Porinas , Eucariotos/genética , Células Eucarióticas , Mitocôndrias , ProteínasRESUMO
MOTIVATION: Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. RESULTS: We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado de Máquina Supervisionado , Algoritmos , Cadeias de Markov , Modelos Estatísticos , Análise de SequênciaRESUMO
SUMMARY: JUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. AVAILABILITY AND IMPLEMENTATION: http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Software , Análise de SequênciaRESUMO
The Database of Protein Disorder (DisProt, URL: www.disprot.org) has been significantly updated and upgraded since its last major renewal in 2007. The current release holds information on more than 800 entries of IDPs/IDRs, i.e. intrinsically disordered proteins or regions that exist and function without a well-defined three-dimensional structure. We have re-curated previous entries to purge DisProt from conflicting cases, and also upgraded the functional classification scheme to reflect continuous advance in the field in the past 10 years or so. We define IDPs as proteins that are disordered along their entire sequence, i.e. entirely lack structural elements, and IDRs as regions that are at least five consecutive residues without well-defined structure. We base our assessment of disorder strictly on experimental evidence, such as X-ray crystallography and nuclear magnetic resonance (primary techniques) and a broad range of other experimental approaches (secondary techniques). Confident and ambiguous annotations are highlighted separately. DisProt 7.0 presents classified knowledge regarding the experimental characterization and functional annotations of IDPs/IDRs, and is intended to provide an invaluable resource for the research community for a better understanding structural disorder and for developing better computational tools for studying disordered proteins.
Assuntos
Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas , Animais , Cristalografia por Raios X , Transferência Ressonante de Energia de Fluorescência , Previsões , Controle de Formulários e Registros , Humanos , Proteínas Intrinsicamente Desordenadas/classificação , Ressonância Magnética Nuclear Biomolecular , Conformação ProteicaRESUMO
MOTIVATION: In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. RESULTS: The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. AVAILABILITY AND IMPLEMENTATION: A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. CONTACT: pbagos@compgen.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genética Populacional/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Metanálise como Assunto , Modelos Genéticos , Software , Predisposição Genética para Doença , Genômica/métodos , Humanos , Hipertensão/genética , Polimorfismo de Nucleotídeo Único , Estatística como AssuntoRESUMO
UNLABELLED: : Accurate topology prediction of transmembrane ß-barrels is still an open question. Here, we present BOCTOPUS2, an improved topology prediction method for transmembrane ß-barrels that can also identify the barrel domain, predict the topology and identify the orientation of residues in transmembrane ß-strands. The major novelty of BOCTOPUS2 is the use of the dyad-repeat pattern of lipid and pore facing residues observed in transmembrane ß-barrels. In a cross-validation test on a benchmark set of 42 proteins, BOCTOPUS2 predicts the correct topology in 69% of the proteins, an improvement of more than 10% over the best earlier method (BOCTOPUS) and in addition, it produces significantly fewer erroneous predictions on non-transmembrane ß-barrel proteins. AVAILABILITY AND IMPLEMENTATION: BOCTOPUS2 webserver along with full dataset and source code is available at http://boctopus.bioinfo.se/ CONTACT: : arne@bioinfo.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Proteínas de Membrana/química , Biologia Computacional , Modelos Moleculares , Linguagens de Programação , Estrutura Secundária de ProteínaRESUMO
MOTIVATION: The PRED-TMBB method is based on Hidden Markov Models and is capable of predicting the topology of beta-barrel outer membrane proteins and discriminate them from water-soluble ones. Here, we present an updated version of the method, PRED-TMBB2, with several newly developed features that improve its performance. The inclusion of a properly defined end state allows for better modeling of the beta-barrel domain, while different emission probabilities for the adjacent residues in strands are used to incorporate knowledge concerning the asymmetric amino acid distribution occurring there. Furthermore, the training was performed using newly developed algorithms in order to optimize the labels of the training sequences. Moreover, the method is retrained on a larger, non-redundant dataset which includes recently solved structures, and a newly developed decoding method was added to the already available options. Finally, the method now allows the incorporation of evolutionary information in the form of multiple sequence alignments. RESULTS: The results of a strict cross-validation procedure show that PRED-TMBB2 with homology information performs significantly better compared to other available prediction methods. It yields 76% in correct topology predictions and outperforms the best available predictor by 7%, with an overall SOV of 0.9. Regarding detection of beta-barrel proteins, PRED-TMBB2, using just the query sequence as input, achieves an MCC value of 0.92, outperforming even predictors designed for this task and are much slower. AVAILABILITY AND IMPLEMENTATION: The method, along with all datasets used, is freely available for academic users at http://www.compgen.org/tools/PRED-TMBB2 CONTACT: pbagos@compgen.org.
Assuntos
Proteínas de Membrana , Algoritmos , Biologia Computacional , Cadeias de Markov , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia de Sequência de AminoácidosRESUMO
MOTIVATION: The translocon recognizes sufficiently hydrophobic regions of a protein and inserts them into the membrane. Computational methods try to determine what hydrophobic regions are recognized by the translocon. Although these predictions are quite accurate, many methods still fail to distinguish marginally hydrophobic transmembrane (TM) helices and equally hydrophobic regions in soluble protein domains. In vivo, this problem is most likely avoided by targeting of the TM-proteins, so that non-TM proteins never see the translocon. Proteins are targeted to the translocon by an N-terminal signal peptide. The targeting is also aided by the fact that the N-terminal helix is more hydrophobic than other TM-helices. In addition, we also recently found that the C-terminal helix is more hydrophobic than central helices. This information has not been used in earlier topology predictors. RESULTS: Here, we use the fact that the N- and C-terminal helices are more hydrophobic to develop a new version of the first-principle-based topology predictor, SCAMPI. The new predictor has two main advantages; first, it can be used to efficiently separate membrane and non-membrane proteins directly without the use of an extra prefilter, and second it shows improved performance for predicting the topology of membrane proteins that contain large non-membrane domains. AVAILABILITY AND IMPLEMENTATION: The predictor, a web server and all datasets are available at http://scampi.bioinfo.se/ CONTACT: arne@bioinfo.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Interações Hidrofóbicas e Hidrofílicas , Estrutura Secundária de Proteína , Biologia Computacional , Previsões , Proteínas de Membrana , Sinais Direcionadores de ProteínasRESUMO
TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions.
Assuntos
Proteínas de Membrana/química , Sinais Direcionadores de Proteínas , Software , Algoritmos , Humanos , Internet , Conformação Proteica , Homologia Estrutural de ProteínaRESUMO
CONTEXT: The role of glucagon-like peptide-1 (GLP-1) in type 2 diabetes (T2D) and obesity is not fully understood. OBJECTIVE: We investigate the association of cardiometabolic, diet, and lifestyle parameters on fasting and postprandial GLP-1 in people at risk of, or living with, T2D. METHODS: We analyzed cross-sectional data from the two Innovative Medicines Initiative (IMI) Diabetes Research on Patient Stratification (DIRECT) cohorts, cohort 1 (n = 2127) individuals at risk of diabetes; cohort 2 (n = 789) individuals with new-onset T2D. RESULTS: Our multiple regression analysis reveals that fasting total GLP-1 is associated with an insulin-resistant phenotype and observe a strong independent relationship with male sex, increased adiposity, and liver fat, particularly in the prediabetes population. In contrast, we showed that incremental GLP-1 decreases with worsening glycemia, higher adiposity, liver fat, male sex, and reduced insulin sensitivity in the prediabetes cohort. Higher fasting total GLP-1 was associated with a low intake of wholegrain, fruit, and vegetables in people with prediabetes, and with a high intake of red meat and alcohol in people with diabetes. CONCLUSION: These studies provide novel insights into the association between fasting and incremental GLP-1, metabolic traits of diabetes and obesity, and dietary intake, and raise intriguing questions regarding the relevance of fasting GLP-1 in the pathophysiology T2D.
Assuntos
Diabetes Mellitus Tipo 2 , Dieta , Peptídeo 1 Semelhante ao Glucagon , Estilo de Vida , Estado Pré-Diabético , Humanos , Masculino , Feminino , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/metabolismo , Peptídeo 1 Semelhante ao Glucagon/sangue , Peptídeo 1 Semelhante ao Glucagon/metabolismo , Estudos Transversais , Pessoa de Meia-Idade , Estado Pré-Diabético/sangue , Estado Pré-Diabético/metabolismo , Idoso , Adulto , Resistência à Insulina , Jejum/sangue , Obesidade/sangue , Obesidade/metabolismo , Estudos de Coortes , Glicemia/metabolismo , Glicemia/análise , Adiposidade/fisiologiaRESUMO
We describe here OMPdb, which is currently the most complete and comprehensive collection of integral ß-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69,354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family's domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane ß-barrels.
Assuntos
Proteínas da Membrana Bacteriana Externa/química , Bases de Dados de Proteínas , Proteínas da Membrana Bacteriana Externa/classificação , Bactérias Gram-Negativas , Estrutura Terciária de ProteínaRESUMO
We evaluate the shared genetic regulation of mRNA molecules, proteins and metabolites derived from whole blood from 3029 human donors. We find abundant allelic heterogeneity, where multiple variants regulate a particular molecular phenotype, and pleiotropy, where a single variant associates with multiple molecular phenotypes over multiple genomic regions. The highest proportion of share genetic regulation is detected between gene expression and proteins (66.6%), with a further median shared genetic associations across 49 different tissues of 78.3% and 62.4% between plasma proteins and gene expression. We represent the genetic and molecular associations in networks including 2828 known GWAS variants, showing that GWAS variants are more often connected to gene expression in trans than other molecular phenotypes in the network. Our work provides a roadmap to understanding molecular networks and deriving the underlying mechanism of action of GWAS variants using different molecular phenotypes in an accessible tissue.
Assuntos
Genômica , Herança Multifatorial , Humanos , Fenótipo , RNA Mensageiro , PesquisadoresRESUMO
For current state-of-the-art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased data sets obtained from protein structures or biochemical assays. Here, we test a number of topology predictors on an "unseen" set of proteins of known structure and also on four "genome-scale" data sets, including one recent large set of experimentally validated human membrane proteins with glycosylated sites. The set of glycosylated proteins is also used to examine the ability of prediction methods to separate membrane from nonmembrane proteins. The results show that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method that combines several of the other prediction methods. The best methods to distinguish membrane from nonmembrane proteins belong to the "Phobius" group of predictors. We further observe that the reported high accuracies in the smaller benchmark sets are not quite maintained in larger scale benchmarks. Instead, we estimate the performance of the best prediction methods for eukaryotic membrane proteins to be between 60% and 70%. The low agreement between predictions from different methods questions earlier estimates about the global properties of the membrane proteome. Finally, we suggest a pipeline to estimate these properties using a combination of the best predictors that could be applied in large-scale proteomics studies of membrane proteins.
Assuntos
Biologia Computacional/métodos , Proteínas de Membrana/química , Proteoma/química , Bases de Dados de Proteínas , Glicosilação , Humanos , Modelos Lineares , Estrutura Secundária de Proteína , Alinhamento de SequênciaRESUMO
Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.
Assuntos
Idioma , Sinais Direcionadores de Proteínas , Algoritmos , Sequência de Aminoácidos , Sinais Direcionadores de Proteínas/genética , ProteínasRESUMO
MOTIVATION: Computational prediction of signal peptides is of great importance in computational biology. In addition to the general secretory pathway (Sec), Bacteria, Archaea and chloroplasts possess another major pathway that utilizes the Twin-Arginine translocase (Tat), which recognizes longer and less hydrophobic signal peptides carrying a distinctive pattern of two consecutive Arginines (RR) in the n-region. A major functional differentiation between the Sec and Tat export pathways lies in the fact that the former translocates secreted proteins unfolded through a protein-conducting channel, whereas the latter translocates completely folded proteins using an unknown mechanism. The purpose of this work is to develop a novel method for predicting and discriminating Sec from Tat signal peptides at better accuracy. RESULTS: We report the development of a novel method, PRED-TAT, which is capable of discriminating Sec from Tat signal peptides and predicting their cleavage sites. The method is based on Hidden Markov Models and possesses a modular architecture suitable for both Sec and Tat signal peptides. On an independent test set of experimentally verified Tat signal peptides, PRED-TAT clearly outperforms the previously proposed methods TatP and TATFIND, whereas, when evaluated as a Sec signal peptide predictor compares favorably to top-scoring predictors such as SignalP and Phobius. The method is freely available for academic users at http://www.compgen.org/tools/PRED-TAT/.
Assuntos
Biologia Computacional/métodos , Cadeias de Markov , Sinais Direcionadores de Proteínas , Bases de Dados de Proteínas , Proteínas de Membrana Transportadoras/química , Dobramento de Proteína , Via SecretóriaRESUMO
UNLABELLED: ExTopoDB is a publicly accessible database of experimentally derived topological models of transmembrane proteins. It contains information collected from studies in the literature that report the use of biochemical methods for the determination of the topology of α-helical transmembrane proteins. Transmembrane protein topology is highly important in order to understand their function and ExTopoDB provides an up to date, complete and comprehensive dataset of experimentally determined topologies of α-helical transmembrane proteins. Topological information is combined with transmembrane topology prediction resulting in more reliable topological models. AVAILABILITY: http://bioinformatics.biol.uoa.gr/ExTopoDB.
Assuntos
Bases de Dados de Proteínas , Proteínas de Membrana/química , Software , Conformação Proteica , Análise de Sequência de ProteínaRESUMO
OMPdb (www.ompdb.org) was introduced as a database for ß-barrel outer membrane proteins from Gram-negative bacteria in 2011 and then included 69,354 entries classified into 85 families. The database has been updated continuously using a collection of characteristic profile Hidden Markov Models able to discriminate between the different families of prokaryotic transmembrane ß-barrels. The number of families has increased ultimately to a total of 129 families in the current, second major version of OMPdb. New additions have been made in parallel with efforts to update existing families and add novel families. Here, we present the upgrade of OMPdb, which from now on aims to become a global repository for all transmembrane ß-barrel proteins, both eukaryotic and bacterial.
RESUMO
Hidden Markov Models (HMMs) are amongst the most successful methods for predicting protein features in biological sequence analysis. However, there are biological problems where the Markovian assumption is not sufficient since the sequence context can provide useful information for prediction purposes. Several extensions of HMMs have appeared in the literature in order to overcome their limitations. We apply here a hybrid method that combines HMMs and Neural Networks (NNs), termed Hidden Neural Networks (HNNs), for biological sequence analysis in a straightforward manner. In this framework, the traditional HMM probability parameters are replaced by NN outputs. As a case study, we focus on the topology prediction of for alpha-helical and beta-barrel membrane proteins. The HNNs show performance gains compared to standard HMMs and the respective predictors outperform the top-scoring methods in the field. The implementation of HNNs can be found in the package JUCHMME, downloadable from http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. The updated PRED-TMBB2 and HMM-TM prediction servers can be accessed at www.compgen.org.