Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Proteome encoded determinants of protein sorting into extracellular vesicles.

Waury, Katharina; Gogishvili, Dea; Nieuwland, Rienk; Chatterjee, Madhurima; Teunissen, Charlotte E; Abeln, Sanne.

J Extracell Biol ; 3(1): e120, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38938677

RESUMO

Extracellular vesicles (EVs) are membranous structures released by cells into the extracellular space and are thought to be involved in cell-to-cell communication. While EVs and their cargo are promising biomarker candidates, sorting mechanisms of proteins to EVs remain unclear. In this study, we ask if it is possible to determine EV association based on the protein sequence. Additionally, we ask what the most important determinants are for EV association. We answer these questions with explainable AI models, using human proteome data from EV databases to train and validate the model. It is essential to correct the datasets for contaminants introduced by coarse EV isolation workflows and for experimental bias caused by mass spectrometry. In this study, we show that it is indeed possible to predict EV association from the protein sequence: a simple sequence-based model for predicting EV proteins achieved an area under the curve of 0.77 ± 0.01, which increased further to 0.84 ± 0.00 when incorporating curated post-translational modification (PTM) annotations. Feature analysis shows that EV-associated proteins are stable, polar, and structured with low isoelectric point compared to non-EV proteins. PTM annotations emerged as the most important features for correct classification; specifically, palmitoylation is one of the most prevalent EV sorting mechanisms for unique proteins. Palmitoylation and nitrosylation sites are especially prevalent in EV proteins that are determined by very strict isolation protocols, indicating they could potentially serve as quality control criteria for future studies. This computational study offers an effective sequence-based predictor of EV associated proteins with extensive characterisation of the human EV proteome that can explain for individual proteins which factors contribute to their EV association.

2.

The immunological landscape of peripheral blood in glioblastoma patients and immunological consequences of age and dexamethasone treatment.

Dusoswa, Sophie A; Verhoeff, Jan; van Asten, Saskia; Lübbers, Joyce; van den Braber, Marlous; Peters, Sophie; Abeln, Sanne; Crommentuijn, Matheus H W; Wesseling, Pieter; Vandertop, William Peter; Twisk, Jos W R; Würdinger, Thomas; Noske, David; van Kooyk, Yvette; Garcia-Vallejo, Juan J.

Front Immunol ; 15: 1343484, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38318180

RESUMO

Background: Glioblastomas manipulate the immune system both locally and systemically, yet, glioblastoma-associated changes in peripheral blood immune composition are poorly studied. Age and dexamethasone administration in glioblastoma patients have been hypothesized to limit the effectiveness of immunotherapy, but their effects remain unclear. We compared peripheral blood immune composition in patients with different types of brain tumor to determine the influence of age, dexamethasone treatment, and tumor volume. Methods: High-dimensional mass cytometry was used to characterise peripheral blood mononuclear cells of 169 patients with glioblastoma, lower grade astrocytoma, metastases and meningioma. We used blood from medically-refractory epilepsy patients and healthy controls as control groups. Immune phenotyping was performed using FlowSOM and t-SNE analysis in R followed by supervised annotation of the resulting clusters. We conducted multiple linear regression analysis between intracranial pathology and cell type abundance, corrected for clinical variables. We tested correlations between cell type abundance and survival with Cox-regression analyses. Results: Glioblastoma patients had significantly fewer naive CD4+ T cells, but higher percentages of mature NK cells than controls. Decreases of naive CD8+ T cells and alternative monocytes and an increase of memory B cells in glioblastoma patients were influenced by age and dexamethasone treatment, and only memory B cells by tumor volume. Progression free survival was associated with percentages of CD4+ regulatory T cells and double negative T cells. Conclusion: High-dimensional mass cytometry of peripheral blood in patients with different types of intracranial tumor provides insight into the relation between intracranial pathology and peripheral immune status. Wide immunosuppression associated with age and pre-operative dexamethasone treatment provide further evidence for their deleterious effects on treatment with immunotherapy.

Assuntos

Glioblastoma , Humanos , Glioblastoma/tratamento farmacológico , Glioblastoma/patologia , Leucócitos Mononucleares/patologia , Linfócitos T CD4-Positivos , Imunoterapia/métodos , Dexametasona/uso terapêutico

3.

Structural flexibility and heterogeneity of recombinant human glial fibrillary acidic protein (GFAP).

Gogishvili, Dea; Illes-Toth, Eva; Harris, Matthew J; Hopley, Christopher; Teunissen, Charlotte E; Abeln, Sanne.

Proteins ; 92(5): 649-664, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38149328

RESUMO

Glial fibrillary acidic protein (GFAP) is a promising biomarker for brain and spinal cord disorders. Recent studies have highlighted the differences in the reliability of GFAP measurements in different biological matrices. The reason for these discrepancies is poorly understood as our knowledge of the protein's 3-dimensional conformation, proteoforms, and aggregation remains limited. Here, we investigate the structural properties of GFAP under different conditions. For this, we characterized recombinant GFAP proteins from various suppliers and applied hydrogen-deuterium exchange mass spectrometry (HDX-MS) to provide a snapshot of the conformational dynamics of GFAP in artificial cerebrospinal fluid (aCSF) compared to the phosphate buffer. Our findings indicate that recombinant GFAP exists in various conformational species. Furthermore, we show that GFAP dimers remained intact under denaturing conditions. HDX-MS experiments show an overall decrease in H-bonding and an increase in solvent accessibility of GFAP in aCSF compared to the phosphate buffer, with clear indications of mixed EX2 and EX1 kinetics. To understand possible structural interface regions and the evolutionary conservation profiles, we combined HDX-MS results with the predicted GFAP-dimer structure by AlphaFold-Multimer. We found that deprotected regions with high structural flexibility in aCSF overlap with predicted conserved dimeric 1B and 2B domain interfaces. Structural property predictions combined with the HDX data show an overall deprotection and signatures of aggregation in aCSF. We anticipate that the outcomes of this research will contribute to a deeper understanding of the structural flexibility of GFAP and ultimately shed light on its behavior in different biological matrices.

Assuntos

Medição da Troca de Deutério , Proteína Glial Fibrilar Ácida , Fosfatos , Humanos , Medição da Troca de Deutério/métodos , Proteína Glial Fibrilar Ácida/química , Proteína Glial Fibrilar Ácida/genética , Proteína Glial Fibrilar Ácida/fisiologia , Conformação Proteica , Reprodutibilidade dos Testes , Proteínas Recombinantes

4.

Deciphering Protein Secretion from the Brain to Cerebrospinal Fluid for Biomarker Discovery.

Waury, Katharina; de Wit, Renske; Verberk, Inge M W; Teunissen, Charlotte E; Abeln, Sanne.

J Proteome Res ; 22(9): 3068-3080, 2023 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-37606934

RESUMO

Cerebrospinal fluid (CSF) is an essential matrix for the discovery of neurological disease biomarkers. However, the high dynamic range of protein concentrations in CSF hinders the detection of the least abundant protein biomarkers by untargeted mass spectrometry. It is thus beneficial to gain a deeper understanding of the secretion processes within the brain. Here, we aim to explore if and how the secretion of brain proteins to the CSF can be predicted. By combining a curated CSF proteome and the brain elevated proteome of the Human Protein Atlas, brain proteins were classified as CSF or non-CSF secreted. A machine learning model was trained on a range of sequence-based features to differentiate between CSF and non-CSF groups and effectively predict the brain origin of proteins. The classification model achieves an area under the curve of 0.89 if using high confidence CSF proteins. The most important prediction features include the subcellular localization, signal peptides, and transmembrane regions. The classifier generalized well to the larger brain detected proteome and is able to correctly predict novel CSF proteins identified by affinity proteomics. In addition to elucidating the underlying mechanisms of protein secretion, the trained classification model can support biomarker candidate selection.

Assuntos

Pesquisa Biomédica , Proteoma , Humanos , Encéfalo , Transporte Proteico , Transporte Biológico , Proteínas do Líquido Cefalorraquidiano

5.

Methods to Discover and Validate Biofluid-Based Biomarkers in Neurodegenerative Dementias.

Teunissen, Charlotte E; Kimble, Leighann; Bayoumy, Sherif; Bolsewig, Katharina; Burtscher, Felicia; Coppens, Salomé; Das, Shreyasee; Gogishvili, Dea; Fernandes Gomes, Bárbara; Gómez de San José, Nerea; Mavrina, Ekaterina; Meda, Francisco J; Mohaupt, Pablo; Mravinacová, Sára; Waury, Katharina; Wojdala, Anna Lidia; Abeln, Sanne; Chiasserini, Davide; Hirtz, Christophe; Gaetani, Lorenzo; Vermunt, Lisa; Bellomo, Giovanni; Halbgebauer, Steffen; Lehmann, Sylvain; Månberg, Anna; Nilsson, Peter; Otto, Markus; Vanmechelen, Eugeen; Verberk, Inge M W; Willemse, Eline; Zetterberg, Henrik.

Mol Cell Proteomics ; 22(10): 100629, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37557955

RESUMO

Neurodegenerative dementias are progressive diseases that cause neuronal network breakdown in different brain regions often because of accumulation of misfolded proteins in the brain extracellular matrix, such as amyloids or inside neurons or other cell types of the brain. Several diagnostic protein biomarkers in body fluids are being used and implemented, such as for Alzheimer's disease. However, there is still a lack of biomarkers for co-pathologies and other causes of dementia. Such biofluid-based biomarkers enable precision medicine approaches for diagnosis and treatment, allow to learn more about underlying disease processes, and facilitate the development of patient inclusion and evaluation tools in clinical trials. When designing studies to discover novel biofluid-based biomarkers, choice of technology is an important starting point. But there are so many technologies to choose among. To address this, we here review the technologies that are currently available in research settings and, in some cases, in clinical laboratory practice. This presents a form of lexicon on each technology addressing its use in research and clinics, its strengths and limitations, and a future perspective.

Assuntos

Doença de Alzheimer , Humanos , Encéfalo , Biomarcadores , Neurônios , Medicina de Precisão , Peptídeos beta-Amiloides

6.

Discovery of novel CSF biomarkers to predict progression in dementia using machine learning.

Gogishvili, Dea; Vromen, Eleonora M; Koppes-den Hertog, Sascha; Lemstra, Afina W; Pijnenburg, Yolande A L; Visser, Pieter Jelle; Tijms, Betty M; Del Campo, Marta; Abeln, Sanne; Teunissen, Charlotte E; Vermunt, Lisa.

Sci Rep ; 13(1): 6531, 2023 04 21.

Artigo em Inglês | MEDLINE | ID: mdl-37085545

RESUMO

Providing an accurate prognosis for individual dementia patients remains a challenge since they greatly differ in rates of cognitive decline. In this study, we used machine learning techniques with the aim to identify cerebrospinal fluid (CSF) biomarkers that predict the rate of cognitive decline within dementia patients. First, longitudinal mini-mental state examination scores (MMSE) of 210 dementia patients were used to create fast and slow progression groups. Second, we trained random forest classifiers on CSF proteomic profiles and obtained a well-performing prediction model for the progression group (ROC-AUC = 0.82). As a third step, Shapley values and Gini feature importance measures were used to interpret the model performance and identify top biomarker candidates for predicting the rate of cognitive decline. Finally, we explored the potential for each of the 20 top candidates in internal sensitivity analyses. TNFRSF4 and TGF [Formula: see text]-1 emerged as the top markers, being lower in fast-progressing patients compared to slow-progressing patients. Proteins of which a low concentration was associated with fast progression were enriched for cell signalling and immune response pathways. None of our top markers stood out as strong individual predictors of subsequent cognitive decline. This could be explained by small effect sizes per protein and biological heterogeneity among dementia patients. Taken together, this study presents a novel progression biomarker identification framework and protein leads for personalised prediction of cognitive decline in dementia.

Assuntos

Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/líquido cefalorraquidiano , Peptídeos beta-Amiloides/líquido cefalorraquidiano , Proteínas tau/líquido cefalorraquidiano , Proteômica , Biomarcadores/líquido cefalorraquidiano , Disfunção Cognitiva/diagnóstico , Aprendizado de Máquina , Progressão da Doença

7.

Clusters of co-abundant proteins in the brain cortex associated with fronto-temporal lobar degeneration.

Bridel, Claire; van Gils, Juami H M; Miedema, Suzanne S M; Hoozemans, Jeroen J M; Pijnenburg, Yolande A L; Smit, August B; Rozemuller, Annemieke J M; Abeln, Sanne; Teunissen, Charlotte E.

Alzheimers Res Ther ; 15(1): 59, 2023 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-36949537

RESUMO

BACKGROUND: Frontotemporal lobar degeneration (FTLD) is characterized pathologically by neuronal and glial inclusions of hyperphosphorylated tau or by neuronal cytoplasmic inclusions of TDP43. This study aimed at deciphering the molecular mechanisms leading to these distinct pathological subtypes. METHODS: To this end, we performed an unbiased mass spectrometry-based proteomic and systems-level analysis of the middle frontal gyrus cortices of FTLD-tau (n = 6), FTLD-TDP (n = 15), and control patients (n = 5). We validated these results in an independent patient cohort (total n = 24). RESULTS: The middle frontal gyrus cortex proteome was most significantly altered in FTLD-tau compared to controls (294 differentially expressed proteins at FDR = 0.05). The proteomic modifications in FTLD-TDP were more heterogeneous (49 differentially expressed proteins at FDR = 0.1). Weighted co-expression network analysis revealed 17 modules of co-regulated proteins, 13 of which were dysregulated in FTLD-tau. These modules included proteins associated with oxidative phosphorylation, scavenger mechanisms, chromatin regulation, and clathrin-mediated transport in both the frontal and temporal cortex of FTLD-tau. The most strongly dysregulated subnetworks identified cyclin-dependent kinase 5 (CDK5) and polypyrimidine tract-binding protein 1 (PTBP1) as key players in the disease process. Dysregulation of 9 of these modules was confirmed in independent validation data sets of FLTD-tau and control temporal and frontal cortex (total n = 24). Dysregulated modules were primarily associated with changes in astrocyte and endothelial cell protein abundance levels, indicating pathological changes in FTD are not limited to neurons. CONCLUSIONS: Using this innovative workflow and zooming in on the most strongly dysregulated proteins of the identified modules, we were able to identify disease-associated mechanisms in FTLD-tau with high potential as biomarkers and/or therapeutic targets.

Assuntos

Proteínas de Ligação a DNA , Lobo Frontal , Demência Frontotemporal , Lobo Temporal , Proteínas tau , Lobo Frontal/metabolismo , Lobo Temporal/metabolismo , Doenças Neurodegenerativas/metabolismo , Demência Frontotemporal/metabolismo , Humanos , Masculino , Feminino , Proteômica , Proteínas tau/metabolismo , Proteínas de Ligação a DNA/metabolismo , Biomarcadores/metabolismo , Países Baixos

8.

Therapeutic potential of compounds targeting SARS-CoV-2 helicase.

Halma, Matthew T J; Wever, Mark J A; Abeln, Sanne; Roche, Didier; Wuite, Gijs J L.

Front Chem ; 10: 1062352, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36561139

RESUMO

The economical and societal impact of COVID-19 has made the development of vaccines and drugs to combat SARS-CoV-2 infection a priority. While the SARS-CoV-2 spike protein has been widely explored as a drug target, the SARS-CoV-2 helicase (nsp13) does not have any approved medication. The helicase shares 99.8% similarity with its SARS-CoV-1 homolog and was shown to be essential for viral replication. This review summarizes and builds on existing research on inhibitors of SARS-CoV-1 and SARS-CoV-2 helicases. Our analysis on the toxicity and specificity of these compounds, set the road going forward for the repurposing of existing drugs and the development of new SARS-CoV-2 helicase inhibitors.

9.

Bioinformatics tools and data resources for assay development of fluid protein biomarkers.

Waury, Katharina; Willemse, Eline A J; Vanmechelen, Eugeen; Zetterberg, Henrik; Teunissen, Charlotte E; Abeln, Sanne.

Biomark Res ; 10(1): 83, 2022 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-36380380

RESUMO

Fluid protein biomarkers are important tools in clinical research and health care to support diagnosis and to monitor patients. Especially within the field of dementia, novel biomarkers could address the current challenges of providing an early diagnosis and of selecting trial participants. While the great potential of fluid biomarkers is recognized, their implementation in routine clinical use has been slow. One major obstacle is the often unsuccessful translation of biomarker candidates from explorative high-throughput techniques to sensitive antibody-based immunoassays. In this review, we propose the incorporation of bioinformatics into the workflow of novel immunoassay development to overcome this bottleneck and thus facilitate the development of novel biomarkers towards clinical laboratory practice. Due to the rapid progress within the field of bioinformatics many freely available and easy-to-use tools and data resources exist which can aid the researcher at various stages. Current prediction methods and databases can support the selection of suitable biomarker candidates, as well as the choice of appropriate commercial affinity reagents. Additionally, we examine methods that can determine or predict the epitope - an antibody's binding region on its antigen - and can help to make an informed choice on the immunogenic peptide used for novel antibody production. Selected use cases for biomarker candidates help illustrate the application and interpretation of the introduced tools.

10.

INFLECT: an R-package for cytometry cluster evaluation using marker modality.

Verhoeff, Jan; Abeln, Sanne; Garcia-Vallejo, Juan J.

BMC Bioinformatics ; 23(1): 487, 2022 Nov 16.

Artigo em Inglês | MEDLINE | ID: mdl-36384426

RESUMO

BACKGROUND: Current methods of high-dimensional unsupervised clustering of mass cytometry data lack means to monitor and evaluate clustering results. Whether unsupervised clustering is correct is typically evaluated by agreement with dimensionality reduction techniques or based on benchmarking with manually classified cells. The ambiguity and lack of reproducibility of sequential gating has been replaced with ambiguity in interpretation of clustering results. On the other hand, spurious overclustering of data leads to loss of statistical power. We have developed INFLECT, an R-package designed to give insight in clustering results and provide an optimal number of clusters. In our approach, a mass cytometry dataset is overclustered intentionally to ensure the smallest phenotypically different subsets are captured using FlowSOM. A range of metacluster number endpoints are generated and evaluated using marker interquartile range and distribution unimodality checks. The fraction of marker distributions that pass these checks is taken as a measure of clustering success. The fraction of unimodal distributions within metaclusters is plotted against the number of generated metaclusters and reaches a plateau of diminishing returns. The inflection point at which this occurs gives an optimal point of capturing cellular heterogeneity versus statistical power. RESULTS: We applied INFLECT to four publically available mass cytometry datasets of different size and number of markers. The unimodality score consistently reached a plateau, with an inflection point dependent on dataset size and number of dimensions. We tested both ConsenusClusterPlus metaclustering and hierarchical clustering. While hierarchical clustering is less computationally expensive and thus faster, it achieved similar results to ConsensusClusterPlus. The four datasets consisted of labeled data and we compared INFLECT metaclustering to published results. INFLECT identified a higher optimal number of metaclusters for all datasets. We illustrated the underlying heterogeneity within labels, showing that these labels encompass distinct types of cells. CONCLUSION: INFLECT addresses a knowledge gap in high-dimensional cytometry analysis, namely assessing clustering results. This is done through monitoring marker distributions for interquartile range and unimodality across a range of metacluster numbers. The inflection point is the optimal trade-off between cellular heterogeneity and statistical power, applied in this work for FlowSOM clustering on mass cytometry datasets.

Assuntos

Reprodutibilidade dos Testes , Análise por Conglomerados , Biomarcadores

11.

Tumour break load is a biologically relevant feature of genomic instability with prognostic value in colorectal cancer.

Lakbir, Soufyan; Lahoz, Sara; Cuatrecasas, Miriam; Camps, Jordi; Glas, Roel A; Heringa, Jaap; Meijer, Gerrit A; Abeln, Sanne; Fijneman, Remond J A.

Eur J Cancer ; 177: 94-102, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36334560

RESUMO

BACKGROUND: Clinically implemented prognostic biomarkers are lacking for the 80% of colorectal cancers (CRCs) that exhibit chromosomal instability (CIN). CIN is characterised by chromosome segregation errors and double-strand break repair defects that lead to somatic copy number aberrations (SCNAs) and chromosomal rearrangement-associated structural variants (SVs), respectively. We hypothesise that the number of SVs is a distinct feature of genomic instability and defined a new measure to quantify SVs: the tumour break load (TBL). The present study aimed to characterise the biological impact and clinical relevance of TBL in CRC. METHODS: Disease-free survival and SCNA data were obtained from The Cancer Genome Atlas and two independent CRC studies. TBL was defined as the sum of SCNA-associated SVs. RNA gene expression data of microsatellite stable (MSS) CRC samples were used to train an RNA-based TBL classifier. Dichotomised DNA-based TBL data were used for survival analysis. RESULTS: TBL shows large variation in CRC with poor correlation to tumour mutational burden and fraction of genome altered. TBL impact on tumour biology was illustrated by the high accuracy of classifying cancers in TBL-high and TBL-low (area under the receiver operating characteristic curve [AUC]: 0.88; p < 0.01). High TBL was associated with disease recurrence in 85 stages II-III MSS CRCs from The Cancer Genome Atlas (hazard ratio [HR]: 6.1; p = 0.007) and in two independent validation series of 57 untreated stages II-III (HR: 4.1; p = 0.012) and 74 untreated stage II MSS CRCs (HR: 2.4; p = 0.01). CONCLUSION: TBL is a prognostic biomarker in patients with non-metastatic MSS CRC with great potential to be implemented in routine molecular diagnostics.

Assuntos

Neoplasias Colorretais , Instabilidade de Microssatélites , Humanos , Instabilidade Cromossômica , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Instabilidade Genômica , Recidiva Local de Neoplasia/genética , Prognóstico , RNA

12.

Multi-task learning to leverage partially annotated data for PPI interface prediction.

Capel, Henriette; Feenstra, K Anton; Abeln, Sanne.

Sci Rep ; 12(1): 10487, 2022 06 21.

Artigo em Inglês | MEDLINE | ID: mdl-35729253

RESUMO

Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations-with data extension-reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein's functional properties of interest are only partially annotated.

Assuntos

Algoritmos , Proteínas , Proteínas/metabolismo

13.

PIPENN: protein interface prediction from sequence with an ensemble of neural nets.

Stringer, Bas; de Ferrante, Hans; Abeln, Sanne; Heringa, Jaap; Feenstra, K Anton; Haydarlou, Reza.

Bioinformatics ; 38(8): 2111-2118, 2022 04 12.

Artigo em Inglês | MEDLINE | ID: mdl-35150231

RESUMO

MOTIVATION: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS: We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Proteínas , Proteínas/química , Software , Sequência de Aminoácidos , Nucleotídeos , Biologia Computacional/métodos

14.

How sticky are our proteins? Quantifying hydrophobicity of the human proteome.

van Gils, Juami Hermine Mariama; Gogishvili, Dea; van Eck, Jan; Bouwmeester, Robbin; van Dijk, Erik; Abeln, Sanne.

Bioinform Adv ; 2(1): vbac002, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36699344

RESUMO

Summary: Proteins tend to bury hydrophobic residues inside their core during the folding process to provide stability to the protein structure and to prevent aggregation. Nevertheless, proteins do expose some 'sticky' hydrophobic residues to the solvent. These residues can play an important functional role, e.g. in protein-protein and membrane interactions. Here, we first investigate how hydrophobic protein surfaces are by providing three measures for surface hydrophobicity: the total hydrophobic surface area, the relative hydrophobic surface area and-using our MolPatch method-the largest hydrophobic patch. Secondly, we analyze how difficult it is to predict these measures from sequence: by adapting solvent accessibility predictions from NetSurfP2.0, we obtain well-performing prediction methods for the THSA and RHSA, while predicting LHP is more challenging. Finally, we analyze implications of exposed hydrophobic surfaces: we show that hydrophobic proteins typically have low expression, suggesting cells avoid an overabundance of sticky proteins. Availability and implementation: The data underlying this article are available in GitHub at https://github.com/ibivu/hydrophobic_patches. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

15.

SeRenDIP-CE: sequence-based interface prediction for conformational epitopes.

Hou, Qingzhen; Stringer, Bas; Waury, Katharina; Capel, Henriette; Haydarlou, Reza; Xue, Fuzhong; Abeln, Sanne; Heringa, Jaap; Feenstra, K Anton.

Bioinformatics ; 37(20): 3421-3427, 2021 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-33974039

RESUMO

MOTIVATION: Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen's epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. RESULTS: We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

16.

The hydrophobic effect characterises the thermodynamic signature of amyloid fibril growth.

van Gils, Juami Hermine Mariama; van Dijk, Erik; Peduzzo, Alessia; Hofmann, Alexander; Vettore, Nicola; Schützmann, Marie P; Groth, Georg; Mouhib, Halima; Otzen, Daniel E; Buell, Alexander K; Abeln, Sanne.

PLoS Comput Biol ; 16(5): e1007767, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32365068

RESUMO

Many proteins have the potential to aggregate into amyloid fibrils, protein polymers associated with a wide range of human disorders such as Alzheimer's and Parkinson's disease. The thermodynamic stability of amyloid fibrils, in contrast to that of folded proteins, is not well understood: the balance between entropic and enthalpic terms, including the chain entropy and the hydrophobic effect, are poorly characterised. Using a combination of theory, in vitro experiments, simulations of a coarse-grained protein model and meta-data analysis, we delineate the enthalpic and entropic contributions that dominate amyloid fibril elongation. Our prediction of a characteristic temperature-dependent enthalpic signature is confirmed by the performed calorimetric experiments and a meta-analysis over published data. From these results we are able to define the necessary conditions to observe cold denaturation of amyloid fibrils. Overall, we show that amyloid fibril elongation is associated with a negative heat capacity, the magnitude of which correlates closely with the hydrophobic surface area that is buried upon fibril formation, highlighting the importance of hydrophobicity for fibril stability.

Assuntos

Amiloide/química , Amiloide/fisiologia , Amiloide/metabolismo , Peptídeos beta-Amiloides/química , Peptídeos beta-Amiloides/fisiologia , Proteínas Amiloidogênicas/química , Proteínas Amiloidogênicas/fisiologia , Humanos , Interações Hidrofóbicas e Hidrofílicas , Modelos Teóricos , Simulação de Dinâmica Molecular , Desnaturação Proteica , Dobramento de Proteína , Temperatura , Termodinâmica

17.

The potential use of big data in oncology.

Willems, Stefan M; Abeln, Sanne; Feenstra, K Anton; de Bree, Remco; van der Poel, Egge F; Baatenburg de Jong, Robert J; Heringa, Jaap; van den Brekel, Michiel W M.

Oral Oncol ; 98: 8-12, 2019 11.

Artigo em Inglês | MEDLINE | ID: mdl-31521885

RESUMO

In this era of information technology, big data analysis is entering biomedical sciences. But what is big data, where do they come from and what can we do with it? In this commentary, the main sources of big data are explained, especially in (head and neck) oncology. It also touches upon the need to integrate various sources of clinical, pathological and quality-of-life data. It discusses some initiatives in linking of such datasets on a nation-wide scale in the Netherlands. Finally, it touches upon important issues regarding governance, FAIRness of data and the need to bring into place the necessary infrastructures needed to fully exploit the full potential of big data sets in head and neck cancer.

Assuntos

Big Data , Informática Médica/métodos , Oncologia , Bases de Dados Factuais , Neoplasias de Cabeça e Pescoço/epidemiologia , Humanos , Disseminação de Informação , Oncologia/métodos , Países Baixos/epidemiologia , Medicina de Precisão/métodos , Qualidade da Assistência à Saúde

18.

Tailor-made multiple sequence alignments using the PRALINE 2 alignment toolkit.

Dijkstra, Maurits J J; van der Ploeg, Atze J; Feenstra, K Anton; Fokkink, Wan J; Abeln, Sanne; Heringa, Jaap.

Bioinformatics ; 35(24): 5315-5317, 2019 12 15.

Artigo em Inglês | MEDLINE | ID: mdl-31368486

RESUMO

SUMMARY: PRALINE 2 is a toolkit for custom multiple sequence alignment workflows. It can be used to incorporate sequence annotations, such as secondary structure or (DNA) motifs, into the alignment scoring, as well as to customize many other aspects of a progressive multiple alignment workflow. AVAILABILITY AND IMPLEMENTATION: PRALINE 2 is implemented in Python and available as open source software on GitHub: https://github.com/ibivu/PRALINE/.

Assuntos

Software , DNA , Estrutura Secundária de Proteína , Alinhamento de Sequência

19.

SeRenDIP: SEquential REmasteriNg to DerIve profiles for fast and accurate predictions of PPI interface positions.

Hou, Qingzhen; De Geest, Paul F G; Griffioen, Christian J; Abeln, Sanne; Heringa, Jaap; Feenstra, K Anton.

Bioinformatics ; 35(22): 4794-4796, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-31116381

RESUMO

MOTIVATION: Interpretation of ubiquitous protein sequence data has become a bottleneck in biomolecular research, due to a lack of structural and other experimental annotation data for these proteins. Prediction of protein interaction sites from sequence may be a viable substitute. We therefore recently developed a sequence-based random forest method for protein-protein interface prediction, which yielded a significantly increased performance than other methods on both homomeric and heteromeric protein-protein interactions. Here, we present a webserver that implements this method efficiently. RESULTS: With the aim of accelerating our previous approach, we obtained sequence conservation profiles by re-mastering the alignment of homologous sequences found by PSI-BLAST. This yielded a more than 10-fold speedup and at least the same accuracy, as reported previously for our method; these results allowed us to offer the method as a webserver. The web-server interface is targeted to the non-expert user. The input is simply a sequence of the protein of interest, and the output a table with scores indicating the likelihood of having an interaction interface at a certain position. As the method is sequence-based and not sensitive to the type of protein interaction, we expect this webserver to be of interest to many biological researchers in academia and in industry. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets are available at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Algoritmos , Sequência de Aminoácidos , Proteínas , Análise de Sequência de Proteína

20.

Motif-Aware PRALINE: Improving the alignment of motif regions.

Dijkstra, Maurits; Bawono, Punto; Abeln, Sanne; Feenstra, K Anton; Fokkink, Wan; Heringa, Jaap.

PLoS Comput Biol ; 14(11): e1006547, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30383764

RESUMO

Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.

Assuntos

Motivos de Aminoácidos , DNA/química , Proteínas/química , Alinhamento de Sequência/normas , Algoritmos , Sequência de Aminoácidos , Sequência Conservada , HIV-1/química , Homologia de Sequência de Aminoácidos , Produtos do Gene env do Vírus da Imunodeficiência Humana/química

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA