Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Bioinformatics ; 40(Suppl 2): ii37-ii44, 2024 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-39230704

RESUMEN

MOTIVATION: Genomic instability is a hallmark of cancer, leading to many somatic alterations. Identifying which alterations have a system-wide impact is a challenging task. Nevertheless, this is an essential first step for prioritizing potential biomarkers. We developed CIBRA (Computational Identification of Biologically Relevant Alterations), a method that determines the system-wide impact of genomic alterations on tumor biology by integrating two distinct omics data types: one indicating genomic alterations (e.g. genomics), and another defining a system-wide expression response (e.g. transcriptomics). CIBRA was evaluated with genome-wide screens in 33 cancer types using primary and metastatic cancer data from the Cancer Genome Atlas and Hartwig Medical Foundation. RESULTS: We demonstrate the capability of CIBRA by successfully confirming the impact of point mutations in experimentally validated oncogenes and tumor suppressor genes (0.79 AUC). Surprisingly, many genes affected by structural variants were identified to have a strong system-wide impact (30.3%), suggesting that their role in cancer development has thus far been largely under-reported. Additionally, CIBRA can identify impact with only 10 cases and controls, providing a novel way to prioritize genomic alterations with a prominent role in cancer biology. Our findings demonstrate that CIBRA can identify cancer drivers by combining genomics and transcriptomics data. Moreover, our work shows an unexpected substantial system-wide impact of structural variants in cancer. Hence, CIBRA has the potential to preselect and refine current definitions of genomic alterations to derive more nuanced biomarkers for diagnostics, disease progression, and treatment response. AVAILABILITY AND IMPLEMENTATION: The R package CIBRA is available at https://github.com/AIT4LIFE-UU/CIBRA.


Asunto(s)
Genómica , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Genómica/métodos , Biología Computacional/métodos , Oncogenes , Biomarcadores de Tumor/genética , Inestabilidad Genómica
2.
J Neurochem ; 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39289040

RESUMEN

Glial fibrillary acidic protein (GFAP) is a well-established biomarker of reactive astrogliosis in the central nervous system because of its elevated levels following brain injury and various neurological disorders. The advent of ultra-sensitive methods for measuring low-abundant proteins has significantly enhanced our understanding of GFAP levels in the serum or plasma of patients with diverse neurological diseases. Clinical studies have demonstrated that GFAP holds promise both as a diagnostic and prognostic biomarker, including but not limited to individuals with Alzheimer's disease. GFAP exhibits diverse forms and structures, herein referred to as its proteoform complexity, encompassing conformational dynamics, isoforms and post-translational modifications (PTMs). In this review, we explore how the proteoform complexity of GFAP influences its detection, which may affect the differential diagnostic performance of GFAP in different biological fluids and can provide valuable insights into underlying biological processes. Additionally, proteoforms are often disease-specific, and our review provides suggestions and highlights areas to focus on for the development of new assays for measuring GFAP, including isoforms, PTMs, discharge mechanisms, breakdown products, higher-order species and interacting partners. By addressing the knowledge gaps highlighted in this review, we aim to support the clinical translation and interpretation of GFAP in both CSF and blood and the development of reliable, reproducible and specific prognostic and diagnostic tests. To enhance disease pathology comprehension and optimise GFAP as a biomarker, a thorough understanding of detected proteoforms in biofluids is essential.

3.
J Extracell Biol ; 3(1): e120, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38938677

RESUMEN

Extracellular vesicles (EVs) are membranous structures released by cells into the extracellular space and are thought to be involved in cell-to-cell communication. While EVs and their cargo are promising biomarker candidates, sorting mechanisms of proteins to EVs remain unclear. In this study, we ask if it is possible to determine EV association based on the protein sequence. Additionally, we ask what the most important determinants are for EV association. We answer these questions with explainable AI models, using human proteome data from EV databases to train and validate the model. It is essential to correct the datasets for contaminants introduced by coarse EV isolation workflows and for experimental bias caused by mass spectrometry. In this study, we show that it is indeed possible to predict EV association from the protein sequence: a simple sequence-based model for predicting EV proteins achieved an area under the curve of 0.77 ± 0.01, which increased further to 0.84 ± 0.00 when incorporating curated post-translational modification (PTM) annotations. Feature analysis shows that EV-associated proteins are stable, polar, and structured with low isoelectric point compared to non-EV proteins. PTM annotations emerged as the most important features for correct classification; specifically, palmitoylation is one of the most prevalent EV sorting mechanisms for unique proteins. Palmitoylation and nitrosylation sites are especially prevalent in EV proteins that are determined by very strict isolation protocols, indicating they could potentially serve as quality control criteria for future studies. This computational study offers an effective sequence-based predictor of EV associated proteins with extensive characterisation of the human EV proteome that can explain for individual proteins which factors contribute to their EV association.

4.
Front Immunol ; 15: 1343484, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38318180

RESUMEN

Background: Glioblastomas manipulate the immune system both locally and systemically, yet, glioblastoma-associated changes in peripheral blood immune composition are poorly studied. Age and dexamethasone administration in glioblastoma patients have been hypothesized to limit the effectiveness of immunotherapy, but their effects remain unclear. We compared peripheral blood immune composition in patients with different types of brain tumor to determine the influence of age, dexamethasone treatment, and tumor volume. Methods: High-dimensional mass cytometry was used to characterise peripheral blood mononuclear cells of 169 patients with glioblastoma, lower grade astrocytoma, metastases and meningioma. We used blood from medically-refractory epilepsy patients and healthy controls as control groups. Immune phenotyping was performed using FlowSOM and t-SNE analysis in R followed by supervised annotation of the resulting clusters. We conducted multiple linear regression analysis between intracranial pathology and cell type abundance, corrected for clinical variables. We tested correlations between cell type abundance and survival with Cox-regression analyses. Results: Glioblastoma patients had significantly fewer naive CD4+ T cells, but higher percentages of mature NK cells than controls. Decreases of naive CD8+ T cells and alternative monocytes and an increase of memory B cells in glioblastoma patients were influenced by age and dexamethasone treatment, and only memory B cells by tumor volume. Progression free survival was associated with percentages of CD4+ regulatory T cells and double negative T cells. Conclusion: High-dimensional mass cytometry of peripheral blood in patients with different types of intracranial tumor provides insight into the relation between intracranial pathology and peripheral immune status. Wide immunosuppression associated with age and pre-operative dexamethasone treatment provide further evidence for their deleterious effects on treatment with immunotherapy.


Asunto(s)
Glioblastoma , Humanos , Glioblastoma/tratamiento farmacológico , Glioblastoma/patología , Leucocitos Mononucleares/patología , Linfocitos T CD4-Positivos , Inmunoterapia/métodos , Dexametasona/uso terapéutico
5.
Proteins ; 92(5): 649-664, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38149328

RESUMEN

Glial fibrillary acidic protein (GFAP) is a promising biomarker for brain and spinal cord disorders. Recent studies have highlighted the differences in the reliability of GFAP measurements in different biological matrices. The reason for these discrepancies is poorly understood as our knowledge of the protein's 3-dimensional conformation, proteoforms, and aggregation remains limited. Here, we investigate the structural properties of GFAP under different conditions. For this, we characterized recombinant GFAP proteins from various suppliers and applied hydrogen-deuterium exchange mass spectrometry (HDX-MS) to provide a snapshot of the conformational dynamics of GFAP in artificial cerebrospinal fluid (aCSF) compared to the phosphate buffer. Our findings indicate that recombinant GFAP exists in various conformational species. Furthermore, we show that GFAP dimers remained intact under denaturing conditions. HDX-MS experiments show an overall decrease in H-bonding and an increase in solvent accessibility of GFAP in aCSF compared to the phosphate buffer, with clear indications of mixed EX2 and EX1 kinetics. To understand possible structural interface regions and the evolutionary conservation profiles, we combined HDX-MS results with the predicted GFAP-dimer structure by AlphaFold-Multimer. We found that deprotected regions with high structural flexibility in aCSF overlap with predicted conserved dimeric 1B and 2B domain interfaces. Structural property predictions combined with the HDX data show an overall deprotection and signatures of aggregation in aCSF. We anticipate that the outcomes of this research will contribute to a deeper understanding of the structural flexibility of GFAP and ultimately shed light on its behavior in different biological matrices.


Asunto(s)
Medición de Intercambio de Deuterio , Proteína Ácida Fibrilar de la Glía , Fosfatos , Humanos , Medición de Intercambio de Deuterio/métodos , Proteína Ácida Fibrilar de la Glía/química , Proteína Ácida Fibrilar de la Glía/genética , Proteína Ácida Fibrilar de la Glía/fisiología , Conformación Proteica , Reproducibilidad de los Resultados , Proteínas Recombinantes
6.
J Proteome Res ; 22(9): 3068-3080, 2023 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-37606934

RESUMEN

Cerebrospinal fluid (CSF) is an essential matrix for the discovery of neurological disease biomarkers. However, the high dynamic range of protein concentrations in CSF hinders the detection of the least abundant protein biomarkers by untargeted mass spectrometry. It is thus beneficial to gain a deeper understanding of the secretion processes within the brain. Here, we aim to explore if and how the secretion of brain proteins to the CSF can be predicted. By combining a curated CSF proteome and the brain elevated proteome of the Human Protein Atlas, brain proteins were classified as CSF or non-CSF secreted. A machine learning model was trained on a range of sequence-based features to differentiate between CSF and non-CSF groups and effectively predict the brain origin of proteins. The classification model achieves an area under the curve of 0.89 if using high confidence CSF proteins. The most important prediction features include the subcellular localization, signal peptides, and transmembrane regions. The classifier generalized well to the larger brain detected proteome and is able to correctly predict novel CSF proteins identified by affinity proteomics. In addition to elucidating the underlying mechanisms of protein secretion, the trained classification model can support biomarker candidate selection.


Asunto(s)
Investigación Biomédica , Proteoma , Humanos , Encéfalo , Transporte de Proteínas , Transporte Biológico , Proteínas del Líquido Cefalorraquídeo
7.
Mol Cell Proteomics ; 22(10): 100629, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37557955

RESUMEN

Neurodegenerative dementias are progressive diseases that cause neuronal network breakdown in different brain regions often because of accumulation of misfolded proteins in the brain extracellular matrix, such as amyloids or inside neurons or other cell types of the brain. Several diagnostic protein biomarkers in body fluids are being used and implemented, such as for Alzheimer's disease. However, there is still a lack of biomarkers for co-pathologies and other causes of dementia. Such biofluid-based biomarkers enable precision medicine approaches for diagnosis and treatment, allow to learn more about underlying disease processes, and facilitate the development of patient inclusion and evaluation tools in clinical trials. When designing studies to discover novel biofluid-based biomarkers, choice of technology is an important starting point. But there are so many technologies to choose among. To address this, we here review the technologies that are currently available in research settings and, in some cases, in clinical laboratory practice. This presents a form of lexicon on each technology addressing its use in research and clinics, its strengths and limitations, and a future perspective.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Encéfalo , Biomarcadores , Neuronas , Medicina de Precisión , Péptidos beta-Amiloides
8.
Sci Rep ; 13(1): 6531, 2023 04 21.
Artículo en Inglés | MEDLINE | ID: mdl-37085545

RESUMEN

Providing an accurate prognosis for individual dementia patients remains a challenge since they greatly differ in rates of cognitive decline. In this study, we used machine learning techniques with the aim to identify cerebrospinal fluid (CSF) biomarkers that predict the rate of cognitive decline within dementia patients. First, longitudinal mini-mental state examination scores (MMSE) of 210 dementia patients were used to create fast and slow progression groups. Second, we trained random forest classifiers on CSF proteomic profiles and obtained a well-performing prediction model for the progression group (ROC-AUC = 0.82). As a third step, Shapley values and Gini feature importance measures were used to interpret the model performance and identify top biomarker candidates for predicting the rate of cognitive decline. Finally, we explored the potential for each of the 20 top candidates in internal sensitivity analyses. TNFRSF4 and TGF [Formula: see text]-1 emerged as the top markers, being lower in fast-progressing patients compared to slow-progressing patients. Proteins of which a low concentration was associated with fast progression were enriched for cell signalling and immune response pathways. None of our top markers stood out as strong individual predictors of subsequent cognitive decline. This could be explained by small effect sizes per protein and biological heterogeneity among dementia patients. Taken together, this study presents a novel progression biomarker identification framework and protein leads for personalised prediction of cognitive decline in dementia.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Humanos , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/líquido cefalorraquídeo , Péptidos beta-Amiloides/líquido cefalorraquídeo , Proteínas tau/líquido cefalorraquídeo , Proteómica , Biomarcadores/líquido cefalorraquídeo , Disfunción Cognitiva/diagnóstico , Aprendizaje Automático , Progresión de la Enfermedad
9.
Alzheimers Res Ther ; 15(1): 59, 2023 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-36949537

RESUMEN

BACKGROUND: Frontotemporal lobar degeneration (FTLD) is characterized pathologically by neuronal and glial inclusions of hyperphosphorylated tau or by neuronal cytoplasmic inclusions of TDP43. This study aimed at deciphering the molecular mechanisms leading to these distinct pathological subtypes. METHODS: To this end, we performed an unbiased mass spectrometry-based proteomic and systems-level analysis of the middle frontal gyrus cortices of FTLD-tau (n = 6), FTLD-TDP (n = 15), and control patients (n = 5). We validated these results in an independent patient cohort (total n = 24). RESULTS: The middle frontal gyrus cortex proteome was most significantly altered in FTLD-tau compared to controls (294 differentially expressed proteins at FDR = 0.05). The proteomic modifications in FTLD-TDP were more heterogeneous (49 differentially expressed proteins at FDR = 0.1). Weighted co-expression network analysis revealed 17 modules of co-regulated proteins, 13 of which were dysregulated in FTLD-tau. These modules included proteins associated with oxidative phosphorylation, scavenger mechanisms, chromatin regulation, and clathrin-mediated transport in both the frontal and temporal cortex of FTLD-tau. The most strongly dysregulated subnetworks identified cyclin-dependent kinase 5 (CDK5) and polypyrimidine tract-binding protein 1 (PTBP1) as key players in the disease process. Dysregulation of 9 of these modules was confirmed in independent validation data sets of FLTD-tau and control temporal and frontal cortex (total n = 24). Dysregulated modules were primarily associated with changes in astrocyte and endothelial cell protein abundance levels, indicating pathological changes in FTD are not limited to neurons. CONCLUSIONS: Using this innovative workflow and zooming in on the most strongly dysregulated proteins of the identified modules, we were able to identify disease-associated mechanisms in FTLD-tau with high potential as biomarkers and/or therapeutic targets.


Asunto(s)
Proteínas de Unión al ADN , Lóbulo Frontal , Demencia Frontotemporal , Lóbulo Temporal , Proteínas tau , Lóbulo Frontal/metabolismo , Lóbulo Temporal/metabolismo , Enfermedades Neurodegenerativas/metabolismo , Demencia Frontotemporal/metabolismo , Humanos , Masculino , Femenino , Proteómica , Proteínas tau/metabolismo , Proteínas de Unión al ADN/metabolismo , Biomarcadores/metabolismo , Países Bajos
10.
Front Chem ; 10: 1062352, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36561139

RESUMEN

The economical and societal impact of COVID-19 has made the development of vaccines and drugs to combat SARS-CoV-2 infection a priority. While the SARS-CoV-2 spike protein has been widely explored as a drug target, the SARS-CoV-2 helicase (nsp13) does not have any approved medication. The helicase shares 99.8% similarity with its SARS-CoV-1 homolog and was shown to be essential for viral replication. This review summarizes and builds on existing research on inhibitors of SARS-CoV-1 and SARS-CoV-2 helicases. Our analysis on the toxicity and specificity of these compounds, set the road going forward for the repurposing of existing drugs and the development of new SARS-CoV-2 helicase inhibitors.

11.
Eur J Cancer ; 177: 94-102, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36334560

RESUMEN

BACKGROUND: Clinically implemented prognostic biomarkers are lacking for the 80% of colorectal cancers (CRCs) that exhibit chromosomal instability (CIN). CIN is characterised by chromosome segregation errors and double-strand break repair defects that lead to somatic copy number aberrations (SCNAs) and chromosomal rearrangement-associated structural variants (SVs), respectively. We hypothesise that the number of SVs is a distinct feature of genomic instability and defined a new measure to quantify SVs: the tumour break load (TBL). The present study aimed to characterise the biological impact and clinical relevance of TBL in CRC. METHODS: Disease-free survival and SCNA data were obtained from The Cancer Genome Atlas and two independent CRC studies. TBL was defined as the sum of SCNA-associated SVs. RNA gene expression data of microsatellite stable (MSS) CRC samples were used to train an RNA-based TBL classifier. Dichotomised DNA-based TBL data were used for survival analysis. RESULTS: TBL shows large variation in CRC with poor correlation to tumour mutational burden and fraction of genome altered. TBL impact on tumour biology was illustrated by the high accuracy of classifying cancers in TBL-high and TBL-low (area under the receiver operating characteristic curve [AUC]: 0.88; p < 0.01). High TBL was associated with disease recurrence in 85 stages II-III MSS CRCs from The Cancer Genome Atlas (hazard ratio [HR]: 6.1; p = 0.007) and in two independent validation series of 57 untreated stages II-III (HR: 4.1; p = 0.012) and 74 untreated stage II MSS CRCs (HR: 2.4; p = 0.01). CONCLUSION: TBL is a prognostic biomarker in patients with non-metastatic MSS CRC with great potential to be implemented in routine molecular diagnostics.


Asunto(s)
Neoplasias Colorrectales , Inestabilidad de Microsatélites , Humanos , Inestabilidad Cromosómica , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Inestabilidad Genómica , Recurrencia Local de Neoplasia/genética , Pronóstico , ARN
12.
BMC Bioinformatics ; 23(1): 487, 2022 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-36384426

RESUMEN

BACKGROUND: Current methods of high-dimensional unsupervised clustering of mass cytometry data lack means to monitor and evaluate clustering results. Whether unsupervised clustering is correct is typically evaluated by agreement with dimensionality reduction techniques or based on benchmarking with manually classified cells. The ambiguity and lack of reproducibility of sequential gating has been replaced with ambiguity in interpretation of clustering results. On the other hand, spurious overclustering of data leads to loss of statistical power. We have developed INFLECT, an R-package designed to give insight in clustering results and provide an optimal number of clusters. In our approach, a mass cytometry dataset is overclustered intentionally to ensure the smallest phenotypically different subsets are captured using FlowSOM. A range of metacluster number endpoints are generated and evaluated using marker interquartile range and distribution unimodality checks. The fraction of marker distributions that pass these checks is taken as a measure of clustering success. The fraction of unimodal distributions within metaclusters is plotted against the number of generated metaclusters and reaches a plateau of diminishing returns. The inflection point at which this occurs gives an optimal point of capturing cellular heterogeneity versus statistical power. RESULTS: We applied INFLECT to four publically available mass cytometry datasets of different size and number of markers. The unimodality score consistently reached a plateau, with an inflection point dependent on dataset size and number of dimensions. We tested both ConsenusClusterPlus metaclustering and hierarchical clustering. While hierarchical clustering is less computationally expensive and thus faster, it achieved similar results to ConsensusClusterPlus. The four datasets consisted of labeled data and we compared INFLECT metaclustering to published results. INFLECT identified a higher optimal number of metaclusters for all datasets. We illustrated the underlying heterogeneity within labels, showing that these labels encompass distinct types of cells. CONCLUSION: INFLECT addresses a knowledge gap in high-dimensional cytometry analysis, namely assessing clustering results. This is done through monitoring marker distributions for interquartile range and unimodality across a range of metacluster numbers. The inflection point is the optimal trade-off between cellular heterogeneity and statistical power, applied in this work for FlowSOM clustering on mass cytometry datasets.


Asunto(s)
Reproducibilidad de los Resultados , Análisis por Conglomerados , Biomarcadores
13.
Biomark Res ; 10(1): 83, 2022 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-36380380

RESUMEN

Fluid protein biomarkers are important tools in clinical research and health care to support diagnosis and to monitor patients. Especially within the field of dementia, novel biomarkers could address the current challenges of providing an early diagnosis and of selecting trial participants. While the great potential of fluid biomarkers is recognized, their implementation in routine clinical use has been slow. One major obstacle is the often unsuccessful translation of biomarker candidates from explorative high-throughput techniques to sensitive antibody-based immunoassays. In this review, we propose the incorporation of bioinformatics into the workflow of novel immunoassay development to overcome this bottleneck and thus facilitate the development of novel biomarkers towards clinical laboratory practice. Due to the rapid progress within the field of bioinformatics many freely available and easy-to-use tools and data resources exist which can aid the researcher at various stages. Current prediction methods and databases can support the selection of suitable biomarker candidates, as well as the choice of appropriate commercial affinity reagents. Additionally, we examine methods that can determine or predict the epitope - an antibody's binding region on its antigen - and can help to make an informed choice on the immunogenic peptide used for novel antibody production. Selected use cases for biomarker candidates help illustrate the application and interpretation of the introduced tools.

14.
Sci Rep ; 12(1): 10487, 2022 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-35729253

RESUMEN

Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations-with data extension-reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein's functional properties of interest are only partially annotated.


Asunto(s)
Algoritmos , Proteínas , Proteínas/metabolismo
15.
Bioinformatics ; 38(8): 2111-2118, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35150231

RESUMEN

MOTIVATION: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS: We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Proteínas , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Nucleótidos , Biología Computacional/métodos
16.
Bioinform Adv ; 2(1): vbac002, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36699344

RESUMEN

Summary: Proteins tend to bury hydrophobic residues inside their core during the folding process to provide stability to the protein structure and to prevent aggregation. Nevertheless, proteins do expose some 'sticky' hydrophobic residues to the solvent. These residues can play an important functional role, e.g. in protein-protein and membrane interactions. Here, we first investigate how hydrophobic protein surfaces are by providing three measures for surface hydrophobicity: the total hydrophobic surface area, the relative hydrophobic surface area and-using our MolPatch method-the largest hydrophobic patch. Secondly, we analyze how difficult it is to predict these measures from sequence: by adapting solvent accessibility predictions from NetSurfP2.0, we obtain well-performing prediction methods for the THSA and RHSA, while predicting LHP is more challenging. Finally, we analyze implications of exposed hydrophobic surfaces: we show that hydrophobic proteins typically have low expression, suggesting cells avoid an overabundance of sticky proteins. Availability and implementation: The data underlying this article are available in GitHub at https://github.com/ibivu/hydrophobic_patches. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

17.
Bioinformatics ; 37(20): 3421-3427, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-33974039

RESUMEN

MOTIVATION: Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen's epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. RESULTS: We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

18.
PLoS Comput Biol ; 16(5): e1007767, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32365068

RESUMEN

Many proteins have the potential to aggregate into amyloid fibrils, protein polymers associated with a wide range of human disorders such as Alzheimer's and Parkinson's disease. The thermodynamic stability of amyloid fibrils, in contrast to that of folded proteins, is not well understood: the balance between entropic and enthalpic terms, including the chain entropy and the hydrophobic effect, are poorly characterised. Using a combination of theory, in vitro experiments, simulations of a coarse-grained protein model and meta-data analysis, we delineate the enthalpic and entropic contributions that dominate amyloid fibril elongation. Our prediction of a characteristic temperature-dependent enthalpic signature is confirmed by the performed calorimetric experiments and a meta-analysis over published data. From these results we are able to define the necessary conditions to observe cold denaturation of amyloid fibrils. Overall, we show that amyloid fibril elongation is associated with a negative heat capacity, the magnitude of which correlates closely with the hydrophobic surface area that is buried upon fibril formation, highlighting the importance of hydrophobicity for fibril stability.


Asunto(s)
Amiloide/química , Amiloide/fisiología , Amiloide/metabolismo , Péptidos beta-Amiloides/química , Péptidos beta-Amiloides/fisiología , Proteínas Amiloidogénicas/química , Proteínas Amiloidogénicas/fisiología , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Teóricos , Simulación de Dinámica Molecular , Desnaturalización Proteica , Pliegue de Proteína , Temperatura , Termodinámica
19.
Oral Oncol ; 98: 8-12, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31521885

RESUMEN

In this era of information technology, big data analysis is entering biomedical sciences. But what is big data, where do they come from and what can we do with it? In this commentary, the main sources of big data are explained, especially in (head and neck) oncology. It also touches upon the need to integrate various sources of clinical, pathological and quality-of-life data. It discusses some initiatives in linking of such datasets on a nation-wide scale in the Netherlands. Finally, it touches upon important issues regarding governance, FAIRness of data and the need to bring into place the necessary infrastructures needed to fully exploit the full potential of big data sets in head and neck cancer.


Asunto(s)
Macrodatos , Informática Médica/métodos , Oncología Médica , Bases de Datos Factuales , Neoplasias de Cabeza y Cuello/epidemiología , Humanos , Difusión de la Información , Oncología Médica/métodos , Países Bajos/epidemiología , Medicina de Precisión/métodos , Calidad de la Atención de Salud
20.
Bioinformatics ; 35(24): 5315-5317, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31368486

RESUMEN

SUMMARY: PRALINE 2 is a toolkit for custom multiple sequence alignment workflows. It can be used to incorporate sequence annotations, such as secondary structure or (DNA) motifs, into the alignment scoring, as well as to customize many other aspects of a progressive multiple alignment workflow. AVAILABILITY AND IMPLEMENTATION: PRALINE 2 is implemented in Python and available as open source software on GitHub: https://github.com/ibivu/PRALINE/.


Asunto(s)
Programas Informáticos , ADN , Estructura Secundaria de Proteína , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA