Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
JCO Clin Cancer Inform ; 8: e2400008, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38875514

RESUMO

PURPOSE: Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)-based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities. METHODS: We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure. RESULTS: UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models. CONCLUSION: MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection.


Assuntos
Inteligência Artificial , Medicina de Precisão , Humanos , Prognóstico , Medicina de Precisão/métodos , Feminino , Doenças Raras/classificação , Doenças Raras/genética , Doenças Raras/diagnóstico , Masculino , Aprendizado Profundo , Neoplasias/classificação , Neoplasias/genética , Neoplasias/diagnóstico , Síndromes Mielodisplásicas/diagnóstico , Síndromes Mielodisplásicas/classificação , Síndromes Mielodisplásicas/genética , Síndromes Mielodisplásicas/terapia , Algoritmos , Pessoa de Meia-Idade , Idoso , Análise por Conglomerados
2.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38754097

RESUMO

MOTIVATION: Mutational signatures are a critical component in deciphering the genetic alterations that underlie cancer development and have become a valuable resource to understand the genomic changes during tumorigenesis. Therefore, it is essential to employ precise and accurate methods for their extraction to ensure that the underlying patterns are reliably identified and can be effectively utilized in new strategies for diagnosis, prognosis, and treatment of cancer patients. RESULTS: We present MUSE-XAE, a novel method for mutational signature extraction from cancer genomes using an explainable autoencoder. Our approach employs a hybrid architecture consisting of a nonlinear encoder that can capture nonlinear interactions among features, and a linear decoder which ensures the interpretability of the active signatures. We evaluated and compared MUSE-XAE with other available tools on both synthetic and real cancer datasets and demonstrated that it achieves superior performance in terms of precision and sensitivity in recovering mutational signature profiles. MUSE-XAE extracts highly discriminative mutational signature profiles by enhancing the classification of primary tumour types and subtypes in real world settings. This approach could facilitate further research in this area, with neural networks playing a critical role in advancing our understanding of cancer genomics. AVAILABILITY AND IMPLEMENTATION: MUSE-XAE software is freely available at https://github.com/compbiomed-unito/MUSE-XAE.


Assuntos
Mutação , Neoplasias , Humanos , Neoplasias/genética , Algoritmos , Software , Genômica/métodos , Biologia Computacional/métodos , Redes Neurais de Computação
3.
Gut ; 73(5): 825-834, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38199805

RESUMO

OBJECTIVE: Hyperferritinaemia is associated with liver fibrosis severity in patients with metabolic dysfunction-associated steatotic liver disease (MASLD), but the longitudinal implications have not been thoroughly investigated. We assessed the role of serum ferritin in predicting long-term outcomes or death. DESIGN: We evaluated the relationship between baseline serum ferritin and longitudinal events in a multicentre cohort of 1342 patients. Four survival models considering ferritin with confounders or non-invasive scoring systems were applied with repeated five-fold cross-validation schema. Prediction performance was evaluated in terms of Harrell's C-index and its improvement by including ferritin as a covariate. RESULTS: Median follow-up time was 96 months. Liver-related events occurred in 7.7%, hepatocellular carcinoma in 1.9%, cardiovascular events in 10.9%, extrahepatic cancers in 8.3% and all-cause mortality in 5.8%. Hyperferritinaemia was associated with a 50% increased risk of liver-related events and 27% of all-cause mortality. A stepwise increase in baseline ferritin thresholds was associated with a statistical increase in C-index, ranging between 0.02 (lasso-penalised Cox regression) and 0.03 (ridge-penalised Cox regression); the risk of developing liver-related events mainly increased from threshold 215.5 µg/L (median HR=1.71 and C-index=0.71) and the risk of overall mortality from threshold 272 µg/L (median HR=1.49 and C-index=0.70). The inclusion of serum ferritin thresholds (215.5 µg/L and 272 µg/L) in predictive models increased the performance of Fibrosis-4 and Non-Alcoholic Fatty Liver Disease Fibrosis Score in the longitudinal risk assessment of liver-related events (C-indices>0.71) and overall mortality (C-indices>0.65). CONCLUSIONS: This study supports the potential use of serum ferritin values for predicting the long-term prognosis of patients with MASLD.


Assuntos
Neoplasias Hepáticas , Doenças Metabólicas , Hepatopatia Gordurosa não Alcoólica , Humanos , Hepatopatia Gordurosa não Alcoólica/patologia , Cirrose Hepática/patologia , Fibrose , Neoplasias Hepáticas/complicações , Ferritinas
4.
Front Oncol ; 13: 1242639, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37869094

RESUMO

Introduction: Prostate cancer (PCa) is the most frequent tumor among men in Europe and has both indolent and aggressive forms. There are several treatment options, the choice of which depends on multiple factors. To further improve current prognostication models, we established the Turin Prostate Cancer Prognostication (TPCP) cohort, an Italian retrospective biopsy cohort of patients with PCa and long-term follow-up. This work presents this new cohort with its main characteristics and the distributions of some of its core variables, along with its potential contributions to PCa research. Methods: The TPCP cohort includes consecutive non-metastatic patients with first positive biopsy for PCa performed between 2008 and 2013 at the main hospital in Turin, Italy. The follow-up ended on December 31st 2021. The primary outcome is the occurrence of metastasis; death from PCa and overall mortality are the secondary outcomes. In addition to numerous clinical variables, the study's prognostic variables include histopathologic information assigned by a centralized uropathology review using a digital pathology software system specialized for the study of PCa, tumor DNA methylation in candidate genes, and features extracted from digitized slide images via Deep Neural Networks. Results: The cohort includes 891 patients followed-up for a median time of 10 years. During this period, 97 patients had progression to metastatic disease and 301 died; of these, 56 died from PCa. In total, 65.3% of the cohort has a Gleason score less than or equal to 3 + 4, and 44.5% has a clinical stage cT1. Consistent with previous studies, age and clinical stage at diagnosis are important prognostic factors: the crude cumulative incidence of metastatic disease during the 14-years of follow-up increases from 9.1% among patients younger than 64 to 16.2% for patients in the age group of 75-84, and from 6.1% for cT1 stage to 27.9% in cT3 stage. Discussion: This study stands to be an important resource for updating existing prognostic models for PCa on an Italian cohort. In addition, the integrated collection of multi-modal data will allow development and/or validation of new models including new histopathological, digital, and molecular markers, with the goal of better directing clinical decisions to manage patients with PCa.

5.
Adv Radiat Oncol ; 8(5): 101228, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37405256

RESUMO

Purpose: The objective of this work was to investigate the ability of machine learning models to use treatment plan dosimetry for prediction of clinician approval of treatment plans (no further planning needed) for left-sided whole breast radiation therapy with boost. Methods and Materials: Investigated plans were generated to deliver a dose of 40.05 Gy to the whole breast in 15 fractions over 3 weeks, with the tumor bed simultaneously boosted to 48 Gy. In addition to the manually generated clinical plan of each of the 120 patients from a single institution, an automatically generated plan was included for each patient to enhance the number of study plans to 240. In random order, the treating clinician retrospectively scored all 240 plans as (1) approved without further planning to seek improvement or (2) further planning needed, while being blind for type of plan generation (manual or automated). In total, 2 × 5 classifiers were trained and evaluated for ability to correctly predict the clinician's plan evaluations: random forest (RF) and constrained logistic regression (LR) classifiers, each trained for 5 different sets of dosimetric plan parameters (feature sets [FS]). Importances of included features for predictions were investigated to better understand clinicians' choices. Results: Although all 240 plans were in principle clinically acceptable for the clinician, only for 71.5% was no further planning required. For the most extensive FS, accuracy, area under the receiver operating characteristic curve, and Cohen's κ for generated RF/LR models for prediction of approval without further planning were 87.2 ± 2.0/86.7 ± 2.2, 0.80 ± 0.03/0.86 ± 0.02, and 0.63 ± 0.05/0.69 ± 0.04, respectively. In contrast to LR, RF performance was independent of the applied FS. For both RF and LR, whole breast excluding boost PTV (PTV40.05Gy) was the most important structure for predictions, with importance factors of 44.6% and 43%, respectively, dose recieved by 95% volume of PTV40.05 (D95%) as the most important parameter in most cases. Conclusions: The investigated use of machine learning to predict clinician approval of treatment plans is highly promising. Including nondosimetric parameters could further increase classifiers' performances. The tool could become useful for aiding treatment planners in generating plans with a high probability of being directly approved by the treating clinician.

6.
Clin Gastroenterol Hepatol ; 21(13): 3314-3321.e3, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37149016

RESUMO

BACKGROUND AND AIMS: Nonalcoholic fatty liver disease (NAFLD) is a complex disease, resulting from the interplay between environmental determinants and genetic variations. Single nucleotide polymorphism rs738409 C>G in the PNPLA3 gene is associated with hepatic fibrosis and with higher risk of developing hepatocellular carcinoma. Here, we analyzed a longitudinal cohort of biopsy-proven NAFLD subjects with the aim to identify individuals in whom genetics may have a stronger impact on disease progression. METHODS: We retrospectively analyzed 756 consecutive, prospectively enrolled biopsy-proven NAFLD subjects from Italy, United Kingdom, and Spain who were followed for a median of 84 months (interquartile range, 65-109 months). We stratified the study cohort according to sex, body mass index (BMI)

Assuntos
Carcinoma Hepatocelular , Varizes Esofágicas e Gástricas , Neoplasias Hepáticas , Hepatopatia Gordurosa não Alcoólica , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/complicações , Hepatopatia Gordurosa não Alcoólica/genética , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Carcinoma Hepatocelular/epidemiologia , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/complicações , Estudos Retrospectivos , Varizes Esofágicas e Gástricas/complicações , Hemorragia Gastrointestinal/complicações , Genótipo , Polimorfismo de Nucleotídeo Único , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/complicações , Predisposição Genética para Doença
7.
Environ Int ; 173: 107864, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36913779

RESUMO

BACKGROUND: The exposome drivers are less studied than its consequences but may be crucial in identifying population subgroups with unfavourable exposures. OBJECTIVES: We used three approaches to study the socioeconomic position (SEP) as a driver of the early-life exposome in Turin children of the NINFEA cohort (Italy). METHODS: Forty-two environmental exposures, collected at 18 months of age (N = 1989), were classified in 5 groups (lifestyle, diet, meteoclimatic, traffic-related, built environment). We performed cluster analysis to identify subjects sharing similar exposures, and intra-exposome-group Principal Component Analysis (PCA) to reduce the dimensionality. SEP at childbirth was measured through the Equivalised Household Income Indicator. SEP-exposome association was evaluated using: 1) an Exposome Wide Association Study (ExWAS), a one-exposure (SEP) one-outcome (exposome) approach; 2) multinomial regression of cluster membership on SEP; 3) regressions of each intra-exposome-group PC on SEP. RESULTS: In the ExWAS, medium/low SEP children were more exposed to greenness, pet ownership, passive smoking, TV screen and sugar; less exposed to NO2, NOX, PM25abs, humidity, built environment, traffic load, unhealthy food facilities, fruit, vegetables, eggs, grain products, and childcare than high SEP children. Medium/low SEP children were more likely to belong to a cluster with poor diet, less air pollution, and to live in the suburbs than high SEP children. Medium/low SEP children were more exposed to lifestyle PC1 (unhealthy lifestyle) and diet PC2 (unhealthy diet), and less exposed to PC1s of the built environment (urbanization factors), diet (mixed diet), and traffic (air pollution) than high SEP children. CONCLUSIONS: The three approaches provided consistent and complementary results, suggesting that children with lower SEP are less exposed to urbanization factors and more exposed to unhealthy lifestyles and diet. The simplest method, the ExWAS, conveys most of the information and is more replicable in other populations. Clustering and PCA may facilitate results interpretation and communication.


Assuntos
Poluição do Ar , Expossoma , Humanos , Criança , Coorte de Nascimento , Exposição Ambiental/análise , Fatores Socioeconômicos
8.
Comput Methods Programs Biomed ; 226: 107111, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36108572

RESUMO

BACKGROUND/AIM: The current availability of large volumes of clinical data has provided medical departments with the opportunity for large-scale analyses, but it has also brought forth the need for an effective strategy of data-storage and data-analysis that is both technically feasible and economically sustainable in the context of limited resources and manpower. Therefore, the aim of this study was to develop a widely-usable data-collection and data-analysis workflow that could be applied in medical departments to perform high-volume relational data analysis on real-time data. METHODS: A sample project, based on a research database on prostate-specific-membrane-antigen/positron-emission-tomography scans performed in prostate cancer patients at our department, was used to develop a new workflow for data-collection and data-analysis. A checklist of requirements for a successful data-collection/analysis strategy, based on shared clinical research experience, was used as reference standard. Software libraries were selected based on widespread availability, reliability, cost, and technical expertise of the research team (REDCap-v11.0.0 for collaborative data-collection, Python-v3.8.5 for data retrieval and SQLite-v3.31.1 for data storage). The primary objective of this study was to develop and implement a workflow to: a) easily store large volumes of structured data into a relational database, b) perform scripted analyses on relational data retrieved in real-time from the database. The secondary objective was to enhance the strategy cost-effectiveness by using open-source/cost-free software libraries. RESULTS: A fully working data strategy was developed and successfully applied to a sample research project. The REDCap platform provided a remote and secure method to collaboratively collect large volumes of standardized relational data, with low technical difficulty and role-based access-control. A Python software was coded to retrieve live data through the REDCap-API and persist them to an SQLite database, preserving data-relationships. The SQL-language enabled complex datasets retrieval, while Python allowed for scripted data computation and analysis. Only cost-free software libraries were used and the sample code was made available through a GitHub repository. CONCLUSIONS: A REDCap-based data-collection and data-analysis workflow, suitable for high-volume relational data-analysis on live data, was developed and successfully implemented using open-source software.


Assuntos
Análise de Dados , Software , Humanos , Fluxo de Trabalho , Reprodutibilidade dos Testes , Bases de Dados Factuais
9.
Lab Anim (NY) ; 51(7): 191-202, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35726023

RESUMO

Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoid neoplasm in dogs and in humans. It is characterized by a remarkable degree of clinical heterogeneity that is not completely elucidated by molecular data. This poses a major barrier to understanding the disease and its response to therapy, or when treating dogs with DLBCL within clinical trials. We performed an integrated analysis of exome (n = 77) and RNA sequencing (n = 43) data in a cohort of canine DLBCL to define the genetic landscape of this tumor. A wide range of signaling pathways and cellular processes were found in common with human DLBCL, but the frequencies of the most recurrently mutated genes (TRAF3, SETD2, POT1, TP53, MYC, FBXW7, DDX3X and TBL1XR1) differed. We developed a prognostic model integrating exonic variants and clinical and transcriptomic features to predict the outcome in dogs with DLBCL. These results comprehensively define the genetic drivers of canine DLBCL and can be prospectively utilized to identify new therapeutic opportunities.


Assuntos
Linfoma Difuso de Grandes Células B , Animais , Cães , Genômica , Humanos , Linfoma Difuso de Grandes Células B/genética , Linfoma Difuso de Grandes Células B/terapia , Linfoma Difuso de Grandes Células B/veterinária , Transdução de Sinais
10.
Gut ; 71(2): 382-390, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-33541866

RESUMO

OBJECTIVE: The full phenotypic expression of non-alcoholic fatty liver disease (NAFLD) in lean subjects is incompletely characterised. We aimed to investigate prevalence, characteristics and long-term prognosis of Caucasian lean subjects with NAFLD. DESIGN: The study cohort comprises 1339 biopsy-proven NAFLD subjects from four countries (Italy, UK, Spain and Australia), stratified into lean and non-lean (body mass index (BMI) 10 483 person-years), 4.7% of lean vs 7.7% of non-lean patients reported liver-related events (p=0.37). No difference in survival was observed compared with non-lean NAFLD (p=0.069). CONCLUSIONS: Caucasian lean subjects with NAFLD may progress to advanced liver disease, develop metabolic comorbidities and experience cardiovascular disease (CVD) as well as liver-related mortality, independent of longitudinal progression to obesity and PNPLA3 genotype. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD where the disease manifests at lower overall BMI thresholds. LAY SUMMARY: NAFLD may affect and progress in both obese and lean individuals. Lean subjects are predominantly males, have a younger age at diagnosis and are more prevalent in some geographic areas. During the follow-up, lean subjects can develop hepatic and extrahepatic disease, including metabolic comorbidities, in the absence of weight gain. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD.


Assuntos
Hepatopatia Gordurosa não Alcoólica/complicações , Magreza/complicações , População Branca , Adulto , Índice de Massa Corporal , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/mortalidade , Hepatopatia Gordurosa não Alcoólica/patologia , Prognóstico , Taxa de Sobrevida , Magreza/mortalidade , Magreza/patologia
11.
Front Genet ; 13: 1049501, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36685831

RESUMO

The high cosine similarity between some single-base substitution mutational signatures and their characteristic flat profiles could suggest the presence of overfitting and mathematical artefacts. The newest version (v3.3) of the signature database available in the Catalogue Of Somatic Mutations In Cancer (COSMIC) provides a collection of 79 mutational signatures, which has more than doubled with respect to previous version (30 profiles available in COSMIC signatures v2), making more critical the associations between signatures and specific mutagenic processes. This study both provides a systematic assessment of the de novo extraction task through simulation scenarios based on the latest version of the COSMIC signatures and highlights, through a novel approach using archetypal analysis, which COSMIC signatures are redundant and more likely to be considered as mathematical artefacts. 29 archetypes were able to reconstruct the profile of all the COSMIC signatures with cosine similarity > 0.8. Interestingly, these archetypes tend to group similar original signatures sharing either the same aetiology or similar biological processes. We believe that these findings will be useful to encourage the development of new de novo extraction methods avoiding the redundancy of information among the signatures while preserving the biological interpretation.

12.
J Hepatol ; 75(4): 786-794, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34090928

RESUMO

BACKGROUND & AIMS: Non-invasive scoring systems (NSS) are used to identify patients with non-alcoholic fatty liver disease (NAFLD) who are at risk of advanced fibrosis, but their reliability in predicting long-term outcomes for hepatic/extrahepatic complications or death and their concordance in cross-sectional and longitudinal risk stratification remain uncertain. METHODS: The most common NSS (NFS, FIB-4, BARD, APRI) and the Hepamet fibrosis score (HFS) were assessed in 1,173 European patients with NAFLD from tertiary centres. Performance for fibrosis risk stratification and for the prediction of long-term hepatic/extrahepatic events, hepatocarcinoma (HCC) and overall mortality were evaluated in terms of AUC and Harrell's c-index. For longitudinal data, NSS-based Cox proportional hazard models were trained on the whole cohort with repeated 5-fold cross-validation, sampling for testing from the 607 patients with all NSS available. RESULTS: Cross-sectional analysis revealed HFS as the best performer for the identification of significant (F0-1 vs. F2-4, AUC = 0.758) and advanced (F0-2 vs. F3-4, AUC = 0.805) fibrosis, while NFS and FIB-4 showed the best performance for detecting histological cirrhosis (range AUCs 0.85-0.88). Considering longitudinal data (follow-up between 62 and 110 months), NFS and FIB-4 were the best at predicting liver-related events (c-indices>0.7), NFS for HCC (c-index = 0.9 on average), and FIB-4 and HFS for overall mortality (c-indices >0.8). All NSS showed limited performance (c-indices <0.7) for extrahepatic events. CONCLUSIONS: Overall, NFS, HFS and FIB-4 outperformed APRI and BARD for both cross-sectional identification of fibrosis and prediction of long-term outcomes, confirming that they are useful tools for the clinical management of patients with NAFLD at increased risk of fibrosis and liver-related complications or death. LAY SUMMARY: Non-invasive scoring systems are increasingly being used in patients with non-alcoholic fatty liver disease to identify those at risk of advanced fibrosis and hence clinical complications. Herein, we compared various non-invasive scoring systems and identified those that were best at identifying risk, as well as those that were best for the prediction of long-term outcomes, such as liver-related events, liver cancer and death.


Assuntos
Hepatopatia Gordurosa não Alcoólica/complicações , Valor Preditivo dos Testes , Projetos de Pesquisa/normas , Tempo , Adulto , Área Sob a Curva , Estudos Transversais , Feminino , Humanos , Fígado/patologia , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/mortalidade , Prognóstico , Curva ROC , Reprodutibilidade dos Testes , Projetos de Pesquisa/tendências , Índice de Gravidade de Doença
13.
BMC Biol ; 19(1): 3, 2021 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-33441128

RESUMO

BACKGROUND: Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task. RESULTS: In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions. CONCLUSIONS: To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open.


Assuntos
Carcinogênese/genética , Progressão da Doença , Aprendizado de Máquina , Oncologia/instrumentação , Neoplasias/genética , Medicina de Precisão/instrumentação , Neoplasias/patologia
14.
Comput Struct Biotechnol J ; 18: 1968-1979, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32774791

RESUMO

Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.

15.
PLoS Comput Biol ; 16(4): e1007722, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32352965

RESUMO

Protein solubility is a key aspect for many biotechnological, biomedical and industrial processes, such as the production of active proteins and antibodies. In addition, understanding the molecular determinants of the solubility of proteins may be crucial to shed light on the molecular mechanisms of diseases caused by aggregation processes such as amyloidosis. Here we present SKADE, a novel Neural Network protein solubility predictor and we show how it can provide novel insight into the protein solubility mechanisms, thanks to its neural attention architecture. First, we show that SKADE positively compares with state of the art tools while using just the protein sequence as input. Then, thanks to the neural attention mechanism, we use SKADE to investigate the patterns learned during training and we analyse its decision process. We use this peculiarity to show that, while the attention profiles do not correlate with obvious sequence aspects such as biophysical properties of the aminoacids, they suggest that N- and C-termini are the most relevant regions for solubility prediction and are predictive for complex emergent properties such as aggregation-prone regions involved in beta-amyloidosis and contact density. Moreover, SKADE is able to identify mutations that increase or decrease the overall solubility of the protein, allowing it to be used to perform large scale in-silico mutagenesis of proteins in order to maximize their solubility.


Assuntos
Biologia Computacional/métodos , Rede Nervosa/fisiologia , Solubilidade , Algoritmos , Sequência de Aminoácidos/fisiologia , Aminoácidos , Animais , Simulação por Computador , Humanos , Modelos Moleculares , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Software
16.
Hum Mutat ; 40(9): 1392-1399, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31209948

RESUMO

Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant (ΔΔGH2O) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the ΔΔGH2O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.


Assuntos
Substituição de Aminoácidos , Proteínas de Ligação ao Ferro/química , Proteínas de Ligação ao Ferro/genética , Algoritmos , Dicroísmo Circular , Humanos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Estabilidade Proteica , Frataxina
17.
Vet Comp Oncol ; 17(3): 308-316, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30805995

RESUMO

Canine malignant melanoma (MM) is a highly aggressive tumour with a low survival rate and represents an ideal spontaneous model for the human counterpart. Considerable progress has been recently obtained, but the therapeutic success for canine melanoma is still challenging. Little is known about the mechanisms beyond pathogenesis and melanoma development, and the molecular response to radiotherapy has never been explored before. A faster and deeper understanding of cancer mutational processes and developing mechanisms are now possible through next generation sequencing technologies. In this study, we matched whole exome and transcriptome sequencing in four dogs affected by MM at diagnosis and at disease progression to identify possible genetic mechanisms associated with therapy failure. According to previous studies, a genetic similarity between canine MM and its human counterpart was observed. Several somatic mutations were functionally related to MAPK, PI3K/AKT and p53 signalling pathways, but located in genes other than BRAF, RAS and KIT. At disease progression, several mutations were related to therapy effects. Natural killer cell-mediated cytotoxicity and several immune-system-related pathways resulted activated opening a new scenario on the microenvironment in this tumour. In conclusion, this study suggests a potential role of the immune system associated to radiotherapy in canine melanoma, but a larger sample size associated with functional studies are needed.


Assuntos
Doenças do Cão/radioterapia , Melanoma/veterinária , Transcriptoma/efeitos da radiação , Animais , Sequência de Bases , Aberrações Cromossômicas , Cães , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/efeitos da radiação , Masculino , Melanoma/radioterapia , Mutação
18.
Anal Bioanal Chem ; 410(12): 2949-2959, 2018 May.
Artigo em Inglês | MEDLINE | ID: mdl-29532191

RESUMO

Surface active maghemite nanoparticles (SAMNs) are able to recognize and bind selected proteins in complex biological systems, forming a hard protein corona. Upon a 5-min incubation in bovine whey from mastitis-affected cows, a significant enrichment of a single peptide characterized by a molecular weight at 4338 Da originated from the proteolysis of aS1-casein was observed. Notably, among the large number of macromolecules in bovine milk, the detection of this specific peptide can hardly be accomplished by conventional analytical techniques. The selective formation of a stable binding between the peptide and SAMNs is due to the stability gained by adsorption-induced surface restructuration of the nanomaterial. We attributed the surface recognition properties of SAMNs to the chelation of iron(III) sites on their surface by sterically compatible carboxylic groups of the peptide. The specific peptide recognition by SAMNs allows its easy determination by MALDI-TOF mass spectrometry, and a threshold value of its normalized peak intensity was identified by a logistic regression approach and suggested for the rapid diagnosis of the pathology. Thus, the present report proposes the analysis of hard protein corona on nanomaterials as a perspective for developing fast analytical procedures for the diagnosis of mastitis in cows. Moreover, the huge simplification of proteome complexity by exploiting the selectivity derived by the peculiar SAMN surface topography, due to the iron(III) distribution pattern, could be of general interest, leading to competitive applications in food science and in biomedicine, allowing the rapid determination of hidden biomarkers by a cutting edge diagnostic strategy. Graphical abstract The topography of iron(III) sites on surface active maghemite nanoparticles (SAMNs) allows the recognition of sterically compatible carboxylic groups on proteins and peptides in complex biological matrixes. The analysis of hard protein corona on SAMNs led to the determination of a biomarker for cow mastitis in milk by MALDI-TOF mass spectrometry.


Assuntos
Compostos Férricos/química , Mastite Bovina/diagnóstico , Proteínas do Leite/análise , Nanopartículas/química , Coroa de Proteína/análise , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Soro do Leite/química , Sequência de Aminoácidos , Animais , Biomarcadores/análise , Bovinos , Feminino , Leite/química , Modelos Moleculares , Peptídeos/análise , Proteômica/métodos
19.
Hum Mutat ; 38(9): 1064-1071, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28102005

RESUMO

SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a support vector machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by gene ontology (GO) terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve of 0.88 with low false-positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper, we summarize the best results obtained by SNPs&GO on disease-related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013), and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs.


Assuntos
Substituição de Aminoácidos , Quinase do Ponto de Checagem 2/genética , Biologia Computacional/métodos , Inibidor p16 de Quinase Dependente de Ciclina/genética , Enzimas Reparadoras do DNA/genética , Proteínas de Ligação a DNA/genética , alfa-N-Acetilgalactosaminidase/genética , Hidrolases Anidrido Ácido , Algoritmos , Ontologia Genética , Predisposição Genética para Doença , Humanos , Anotação de Sequência Molecular , Curva ROC , Máquina de Vetores de Suporte
20.
Bioinformatics ; 31(20): 3269-75, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26079349

RESUMO

MOTIVATION: Molecular recognition of N-terminal targeting peptides is the most common mechanism controlling the import of nuclear-encoded proteins into mitochondria and chloroplasts. When experimental information is lacking, computational methods can annotate targeting peptides, and determine their cleavage sites for characterizing protein localization, function, and mature protein sequences. The problem of discriminating mitochondrial from chloroplastic propeptides is particularly relevant when annotating proteomes of photosynthetic Eukaryotes, endowed with both types of sequences. RESULTS: Here, we introduce TPpred3, a computational method that given any Eukaryotic protein sequence performs three different tasks: (i) the detection of targeting peptides; (ii) their classification as mitochondrial or chloroplastic and (iii) the precise localization of the cleavage sites in an organelle-specific framework. Our implementation is based on our TPpred previously introduced. Here, we integrate a new N-to-1 Extreme Learning Machine specifically designed for the classification task (ii). For the last task, we introduce an organelle-specific Support Vector Machine that exploits sequence motifs retrieved with an extensive motif-discovery analysis of a large set of mitochondrial and chloroplastic proteins. We show that TPpred3 outperforms the state-of-the-art methods in all the three tasks. AVAILABILITY AND IMPLEMENTATION: The method server and datasets are available at http://tppred3.biocomp.unibo.it. CONTACT: gigi@biocomp.unibo.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas de Cloroplastos/química , Aprendizado de Máquina , Proteínas Mitocondriais/química , Análise de Sequência de Proteína/métodos , Proteínas de Cloroplastos/metabolismo , Cloroplastos/metabolismo , Eucariotos/metabolismo , Mitocôndrias/metabolismo , Proteínas Mitocondriais/metabolismo , Peptídeos/química , Transporte Proteico , Software , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA