Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 135
Filtrar
1.
Sci Rep ; 14(1): 13188, 2024 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-38851759

RESUMEN

Genome interpretation (GI) encompasses the computational attempts to model the relationship between genotype and phenotype with the goal of understanding how the first leads to the second. While traditional approaches have focused on sub-problems such as predicting the effect of single nucleotide variants or finding genetic associations, recent advances in neural networks (NNs) have made it possible to develop end-to-end GI models that take genomic data as input and predict phenotypes as output. However, technical and modeling issues still need to be fixed for these models to be effective, including the widespread underdetermination of genomic datasets, making them unsuitable for training large, overfitting-prone, NNs. Here we propose novel GI models to address this issue, exploring the use of two types of transfer learning approaches and proposing a novel Biologically Meaningful Sparse NN layer specifically designed for end-to-end GI. Our models predict the leaf and seed ionome in A.thaliana, obtaining comparable results to our previous over-parameterized model while reducing the number of parameters by 8.8 folds. We also investigate how the effect of population stratification influences the evaluation of the performances, highlighting how it leads to (1) an instance of the Simpson's Paradox, and (2) model generalization limitations.


Asunto(s)
Arabidopsis , Genoma de Planta , Hojas de la Planta , Semillas , Arabidopsis/genética , Hojas de la Planta/genética , Hojas de la Planta/metabolismo , Semillas/genética , Semillas/metabolismo , Redes Neurales de la Computación , Genómica/métodos , Fenotipo , Modelos Genéticos , Genotipo
2.
JCO Clin Cancer Inform ; 8: e2400008, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38875514

RESUMEN

PURPOSE: Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)-based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities. METHODS: We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure. RESULTS: UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models. CONCLUSION: MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection.


Asunto(s)
Inteligencia Artificial , Medicina de Precisión , Humanos , Pronóstico , Medicina de Precisión/métodos , Femenino , Enfermedades Raras/clasificación , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Masculino , Aprendizaje Profundo , Neoplasias/clasificación , Neoplasias/genética , Neoplasias/diagnóstico , Síndromes Mielodisplásicos/diagnóstico , Síndromes Mielodisplásicos/clasificación , Síndromes Mielodisplásicos/genética , Síndromes Mielodisplásicos/terapia , Algoritmos , Persona de Mediana Edad , Anciano , Análisis por Conglomerados
3.
medRxiv ; 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38766179

RESUMEN

Genetic variants in genes GRIN1 , GRIN2A , GRIN2B , and GRIN2D , which encode subunits of the N-methyl-D-aspartate receptor (NMDAR), have been associated with severe and heterogeneous neurologic diseases. Missense variants in these genes can result in gain or loss of the NMDAR function, requiring opposite therapeutic treatments. Computational methods that predict pathogenicity and molecular functional effects are therefore crucial for accurate diagnosis and therapeutic applications. We assembled missense variants: 201 from patients, 631 from general population, and 159 characterized by electrophysiological readouts showing whether they can enhance or reduce the receptor function. This includes new functional data from 47 variants reported here, for the first time. We found that pathogenic/benign variants and variants that increase/decrease the channel function were distributed unevenly on the protein structure, with spatial proximity to ligands bound to the agonist and antagonist binding sites being key predictive features. Leveraging distances from ligands, we developed two independent machine learning-based predictors for NMDAR missense variants: a pathogenicity predictor which outperforms currently available predictors (AUC=0.945, MCC=0.726), and the first binary predictor of molecular function (increase or decrease) (AUC=0.809, MCC=0.523). Using these, we reclassified variants of uncertain significance in the ClinVar database and refined a previous genome-informed epidemiological model to estimate the birth incidence of molecular mechanism-defined GRIN disorders. Our findings demonstrate that distance from ligands is an important feature in NMDARs that can enhance variant pathogenicity prediction and enable functional prediction. Further studies with larger numbers of phenotypically and functionally characterized variants will enhance the potential clinical utility of this method.

4.
Bioinformatics ; 40(5)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38754097

RESUMEN

MOTIVATION: Mutational signatures are a critical component in deciphering the genetic alterations that underlie cancer development and have become a valuable resource to understand the genomic changes during tumorigenesis. Therefore, it is essential to employ precise and accurate methods for their extraction to ensure that the underlying patterns are reliably identified and can be effectively utilized in new strategies for diagnosis, prognosis, and treatment of cancer patients. RESULTS: We present MUSE-XAE, a novel method for mutational signature extraction from cancer genomes using an explainable autoencoder. Our approach employs a hybrid architecture consisting of a nonlinear encoder that can capture nonlinear interactions among features, and a linear decoder which ensures the interpretability of the active signatures. We evaluated and compared MUSE-XAE with other available tools on both synthetic and real cancer datasets and demonstrated that it achieves superior performance in terms of precision and sensitivity in recovering mutational signature profiles. MUSE-XAE extracts highly discriminative mutational signature profiles by enhancing the classification of primary tumour types and subtypes in real world settings. This approach could facilitate further research in this area, with neural networks playing a critical role in advancing our understanding of cancer genomics. AVAILABILITY AND IMPLEMENTATION: MUSE-XAE software is freely available at https://github.com/compbiomed-unito/MUSE-XAE.


Asunto(s)
Mutación , Neoplasias , Humanos , Neoplasias/genética , Algoritmos , Programas Informáticos , Genómica/métodos , Biología Computacional/métodos , Redes Neurales de la Computación
5.
Int J Cardiol ; 405: 131933, 2024 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-38437950

RESUMEN

BACKGROUND: The impact of statin therapy on cardiovascular outcomes after ST-elevation acute myocardial infarction (STEMI) in real- world patients is understudied. AIMS: To identify predictors of low adherence and discontinuation to statin therapy within 6 months after STEMI and to estimate their impact on cardiovascular outcomes at one year follow-up. METHODS: We evaluated real-world adherence to statin therapy by comparing the number of bought tablets to the expected ones at 1 year follow-up through pharmacy registries. A total of 6043 STEMI patients admitted from 2012 to 2017 were enrolled in the FAST STEMI registry and followed up for 4,7 ± 1,6 years; 304 patients with intraprocedural and intrahospital deaths were excluded. The main outcomes evaluated were all-cause death, cardiovascular death, myocardial infarction, major and minor bleeding events, and ischemic stroke. The compliance cut-off chosen was 80% as mainly reported in literature. RESULTS: From a total of 5744 patients, 418 (7,2%) patients interrupted statin therapy within 6 months after STEMI, whereas 3337 (58,1%) presented >80% adherence to statin therapy. Statin optimal adherence (>80%) resulted as protective factor towards both cardiovascular (0.1% vs 4.6%; AdjHR 0.025, 95%CI 0.008-0.079, p < 0.001) and all-cause mortality (0.3% vs 13.4%; Adj HR 0.032, 95%CI 0.018-0.059, p < 0.001) at 1 year follow-up. Further, a significant reduction of ischemic stroke incidence (1% vs 2.5%, p = 0.001) was seen in the optimal adherent group. Statin discontinuation within 6 months after STEMI showed an increase of both cardiovascular (5% vs 1.7%; AdjHR 2.23; 95%CI 1.37-3.65; p = 0,001) and all-cause mortality (14.8% vs 5.1%, AdjHR 2.32; 95%CI 1.73-3.11; p ã€ˆ0,001) at 1 year follow-up. After multivariate analysis age over 75 years old, known ischemic cardiopathy and female gender resulted as predictors of therapy discontinuation. Age over 75 years old, chronic kidney disease, previous atrial fibrillation, vasculopathy, known ischemic cardiopathy were found to be predictors of low statin adherence. CONCLUSIONS: n our real-world registry low statin adherence and discontinuation therapy within 6 months after STEMI were independently associated to an increase of cardiovascular and all-cause mortality at 1 year follow-up. Low statin adherence led to higher rates of ischemic stroke.


Asunto(s)
Inhibidores de Hidroximetilglutaril-CoA Reductasas , Cumplimiento de la Medicación , Sistema de Registros , Infarto del Miocardio con Elevación del ST , Humanos , Infarto del Miocardio con Elevación del ST/tratamiento farmacológico , Infarto del Miocardio con Elevación del ST/mortalidad , Masculino , Inhibidores de Hidroximetilglutaril-CoA Reductasas/administración & dosificación , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Femenino , Cumplimiento de la Medicación/estadística & datos numéricos , Anciano , Persona de Mediana Edad , Estudios de Seguimiento , Factores de Tiempo , Resultado del Tratamiento
6.
Comput Biol Med ; 172: 108288, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38503094

RESUMEN

Data sharing among different institutions represents one of the major challenges in developing distributed machine learning approaches, especially when data is sensitive, such as in medical applications. Federated learning is a possible solution, but requires fast communications and flawless security. Here, we propose SYNDSURV (SYNthetic Distributed SURVival), an alternative approach that simplifies the current state-of-the-art paradigm by allowing different centres to generate local simulated instances from real data and then gather them into a centralised hub, where an Artificial Intelligence (AI) model can learn in a standard way. The main advantage of this procedure is that it is model-agnostic, therefore prediction models can be directly applied in distributed applications without requiring particular adaptations as the current federated approaches do. To show the validity of our approach for medical applications, we tested it on a survival analysis task, offering a viable alternative to train AI models on distributed data. While federated learning has been mainly optimised for gradient-based approaches so far, our framework works with any predictive method, proving to be a comparable way of performing distributed learning without being too demanding towards each participating institute in terms of infrastructural requirements.


Asunto(s)
Inteligencia Artificial , Aprendizaje Automático , Análisis de Supervivencia
7.
Gut ; 73(5): 825-834, 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38199805

RESUMEN

OBJECTIVE: Hyperferritinaemia is associated with liver fibrosis severity in patients with metabolic dysfunction-associated steatotic liver disease (MASLD), but the longitudinal implications have not been thoroughly investigated. We assessed the role of serum ferritin in predicting long-term outcomes or death. DESIGN: We evaluated the relationship between baseline serum ferritin and longitudinal events in a multicentre cohort of 1342 patients. Four survival models considering ferritin with confounders or non-invasive scoring systems were applied with repeated five-fold cross-validation schema. Prediction performance was evaluated in terms of Harrell's C-index and its improvement by including ferritin as a covariate. RESULTS: Median follow-up time was 96 months. Liver-related events occurred in 7.7%, hepatocellular carcinoma in 1.9%, cardiovascular events in 10.9%, extrahepatic cancers in 8.3% and all-cause mortality in 5.8%. Hyperferritinaemia was associated with a 50% increased risk of liver-related events and 27% of all-cause mortality. A stepwise increase in baseline ferritin thresholds was associated with a statistical increase in C-index, ranging between 0.02 (lasso-penalised Cox regression) and 0.03 (ridge-penalised Cox regression); the risk of developing liver-related events mainly increased from threshold 215.5 µg/L (median HR=1.71 and C-index=0.71) and the risk of overall mortality from threshold 272 µg/L (median HR=1.49 and C-index=0.70). The inclusion of serum ferritin thresholds (215.5 µg/L and 272 µg/L) in predictive models increased the performance of Fibrosis-4 and Non-Alcoholic Fatty Liver Disease Fibrosis Score in the longitudinal risk assessment of liver-related events (C-indices>0.71) and overall mortality (C-indices>0.65). CONCLUSIONS: This study supports the potential use of serum ferritin values for predicting the long-term prognosis of patients with MASLD.


Asunto(s)
Neoplasias Hepáticas , Enfermedades Metabólicas , Enfermedad del Hígado Graso no Alcohólico , Humanos , Enfermedad del Hígado Graso no Alcohólico/patología , Cirrosis Hepática/patología , Fibrosis , Neoplasias Hepáticas/complicaciones , Ferritinas
8.
Genes (Basel) ; 14(12)2023 12 17.
Artículo en Inglés | MEDLINE | ID: mdl-38137050

RESUMEN

Missense variation in genomes can affect protein structure stability and, in turn, the cell physiology behavior. Predicting the impact of those variations is relevant, and the best-performing computational tools exploit the protein structure information. However, most of the current protein sequence variants are unresolved, and comparative or ab initio tools can provide a structure. Here, we evaluate the impact of model structures, compared to experimental structures, on the predictors of protein stability changes upon single-point mutations, where no significant changes are expected between the original and the mutated structures. We show that there are substantial differences among the computational tools. Methods that rely on coarse-grained representation are less sensitive to the underlying protein structures. In contrast, tools that exploit more detailed molecular representations are sensible to structures generated from comparative modeling, even on single-residue substitutions.


Asunto(s)
Biología Computacional , Mutación Puntual , Biología Computacional/métodos , Proteínas/metabolismo , Estabilidad Proteica , Secuencia de Aminoácidos
9.
Front Oncol ; 13: 1242639, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37869094

RESUMEN

Introduction: Prostate cancer (PCa) is the most frequent tumor among men in Europe and has both indolent and aggressive forms. There are several treatment options, the choice of which depends on multiple factors. To further improve current prognostication models, we established the Turin Prostate Cancer Prognostication (TPCP) cohort, an Italian retrospective biopsy cohort of patients with PCa and long-term follow-up. This work presents this new cohort with its main characteristics and the distributions of some of its core variables, along with its potential contributions to PCa research. Methods: The TPCP cohort includes consecutive non-metastatic patients with first positive biopsy for PCa performed between 2008 and 2013 at the main hospital in Turin, Italy. The follow-up ended on December 31st 2021. The primary outcome is the occurrence of metastasis; death from PCa and overall mortality are the secondary outcomes. In addition to numerous clinical variables, the study's prognostic variables include histopathologic information assigned by a centralized uropathology review using a digital pathology software system specialized for the study of PCa, tumor DNA methylation in candidate genes, and features extracted from digitized slide images via Deep Neural Networks. Results: The cohort includes 891 patients followed-up for a median time of 10 years. During this period, 97 patients had progression to metastatic disease and 301 died; of these, 56 died from PCa. In total, 65.3% of the cohort has a Gleason score less than or equal to 3 + 4, and 44.5% has a clinical stage cT1. Consistent with previous studies, age and clinical stage at diagnosis are important prognostic factors: the crude cumulative incidence of metastatic disease during the 14-years of follow-up increases from 9.1% among patients younger than 64 to 16.2% for patients in the age group of 75-84, and from 6.1% for cT1 stage to 27.9% in cT3 stage. Discussion: This study stands to be an important resource for updating existing prognostic models for PCa on an Italian cohort. In addition, the integrated collection of multi-modal data will allow development and/or validation of new models including new histopathological, digital, and molecular markers, with the goal of better directing clinical decisions to manage patients with PCa.

10.
PLoS Comput Biol ; 19(9): e1011474, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37721960

RESUMEN

Genetic markers (especially short tandem repeats or STRs) located on the X chromosome are a valuable resource to solve complex kinship cases in forensic genetics in addition or alternatively to autosomal STRs. Groups of tightly linked markers are combined into haplotypes, thus increasing the discriminating power of tests. However, this approach requires precise knowledge of the recombination rates between adjacent markers. The International Society of Forensic Genetics recommends that recombination rate estimation on the X chromosome is performed from pedigree genetic data while taking into account the confounding effect of mutations. However, implementations that satisfy these requirements have several drawbacks: they were never publicly released, they are very slow and/or need cluster-level hardware and strong computational expertise to use. In order to address these key concerns we developed Recombulator-X, a new open-source Python tool. The most challenging issue, namely the running time, was addressed with dynamic programming techniques to greatly reduce the computational complexity of the algorithm. Compared to the previous methods, Recombulator-X reduces the estimation times from weeks or months to less than one hour for typical datasets. Moreover, the estimation process, including preprocessing, has been streamlined and packaged into a simple command-line tool that can be run on a normal PC. Where previous approaches were limited to small panels of STR markers (up to 15), our tool can handle greater numbers (up to 100) of mixed STR and non-STR markers. In conclusion, Recombulator-X makes the estimation process much simpler, faster and accessible to researchers without a computational background, hopefully spurring increased adoption of best practices.

11.
J Mol Biol ; 435(20): 168245, 2023 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-37625584

RESUMEN

The study of protein folding plays a crucial role in improving our understanding of protein function and of the relationship between genetics and phenotypes. In particular, understanding the thermodynamics and kinetics of the folding process is important for uncovering the mechanisms behind human disorders caused by protein misfolding. To address this issue, it is essential to collect and curate experimental kinetic and thermodynamic data on protein folding. K-Pro is a new database designed for collecting and storing experimental kinetic data on monomeric proteins, with a two-state folding mechanism. With 1,529 records from 62 proteins corresponding to 65 structures, K-Pro contains various kinetic parameters such as the logarithm of the folding and unfolding rates, Tanford's ß and the ϕ values. When available, the database also includes thermodynamic parameters associated with the kinetic data. K-Pro features a user-friendly interface that allows browsing and downloading kinetic data of interest. The graphical interface provides a visual representation of the protein and mutants, and it is cross-linked to key databases such as PDB, UniProt, and PubMed. K-Pro is open and freely accessible through https://folding.biofold.org/k-pro and supports the latest versions of popular browsers.


Asunto(s)
Bases de Datos de Proteínas , Pliegue de Proteína , Proteínas , Humanos , Cinética , Desnaturalización Proteica , Proteínas/química , Proteínas/genética , Termodinámica , Proteínas Mutantes/química , Proteínas Mutantes/genética
12.
Adv Radiat Oncol ; 8(5): 101228, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37405256

RESUMEN

Purpose: The objective of this work was to investigate the ability of machine learning models to use treatment plan dosimetry for prediction of clinician approval of treatment plans (no further planning needed) for left-sided whole breast radiation therapy with boost. Methods and Materials: Investigated plans were generated to deliver a dose of 40.05 Gy to the whole breast in 15 fractions over 3 weeks, with the tumor bed simultaneously boosted to 48 Gy. In addition to the manually generated clinical plan of each of the 120 patients from a single institution, an automatically generated plan was included for each patient to enhance the number of study plans to 240. In random order, the treating clinician retrospectively scored all 240 plans as (1) approved without further planning to seek improvement or (2) further planning needed, while being blind for type of plan generation (manual or automated). In total, 2 × 5 classifiers were trained and evaluated for ability to correctly predict the clinician's plan evaluations: random forest (RF) and constrained logistic regression (LR) classifiers, each trained for 5 different sets of dosimetric plan parameters (feature sets [FS]). Importances of included features for predictions were investigated to better understand clinicians' choices. Results: Although all 240 plans were in principle clinically acceptable for the clinician, only for 71.5% was no further planning required. For the most extensive FS, accuracy, area under the receiver operating characteristic curve, and Cohen's κ for generated RF/LR models for prediction of approval without further planning were 87.2 ± 2.0/86.7 ± 2.2, 0.80 ± 0.03/0.86 ± 0.02, and 0.63 ± 0.05/0.69 ± 0.04, respectively. In contrast to LR, RF performance was independent of the applied FS. For both RF and LR, whole breast excluding boost PTV (PTV40.05Gy) was the most important structure for predictions, with importance factors of 44.6% and 43%, respectively, dose recieved by 95% volume of PTV40.05 (D95%) as the most important parameter in most cases. Conclusions: The investigated use of machine learning to predict clinician approval of treatment plans is highly promising. Including nondosimetric parameters could further increase classifiers' performances. The tool could become useful for aiding treatment planners in generating plans with a high probability of being directly approved by the treating clinician.

13.
Artif Intell Med ; 142: 102588, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37316101

RESUMEN

BACKGROUND: Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterised by the progressive loss of motor neurons in the brain and spinal cord. The fact that ALS's disease course is highly heterogeneous, and its determinants not fully known, combined with ALS's relatively low prevalence, renders the successful application of artificial intelligence (AI) techniques particularly arduous. OBJECTIVE: This systematic review aims at identifying areas of agreement and unanswered questions regarding two notable applications of AI in ALS, namely the automatic, data-driven stratification of patients according to their phenotype, and the prediction of ALS progression. Differently from previous works, this review is focused on the methodological landscape of AI in ALS. METHODS: We conducted a systematic search of the Scopus and PubMed databases, looking for studies on data-driven stratification methods based on unsupervised techniques resulting in (A) automatic group discovery or (B) a transformation of the feature space allowing patient subgroups to be identified; and for studies on internally or externally validated methods for the prediction of ALS progression. We described the selected studies according to the following characteristics, when applicable: variables used, methodology, splitting criteria and number of groups, prediction outcomes, validation schemes, and metrics. RESULTS: Of the starting 1604 unique reports (2837 combined hits between Scopus and PubMed), 239 were selected for thorough screening, leading to the inclusion of 15 studies on patient stratification, 28 on prediction of ALS progression, and 6 on both stratification and prediction. In terms of variables used, most stratification and prediction studies included demographics and features derived from the ALSFRS or ALSFRS-R scores, which were also the main prediction targets. The most represented stratification methods were K-means, and hierarchical and expectation-maximisation clustering; while random forests, logistic regression, the Cox proportional hazard model, and various flavours of deep learning were the most widely used prediction methods. Predictive model validation was, albeit unexpectedly, quite rarely performed in absolute terms (leading to the exclusion of 78 eligible studies), with the overwhelming majority of included studies resorting to internal validation only. CONCLUSION: This systematic review highlighted a general agreement in terms of input variable selection for both stratification and prediction of ALS progression, and in terms of prediction targets. A striking lack of validated models emerged, as well as a general difficulty in reproducing many published studies, mainly due to the absence of the corresponding parameter lists. While deep learning seems promising for prediction applications, its superiority with respect to traditional methods has not been established; there is, instead, ample room for its application in the subfield of patient stratification. Finally, an open question remains on the role of new environmental and behavioural variables collected via novel, real-time sensors.


Asunto(s)
Esclerosis Amiotrófica Lateral , Humanos , Esclerosis Amiotrófica Lateral/diagnóstico , Inteligencia Artificial , Encéfalo , Análisis por Conglomerados , Bases de Datos Factuales
14.
Front Mol Biosci ; 10: 1169109, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37234922

RESUMEN

Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.

15.
Bioinformatics ; 39(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37255310

RESUMEN

MOTIVATION: The prediction of reliable Drug-Target Interactions (DTIs) is a key task in computer-aided drug design and repurposing. Here, we present a new approach based on data fusion for DTI prediction built on top of the NXTfusion library, which generalizes the Matrix Factorization paradigm by extending it to the nonlinear inference over Entity-Relation graphs. RESULTS: We benchmarked our approach on five datasets and we compared our models against state-of-the-art methods. Our models outperform most of the existing methods and, simultaneously, retain the flexibility to predict both DTIs as binary classification and regression of the real-valued drug-target affinity, competing with models built explicitly for each task. Moreover, our findings suggest that the validation of DTI methods should be stricter than what has been proposed in some previous studies, focusing more on mimicking real-life DTI settings where predictions for previously unseen drugs, proteins, and drug-protein pairs are needed. These settings are exactly the context in which the benefit of integrating heterogeneous information with our Entity-Relation data fusion approach is the most evident. AVAILABILITY AND IMPLEMENTATION: All software and data are available at https://github.com/eugeniomazzone/CPI-NXTFusion and https://pypi.org/project/NXTfusion/.


Asunto(s)
Desarrollo de Medicamentos , Programas Informáticos , Proteínas , Interacciones Farmacológicas , Diseño de Fármacos
16.
Nucleic Acids Res ; 51(W1): W451-W458, 2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37246737

RESUMEN

One of the primary challenges in human genetics is determining the functional impact of single nucleotide variants (SNVs) and insertion and deletions (InDels), whether coding or noncoding. In the past, methods have been created to detect disease-related single amino acid changes, but only some can assess the influence of noncoding variations. CADD is the most commonly used and advanced algorithm for predicting the diverse effects of genome variations. It employs a combination of sequence conservation and functional features derived from the ENCODE project data. To use CADD, a large set of pre-calculated information must be downloaded during the installation process. To streamline the variant annotation process, we developed PhD-SNPg, a machine-learning tool that is easy to install and lightweight, relying solely on sequence-based features. Here we present an updated version, trained on a larger dataset, that can also predict the impact of the InDel variations. Despite its simplicity, PhD-SNPg performs similarly to CADD, making it ideal for rapid genome interpretation and as a benchmark for tool development.


Asunto(s)
Algoritmos , Genoma Humano , Humanos , Mutación INDEL , Aprendizaje Automático , Polimorfismo de Nucleótido Simple
17.
Clin Gastroenterol Hepatol ; 21(13): 3314-3321.e3, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37149016

RESUMEN

BACKGROUND AND AIMS: Nonalcoholic fatty liver disease (NAFLD) is a complex disease, resulting from the interplay between environmental determinants and genetic variations. Single nucleotide polymorphism rs738409 C>G in the PNPLA3 gene is associated with hepatic fibrosis and with higher risk of developing hepatocellular carcinoma. Here, we analyzed a longitudinal cohort of biopsy-proven NAFLD subjects with the aim to identify individuals in whom genetics may have a stronger impact on disease progression. METHODS: We retrospectively analyzed 756 consecutive, prospectively enrolled biopsy-proven NAFLD subjects from Italy, United Kingdom, and Spain who were followed for a median of 84 months (interquartile range, 65-109 months). We stratified the study cohort according to sex, body mass index (BMI)

Asunto(s)
Carcinoma Hepatocelular , Várices Esofágicas y Gástricas , Neoplasias Hepáticas , Enfermedad del Hígado Graso no Alcohólico , Humanos , Femenino , Masculino , Persona de Mediana Edad , Enfermedad del Hígado Graso no Alcohólico/complicaciones , Enfermedad del Hígado Graso no Alcohólico/genética , Enfermedad del Hígado Graso no Alcohólico/epidemiología , Carcinoma Hepatocelular/epidemiología , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/complicaciones , Estudios Retrospectivos , Várices Esofágicas y Gástricas/complicaciones , Hemorragia Gastrointestinal/complicaciones , Genotipo , Polimorfismo de Nucleótido Simple , Neoplasias Hepáticas/epidemiología , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/complicaciones , Predisposición Genética a la Enfermedad
18.
Environ Int ; 173: 107864, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36913779

RESUMEN

BACKGROUND: The exposome drivers are less studied than its consequences but may be crucial in identifying population subgroups with unfavourable exposures. OBJECTIVES: We used three approaches to study the socioeconomic position (SEP) as a driver of the early-life exposome in Turin children of the NINFEA cohort (Italy). METHODS: Forty-two environmental exposures, collected at 18 months of age (N = 1989), were classified in 5 groups (lifestyle, diet, meteoclimatic, traffic-related, built environment). We performed cluster analysis to identify subjects sharing similar exposures, and intra-exposome-group Principal Component Analysis (PCA) to reduce the dimensionality. SEP at childbirth was measured through the Equivalised Household Income Indicator. SEP-exposome association was evaluated using: 1) an Exposome Wide Association Study (ExWAS), a one-exposure (SEP) one-outcome (exposome) approach; 2) multinomial regression of cluster membership on SEP; 3) regressions of each intra-exposome-group PC on SEP. RESULTS: In the ExWAS, medium/low SEP children were more exposed to greenness, pet ownership, passive smoking, TV screen and sugar; less exposed to NO2, NOX, PM25abs, humidity, built environment, traffic load, unhealthy food facilities, fruit, vegetables, eggs, grain products, and childcare than high SEP children. Medium/low SEP children were more likely to belong to a cluster with poor diet, less air pollution, and to live in the suburbs than high SEP children. Medium/low SEP children were more exposed to lifestyle PC1 (unhealthy lifestyle) and diet PC2 (unhealthy diet), and less exposed to PC1s of the built environment (urbanization factors), diet (mixed diet), and traffic (air pollution) than high SEP children. CONCLUSIONS: The three approaches provided consistent and complementary results, suggesting that children with lower SEP are less exposed to urbanization factors and more exposed to unhealthy lifestyles and diet. The simplest method, the ExWAS, conveys most of the information and is more replicable in other populations. Clustering and PCA may facilitate results interpretation and communication.


Asunto(s)
Contaminación del Aire , Exposoma , Humanos , Niño , Cohorte de Nacimiento , Exposición a Riesgos Ambientales/análisis , Factores Socioeconómicos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...