Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 138
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(5)2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38754097

RESUMO

MOTIVATION: Mutational signatures are a critical component in deciphering the genetic alterations that underlie cancer development and have become a valuable resource to understand the genomic changes during tumorigenesis. Therefore, it is essential to employ precise and accurate methods for their extraction to ensure that the underlying patterns are reliably identified and can be effectively utilized in new strategies for diagnosis, prognosis, and treatment of cancer patients. RESULTS: We present MUSE-XAE, a novel method for mutational signature extraction from cancer genomes using an explainable autoencoder. Our approach employs a hybrid architecture consisting of a nonlinear encoder that can capture nonlinear interactions among features, and a linear decoder which ensures the interpretability of the active signatures. We evaluated and compared MUSE-XAE with other available tools on both synthetic and real cancer datasets and demonstrated that it achieves superior performance in terms of precision and sensitivity in recovering mutational signature profiles. MUSE-XAE extracts highly discriminative mutational signature profiles by enhancing the classification of primary tumour types and subtypes in real world settings. This approach could facilitate further research in this area, with neural networks playing a critical role in advancing our understanding of cancer genomics. AVAILABILITY AND IMPLEMENTATION: MUSE-XAE software is freely available at https://github.com/compbiomed-unito/MUSE-XAE.


Assuntos
Mutação , Neoplasias , Humanos , Neoplasias/genética , Algoritmos , Software , Genômica/métodos , Biologia Computacional/métodos , Redes Neurais de Computação
3.
Nucleic Acids Res ; 51(W1): W451-W458, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37246737

RESUMO

One of the primary challenges in human genetics is determining the functional impact of single nucleotide variants (SNVs) and insertion and deletions (InDels), whether coding or noncoding. In the past, methods have been created to detect disease-related single amino acid changes, but only some can assess the influence of noncoding variations. CADD is the most commonly used and advanced algorithm for predicting the diverse effects of genome variations. It employs a combination of sequence conservation and functional features derived from the ENCODE project data. To use CADD, a large set of pre-calculated information must be downloaded during the installation process. To streamline the variant annotation process, we developed PhD-SNPg, a machine-learning tool that is easy to install and lightweight, relying solely on sequence-based features. Here we present an updated version, trained on a larger dataset, that can also predict the impact of the InDel variations. Despite its simplicity, PhD-SNPg performs similarly to CADD, making it ideal for rapid genome interpretation and as a benchmark for tool development.


Assuntos
Algoritmos , Genoma Humano , Humanos , Mutação INDEL , Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único
4.
BMC Biol ; 22(1): 202, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39256748

RESUMO

BACKGROUND: Seafood is increasingly traded worldwide, but its supply chain is particularly prone to frauds. To increase consumer confidence, prevent illegal trade, and provide independent validation for eco-labelling, accurate tools for seafood traceability are needed. Here we show that the use of microbiome profiling (MP) coupled with machine learning (ML) allows precise tracing the origin of Manila clams harvested in areas separated by small geographic distances. The study was designed to represent a real-world scenario. Clams were collected in different seasons across the most important production area in Europe (lagoons along the northern Adriatic coast) to cover the known seasonal variation in microbiome composition for the species. DNA extracted from samples underwent the same depuration process as commercial products (i.e. at least 12 h in open flow systems). RESULTS: Machine learning-based analysis of microbiome profiles was carried out using two completely independent sets of data (collected at the same locations but in different years), one for training the algorithm, and the other for testing its accuracy and assessing the temporal stability signal. Briefly, gills (GI) and digestive gland (DG) of clams were collected in summer and winter over two different years (i.e. from 2018 to 2020) in one banned area and four farming sites. 16S DNA metabarcoding was performed on clam tissues and the obtained amplicon sequence variants (ASVs) table was used as input for ML MP. The best-predicting performances were obtained using the combined information of GI and DG (consensus analysis), showing a Cohen K-score > 0.95 when the target was the classification of samples collected from the banned area and those harvested at farming sites. Classification of the four different farming areas showed slightly lower accuracy with a 0.76 score. CONCLUSIONS: We show here that MP coupled with ML is an effective tool to trace the origin of shellfish products. The tool is extremely robust against seasonal and inter-annual variability, as well as product depuration, and is ready for implementation in routine assessment to prevent the trade of illegally harvested or mislabeled shellfish.


Assuntos
Bivalves , Aprendizado de Máquina , Microbiota , Alimentos Marinhos , Alimentos Marinhos/microbiologia , Animais , Bivalves/microbiologia , Comércio
5.
Gut ; 73(5): 825-834, 2024 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-38199805

RESUMO

OBJECTIVE: Hyperferritinaemia is associated with liver fibrosis severity in patients with metabolic dysfunction-associated steatotic liver disease (MASLD), but the longitudinal implications have not been thoroughly investigated. We assessed the role of serum ferritin in predicting long-term outcomes or death. DESIGN: We evaluated the relationship between baseline serum ferritin and longitudinal events in a multicentre cohort of 1342 patients. Four survival models considering ferritin with confounders or non-invasive scoring systems were applied with repeated five-fold cross-validation schema. Prediction performance was evaluated in terms of Harrell's C-index and its improvement by including ferritin as a covariate. RESULTS: Median follow-up time was 96 months. Liver-related events occurred in 7.7%, hepatocellular carcinoma in 1.9%, cardiovascular events in 10.9%, extrahepatic cancers in 8.3% and all-cause mortality in 5.8%. Hyperferritinaemia was associated with a 50% increased risk of liver-related events and 27% of all-cause mortality. A stepwise increase in baseline ferritin thresholds was associated with a statistical increase in C-index, ranging between 0.02 (lasso-penalised Cox regression) and 0.03 (ridge-penalised Cox regression); the risk of developing liver-related events mainly increased from threshold 215.5 µg/L (median HR=1.71 and C-index=0.71) and the risk of overall mortality from threshold 272 µg/L (median HR=1.49 and C-index=0.70). The inclusion of serum ferritin thresholds (215.5 µg/L and 272 µg/L) in predictive models increased the performance of Fibrosis-4 and Non-Alcoholic Fatty Liver Disease Fibrosis Score in the longitudinal risk assessment of liver-related events (C-indices>0.71) and overall mortality (C-indices>0.65). CONCLUSIONS: This study supports the potential use of serum ferritin values for predicting the long-term prognosis of patients with MASLD.


Assuntos
Neoplasias Hepáticas , Doenças Metabólicas , Hepatopatia Gordurosa não Alcoólica , Humanos , Hepatopatia Gordurosa não Alcoólica/patologia , Cirrose Hepática/patologia , Fibrose , Neoplasias Hepáticas/complicações , Ferritinas
6.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35021190

RESUMO

Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and 'all' available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21-0.5 and 0-0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51-0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the $\Delta \Delta G$ predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.


Assuntos
Mutação Puntual , Proteínas , Mutação , Estabilidade Proteica , Proteínas/química , Termodinâmica
7.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37255310

RESUMO

MOTIVATION: The prediction of reliable Drug-Target Interactions (DTIs) is a key task in computer-aided drug design and repurposing. Here, we present a new approach based on data fusion for DTI prediction built on top of the NXTfusion library, which generalizes the Matrix Factorization paradigm by extending it to the nonlinear inference over Entity-Relation graphs. RESULTS: We benchmarked our approach on five datasets and we compared our models against state-of-the-art methods. Our models outperform most of the existing methods and, simultaneously, retain the flexibility to predict both DTIs as binary classification and regression of the real-valued drug-target affinity, competing with models built explicitly for each task. Moreover, our findings suggest that the validation of DTI methods should be stricter than what has been proposed in some previous studies, focusing more on mimicking real-life DTI settings where predictions for previously unseen drugs, proteins, and drug-protein pairs are needed. These settings are exactly the context in which the benefit of integrating heterogeneous information with our Entity-Relation data fusion approach is the most evident. AVAILABILITY AND IMPLEMENTATION: All software and data are available at https://github.com/eugeniomazzone/CPI-NXTFusion and https://pypi.org/project/NXTfusion/.


Assuntos
Desenvolvimento de Medicamentos , Software , Proteínas , Interações Medicamentosas , Desenho de Fármacos
8.
PLoS Comput Biol ; 19(9): e1011474, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37721960

RESUMO

Genetic markers (especially short tandem repeats or STRs) located on the X chromosome are a valuable resource to solve complex kinship cases in forensic genetics in addition or alternatively to autosomal STRs. Groups of tightly linked markers are combined into haplotypes, thus increasing the discriminating power of tests. However, this approach requires precise knowledge of the recombination rates between adjacent markers. The International Society of Forensic Genetics recommends that recombination rate estimation on the X chromosome is performed from pedigree genetic data while taking into account the confounding effect of mutations. However, implementations that satisfy these requirements have several drawbacks: they were never publicly released, they are very slow and/or need cluster-level hardware and strong computational expertise to use. In order to address these key concerns we developed Recombulator-X, a new open-source Python tool. The most challenging issue, namely the running time, was addressed with dynamic programming techniques to greatly reduce the computational complexity of the algorithm. Compared to the previous methods, Recombulator-X reduces the estimation times from weeks or months to less than one hour for typical datasets. Moreover, the estimation process, including preprocessing, has been streamlined and packaged into a simple command-line tool that can be run on a normal PC. Where previous approaches were limited to small panels of STR markers (up to 15), our tool can handle greater numbers (up to 100) of mixed STR and non-STR markers. In conclusion, Recombulator-X makes the estimation process much simpler, faster and accessible to researchers without a computational background, hopefully spurring increased adoption of best practices.

9.
Artigo em Inglês | MEDLINE | ID: mdl-39259210

RESUMO

Beta-blockers are a crucial part of post-myocardial infarction (MI) pharmacological therapy. Recent studies have raised questions about their efficacy in patients without reduced left ventricular ejection fraction (LVEF). This study aims to assess adherence to beta-blockers after discharge for ST-segment elevation myocardial infarction (STEMI) and the impact of adherence on outcomes based on LVEF at discharge. The retrospective registry FAST-STEMI evaluated real-world adherence to main cardiovascular drugs in STEMI patients between 2012 and 2017 by comparing purchased tablets to expected ones at one year through pharmacy registries. Optimal adherence was defined ≥80%. Primary outcomes included all-cause and cardiovascular death, while secondary outcomes were myocardial infarction, major/minor bleeding events, and ischemic stroke The study included 4688 patients discharged on beta-blockers. Mean age was 64 ± 12.3 years, 76% were male, and mean LVEF was 49.2 ± 8.8%. Mean adherence at one year was 87.1%. Optimal adherence was associated with lower all-cause (adjHR 0.62, 95%CI 0.41-0.92, p 0.02) and cardiovascular mortality (adjHR 0.55, 95%CI 0.26-0.98, p 0.043). In LVEF ≤40% patients, optimal adherence was linked to reduced all-cause and cardiovascular mortality but this was not found either in patients with preserved or mildly reduced LVEF. Predictors of cardiovascular mortality included older age, chronic kidney disease, male gender, and atrial fibrillation. Optimal adherence to beta-blocker therapy in all-comers STEMI patients reduced all-cause and cardiovascular mortality at 1 year; once stratified by LVEF, this effect is confirmed only in patients with reduced LVEF (< 40%) at hospital discharge.

10.
Nucleic Acids Res ; 50(3): e16, 2022 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-34792168

RESUMO

In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature.


Assuntos
Arabidopsis , Arabidopsis/genética , Genoma , Estudo de Associação Genômica Ampla , Genótipo , Redes Neurais de Computação , Fenótipo , Sequenciamento Completo do Genoma
11.
Nucleic Acids Res ; 50(W1): W222-W227, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35524565

RESUMO

Estimating the functional effect of single amino acid variants in proteins is fundamental for predicting the change in the thermodynamic stability, measured as the difference in the Gibbs free energy of unfolding, between the wild-type and the variant protein (ΔΔG). Here, we present the web-server of the DDGun method, which was previously developed for the ΔΔG prediction upon amino acid variants. DDGun is an untrained method based on basic features derived from evolutionary information. It is antisymmetric, as it predicts opposite ΔΔG values for direct (A → B) and reverse (B → A) single and multiple site variants. DDGun is available in two versions, one based on only sequence information and the other one based on sequence and structure information. Despite being untrained, DDGun reaches prediction performances comparable to those of trained methods. Here we make DDGun available as a web server. For the web server version, we updated the protein sequence database used for the computation of the evolutionary features, and we compiled two new data sets of protein variants to do a blind test of its performances. On these blind data sets of single and multiple site variants, DDGun confirms its prediction performance, reaching an average correlation coefficient between experimental and predicted ΔΔG of 0.45 and 0.49 for the sequence-based and structure-based versions, respectively. Besides being used for the prediction of ΔΔG, we suggest that DDGun should be adopted as a benchmark method to assess the predictive capabilities of newly developed methods. Releasing DDGun as a web-server, stand-alone program and docker image will facilitate the necessary process of method comparison to improve ΔΔG prediction.


Assuntos
Aminoácidos , Estabilidade Proteica , Proteínas , Aminoácidos/genética , Computadores , Bases de Dados de Proteínas , Proteínas/genética , Proteínas/química
12.
Clin Gastroenterol Hepatol ; 21(13): 3314-3321.e3, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37149016

RESUMO

BACKGROUND AND AIMS: Nonalcoholic fatty liver disease (NAFLD) is a complex disease, resulting from the interplay between environmental determinants and genetic variations. Single nucleotide polymorphism rs738409 C>G in the PNPLA3 gene is associated with hepatic fibrosis and with higher risk of developing hepatocellular carcinoma. Here, we analyzed a longitudinal cohort of biopsy-proven NAFLD subjects with the aim to identify individuals in whom genetics may have a stronger impact on disease progression. METHODS: We retrospectively analyzed 756 consecutive, prospectively enrolled biopsy-proven NAFLD subjects from Italy, United Kingdom, and Spain who were followed for a median of 84 months (interquartile range, 65-109 months). We stratified the study cohort according to sex, body mass index (BMI)

Assuntos
Carcinoma Hepatocelular , Varizes Esofágicas e Gástricas , Neoplasias Hepáticas , Hepatopatia Gordurosa não Alcoólica , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/complicações , Hepatopatia Gordurosa não Alcoólica/genética , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Carcinoma Hepatocelular/epidemiologia , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/complicações , Estudos Retrospectivos , Varizes Esofágicas e Gástricas/complicações , Hemorragia Gastrointestinal/complicações , Genótipo , Polimorfismo de Nucleotídeo Único , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/complicações , Predisposição Genética para Doença
13.
Brief Bioinform ; 22(2): 2172-2181, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32266404

RESUMO

Most living organisms rely on double-stranded DNA (dsDNA) to store their genetic information and perpetuate themselves. This biological information has been considered as the main target of evolution. However, here we show that symmetries and patterns in the dsDNA sequence can emerge from the physical peculiarities of the dsDNA molecule itself and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure. The randomness justifies the human codon biases and context-dependent mutation patterns in human populations. Thus, the DNA 'exceptional symmetries,' emerged from the randomness, have to be taken into account when looking for the DNA encoded information. Our results suggest that the double helix energy constraints and, more generally, the physical properties of the dsDNA are the hard drivers of the overall DNA sequence architecture, whereas the selective biological processes act as soft drivers, which only under extraordinary circumstances overtake the overall entropy content of the genome.


Assuntos
DNA/genética , Evolução Molecular , Análise de Sequência de DNA/métodos , Humanos
14.
Brief Bioinform ; 22(1): 601-603, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31885042

RESUMO

A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include the analysis of the missing method, showing that INPS is nearly insensitive to the addressed problem.


Assuntos
Algoritmos , Aprendizado de Máquina , Mutação , Estabilidade Proteica
15.
Gut ; 71(2): 382-390, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-33541866

RESUMO

OBJECTIVE: The full phenotypic expression of non-alcoholic fatty liver disease (NAFLD) in lean subjects is incompletely characterised. We aimed to investigate prevalence, characteristics and long-term prognosis of Caucasian lean subjects with NAFLD. DESIGN: The study cohort comprises 1339 biopsy-proven NAFLD subjects from four countries (Italy, UK, Spain and Australia), stratified into lean and non-lean (body mass index (BMI) 10 483 person-years), 4.7% of lean vs 7.7% of non-lean patients reported liver-related events (p=0.37). No difference in survival was observed compared with non-lean NAFLD (p=0.069). CONCLUSIONS: Caucasian lean subjects with NAFLD may progress to advanced liver disease, develop metabolic comorbidities and experience cardiovascular disease (CVD) as well as liver-related mortality, independent of longitudinal progression to obesity and PNPLA3 genotype. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD where the disease manifests at lower overall BMI thresholds. LAY SUMMARY: NAFLD may affect and progress in both obese and lean individuals. Lean subjects are predominantly males, have a younger age at diagnosis and are more prevalent in some geographic areas. During the follow-up, lean subjects can develop hepatic and extrahepatic disease, including metabolic comorbidities, in the absence of weight gain. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD.


Assuntos
Hepatopatia Gordurosa não Alcoólica/complicações , Magreza/complicações , População Branca , Adulto , Índice de Massa Corporal , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/mortalidade , Hepatopatia Gordurosa não Alcoólica/patologia , Prognóstico , Taxa de Sobrevida , Magreza/mortalidade , Magreza/patologia
16.
BMC Genomics ; 23(1): 87, 2022 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-35100973

RESUMO

BACKGROUND: Genomic DNA has been shaped by mutational processes through evolution. The cellular machinery for error correction and repair has left its marks in the nucleotide composition along with structural and functional constraints. Therefore, the probability of observing a base in a certain position in the human genome is highly context-dependent. RESULTS: Here we develop context-dependent nucleotide models. We first investigate models of nucleotides conditioned on sequence context. We develop a bidirectional Markov model that use an average of the probability from a Markov model applied to both strands of the sequence and thus depends on up to 14 bases to each side of the nucleotide. We show how the genome predictability varies across different types of genomic regions. Surprisingly, this model can predict a base from its context with an average of more than 50% accuracy. For somatic variants we show a tendency towards higher probability for the variant base than for the reference base. Inspired by DNA substitution models, we develop a model of mutability that estimates a mutation matrix (called the alpha matrix) on top of the nucleotide distribution. The alpha matrix can be estimated from a much smaller context than the nucleotide model, but the final model will still depend on the full context of the nucleotide model. With the bidirectional Markov model of order 14 and an alpha matrix dependent on just one base to each side, we obtain a model that compares well with a model of mutability that estimates mutation probabilities directly conditioned on three nucleotides to each side. For somatic variants in particular, our model fits better than the simpler model. Interestingly, the model is not very sensitive to the size of the context for the alpha matrix. CONCLUSIONS: Our study found strong context dependencies of nucleotides in the human genome. The best model uses a context of 14 nucleotides to each side. Based on these models, a substitution model was constructed that separates into the context model and a matrix dependent on a small context. The model fit somatic variants particularly well.


Assuntos
DNA , Nucleotídeos , DNA/genética , Genoma Humano , Genômica , Humanos , Nucleotídeos/genética , Probabilidade
17.
Hum Genet ; 141(10): 1649-1658, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35098354

RESUMO

Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein. This information has been extensively used to predict structure, interactions and functions in macromolecules. Pathogenicity prediction models rely on multiple sequence alignment information at different levels. However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants. Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels. We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level. Furthermore, to achieve the performance of state-of-the-art methods, such as CADD and REVEL, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included. Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features. As expected, our method results in ~ 3% lower area under the receiver-operating characteristic curve (AUC). When compared with an ensemble-based algorithm (REVEL). Nevertheless, the combination of predictions of multiple methods can help to identify more reliable predictions. These observations indicate that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.


Assuntos
Biologia Computacional , Ácidos Nucleicos , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Humanos , Mutação de Sentido Incorreto , Alinhamento de Sequência
18.
Bioinformatics ; 36(24): 5709-5711, 2021 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-33492342

RESUMO

SUMMARY: Identifying pathogenic variants and annotating them is a major challenge in human genetics, especially for the non-coding ones. Several tools have been developed and used to predict the functional effect of genetic variants. However, the calibration assessment of the predictions has received little attention. Calibration refers to the idea that if a model predicts a group of variants to be pathogenic with a probability P, it is expected that the same fraction P of true positive is found in the observed set. For instance, a well-calibrated classifier should label the variants such that among the ones to which it gave a probability value close to 0.7, approximately 70% actually belong to the pathogenic class. Poorly calibrated algorithms can be misleading and potentially harmful for clinical decision making. AVALIABILITY AND IMPLEMENTATION: The dataset used for testing the methods is available through the DOI:10.5281/zenodo.4448197. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

19.
BMC Biol ; 19(1): 3, 2021 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-33441128

RESUMO

BACKGROUND: Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task. RESULTS: In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions. CONCLUSIONS: To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open.


Assuntos
Carcinogênese/genética , Progressão da Doença , Aprendizado de Máquina , Oncologia/instrumentação , Neoplasias/genética , Medicina de Precisão/instrumentação , Neoplasias/patologia
20.
J Hepatol ; 75(4): 786-794, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34090928

RESUMO

BACKGROUND & AIMS: Non-invasive scoring systems (NSS) are used to identify patients with non-alcoholic fatty liver disease (NAFLD) who are at risk of advanced fibrosis, but their reliability in predicting long-term outcomes for hepatic/extrahepatic complications or death and their concordance in cross-sectional and longitudinal risk stratification remain uncertain. METHODS: The most common NSS (NFS, FIB-4, BARD, APRI) and the Hepamet fibrosis score (HFS) were assessed in 1,173 European patients with NAFLD from tertiary centres. Performance for fibrosis risk stratification and for the prediction of long-term hepatic/extrahepatic events, hepatocarcinoma (HCC) and overall mortality were evaluated in terms of AUC and Harrell's c-index. For longitudinal data, NSS-based Cox proportional hazard models were trained on the whole cohort with repeated 5-fold cross-validation, sampling for testing from the 607 patients with all NSS available. RESULTS: Cross-sectional analysis revealed HFS as the best performer for the identification of significant (F0-1 vs. F2-4, AUC = 0.758) and advanced (F0-2 vs. F3-4, AUC = 0.805) fibrosis, while NFS and FIB-4 showed the best performance for detecting histological cirrhosis (range AUCs 0.85-0.88). Considering longitudinal data (follow-up between 62 and 110 months), NFS and FIB-4 were the best at predicting liver-related events (c-indices>0.7), NFS for HCC (c-index = 0.9 on average), and FIB-4 and HFS for overall mortality (c-indices >0.8). All NSS showed limited performance (c-indices <0.7) for extrahepatic events. CONCLUSIONS: Overall, NFS, HFS and FIB-4 outperformed APRI and BARD for both cross-sectional identification of fibrosis and prediction of long-term outcomes, confirming that they are useful tools for the clinical management of patients with NAFLD at increased risk of fibrosis and liver-related complications or death. LAY SUMMARY: Non-invasive scoring systems are increasingly being used in patients with non-alcoholic fatty liver disease to identify those at risk of advanced fibrosis and hence clinical complications. Herein, we compared various non-invasive scoring systems and identified those that were best at identifying risk, as well as those that were best for the prediction of long-term outcomes, such as liver-related events, liver cancer and death.


Assuntos
Hepatopatia Gordurosa não Alcoólica/complicações , Valor Preditivo dos Testes , Projetos de Pesquisa/normas , Tempo , Adulto , Área Sob a Curva , Estudos Transversais , Feminino , Humanos , Fígado/patologia , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/mortalidade , Prognóstico , Curva ROC , Reprodutibilidade dos Testes , Projetos de Pesquisa/tendências , Índice de Gravidade de Doença
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA