Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Bioinformatics ; 37(21): 3865-3873, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34086846

RESUMEN

MOTIVATION: Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION: Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Iluminación , Genotipo , Polimorfismo de Nucleótido Simple , Fenotipo
2.
PLoS Comput Biol ; 16(9): e1008244, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32960884

RESUMEN

Alcoholic-related liver disease (ALD) is the cause of more than half of all liver-related deaths. Sustained excess drinking causes fatty liver and alcohol-related steatohepatitis, which may progress to alcoholic liver fibrosis (ALF) and eventually to alcohol-related liver cirrhosis (ALC). Unfortunately, it is difficult to identify patients with early-stage ALD, as these are largely asymptomatic. Consequently, the majority of ALD patients are only diagnosed by the time ALD has reached decompensated cirrhosis, a symptomatic phase marked by the development of complications as bleeding and ascites. The main goal of this study is to discover relevant upstream diagnoses helping to understand the development of ALD, and to highlight meaningful downstream diagnoses that represent its progression to liver failure. Here, we use data from the Danish health registries covering the entire population of Denmark during nineteen years (1996-2014), to examine if it is possible to identify patients likely to develop ALF or ALC based on their past medical history. To this end, we explore a knowledge discovery approach by using high-dimensional statistical and machine learning techniques to extract and analyze data from the Danish National Patient Registry. Consistent with the late diagnoses of ALD, we find that ALC is the most common form of ALD in the registry data and that ALC patients have a strong over-representation of diagnoses associated with liver dysfunction. By contrast, we identify a small number of patients diagnosed with ALF who appear to be much less sick than those with ALC. We perform a matched case-control study using the group of patients with ALC as cases and their matched patients with non-ALD as controls. Machine learning models (SVM, RF, LightGBM and NaiveBayes) trained and tested on the set of ALC patients achieve a high performance for data classification (AUC = 0.89). When testing the same trained models on the small set of ALF patients, their performance unsurprisingly drops a lot (AUC = 0.67 for NaiveBayes). The statistical and machine learning results underscore small groups of upstream and downstream comorbidities that accurately detect ALC patients and show promise in prediction of ALF. Some of these groups are conditions either caused by alcohol or caused by malnutrition associated with alcohol-overuse. Others are comorbidities either related to trauma and life-style or to complications to cirrhosis, such as oesophageal varices. Our findings highlight the potential of this approach to uncover knowledge in registry data related to ALD.


Asunto(s)
Hepatopatías Alcohólicas/epidemiología , Hepatopatías Alcohólicas/patología , Aprendizaje Automático , Modelos Estadísticos , Anciano , Anciano de 80 o más Años , Comorbilidad , Dinamarca , Femenino , Humanos , Fallo Hepático/prevención & control , Masculino , Persona de Mediana Edad , Sistema de Registros , Factores de Riesgo
3.
J Proteome Res ; 16(6): 2262-2272, 2017 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-28440083

RESUMEN

The evolution of human health is a continuum of transitions, involving multifaceted processes at multiple levels, and there is an urgent need for integrative biomarkers that can characterize and predict progression toward disease development. The objective of this work was to perform a systems metabolomics approach to predict metabolic syndrome (MetS) development. A case-control design was used within the French occupational GAZEL cohort (n = 112 males: discovery study; n = 94: replication/validation study). Our integrative strategy was to combine untargeted metabolomics with clinical, sociodemographic, and food habit parameters to describe early phenotypes and build multidimensional predictive models. Different models were built from the discriminant variables, and prediction performances were optimized either when reducing the number of metabolites used or when keeping the associated signature. We illustrated that a selected reduced metabolic profile was able to reveal subtle phenotypic differences 5 years before MetS occurrence. Moreover, resulting metabolomic markers, when combined with clinical characteristics, allowed improving the disease development prediction. The validation study showed that this predictive performance was specific to the MetS component. This work also demonstrates the interest of such an approach to discover subphenotypes that will need further characterization to be able to shift to molecular reclassification and targeting of MetS.


Asunto(s)
Síndrome Metabólico/diagnóstico , Metabolómica/métodos , Valor Predictivo de las Pruebas , Biología de Sistemas/métodos , Biomarcadores , Estudios de Casos y Controles , Progresión de la Enfermedad , Francia , Humanos , Masculino , Persona de Mediana Edad , Fenotipo
4.
Database (Oxford) ; 20222022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35348648

RESUMEN

The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.


Asunto(s)
Minería de Datos , Estudio de Asociación del Genoma Completo , Bases de Datos Factuales
5.
Front Mol Biosci ; 3: 30, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27458587

RESUMEN

Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48 best candidates for prediction. Using linear logistic regression on this reduced dataset enabled us to obtain the best performances in terms of prediction accuracy and number of false positive with a model including 5 top variables. Therefore, these results highlighted the interest of feature selection methods and the importance of working on reduced datasets for the identification of predictive biomarkers issued from untargeted metabolomics data.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda