Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
PLoS One ; 17(7): e0271395, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35830458

RESUMEN

Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the expression of nearby genes on the genome. However, identifying which genes are targeted by these disease-associated SNPs remains challenging. In the past, protein knowledge graphs have often been used to identify genes that are associated with disease, also referred to as "disease genes". Here, we explore whether protein knowledge graphs can be used to identify genes that are targeted by disease-associated non-coding SNPs by testing and comparing the performance of six existing methods for a protein knowledge graph, four of which were developed for disease gene identification. We compare our performance against two baselines: (1) an existing state-of-the-art method that is based on guilt-by-association, and (2) the leading assumption that SNPs target the nearest gene on the genome. We test these methods with four reference sets, three of which were obtained by different means. Furthermore, we combine methods to investigate whether their combination improves performance. We find that protein knowledge graphs that include predicate information perform comparable to the current state of the art, achieving an area under the receiver operating characteristic curve (AUC) of 79.6% on average across all four reference sets. Protein knowledge graphs that lack predicate information perform comparable to our other baseline (genetic distance) which achieved an AUC of 75.7% across all four reference sets. Combining multiple methods improved performance to 84.9% AUC. We conclude that methods for a protein knowledge graph can be used to identify which genes are targeted by disease-associated non-coding SNPs.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Reconocimiento de Normas Patrones Automatizadas , Fenotipo
2.
Int J Mol Sci ; 21(16)2020 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-32806627

RESUMEN

Fabry Disease (FD) is a rare, X-linked, lysosomal storage disease that mainly causes renal, cardiac and cerebral complications. Enzyme replacement therapy (ERT) with recombinant alpha-galactosidase A is available, but approximately 50% of male patients with classical FD develop inhibiting anti-drug antibodies (iADAs) that lead to reduced biochemical responses and an accelerated loss of renal function. Once immunization has occurred, iADAs tend to persist and tolerization is hard to achieve. Here we developed a pre-treatment prediction model for iADA development in FD using existing data from 120 classical male FD patients from three European centers, treated with ERT. We found that nonsense and frameshift mutations in the α-galactosidase A gene (p = 0.05), higher plasma lysoGb3 at baseline (p < 0.001) and agalsidase beta as first treatment (p = 0.006) were significantly associated with iADA development. Prediction performance of a Random Forest model, using multiple variables (AUC-ROC: 0.77) was compared to a logistic regression (LR) model using the three significantly associated variables (AUC-ROC: 0.77). The LR model can be used to determine iADA risk in individual FD patients prior to treatment initiation. This helps to determine in which patients adjusted treatment and/or immunomodulatory regimes may be considered to minimize iADA development risk.


Asunto(s)
Anticuerpos/inmunología , Enfermedad de Fabry/tratamiento farmacológico , Enfermedad de Fabry/inmunología , Isoenzimas/inmunología , Isoenzimas/uso terapéutico , Proteínas Recombinantes/inmunología , Proteínas Recombinantes/uso terapéutico , alfa-Galactosidasa/inmunología , alfa-Galactosidasa/uso terapéutico , Adolescente , Adulto , Algoritmos , Área Bajo la Curva , Niño , Estudios de Cohortes , Humanos , Modelos Logísticos , Masculino , Persona de Mediana Edad , Curva ROC , Factores de Riesgo , Adulto Joven
3.
J Biomed Semantics ; 11(1): 9, 2020 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-32819419

RESUMEN

BACKGROUND: Knowledge graphs can represent the contents of biomedical literature and databases as subject-predicate-object triples, thereby enabling comprehensive analyses that identify e.g. relationships between diseases. Some diseases are often diagnosed in patients in specific temporal sequences, which are referred to as disease trajectories. Here, we determine whether a sequence of two diseases forms a trajectory by leveraging the predicate information from paths between (disease) proteins in a knowledge graph. Furthermore, we determine the added value of directional information of predicates for this task. To do so, we create four feature sets, based on two methods for representing indirect paths, and both with and without directional information of predicates (i.e., which protein is considered subject and which object). The added value of the directional information of predicates is quantified by comparing the classification performance of the feature sets that include or exclude it. RESULTS: Our method achieved a maximum area under the ROC curve of 89.8% and 74.5% when evaluated with two different reference sets. Use of directional information of predicates significantly improved performance by 6.5 and 2.0 percentage points respectively. CONCLUSIONS: Our work demonstrates that predicates between proteins can be used to identify disease trajectories. Using the directional information of predicates significantly improved performance over not using this information.


Asunto(s)
Ontologías Biológicas , Gráficos por Computador , Enfermedad , Humanos , Almacenamiento y Recuperación de la Información , Curva ROC , Semántica
4.
Sci Rep ; 9(1): 6281, 2019 04 18.
Artículo en Inglés | MEDLINE | ID: mdl-31000794

RESUMEN

Compounds that are candidates for drug repurposing can be ranked by leveraging knowledge available in the biomedical literature and databases. This knowledge, spread across a variety of sources, can be integrated within a knowledge graph, which thereby comprehensively describes known relationships between biomedical concepts, such as drugs, diseases, genes, etc. Our work uses the semantic information between drug and disease concepts as features, which are extracted from an existing knowledge graph that integrates 200 different biological knowledge sources. RepoDB, a standard drug repurposing database which describes drug-disease combinations that were approved or that failed in clinical trials, is used to train a random forest classifier. The 10-times repeated 10-fold cross-validation performance of the classifier achieves a mean area under the receiver operating characteristic curve (AUC) of 92.2%. We apply the classifier to prioritize 21 preclinical drug repurposing candidates that have been suggested for Autosomal Dominant Polycystic Kidney Disease (ADPKD). Mozavaptan, a vasopressin V2 receptor antagonist is predicted to be the drug most likely to be approved after a clinical trial, and belongs to the same drug class as tolvaptan, the only treatment for ADPKD that is currently approved. We conclude that semantic properties of concepts in a knowledge graph can be exploited to prioritize drug repurposing candidates for testing in clinical trials.


Asunto(s)
Reposicionamiento de Medicamentos/métodos , Difusión de la Información/métodos , Riñón Poliquístico Autosómico Dominante/tratamiento farmacológico , Semántica , Benzazepinas/uso terapéutico , Ensayos Clínicos como Asunto , Bases de Datos Factuales , Humanos , Conocimiento , Reconocimiento de Normas Patrones Automatizadas
5.
J Biomed Semantics ; 9(1): 23, 2018 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-30189889

RESUMEN

BACKGROUND: Biomedical knowledge graphs have become important tools to computationally analyse the comprehensive body of biomedical knowledge. They represent knowledge as subject-predicate-object triples, in which the predicate indicates the relationship between subject and object. A triple can also contain provenance information, which consists of references to the sources of the triple (e.g. scientific publications or database entries). Knowledge graphs have been used to classify drug-disease pairs for drug efficacy screening, but existing computational methods have often ignored predicate and provenance information. Using this information, we aimed to develop a supervised machine learning classifier and determine the added value of predicate and provenance information for drug efficacy screening. To ensure the biological plausibility of our method we performed our research on the protein level, where drugs are represented by their drug target proteins, and diseases by their disease proteins. RESULTS: Using random forests with repeated 10-fold cross-validation, our method achieved an area under the ROC curve (AUC) of 78.1% and 74.3% for two reference sets. We benchmarked against a state-of-the-art knowledge-graph technique that does not use predicate and provenance information, obtaining AUCs of 65.6% and 64.6%, respectively. Classifiers that only used predicate information performed superior to classifiers that only used provenance information, but using both performed best. CONCLUSION: We conclude that both predicate and provenance information provide added value for drug efficacy screening.


Asunto(s)
Ontologías Biológicas , Gráficos por Computador , Evaluación Preclínica de Medicamentos , Reacciones Falso Negativas , Curva ROC
6.
J Biomed Inform ; 71: 178-189, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-28579531

RESUMEN

PROBLEM: Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers. METHOD: We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance. RESULTS: Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974. DISCUSSION: Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.


Asunto(s)
Biomarcadores , Minería de Datos , Bases de Datos Factuales , Trastornos Migrañosos/diagnóstico , Semántica , Automatización , Humanos , Publicaciones
7.
BMC Bioinformatics ; 16: 25, 2015 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-25627479

RESUMEN

BACKGROUND: Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins. RESULTS: We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%. CONCLUSIONS: Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified.


Asunto(s)
Electroforesis en Gel Bidimensional/métodos , Infecciones por VIH/metabolismo , Proteínas/análisis , Proteómica/métodos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Linfocitos T/metabolismo , Electroforesis Bidimensional Diferencial en Gel/métodos , Células Cultivadas , Infecciones por VIH/virología , VIH-1/metabolismo , Humanos , Fragmentos de Péptidos/análisis , Linfocitos T/virología
8.
Front Microbiol ; 3: 240, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22783244

RESUMEN

This mini-review summarizes techniques applied in, and results obtained with, proteomic studies of human immunodeficiency virus type 1 (HIV-1)-T cell interaction. Our group previously reported on the use of two-dimensional differential gel electrophoresis (2D-DIGE) coupled to matrix assisted laser-desorption time of flight peptide mass fingerprint analysis, to study T cell responses upon HIV-1 infection. Only one in three differentially expressed proteins could be identified using this experimental setup. Here we report on our latest efforts to test models generated by this data set and extend its analysis by using novel bioinformatic algorithms. The 2D-DIGE results are compared with other studies including a pilot study using one-dimensional peptide separation coupled to MS(E), a novel mass spectrometric approach. It can be concluded that although the latter method detects fewer proteins, it is much faster and less labor intensive. Last but not least, recent developments and remaining challenges in the field of proteomic studies of HIV-1 infection and proteomics in general are discussed.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA