Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 133
Filtrar
1.
J Alzheimers Dis ; 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38759012

RESUMO

Background: Despite numerous past endeavors for the semantic harmonization of Alzheimer's disease (AD) cohort studies, an automatic tool has yet to be developed. Objective: As cohort studies form the basis of data-driven analysis, harmonizing them is crucial for cross-cohort analysis. We aimed to accelerate this task by constructing an automatic harmonization tool. Methods: We created a common data model (CDM) through cross-mapping data from 20 cohorts, three CDMs, and ontology terms, which was then used to fine-tune a BioBERT model. Finally, we evaluated the model using three previously unseen cohorts and compared its performance to a string-matching baseline model. Results: Here, we present our AD-Mapper interface for automatic harmonization of AD cohort studies, which outperformed a string-matching baseline on previously unseen cohort studies. We showcase our CDM comprising 1218 unique variables. Conclusion: AD-Mapper leverages semantic similarities in naming conventions across cohorts to improve mapping performance.

2.
Sci Data ; 11(1): 507, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38755219

RESUMO

In the pharmaceutical industry, the patent protection of drugs and medicines is accorded importance because of the high costs involved in the development of novel drugs. Over the years, researchers have analyzed patent documents to identify freedom-to-operate spaces for novel drug candidates. To assist this, several well-established public patent document data repositories have enabled automated methodologies for extracting information on therapeutic agents. In this study, we delve into one such publicly available patent database, SureChEMBL, which catalogues patent documents related to life sciences. Our exploration begins by identifying patent compounds across public chemical data resources, followed by pinpointing sections in patent documents where the chemical annotations were found. Next, we exhibit the potential of compounds to serve as drug candidates by evaluating their conformity to drug-likeness criteria. Lastly, we examine the drug development stage reported for these compounds to understand their clinical success. In summary, our investigation aims at providing a comprehensive overview of the patent compounds catalogued in SureChEMBL, assessing their relevance to pharmaceutical drug discovery.


Assuntos
Descoberta de Drogas , Patentes como Assunto , Bases de Dados Factuais , Indústria Farmacêutica
5.
Database (Oxford) ; 20232023 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-38041858

RESUMO

As one of the leading causes for dementia in the population, it is imperative that we discern exactly why Alzheimer's disease (AD) has a strong molecular association with beta-amyloid and tau. Although a clear understanding about etiology and pathogenesis of AD remains unsolved, scientists worldwide have dedicated significant efforts to discovering the molecular interactions linked to the pathological characteristics and potential treatments. Knowledge representations, such as domain ontologies encompassing our current understanding about AD, could greatly assist and contribute to disease research. This paper describes the construction and application of the integrated Alzheimer's Disease Ontology (ADO), combining selected concepts from the former version of the ADO and the Alzheimer's Disease Mapping Ontology (ADMO). In addition to the existing entities available from these knowledge models, essential knowledge about AD from public sources, such as newly discovered risk factor genes and novel treatments, was also integrated. The ADO can also be leveraged in text mining scenarios given that it is conceptually enriched with domain-specific knowledge as well as their relations. The integrated ADO consists of 39 855 total axioms. The ontology covers many aspects of the AD domain, including risk factor genes, clinical features, treatments and experimental models. The ontology complies with the Open Biological and Biomedical Ontology principles and was accepted by the foundry. In this paper, we illustrate the role of the presented ontology in extracting textual information from the SCAIView database and key measures in an ADO-based corpus. Database URL:  https://academic.oup.com/database.


Assuntos
Doença de Alzheimer , Ontologias Biológicas , Humanos , Doença de Alzheimer/genética , Mineração de Dados
6.
Front Neurol ; 14: 1187095, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37545729

RESUMO

Efficient data sharing is hampered by an array of organizational, ethical, behavioral, and technical challenges, slowing research progress and reducing the utility of data generated by clinical research studies on neurodegenerative diseases. There is a particular need to address differences between public and private sector environments for research and data sharing, which have varying standards, expectations, motivations, and interests. The Neuronet data sharing Working Group was set up to understand the existing barriers to data sharing in public-private partnership projects, and to provide guidance to overcome these barriers, by convening data sharing experts from diverse projects in the IMI neurodegeneration portfolio. In this policy and practice review, we outline the challenges and learnings of the WG, providing the neurodegeneration community with examples of good practices and recommendations on how to overcome obstacles to data sharing. These obstacles span organizational issues linked to the unique structure of cross-sectoral, collaborative research initiatives, to technical issues that affect the storage, structure and annotations of individual datasets. We also identify sociotechnical hurdles, such as academic recognition and reward systems that disincentivise data sharing, and legal challenges linked to heightened perceptions of data privacy risk, compounded by a lack of clear guidance on GDPR compliance mechanisms for public-private research. Focusing on real-world, neuroimaging and digital biomarker data, we highlight particular challenges and learnings for data sharing, such as data management planning, development of ethical codes of conduct, and harmonization of protocols and curation processes. Cross-cutting solutions and enablers include the principles of transparency, standardization and co-design - from open, accessible metadata catalogs that enhance findability of data, to measures that increase visibility and trust in data reuse.

7.
Front Neurol ; 14: 1174079, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37521302

RESUMO

The Innovative Medicines Initiative (IMI), was a European public-private partnership (PPP) undertaking intended to improve the drug development process, facilitate biomarker development, accelerate clinical trial timelines, improve success rates, and generally increase the competitiveness of European pharmaceutical sector research. Through the IMI, pharmaceutical research interests and the research agenda of the EU are supported by academic partnership and financed by both the pharmaceutical companies and public funds. Since its inception, the IMI has funded dozens of research partnerships focused on solving the core problems that have consistently obstructed the translation of research into clinical success. In this post-mortem review paper, we focus on six research initiatives that tackled foundational challenges of this nature: Aetionomy, EMIF, EPAD, EQIPD, eTRIKS, and PRISM. Several of these initiatives focused on neurodegenerative diseases; we therefore discuss the state of neurodegenerative research both at the start of the IMI and now, and the contributions that IMI partnerships made to progress in the field. Many of the initiatives we review had goals including, but not limited to, the establishment of translational, data-centric initiatives and the implementation of trans-diagnostic approaches that move beyond the candidate disease approach to assess symptom etiology without bias, challenging the construct of disease diagnosis. We discuss the successes of these initiatives, the challenges faced, and the merits and shortcomings of the IMI approach with participating senior scientists for each. Here, we distill their perspectives on the lessons learned, with an aim to positively impact funding policy and approaches in the future.

8.
Bioinform Adv ; 3(1): vbad033, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37016683

RESUMO

Motivation: Epilepsy is a multifaceted complex disorder that requires a precise understanding of the classification, diagnosis, treatment and disease mechanism governing it. Although scattered resources are available on epilepsy, comprehensive and structured knowledge is missing. In contemplation to promote multidisciplinary knowledge exchange and facilitate advancement in clinical management, especially in pre-clinical research, a disease-specific ontology is necessary. The presented ontology is designed to enable better interconnection between scientific community members in the epilepsy domain. Results: The Epilepsy Ontology (EPIO) is an assembly of structured knowledge on various aspects of epilepsy, developed according to Basic Formal Ontology (BFO) and Open Biological and Biomedical Ontology (OBO) Foundry principles. Concepts and definitions are collected from the latest International League against Epilepsy (ILAE) classification, domain-specific ontologies and scientific literature. This ontology consists of 1879 classes and 28 151 axioms (2171 declaration axioms, 2219 logical axioms) from several aspects of epilepsy. This ontology is intended to be used for data management and text mining purposes. Availability and implementation: The current release of the ontology is publicly available under a Creative Commons 4.0 License and shared via http://purl.obolibrary.org/obo/epso.owl and is a community-based effort assembling various facets of the complex disease. The ontology is also deposited in BioPortal at https://bioportal.bioontology.org/ontologies/EPIO. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

9.
Nat Commun ; 14(1): 761, 2023 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-36765056

RESUMO

The anticipation of progression of Alzheimer's disease (AD) is crucial for evaluations of secondary prevention measures thought to modify the disease trajectory. However, it is difficult to forecast the natural progression of AD, notably because several functions decline at different ages and different rates in different patients. We evaluate here AD Course Map, a statistical model predicting the progression of neuropsychological assessments and imaging biomarkers for a patient from current medical and radiological data at early disease stages. We tested the method on more than 96,000 cases, with a pool of more than 4,600 patients from four continents. We measured the accuracy of the method for selecting participants displaying a progression of clinical endpoints during a hypothetical trial. We show that enriching the population with the predicted progressors decreases the required sample size by 38% to 50%, depending on trial duration, outcome, and targeted disease stage, from asymptomatic individuals at risk of AD to subjects with early and mild AD. We show that the method introduces no biases regarding sex or geographic locations and is robust to missing data. It performs best at the earliest stages of disease and is therefore highly suitable for use in prevention trials.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/psicologia , Progressão da Doença , Neuroimagem/métodos , Projetos de Pesquisa , Biomarcadores
10.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36322820

RESUMO

MOTIVATION: Drug discovery practitioners in industry and academia use semantic tools to extract information from online scientific literature to generate new insights into targets, therapeutics and diseases. However, due to complexities in access and analysis, patent-based literature is often overlooked as a source of information. As drug discovery is a highly competitive field, naturally, tools that tap into patent literature can provide any actor in the field an advantage in terms of better informed decision-making. Hence, we aim to facilitate access to patent literature through the creation of an automatic tool for extracting information from patents described in existing public resources. RESULTS: Here, we present PEMT, a novel patent enrichment tool, that takes advantage of public databases like ChEMBL and SureChEMBL to extract relevant patent information linked to chemical structures and/or gene names described through FAIR principles and metadata annotations. PEMT aims at supporting drug discovery and research by establishing a patent landscape around genes of interest. The pharmaceutical focus of the tool is mainly due to the subselection of International Patent Classification codes, but in principle, it can be used for other patent fields, provided that a link between a concept and chemical structure is investigated. Finally, we demonstrate a use-case in rare diseases by generating a gene-patent list based on the epidemiological prevalence of these diseases and exploring their underlying patent landscapes. AVAILABILITY AND IMPLEMENTATION: PEMT is an open-source Python tool and its source code and PyPi package are available at https://github.com/Fraunhofer-ITMP/PEMT and https://pypi.org/project/PEMT/, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metadados , Software , Bases de Dados Factuais
11.
Artigo em Inglês | MEDLINE | ID: mdl-36462601

RESUMO

Schizophrenia and bipolar disorder are characterized by highly similar neuropsychological signatures, implying shared neurobiological mechanisms between these two disorders. These disorders also have comorbidities, such as type 2 diabetes mellitus (T2DM). To date, an understanding of the mechanisms that mediate the link between these two disorders remains incomplete. In this work, we identify and investigate shared patterns across multiple schizophrenia, bipolar disorder and T2DM gene expression datasets through multiple strategies. Firstly, we investigate dysregulation patterns at the gene-level and compare our findings against disease-specific knowledge graphs (KGs). Secondly, we analyze the concordance of co-expression patterns across datasets to identify disease-specific as well as common pathways. Thirdly, we examine enriched pathways across datasets and disorders to identify common biological mechanisms between them. Lastly, we investigate the correspondence of shared genetic variants between these two disorders and T2DM as well as the disease-specific KGs. In conclusion, our work reveals several shared candidate genes and pathways, particularly those related to the immune system, such as TNF signaling pathway, IL-17 signaling pathway and NF-kappa B signaling pathway and nervous system, such as dopaminergic synapse and GABAergic synapse, which we propose mediate the link between schizophrenia and bipolar disorder and its shared comorbidity, T2DM.


Assuntos
Transtorno Bipolar , Diabetes Mellitus Tipo 2 , Esquizofrenia , Humanos , Transtorno Bipolar/psicologia , Esquizofrenia/epidemiologia , Esquizofrenia/genética , Comorbidade , Transdução de Sinais
12.
JAMIA Open ; 5(4): ooac087, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36380848

RESUMO

Objective: Healthcare data such as clinical notes are primarily recorded in an unstructured manner. If adequately translated into structured data, they can be utilized for health economics and set the groundwork for better individualized patient care. To structure clinical notes, deep-learning methods, particularly transformer-based models like Bidirectional Encoder Representations from Transformers (BERT), have recently received much attention. Currently, biomedical applications are primarily focused on the English language. While general-purpose German-language models such as GermanBERT and GottBERT have been published, adaptations for biomedical data are unavailable. This study evaluated the suitability of existing and novel transformer-based models for the German biomedical and clinical domain. Materials and Methods: We used 8 transformer-based models and pre-trained 3 new models on a newly generated biomedical corpus, and systematically compared them with each other. We annotated a new dataset of clinical notes and used it with 4 other corpora (BRONCO150, CLEF eHealth 2019 Task 1, GGPONC, and JSynCC) to perform named entity recognition (NER) and document classification tasks. Results: General-purpose language models can be used effectively for biomedical and clinical natural language processing (NLP) tasks, still, our newly trained BioGottBERT model outperformed GottBERT on both clinical NER tasks. However, training new biomedical models from scratch proved ineffective. Discussion: The domain-adaptation strategy's potential is currently limited due to a lack of pre-training data. Since general-purpose language models are only marginally inferior to domain-specific models, both options are suitable for developing German-language biomedical applications. Conclusion: General-purpose language models perform remarkably well on biomedical and clinical NLP tasks. If larger corpora become available in the future, domain-adapting these models may improve performances.

13.
J Clin Med ; 11(19)2022 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-36233841

RESUMO

Excess labile heme, occurring under hemolytic conditions, displays a versatile modulator in the blood coagulation system. As such, heme provokes prothrombotic states, either by binding to plasma proteins or through interaction with participating cell types. However, despite several independent reports on these effects, apparently contradictory observations and significant knowledge gaps characterize this relationship, which hampers a complete understanding of heme-driven coagulopathies and the development of suitable and specific treatment options. Thus, the computational exploration of the complex network of heme-triggered effects in the blood coagulation system is presented herein. Combining hemostasis- and heme-specific terminology, the knowledge available thus far was curated and modeled in a mechanistic interactome. Further, these data were incorporated in the earlier established heme knowledge graph, "HemeKG", to better comprehend the knowledge surrounding heme biology. Finally, a pathway enrichment analysis of these data provided deep insights into so far unknown links and novel experimental targets within the blood coagulation cascade and platelet activation pathways for further investigation of the prothrombotic nature of heme. In summary, this study allows, for the first time, a detailed network analysis of the effects of heme in the blood coagulation system.

14.
Bioinformatics ; 38(24): 5466-5468, 2022 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-36303318

RESUMO

MOTIVATION: A global medical crisis like the coronavirus disease 2019 (COVID-19) pandemic requires interdisciplinary and highly collaborative research from all over the world. One of the key challenges for collaborative research is a lack of interoperability among various heterogeneous data sources. Interoperability, standardization and mapping of datasets are necessary for data analysis and applications in advanced algorithms such as developing personalized risk prediction modeling. RESULTS: To ensure the interoperability and compatibility among COVID-19 datasets, we present here a common data model (CDM) which has been built from 11 different COVID-19 datasets from various geographical locations. The current version of the CDM holds 4639 data variables related to COVID-19 such as basic patient information (age, biological sex and diagnosis) as well as disease-specific data variables, for example, Anosmia and Dyspnea. Each of the data variables in the data model is associated with specific data types, variable mappings, value ranges, data units and data encodings that could be used for standardizing any dataset. Moreover, the compatibility with established data standards like OMOP and FHIR makes the CDM a well-designed CDM for COVID-19 data interoperability. AVAILABILITY AND IMPLEMENTATION: The CDM is available in a public repo here: https://github.com/Fraunhofer-SCAI-Applied-Semantics/COVID-19-Global-Model. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Humanos , Algoritmos , Pandemias
16.
BMC Bioinformatics ; 23(1): 231, 2022 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-35705903

RESUMO

Distinct gene expression patterns within cells are foundational for the diversity of functions and unique characteristics observed in specific contexts, such as human tissues and cell types. Though some biological processes commonly occur across contexts, by harnessing the vast amounts of available gene expression data, we can decipher the processes that are unique to a specific context. Therefore, with the goal of developing a portrait of context-specific patterns to better elucidate how they govern distinct biological processes, this work presents a large-scale exploration of transcriptomic signatures across three different contexts (i.e., tissues, cell types, and cell lines) by leveraging over 600 gene expression datasets categorized into 98 subcontexts. The strongest pairwise correlations between genes from these subcontexts are used for the construction of co-expression networks. Using a network-based approach, we then pinpoint patterns that are unique and common across these subcontexts. First, we focused on patterns at the level of individual nodes and evaluated their functional roles using a human protein-protein interactome as a referential network. Next, within each context, we systematically overlaid the co-expression networks to identify specific and shared correlations as well as relations already described in scientific literature. Additionally, in a pathway-level analysis, we overlaid node and edge sets from co-expression networks against pathway knowledge to identify biological processes that are related to specific subcontexts or groups of them. Finally, we have released our data and scripts at https://zenodo.org/record/5831786 and https://github.com/ContNeXt/ , respectively and developed ContNeXt ( https://contnext.scai.fraunhofer.de/ ), a web application to explore the networks generated in this work.


Assuntos
Redes Reguladoras de Genes , Transcriptoma , Perfilação da Expressão Gênica , Humanos , Software
17.
Bioinformatics ; 38(15): 3850-3852, 2022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35652780

RESUMO

MOTIVATION: The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more comprehensive picture of disease etiology. However, achieving this requires a global integration of data across studies, which proves to be challenging given the lack of interoperability of cohort datasets. RESULTS: Here, we present the Data Steward Tool (DST), an application that allows for semi-automatic semantic integration of clinical data into ontologies and global data models and data standards. We demonstrate the applicability of the tool in the field of dementia research by establishing a Clinical Data Model (CDM) in this domain. The CDM currently consists of 277 common variables covering demographics (e.g. age and gender), diagnostics, neuropsychological tests and biomarker measurements. The DST combined with this disease-specific data model shows how interoperability between multiple, heterogeneous dementia datasets can be achieved. AVAILABILITY AND IMPLEMENTATION: The DST source code and Docker images are respectively available at https://github.com/SCAI-BIO/data-steward and https://hub.docker.com/r/phwegner/data-steward. Furthermore, the DST is hosted at https://data-steward.bio.scai.fraunhofer.de/data-steward. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Demência , Semântica , Humanos , Software , Demência/diagnóstico
18.
Patterns (N Y) ; 3(3): 100433, 2022 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-35510183

RESUMO

The high number of failed pre-clinical and clinical studies for compounds targeting Alzheimer disease (AD) has demonstrated that there is a need to reassess existing strategies. Here, we pursue a holistic, mechanism-centric drug repurposing approach combining computational analytics and experimental screening data. Based on this integrative workflow, we identified 77 druggable modifiers of tau phosphorylation (pTau). One of the upstream modulators of pTau, HDAC6, was screened with 5,632 drugs in a tau-specific assay, resulting in the identification of 20 repurposing candidates. Four compounds and their known targets were found to have a link to AD-specific genes. Our approach can be applied to a variety of AD-associated pathophysiological mechanisms to identify more repurposing candidates.

19.
Alzheimers Res Ther ; 14(1): 69, 2022 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-35598021

RESUMO

BACKGROUND: Currently, Alzheimer's disease (AD) cohort datasets are difficult to find and lack across-cohort interoperability, and the actual content of publicly available datasets often only becomes clear to third-party researchers once data access has been granted. These aspects severely hinder the advancement of AD research through emerging data-driven approaches such as machine learning and artificial intelligence and bias current data-driven findings towards the few commonly used, well-explored AD cohorts. To achieve robust and generalizable results, validation across multiple datasets is crucial. METHODS: We accessed and systematically investigated the content of 20 major AD cohort datasets at the data level. Both, a medical professional and a data specialist, manually curated and semantically harmonized the acquired datasets. Finally, we developed a platform that displays vital information about the available datasets. RESULTS: Here, we present ADataViewer, an interactive platform that facilitates the exploration of 20 cohort datasets with respect to longitudinal follow-up, demographics, ethnoracial diversity, measured modalities, and statistical properties of individual variables. It allows researchers to quickly identify AD cohorts that meet user-specified requirements for discovery and validation studies regarding available variables, sample sizes, and longitudinal follow-up. Additionally, we publish the underlying variable mapping catalog that harmonizes 1196 unique variables across the 20 cohorts and paves the way for interoperable AD datasets. CONCLUSIONS: In conclusion, ADataViewer facilitates fast, robust data-driven research by transparently displaying cohort dataset content and supporting researchers in selecting datasets that are suited for their envisioned study. The platform is available at https://adata.scai.fraunhofer.de/ .


Assuntos
Doença de Alzheimer , Inteligência Artificial , Estudos de Coortes , Humanos , Tamanho da Amostra
20.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35453140

RESUMO

Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.


Assuntos
Bases de Dados Factuais , Análise Fatorial , Estudos Longitudinais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...