Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
Heliyon ; 10(7): e28560, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38590890

ABSTRACT

Single Sign-On (SSO) methods are the primary solution to authenticate users across multiple web systems. These mechanisms streamline the authentication procedure by avoiding duplicate developments of authentication modules for each application. Besides, these mechanisms also provide convenience to the end-user by keeping the user authenticated when switching between different contexts. To ensure this cross-application authentication, SSO relies on an Identity Provider (IdP), which is commonly set up and managed by each institution that needs to enforce SSO internally. However, the solution is not so straightforward when several institutions need to cooperate in a unique ecosystem. This could be tackled by centralizing the authentication mechanisms in one of the involved entities, a solution raising responsibilities that may be difficult for peers to accept. Moreover, this solution is not appropriate for dynamic groups, where peers may join or leave frequently. In this paper, we propose an architecture that uses a trusted third-party service to authenticate multiple entities, ensuring the isolation of the user's attributes between this service and the institutional SSO systems. This architecture was validated in the EHDEN Portal, which includes web tools and services of this European health project, to establish a Federated Authentication schema.

2.
Comput Biol Med ; 159: 106867, 2023 06.
Article in English | MEDLINE | ID: mdl-37060770

ABSTRACT

A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.


Subject(s)
Autism Spectrum Disorder , Gene Expression Profiling , Humans , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Transcriptome , Algorithms
3.
Database (Oxford) ; 20232023 03 07.
Article in English | MEDLINE | ID: mdl-36882099

ABSTRACT

The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.


Subject(s)
COVID-19 , United States , Humans , National Library of Medicine (U.S.) , Data Mining , Databases, Factual , MEDLINE
4.
J Biomed Inform ; 137: 104272, 2023 01.
Article in English | MEDLINE | ID: mdl-36563828

ABSTRACT

BACKGROUND: Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in multi-centre studies, finding suitable datasets to support the study is challenging, time-consuming, and sometimes impossible without a deep understanding of each dataset. METHODS: We propose a strategy for retrieving biomedical datasets of interest that were semantically annotated, using an interface built by applying a methodology for transforming natural language questions into formal language queries. The advantages of creating biomedical semantic data are enhanced by using natural language interfaces to issue complex queries without manipulating a logical query language. RESULTS: Our methodology was validated using Alzheimer's disease datasets published in a European platform for sharing and reusing biomedical data. We converted data to semantic information format using biomedical ontologies in everyday use in the biomedical community and published it as a FAIR endpoint. We have considered natural language questions of three types: single-concept questions, questions with exclusion criteria, and multi-concept questions. Finally, we analysed the performance of the question-answering module we used and its limitations. The source code is publicly available at https://bioinformatics-ua.github.io/BioKBQA/. CONCLUSION: We propose a strategy for using information extracted from biomedical data and transformed into a semantic format using open biomedical ontologies. Our method uses natural language to formulate questions to be answered by this semantic data without the direct use of formal query languages.


Subject(s)
Natural Language Processing , Semantics , Software , Language , Databases, Factual
5.
Healthcare (Basel) ; 10(11)2022 Nov 15.
Article in English | MEDLINE | ID: mdl-36421611

ABSTRACT

Biomedical databases often have restricted access policies and governance rules. Thus, an adequate description of their content is essential for researchers who wish to use them for medical research. A strategy for publishing information without disclosing patient-level data is through database fingerprinting and aggregate characterisations. However, this information is still presented in a format that makes it challenging to search, analyse, and decide on the best databases for a domain of study. Several strategies allow one to visualise and compare the characteristics of multiple biomedical databases. Our study focused on a European platform for sharing and disseminating biomedical data. We use semantic data visualisation techniques to assist in comparing descriptive metadata from several databases. The great advantage lies in streamlining the database selection process, ensuring that sensitive details are not shared. To address this goal, we have considered two levels of data visualisation, one characterising a single database and the other involving multiple databases in network-level visualisations. This study revealed the impact of the proposed visualisations and some open challenges in representing semantically annotated biomedical datasets. Identifying future directions in this scope was one of the outcomes of this work.

6.
Stud Health Technol Inform ; 298: 163-164, 2022 Aug 31.
Article in English | MEDLINE | ID: mdl-36073478

ABSTRACT

Anonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid inadequate disclosure is to ensure that all data that can be associated directly with an individual is removed from the original dataset. However, some studies have shown that simple anonymisation procedures can sometimes be reverted using specific patients' characteristics. In this work, we propose a secure architecture to share information from distributed databases without compromising the subjects' privacy. The anonymiser system was validated using the OMOP CDM data schema, which is widely adopted in observational research studies.


Subject(s)
Personally Identifiable Information , Privacy , Databases, Factual , Humans
7.
Stud Health Technol Inform ; 298: 167-168, 2022 Aug 31.
Article in English | MEDLINE | ID: mdl-36073480

ABSTRACT

In the last decades, the field of metagenomics aided by NGS technologies has grown exponentially and is now a cornerstone tool in medicine. However, even with the current technologies, obtaining a conclusive identification of an organism can be challenging due to using reference-based methods. Consequently, when releasing a new repository of genomic data that contains de-novo sequences, it is problematic to characterize its content. In this paper, we propose a novel method for organism identification and the creation and characterization of genomic databases. For identification, we propose a three-step pipeline for reference-free reconstruction, reference-based classification and features-based classification. On the other hand, for content exposition and extraction, the sequences and their identification are aggregated into a web database catalogue.


Subject(s)
Genome , Genomics
8.
Semin Arthritis Rheum ; 56: 152050, 2022 10.
Article in English | MEDLINE | ID: mdl-35728447

ABSTRACT

BACKGROUND: Identification of rheumatoid arthritis (RA) patients at high risk of adverse health outcomes remains a major challenge. We aimed to develop and validate prediction models for a variety of adverse health outcomes in RA patients initiating first-line methotrexate (MTX) monotherapy. METHODS: Data from 15 claims and electronic health record databases across 9 countries were used. Models were developed and internally validated on Optum® De-identified Clinformatics® Data Mart Database using L1-regularized logistic regression to estimate the risk of adverse health outcomes within 3 months (leukopenia, pancytopenia, infection), 2 years (myocardial infarction (MI) and stroke), and 5 years (cancers [colorectal, breast, uterine] after treatment initiation. Candidate predictors included demographic variables and past medical history. Models were externally validated on all other databases. Performance was assessed using the area under the receiver operator characteristic curve (AUC) and calibration plots. FINDINGS: Models were developed and internally validated on 21,547 RA patients and externally validated on 131,928 RA patients. Models for serious infection (AUC: internal 0.74, external ranging from 0.62 to 0.83), MI (AUC: internal 0.76, external ranging from 0.56 to 0.82), and stroke (AUC: internal 0.77, external ranging from 0.63 to 0.95), showed good discrimination and adequate calibration. Models for the other outcomes showed modest internal discrimination (AUC < 0.65) and were not externally validated. INTERPRETATION: We developed and validated prediction models for a variety of adverse health outcomes in RA patients initiating first-line MTX monotherapy. Final models for serious infection, MI, and stroke demonstrated good performance across multiple databases and can be studied for clinical use. FUNDING: This activity under the European Health Data & Evidence Network (EHDEN) has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 806968. This Joint Undertaking receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA.


Subject(s)
Antirheumatic Agents , Arthritis, Rheumatoid , Stroke , Antirheumatic Agents/therapeutic use , Arthritis, Rheumatoid/drug therapy , Cohort Studies , Humans , Methotrexate/therapeutic use , Outcome Assessment, Health Care , Stroke/etiology
9.
Stud Health Technol Inform ; 294: 585-586, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612156

ABSTRACT

Many clinical studies are greatly dependent on an efficient identification of relevant datasets. This selection can be performed in existing health data catalogues, by searching for available metadata. The search process can be optimised through questioning-answering interfaces, to help researchers explore the available data present. However, when searching the distinct catalogues the lack of metadata harmonisation imposes a few bottlenecks. This paper presents a methodology to allow semantic search over several biomedical database catalogues, by extracting the information using a shared domain knowledge. The resulting pipeline allows the converted data to be published as FAIR endpoints, and it provides an end-user interface that accepts natural language questions.


Subject(s)
Metadata , Semantics , Databases, Factual , Language , Natural Language Processing
10.
J Biomed Inform ; 120: 103849, 2021 08.
Article in English | MEDLINE | ID: mdl-34214696

ABSTRACT

BACKGROUND: The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data. METHODS: We propose a two-stage workflow for solving an existing gap in Extraction, Transformation and Loading (ETL) procedures regarding observational databases. The first stage of the workflow extracts prescriptions present in patient's clinical notes, while the second stage harmonises the extracted information into their standard definition and stores the resulting information in a common database schema used in observational studies. RESULTS: We validated this methodology using two distinct data sets, in which the goal was to extract and store drug related information in a new Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) database. We analysed the performance of the used annotator as well as its limitations. Finally, we described some practical examples of how users can explore these datasets once migrated to OMOP CDM databases. CONCLUSION: With this methodology, we were able to show a strategy for using the information extracted from the clinical notes in business intelligence tools, or for other applications such as data exploration through the use of SQL queries. Besides, the extracted information complements the data present in OMOP CDM databases which was not directly available in the EHR database.


Subject(s)
Electronic Health Records , Pharmaceutical Preparations , Databases, Factual , Delivery of Health Care , Humans , Workflow
11.
Stud Health Technol Inform ; 281: 327-331, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042759

ABSTRACT

The process of refining the research question in a medical study depends greatly on the current background of the investigated subject. The information found in prior works can directly impact several stages of the study, namely the cohort definition stage. Besides previous published methods, researchers could also leverage on other materials, such as the output of cohort selection tools, to enrich and to accelerate their own work. However, this kind of information is not always captured by search engines. In this paper, we present a methodology, based on a combination of content-based retrieval and text annotation techniques, to identify relevant scientific publications related to a research question and to the selected data sources.


Subject(s)
Information Storage and Retrieval , Search Engine , Cohort Studies
12.
Comput Biol Med ; 130: 104180, 2021 03.
Article in English | MEDLINE | ID: mdl-33360272

ABSTRACT

Privacy issues limit the analysis and cross-exploration of most distributed and private biobanks, often raised by the multiple dimensionality and sensitivity of the data associated with access restrictions and policies. These characteristics prevent collaboration between entities, constituting a barrier to emergent personalized and public health challenges, namely the discovery of new druggable targets, identification of disease-causing genetic variants, or the study of rare diseases. In this paper, we propose a semi-automatic methodology for the analysis of distributed and private biobanks. The strategies involved in the proposed methodology efficiently enable the creation and execution of unified genomic studies using distributed repositories, without compromising the information present in the datasets. We apply the methodology to a case study in the current Covid-19, ensuring the combination of the diagnostics from multiple entities while maintaining privacy through a completely identical procedure. Moreover, we show that the methodology follows a simple, intuitive, and practical scheme.


Subject(s)
Biological Specimen Banks , COVID-19 , Public Health , SARS-CoV-2 , Humans
13.
JMIR Med Inform ; 8(12): e22898, 2020 Dec 29.
Article in English | MEDLINE | ID: mdl-33372893

ABSTRACT

BACKGROUND: Electronic health records store large amounts of patient clinical data. Despite efforts to structure patient data, clinical notes containing rich patient information remain stored as free text, greatly limiting its exploitation. This includes family history, which is highly relevant for applications such as diagnosis and prognosis. OBJECTIVE: This study aims to develop automatic strategies for annotating family history information in clinical notes, focusing not only on the extraction of relevant entities such as family members and disease mentions but also on the extraction of relations between the identified entities. METHODS: This study extends a previous contribution for the 2019 track on family history extraction from national natural language processing clinical challenges by improving a previously developed rule-based engine, using deep learning (DL) approaches for the extraction of entities from clinical notes, and combining both approaches in a hybrid end-to-end system capable of successfully extracting family member and observation entities and the relations between those entities. Furthermore, this study analyzes the impact of factors such as the use of external resources and different types of embeddings in the performance of DL models. RESULTS: The approaches developed were evaluated in a first task regarding entity extraction and in a second task concerning relation extraction. The proposed DL approach improved observation extraction, obtaining F1 scores of 0.8688 and 0.7907 in the training and test sets, respectively. However, DL approaches have limitations in the extraction of family members. The rule-based engine was adjusted to have higher generalizing capability and achieved family member extraction F1 scores of 0.8823 and 0.8092 in the training and test sets, respectively. The resulting hybrid system obtained F1 scores of 0.8743 and 0.7979 in the training and test sets, respectively. For the second task, the original evaluator was adjusted to perform a more exact evaluation than the original one, and the hybrid system obtained F1 scores of 0.6480 and 0.5082 in the training and test sets, respectively. CONCLUSIONS: We evaluated the impact of several factors on the performance of DL models, and we present an end-to-end system for extracting family history information from clinical notes, which can help in the structuring and reuse of this type of information. The final hybrid solution is provided in a publicly available code repository.

14.
Stud Health Technol Inform ; 270: 93-97, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570353

ABSTRACT

Electronic health records contain valuable information on patients' clinical history in the form of free text. Manually analyzing millions of these documents is unfeasible and automatic natural language processing methods are essential for efficiently exploiting these data. Within this, normalization of clinical entities, where the aim is to link entity mentions to reference vocabularies, is of utmost importance to successfully extract knowledge from clinical narratives. In this paper we present sieve-based models combined with heuristics and word embeddings and present results of our participation in the 2019 n2c2 (National NLP Clinical Challenges) shared-task on clinical concept normalization.


Subject(s)
Electronic Health Records , Heuristics , Natural Language Processing , Humans , Narration
15.
Stud Health Technol Inform ; 270: 1183-1184, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570570

ABSTRACT

Aiming to better understand the genetic and environmental associations of Alzheimer's disease, many clinical trials and scientific studies have been conducted. However, these studies are often based on a small number of participants. To address this limitation, there is an increasing demand of multi-cohorts studies, which can provide higher statistical power and clinical evidence. However, this data integration implies dealing with the diversity of cohorts structures and the wide variability of concepts. Moreover, discovering similar cohorts to extend a running study is typically a demanding task. In this paper, we present a recommendation system to allow finding similar cohorts based on profile interests. The method uses collaborative filtering mixed with context-based retrieval techniques to find relevant cohorts on scientific literature about Alzheimer's diseases. The method was validated in a set of 62 cohorts.


Subject(s)
Algorithms , Alzheimer Disease , Humans
16.
BMC Med Inform Decis Mak ; 19(1): 121, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31266480

ABSTRACT

BACKGROUND: Many healthcare databases have been routinely collected over the past decades, to support clinical practice and administrative services. However, their secondary use for research is often hindered by restricted governance rules. Furthermore, health research studies typically involve many participants with complementary roles and responsibilities which require proper process management. RESULTS: From a wide set of requirements collected from European clinical studies, we developed TASKA, a task/workflow management system that helps to cope with the socio-technical issues arising when dealing with multidisciplinary and multi-setting clinical studies. The system is based on a two-layered architecture: 1) the backend engine, which follows a micro-kernel pattern, for extensibility, and RESTful web services, for decoupling from the web clients; 2) and the client, entirely developed in ReactJS, allowing the construction and management of studies through a graphical interface. TASKA is a GNU GPL open source project, accessible at https://github.com/bioinformatics-ua/taska . A demo version is also available at https://bioinformatics.ua.pt/taska . CONCLUSIONS: The system is currently used to support feasibility studies across several institutions and countries, in the context of the European Medical Information Framework (EMIF) project. The tool was shown to simplify the set-up of health studies, the management of participants and their roles, as well as the overall governance process.


Subject(s)
Health Services Research/organization & administration , Task Performance and Analysis , Databases, Factual , Humans , Software , User-Computer Interface , Workflow
17.
J Digit Imaging ; 32(5): 870-879, 2019 10.
Article in English | MEDLINE | ID: mdl-31201587

ABSTRACT

In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical diagnosis and treatment, many recent initiatives claim the use of medical imaging studies in clinical research scenarios but also to improve the business practices of medical institutions. However, the continuous production of medical imaging studies coupled with the tremendous amount of associated data, makes the real-time analysis of medical imaging repositories difficult using conventional tools and methodologies. Those archives contain not only the image data itself but also a wide range of valuable metadata describing all the stakeholders involved in the examination. The exploration of such technologies will increase the efficiency and quality of medical practice. In major centers, it represents a big data scenario where Business Intelligence (BI) and Data Analytics (DA) are rare and implemented through data warehousing approaches. This article proposes an Extract, Transform, Load (ETL) framework for medical imaging repositories able to feed, in real-time, a developed BI (Business Intelligence) application. The solution was designed to provide the necessary environment for leading research on top of live institutional repositories without requesting the creation of a data warehouse. It features an extensible dashboard with customizable charts and reports, with an intuitive web-based interface that empowers the usage of novel data mining techniques, namely, a variety of data cleansing tools, filters, and clustering functions. Therefore, the user is not required to master the programming skills commonly needed for data analysts and scientists, such as Python and R.


Subject(s)
Data Mining/methods , Data Warehousing/methods , Metadata/statistics & numerical data , Radiology Information Systems/organization & administration , Radiology Information Systems/statistics & numerical data , Data Mining/statistics & numerical data , Data Warehousing/statistics & numerical data , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...