Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 90
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Biogerontology ; 24(4): 555-562, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37004691

RESUMEN

Aging is a topic of paramount importance in an increasingly elderly society and has been the focus of extensive research. Protein homeostasis (proteostasis) decline is a hallmark in aging and several age-related diseases, but which specific proteins and mechanisms are involved in proteostasis (de)regulation during the aging process remain largely unknown. Here, we used different text-mining tools complemented with protein-protein interaction data to address this complex topic. Analysis of the integrated protein interaction networks identified novel proteins and pathways associated to proteostasis mechanisms and aging or age-related disorders, indicating that this approach is useful to identify previously unknown links and for retrieving information of potential novel biomarkers or therapeutic targets.


Asunto(s)
Deficiencias en la Proteostasis , Proteostasis , Humanos , Anciano , Proteostasis/fisiología , Pliegue de Proteína , Envejecimiento/fisiología , Minería de Datos
2.
J Biomed Inform ; 134: 104195, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36150641

RESUMEN

BACKGROUND: Electronic Health Records (EHRs) aggregate diverse information at the patient level, holding a trajectory representative of the evolution of the patient health status throughout time. Although this information provides context and can be leveraged by physicians to monitor patient health and make more accurate prognoses/diagnoses, patient records can contain information from very long time spans, which combined with the rapid generation rate of medical data makes clinical decision making more complex. Patient trajectory modelling can assist by exploring existing information in a scalable manner, and can contribute in augmenting health care quality by fostering preventive medicine practices (e.g. earlier disease diagnosis). METHODS: We propose a solution to model patient trajectories that combines different types of information (e.g. clinical text, standard codes) and considers the temporal aspect of clinical data. This solution leverages two different architectures: one supporting flexible sets of input features, to convert patient admissions into dense representations; and a second exploring extracted admission representations in a recurrent-based architecture, where patient trajectories are processed in sub-sequences using a sliding window mechanism. RESULTS: The developed solution was evaluated on two different clinical outcomes, unexpected patient readmission and disease progression, using the publicly available Medical Information Mart for Intensive Care (MIMIC)-III clinical database. The results obtained demonstrate the potential of the first architecture to model readmission and diagnoses prediction using single patient admissions. While information from clinical text did not show the discriminative power observed in other existing works, this may be explained by the need to fine-tune the clinicalBERT model. Finally, we demonstrate the potential of the sequence-based architecture using a sliding window mechanism to represent the input data, attaining comparable performances to other existing solutions. CONCLUSION: Herein, we explored DL-based techniques to model patient trajectories and propose two flexible architectures that explore patient admissions on an individual and sequence basis. The combination of clinical text with other types of information led to positive results, which can be further improved by including a fine-tuned version of clinicalBERT in the architectures. The proposed solution can be publicly accessed at https://github.com/bioinformatics-ua/PatientTM.


Asunto(s)
Readmisión del Paciente , Médicos , Progresión de la Enfermedad , Registros Electrónicos de Salud , Humanos , Pronóstico
3.
Br J Sports Med ; 56(10): 577-587, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-35022162

RESUMEN

OBJECTIVE: To review and frequently update the available evidence on injury risk factors and epidemiology of injury in trail running. DESIGN: Living systematic review. Updated searches will be done every 6 months for a minimum period of 5 years. DATA SOURCES: Eight electronic databases were searched from inception to 18 March 2021. ELIGIBILITY CRITERIA: Studies that investigated injury risk factors and/or reported the epidemiology of injury in trail running. RESULTS: Nineteen eligible studies were included, of which 10 studies investigated injury risk factors among 2 785 participants. Significant intrinsic factors associated with injury are: more running experience, level A runner and higher total propensity to sports accident questionnaire (PAD-22) score. Previous history of cramping and postrace biomarkers of muscle damage is associated with cramping. Younger age and low skin phototypes are associated with sunburn. Significant extrinsic factors associated with injury are neglecting warm-up, no specialised running plan, training on asphalt, double training sessions per day and physical labour occupations. A slower race finishing time is associated with cramping, while more than 3 hours of training per day, shade as the primary mode of sun protection and being single are associated with sunburn. An injury incidence range 0.7-61.2 injuries/1000 hours of running and prevalence range 1.3% to 90% were reported. The lower limb was the most reported region of injury, specifically involving blisters of the foot/toe. CONCLUSION: Limited studies investigated injury risk factors in trail running. Our review found eight intrinsic and nine extrinsic injury risk factors. This review highlighted areas for future research that may aid in designing injury risk management strategies for safer trail running participation.PROSPERO registration numberCRD42021240832.


Asunto(s)
Traumatismos en Atletas , Carrera , Quemadura Solar , Traumatismos en Atletas/epidemiología , Traumatismos en Atletas/etiología , Pie , Humanos , Incidencia , Extremidad Inferior/lesiones , Factores de Riesgo , Carrera/lesiones
4.
Medicina (Kaunas) ; 58(1)2022 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-35056421

RESUMEN

Background and Objectives: Peri-implantitis treatment is still undefined. Regenerative treatment is expensive and technically demanding due to the need to handle biomaterials, membranes and different methodologies of decontamination. Resective treatment and implantoplasty might be a viable solution. This case series presents a 24 month retrospective observational study of 10 peri-implantitis patients treated with implantoplasty. Materials and Methods: In the present case series, 10 peri-implantitis patients (20 implants) were treated with a resective approach and implantoplasty. Previous to implantoplasty, all patients underwent non-surgical treatment. This surgery consisted in a full-thickness flap and implant surface exposure. The exposed non-osseointegrated implant body was submitted to implantoplasty. The flap was apically repositioned and sutured. Patients were accompanied for 24 months. Results: The mean initial probing depth (PD) (PD = 5.37 ± 0.86 mm), bleeding on probing (BoP = 0.12 ± 0.06%) and suppuration (Sup = 0.01 ± 0.01%) decreased significantly at the 12 month evaluation (PD = 2.90 ± 0.39 mm; BoP = 0.01 ± 0.01% and Sup = 0.00 ± 0.00%). Between the 12 and 24 month evaluations, there were no significant clinical changes (PD = 2.85 ± 0.45 mm; BoP = 0.01 ± 0.01% and Sup = 0.00 ± 0.00%). Mucosal recession (MR) had a significant increase between the baseline and the first 12 months (0.69 ± 0.99 mm vs. 1.96 ± 1.33 mm), but there were no significant changes between the 12th and 24th month (1.94 ± 1.48 mm). The success rate was 100% without implant fracture or loss. Conclusions: Resective surgery and implantoplasty might be a valid option in some specific peri-implantitis cases. Properly designed clinical trials are needed to confirm this possibility.


Asunto(s)
Periimplantitis , Humanos , Periimplantitis/cirugía , Índice Periodontal , Investigación , Colgajos Quirúrgicos
5.
Medicina (Kaunas) ; 58(5)2022 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-35630045

RESUMEN

Background and objectives: Starting the multicomponent training sessions with aerobic-based exercises or resistance-based exercises may have different effects on functional fitness and body composition. Thus, the aim of this study was to assess the effects of the order of exercises in elderly women's physical fitness and body composition by multicomponent training. Materials and Methods: A sample of 91 elderly females, aged between 60 and 81, were randomly divided into three groups (A, B, C). Each group performed the following order of exercises: Group A consisted of warm-up followed by aerobic training, strength training, stretching and cool down; Group B consisted of warm-up followed by strength training, aerobic training, stretching and cool down; while the control group (C) did not perform any exercise. Functional fitness and body composition were assessed at 3 moments of the 32 weeks (baseline and after each 16-week) intervention. One-way ANOVA for comparison between groups, ANOVA for repeated measures and multiple linear regression were used for statistical analysis. Results: The results showed that the functional fitness and body composition varied over the 32 weeks of multicomponent training. However, group A seems to show higher improvements in more variables. Conclusion: In the current study, group A obtained better results in most of the evaluated parameters. Thus, to improve functional fitness, warm-up, followed by aerobic training, strength training and relaxation may be the most suitable training for elderly women.


Asunto(s)
Ejercicio Físico , Entrenamiento de Fuerza , Anciano , Anciano de 80 o más Años , Composición Corporal , Terapia por Ejercicio , Femenino , Humanos , Persona de Mediana Edad , Aptitud Física
6.
J Biomed Inform ; 120: 103849, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34214696

RESUMEN

BACKGROUND: The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data. METHODS: We propose a two-stage workflow for solving an existing gap in Extraction, Transformation and Loading (ETL) procedures regarding observational databases. The first stage of the workflow extracts prescriptions present in patient's clinical notes, while the second stage harmonises the extracted information into their standard definition and stores the resulting information in a common database schema used in observational studies. RESULTS: We validated this methodology using two distinct data sets, in which the goal was to extract and store drug related information in a new Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) database. We analysed the performance of the used annotator as well as its limitations. Finally, we described some practical examples of how users can explore these datasets once migrated to OMOP CDM databases. CONCLUSION: With this methodology, we were able to show a strategy for using the information extracted from the clinical notes in business intelligence tools, or for other applications such as data exploration through the use of SQL queries. Besides, the extracted information complements the data present in OMOP CDM databases which was not directly available in the EHR database.


Asunto(s)
Registros Electrónicos de Salud , Preparaciones Farmacéuticas , Bases de Datos Factuales , Atención a la Salud , Humanos , Flujo de Trabajo
7.
Sensors (Basel) ; 21(2)2021 Jan 09.
Artículo en Inglés | MEDLINE | ID: mdl-33435334

RESUMEN

A transmitarray antenna is evaluated to generate a multi-focusing spot area in the Fresnel region of the antenna in the Ka-band. The antenna is designed to focus the radiated field at a certain point using a central feeding configuration. The number of feeds is increased to create as many focusing spots as feeds. The feeds are placed along an arc defined in the principal planes of the transmitarray, radiating independent near-field spots and providing a solution with a wide-angle spot scanning without an antenna displacement and a high isolation between feeds. To validate this concept, a transmitarray based on dielectric-only cells is designed and simulated under full-wave conditions. Then, this design is manufactured using a 3D printing technique, and the prototype is measured in a planar acquisition range. Measurements are performed for different feed positions in order to validate the multi-focusing capability of the antenna. Measurements and simulations show a high agreement and validate the proposed design technique.

8.
Entropy (Basel) ; 22(1)2020 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-33285880

RESUMEN

Sources that generate symbolic sequences with algorithmic nature may differ in statistical complexity because they create structures that follow algorithmic schemes, rather than generating symbols from a probabilistic function assuming independence. In the case of Turing machines, this means that machines with the same algorithmic complexity can create tapes with different statistical complexity. In this paper, we use a compression-based approach to measure global and local statistical complexity of specific Turing machine tapes with the same number of states and alphabet. Both measures are estimated using the best-order Markov model. For the global measure, we use the Normalized Compression (NC), while, for the local measures, we define and use normal and dynamic complexity profiles to quantify and localize lower and higher regions of statistical complexity. We assessed the validity of our methodology on synthetic and real genomic data showing that it is tolerant to increasing rates of editions and block permutations. Regarding the analysis of the tapes, we localize patterns of higher statistical complexity in two regions, for a different number of machine states. We show that these patterns are generated by a decrease of the tape's amplitude, given the setting of small rule cycles. Additionally, we performed a comparison with a measure that uses both algorithmic and statistical approaches (BDM) for analysis of the tapes. Naturally, BDM is efficient given the algorithmic nature of the tapes. However, for a higher number of states, BDM is progressively approximated by our methodology. Finally, we provide a simple algorithm to increase the statistical complexity of a Turing machine tape while retaining the same algorithmic complexity. We supply a publicly available implementation of the algorithm in C++ language under the GPLv3 license. All results can be reproduced in full with scripts provided at the repository.

9.
Pediatr Blood Cancer ; 64(12)2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28643377

RESUMEN

INTRODUCTION: The main therapeutic intervention for sickle cell disease (SCD) is hydroxyurea (HU). The effect of HU is largely through dose-dependent induction of fetal hemoglobin (HbF). Poor HU adherence is common among adolescents. METHODS: Our 6-month, two-site pilot intervention trial, "HABIT," was led by culturally aligned community health workers (CHWs). CHWs performed support primarily through home visits, augmented by tailored text message reminders. Dyads of youth with SCD ages 10-18 years and a parent were enrolled. A customized HbF biomarker, the percentage decrease from each patients' highest historical HU-induced HbF, "Personal best," was used to qualify for enrollment and assess HU adherence. Two primary outcomes were as follows: (1) intervention feasibility and acceptability and (2) HU adherence measured in three ways: monthly percentage improvement toward HbF Personal best, proportion of days covered (PDC) by HU, and self-report. RESULTS: Twenty-eight dyads were enrolled, of which 89% were retained. Feasibility and acceptability were excellent. Controlling for group assignment and month of intervention, the intervention group improved percentage decrease from Personal best by 2.3% per month during months 0-4 (P = 0.30), with similar improvement in adherence demonstrated using pharmacy records. Self-reported adherence did not correlate. Dyads viewed CHWs as supportive for learning about SCD and HU, living with SCD and making progress in coordinated self-management responsibility to support a daily HU habit. Most parents and youth appreciated text message HU reminders. CONCLUSIONS: The HABIT pilot intervention demonstrated feasibility and acceptability with promising effect toward improved medication adherence. Testing in a larger multisite intervention trial is warranted.


Asunto(s)
Anemia de Células Falciformes/tratamiento farmacológico , Agentes Comunitarios de Salud , Hidroxiurea/uso terapéutico , Cumplimiento de la Medicación , Adolescente , Anemia de Células Falciformes/sangre , Niño , Estudios de Factibilidad , Femenino , Hemoglobina Fetal/análisis , Humanos , Masculino , Proyectos Piloto
10.
Lung ; 195(5): 575-585, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28707107

RESUMEN

INTRODUCTION: Cough in bronchiectasis is associated with significant impairment in health status. This study aimed to quantify cough frequency objectively with a cough monitor and investigate its relationship with health status. A secondary aim was to identify clinical predictors of cough frequency. METHODS: Fifty-four patients with bronchiectasis were compared with thirty-five healthy controls. Objective 24-h cough, health status (cough-specific: Leicester Cough Questionnaire LCQ and bronchiectasis specific: Bronchiectasis Health Questionnaire BHQ), cough severity and lung function were measured. The clinical predictors of cough frequency in bronchiectasis were determined in a multivariate analysis. RESULTS: Objective cough frequency was significantly raised in patients with bronchiectasis compared to healthy controls [geometric mean (standard deviation)] 184.5 (4.0) vs. 20.6 (3.2) coughs/24-h; mean fold-difference (95% confidence interval) 8.9 (5.2, 15.2); p < 0.001 and they had impaired health status. There was a significant correlation between objective cough frequency and subjective measures; LCQ r = -0.52 and BHQ r = -0.62, both p < 0.001. Sputum production, exacerbations (between past 2 weeks to 12 months) and age were significantly associated with objective cough frequency in multivariate analysis, explaining 52% of the variance (p < 0.001). There was no statistically significant association between cough frequency and lung function. CONCLUSIONS: Cough is a common and significant symptom in patients with bronchiectasis. Sputum production, exacerbations and age, but not lung function, were independent predictors of cough frequency. Ambulatory objective cough monitoring provides novel insights and should be further investigated as an outcome measure in bronchiectasis.


Asunto(s)
Bronquiectasia/fisiopatología , Tos/fisiopatología , Estado de Salud , Calidad de Vida , Adulto , Anciano , Bronquiectasia/complicaciones , Portador Sano/fisiopatología , Estudios de Casos y Controles , Tos/etiología , Progresión de la Enfermedad , Femenino , Volumen Espiratorio Forzado , Humanos , Masculino , Persona de Mediana Edad , Infecciones por Pseudomonas/complicaciones , Infecciones por Pseudomonas/fisiopatología , Pseudomonas aeruginosa , Índice de Severidad de la Enfermedad , Esputo , Encuestas y Cuestionarios , Escala Visual Analógica , Capacidad Vital
11.
J Med Syst ; 41(9): 141, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28780714

RESUMEN

Historically, medical imaging repositories have been supported by indoor infrastructures. However, the amount of diagnostic imaging procedures has continuously increased over the last decades, imposing several challenges associated with the storage volume, data redundancy and availability. Cloud platforms are focused on delivering hardware and software services over the Internet, becoming an appealing solution for repository outsourcing. Although this option may bring financial and technological benefits, it also presents new challenges. In medical imaging scenarios, communication latency is a critical issue that still hinders the adoption of this paradigm. This paper proposes an intelligent Cloud storage gateway that optimizes data access times. This is achieved through a new cache architecture that combines static rules and pattern recognition for eviction and prefetching. The evaluation results, obtained from experiments over a real-world dataset, show that cache hit ratios can reach around 80%, leading to reductions of image retrieval times by over 60%. The combined use of eviction and prefetching policies proposed can significantly reduce communication latency, even when using a small cache in comparison to the total size of the repository. Apart from the performance gains, the proposed system is capable of adjusting to specific workflows of different institutions.


Asunto(s)
Diagnóstico por Imagen , Nube Computacional , Almacenamiento y Recuperación de la Información , Internet , Servicios Externos
12.
Bioinformatics ; 29(15): 1915-6, 2013 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-23736528

RESUMEN

SUMMARY: The continuous growth of the biomedical scientific literature has been motivating the development of text-mining tools able to efficiently process all this information. Although numerous domain-specific solutions are available, there is no web-based concept-recognition system that combines the ability to select multiple concept types to annotate, to reference external databases and to automatically annotate nested and intercepted concepts. BeCAS, the Biomedical Concept Annotation System, is an API for biomedical concept identification and a web-based tool that addresses these limitations. MEDLINE abstracts or free text can be annotated directly in the web interface, where identified concepts are enriched with links to reference databases. Using its customizable widget, it can also be used to augment external web pages with concept highlighting features. Furthermore, all text-processing and annotation features are made available through an HTTP REST API, allowing integration in any text-processing pipeline. AVAILABILITY: BeCAS is freely available for non-commercial use at http://bioinformatics.ua.pt/becas. CONTACTS: tiago.nunes@ua.pt or jlo@ua.pt.


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Bases de Datos Factuales , Internet , MEDLINE
14.
Theor Biol Med Model ; 11 Suppl 1: S6, 2014 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-25077431

RESUMEN

BACKGROUND: Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluates the use of Twitter messages and search engine query logs to estimate and predict the incidence rate of influenza like illness in Portugal. RESULTS: Based on a manually classified dataset of 2704 tweets from Portugal, we selected a set of 650 textual features to train a Naïve Bayes classifier to identify tweets mentioning flu or flu-like illness or symptoms. We obtained a precision of 0.78 and an F-measure of 0.83, based on cross validation over the complete annotated set. Furthermore, we trained a multiple linear regression model to estimate the health-monitoring data from the Influenzanet project, using as predictors the relative frequencies obtained from the tweet classification results and from query logs, and achieved a correlation ratio of 0.89 (p<0.001). These classification and regression models were also applied to estimate the flu incidence in the following flu season, achieving a correlation of 0.72. CONCLUSIONS: Previous studies addressing the estimation of disease incidence based on user-generated content have mostly focused on the english language. Our results further validate those studies and show that by changing the initial steps of data preprocessing and feature extraction and selection, the proposed approaches can be adapted to other languages. Additionally, we investigated whether the predictive model created can be applied to data from the subsequent flu season. In this case, although the prediction result was good, an initial phase to adapt the regression model could be necessary to achieve more robust results.


Asunto(s)
Gripe Humana/epidemiología , Internet , Motor de Búsqueda , Humanos , Modelos Lineales , Portugal/epidemiología , Curva ROC , Estadística como Asunto
15.
Am J Respir Crit Care Med ; 187(9): 991-7, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23471466

RESUMEN

RATIONALE: Cough can be assessed with visual analog scales (VAS), health status measures, and 24-hour cough frequency monitors (CF(24)). Evidence for their measurement properties in acute cough caused by upper respiratory tract infection (URTI) and longitudinal data is limited. OBJECTIVES: To assess cough longitudinally in URTI with subjective and objective outcome measures and determine sample size for future studies. METHODS: Thirty-three previously healthy subjects with URTI completed cough VAS, Leicester Cough Questionnaire (LCQ-acute), and CF(24) monitoring (Leicester Cough Monitor) on three occasions, 4 days apart. Changes in subjects' condition were assessed with a global rating of change questionnaire. The potential for baseline first-hour cough frequency (CF(1)), VAS, and LCQ to identify low CF(24) was assessed. MEASUREMENTS AND MAIN RESULTS: Mean ± SD duration of cough at visit 1 was 4.1 ± 2.5 days. Geometric mean ± log SD baseline CF(24) and median (interquartile range) cough bouts were high (14.9 ± 0.4 coughs/h and 85 [39-195] bouts/24 h). Health status was severely impaired. There was a significant reduction in CF(24) and VAS, and improvement in LCQ, from visits 1-3. At visit 3, CF(24) remained above normal limits in 52% of subjects. The smallest changes in CF(24), LCQ, and VAS that subjects perceived important were 54%, 2- and 17-mm change from baseline, respectively. The sample sizes required for parallel group studies to detect these changes are 27, 51, and 25 subjects per group, respectively. CF(1) (<20.5 coughs/h) was predictive of low CF(24). CONCLUSIONS: CF(24), VAS, and LCQ are responsive outcome tools for the assessment of acute cough. The smallest change in cough frequency perceived important by subjects is 54%. The sample sizes required for future studies are modest and achievable.


Asunto(s)
Tos/etiología , Infecciones del Sistema Respiratorio/complicaciones , Índice de Severidad de la Enfermedad , Enfermedad Aguda , Adulto , Tos/fisiopatología , Femenino , Humanos , Estudios Longitudinales , Masculino , Monitoreo Fisiológico , Calidad de Vida , Tamaño de la Muestra , Encuestas y Cuestionarios
16.
Database (Oxford) ; 20242024 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-38994795

RESUMEN

Biomedical relation extraction is an ongoing challenge within the natural language processing community. Its application is important for understanding scientific biomedical literature, with many use cases, such as drug discovery, precision medicine, disease diagnosis, treatment optimization and biomedical knowledge graph construction. Therefore, the development of a tool capable of effectively addressing this task holds the potential to improve knowledge discovery by automating the extraction of relations from research manuscripts. The first track in the BioCreative VIII competition extended the scope of this challenge by introducing the detection of novel relations within the literature. This paper describes that our participation system initially focused on jointly extracting and classifying novel relations between biomedical entities. We then describe our subsequent advancement to an end-to-end model. Specifically, we enhanced our initial system by incorporating it into a cascading pipeline that includes a tagger and linker module. This integration enables the comprehensive extraction of relations and classification of their novelty directly from raw text. Our experiments yielded promising results, and our tagger module managed to attain state-of-the-art named entity recognition performance, with a micro F1-score of 90.24, while our end-to-end system achieved a competitive novelty F1-score of 24.59. The code to run our system is publicly available at https://github.com/ieeta-pt/BioNExt. Database URL: https://github.com/ieeta-pt/BioNExt.


Asunto(s)
Procesamiento de Lenguaje Natural , Minería de Datos/métodos , Humanos
17.
Database (Oxford) ; 20242024 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-39083461

RESUMEN

The identification of medical concepts from clinical narratives has a large interest in the biomedical scientific community due to its importance in treatment improvements or drug development research. Biomedical named entity recognition (NER) in clinical texts is crucial for automated information extraction, facilitating patient record analysis, drug development, and medical research. Traditional approaches often focus on single-class NER tasks, yet recent advancements emphasize the necessity of addressing multi-class scenarios, particularly in complex biomedical domains. This paper proposes a strategy to integrate a multi-head conditional random field (CRF) classifier for multi-class NER in Spanish clinical documents. Our methodology overcomes overlapping entity instances of different types, a common challenge in traditional NER methodologies, by using a multi-head CRF model. This architecture enhances computational efficiency and ensures scalability for multi-class NER tasks, maintaining high performance. By combining four diverse datasets, SympTEMIST, MedProcNER, DisTEMIST, and PharmaCoNER, we expand the scope of NER to encompass five classes: symptoms, procedures, diseases, chemicals, and proteins. To the best of our knowledge, these datasets combined create the largest Spanish multi-class dataset focusing on biomedical entity recognition and linking for clinical notes, which is important to train a biomedical model in Spanish. We also provide entity linking to the multi-lingual Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) vocabulary, with the eventual goal of performing biomedical relation extraction. Through experimentation and evaluation of Spanish clinical documents, our strategy provides competitive results against single-class NER models. For NER, our system achieves a combined micro-averaged F1-score of 78.73, with clinical mentions normalized to SNOMED CT with an end-to-end F1-score of 54.51. The code to run our system is publicly available at https://github.com/ieeta-pt/Multi-Head-CRF. Database URL: https://github.com/ieeta-pt/Multi-Head-CRF.


Asunto(s)
Minería de Datos , Humanos , España , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Registros Electrónicos de Salud
18.
BMC Bioinformatics ; 14: 281, 2013 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-24063607

RESUMEN

BACKGROUND: Concept recognition is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. The development of such solutions is typically performed in an ad-hoc manner or using general information extraction frameworks, which are not optimized for the biomedical domain and normally require the integration of complex external libraries and/or the development of custom tools. RESULTS: This article presents Neji, an open source framework optimized for biomedical concept recognition built around four key characteristics: modularity, scalability, speed, and usability. It integrates modules for biomedical natural language processing, such as sentence splitting, tokenization, lemmatization, part-of-speech tagging, chunking and dependency parsing. Concept recognition is provided through dictionary matching and machine learning with normalization methods. Neji also integrates an innovative concept tree implementation, supporting overlapped concept names and respective disambiguation techniques. The most popular input and output formats, namely Pubmed XML, IeXML, CoNLL and A1, are also supported. On top of the built-in functionalities, developers and researchers can implement new processing modules or pipelines, or use the provided command-line interface tool to build their own solutions, applying the most appropriate techniques to identify heterogeneous biomedical concepts. Neji was evaluated against three gold standard corpora with heterogeneous biomedical concepts (CRAFT, AnEM and NCBI disease corpus), achieving high performance results on named entity recognition (F1-measure for overlap matching: species 95%, cell 92%, cellular components 83%, gene and proteins 76%, chemicals 65%, biological processes and molecular functions 63%, disorders 85%, and anatomical entities 82%) and on entity normalization (F1-measure for overlap name matching and correct identifier included in the returned list of identifiers: species 88%, cell 71%, cellular components 72%, gene and proteins 64%, chemicals 53%, and biological processes and molecular functions 40%). Neji provides fast and multi-threaded data processing, annotating up to 1200 sentences/second when using dictionary-based concept identification. CONCLUSIONS: Considering the provided features and underlying characteristics, we believe that Neji is an important contribution to the biomedical community, streamlining the development of complex concept recognition solutions. Neji is freely available at http://bioinformatics.ua.pt/neji.


Asunto(s)
Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Programas Informáticos , Bases de Datos Factuales
19.
BMC Bioinformatics ; 14: 54, 2013 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-23413997

RESUMEN

BACKGROUND: Automatic recognition of biomedical names is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. In recent years, various solutions have been implemented to tackle this problem. However, limitations regarding system characteristics, customization and usability still hinder their wider application outside text mining research. RESULTS: We present Gimli, an open-source, state-of-the-art tool for automatic recognition of biomedical names. Gimli includes an extended set of implemented and user-selectable features, such as orthographic, morphological, linguistic-based, conjunctions and dictionary-based. A simple and fast method to combine different trained models is also provided. Gimli achieves an F-measure of 87.17% on GENETAG and 72.23% on JNLPBA corpus, significantly outperforming existing open-source solutions. CONCLUSIONS: Gimli is an off-the-shelf, ready to use tool for named-entity recognition, providing trained and optimized models for recognition of biomedical entities from scientific text. It can be used as a command line tool, offering full functionality, including training of new models and customization of the feature set and model parameters through a configuration file. Advanced users can integrate Gimli in their text mining workflows through the provided library, and extend or adapt its functionalities. Based on the underlying system characteristics and functionality, both for final users and developers, and on the reported performance results, we believe that Gimli is a state-of-the-art solution for biomedical NER, contributing to faster and better research in the field. Gimli is freely available at http://bioinformatics.ua.pt/gimli.


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Vocabulario Controlado , Línea Celular , ADN , Genes , Humanos , Proteínas , ARN
20.
Bioinformatics ; 28(9): 1253-61, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-22419783

RESUMEN

MOTIVATION: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs). RESULTS: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts. AVAILABILITY AND IMPLEMENTATION: Totum is implemented in Java and its resources are available at http://bioinformatics.ua.pt/totum


Asunto(s)
Inteligencia Artificial , Minería de Datos , Anotación de Secuencia Molecular , Proteínas/genética , Animales , Humanos , MEDLINE , Ratones , Anotación de Secuencia Molecular/normas , Terminología como Asunto , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA