Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
J Biomed Semantics ; 14(1): 1, 2023 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-36721225

RESUMEN

BACKGROUND: Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health. OBJECTIVE: In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications. METHODS: We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen. RESULTS: We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents. CONCLUSIONS: We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest. TRIAL REGISTRATION: N/A.


Asunto(s)
Algoritmos , Procesamiento de Lenguaje Natural , Bases de Datos Genéticas , MEDLINE , Aprendizaje Automático
2.
BMC Geriatr ; 22(1): 922, 2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36451137

RESUMEN

BACKGROUND: Although elderly population is generally frail, it is important to closely monitor their health deterioration to improve the care and support in residential aged care homes (RACs). Currently, the best identification approach is through time-consuming regular geriatric assessments. This study aimed to develop and validate a retrospective electronic frailty index (reFI) to track the health status of people staying at RACs using the daily routine operational data records. METHODS: We have access to patient records from the Royal Freemasons Benevolent Institution RACs (Australia) over the age of 65, spanning 2010 to 2021. The reFI was developed using the cumulative deficit frailty model whose value was calculated as the ratio of number of present frailty deficits to the total possible frailty indicators (32). Frailty categories were defined using population quartiles. 1, 3 and 5-year mortality were used for validation. Survival analysis was performed using Kaplan-Meier estimate. Hazard ratios (HRs) were estimated using Cox regression analyses and the association was assessed using receiver operating characteristic (ROC) curves. RESULTS: Two thousand five hundred eighty-eight residents were assessed, with an average length of stay of 1.2 ± 2.2 years. The RAC cohort was generally frail with an average reFI of 0.21 ± 0.11. According to the Kaplan-Meier estimate, survival varied significantly across different frailty categories (p < 0.01). The estimated hazard ratios (HRs) were 1.12 (95% CI 1.09-1.15), 1.11 (95% CI 1.07-1.14), and 1.1 (95% CI 1.04-1.17) at 1, 3 and 5 years. The ROC analysis of the reFI for mortality outcome showed an area under the curve (AUC) of ≥0.60 for 1, 3 and 5-year mortality. CONCLUSION: A novel reFI was developed using the routine data recorded at RACs. reFI can identify changes in the frailty index over time for elderly people, that could potentially help in creating personalised care plans for addressing their health deterioration.


Asunto(s)
Fragilidad , Anciano , Humanos , Estudios Retrospectivos , Fragilidad/diagnóstico , Fragilidad/epidemiología , Hogares para Ancianos , Electrónica , Estimación de Kaplan-Meier
3.
AMIA Annu Symp Proc ; 2020: 1325-1334, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33936509

RESUMEN

Recent research in predicting protein secondary structure populations (SSP) based on Nuclear Magnetic Resonance (NMR) chemical shifts has helped quantitatively characterise the structural conformational properties of intrinsically disordered proteins and regions (IDP/IDR). Different from protein secondary structure (SS) prediction, the SSP prediction assumes a dynamic assignment of secondary structures that seem correlate with disordered states. In this study, we designed a single-task deep learning framework to predict IDP/IDR and SSP respectively; and multitask deep learning frameworks to allow quantitative predictions of IDP/IDR evidenced by the simultaneously predicted SSP. According to independent test results, single-task deep learning models improve the prediction performance of shallow models for SSP and IDP/IDR. Also, the prediction performance was further improved for IDP/IDR prediction when SSP prediction was simultaneously predicted in multitask models. With p53 as a use case, we demonstrate how predicted SSP is used to explain the IDP/IDR predictions for each functional region.


Asunto(s)
Aprendizaje Profundo , Proteínas Intrínsecamente Desordenadas/química , Estructura Secundaria de Proteína
4.
Clin Colorectal Cancer ; 17(3): e569-e577, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29980491

RESUMEN

BACKGROUND: Multiple studies have defined the prognostic and potential predictive significance of the primary tumor side in metastatic colorectal cancer (CRC). However, the currently available data for early-stage disease are limited and inconsistent. MATERIALS AND METHODS: We explored the clinicopathologic, treatment, and outcome data from a multisite Australian CRC registry from 2003 to 2016. Tumors at and distal to the splenic flexure were considered a left primary (LP). RESULTS: For the 6547 patients identified, the median age at diagnosis was 69 years, 55% were men, and most (63%) had a LP. Comparing the outcomes for right primary (RP) versus LP, time-to-recurrence was similar for stage I and III disease, but longer for those with a stage II RP (hazard ratio [HR], 0.68; 95% confidence interval [CI], 0.52-0.90; P < .01). Adjuvant chemotherapy provided a consistent benefit in stage III disease, regardless of the tumor side. Overall survival (OS) was similar for those with stage I and II disease between LP and RP patients; however, those with stage III RP disease had poorer OS (HR, 1.30; 95% CI, 1.04-1.62; P < .05) and cancer-specific survival (HR, 1.55; 95% CI, 1.19-2.03; P < .01). Patients with stage IV RP, whether de novo metastatic (HR, 1.15; 95% CI, 0.95-1.39) or relapsed post-early-stage disease (HR, 1.35; 95% CI, 1.11-1.65; P < .01), had poorer OS. CONCLUSION: In early-stage CRC, the association of tumor side and effect on the time-to-recurrence and OS varies by stage. In stage III patients with an RP, poorer OS and cancer-specific survival outcomes are, in part, driven by inferior survival after recurrence, and tumor side did not influence adjuvant chemotherapy benefit.


Asunto(s)
Antineoplásicos/uso terapéutico , Neoplasias Colorrectales/patología , Recurrencia Local de Neoplasia/epidemiología , Sistema de Registros/estadística & datos numéricos , Anciano , Australia/epidemiología , Quimioterapia Adyuvante/métodos , Neoplasias Colorrectales/mortalidad , Neoplasias Colorrectales/terapia , Supervivencia sin Enfermedad , Femenino , Humanos , Masculino , Persona de Mediana Edad , Recurrencia Local de Neoplasia/patología , Estadificación de Neoplasias , Prevalencia , Pronóstico , Modelos de Riesgos Proporcionales , Estudios Prospectivos , Análisis de Supervivencia
5.
BMC Bioinformatics ; 16: 113, 2015 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-25887792

RESUMEN

BACKGROUND: Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. RESULTS: Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. CONCLUSIONS: We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Algoritmos , Almacenamiento y Recuperación de la Información , MEDLINE , Medical Subject Headings , Inteligencia Artificial , Humanos , Semántica
6.
J Bioinform Comput Biol ; 8(1): 163-79, 2010 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-20183881

RESUMEN

The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the boundary assignments and that enabled comparison and alignment of the results. During the harmonization phase, the results produced from those different systems were integrated in a single harmonized corpus ("silver standard" corpus) by applying a voting scheme. We give an overview of the processed data and the principles of harmonization--formal boundary reconciliation and semantic matching of named entities. Finally, all submissions of the participants were evaluated against that silver standard corpus. We found that species and disease annotations are better standardized amongst the partners than the annotations of genes and proteins. The raw corpus is now available for additional named entity annotations. Parts of it will be made available later on for a public challenge. We expect that we can improve corpus building activities both in terms of the numbers of named entity classes being covered, as well as the size of the corpus in terms of annotated documents.


Asunto(s)
Biología Computacional/normas , Minería de Datos/normas , Conducta Cooperativa , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales/estadística & datos numéricos , Unified Medical Language System
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...