Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
BMC Geriatr ; 22(1): 922, 2022 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-36451137

RESUMO

BACKGROUND: Although elderly population is generally frail, it is important to closely monitor their health deterioration to improve the care and support in residential aged care homes (RACs). Currently, the best identification approach is through time-consuming regular geriatric assessments. This study aimed to develop and validate a retrospective electronic frailty index (reFI) to track the health status of people staying at RACs using the daily routine operational data records. METHODS: We have access to patient records from the Royal Freemasons Benevolent Institution RACs (Australia) over the age of 65, spanning 2010 to 2021. The reFI was developed using the cumulative deficit frailty model whose value was calculated as the ratio of number of present frailty deficits to the total possible frailty indicators (32). Frailty categories were defined using population quartiles. 1, 3 and 5-year mortality were used for validation. Survival analysis was performed using Kaplan-Meier estimate. Hazard ratios (HRs) were estimated using Cox regression analyses and the association was assessed using receiver operating characteristic (ROC) curves. RESULTS: Two thousand five hundred eighty-eight residents were assessed, with an average length of stay of 1.2 ± 2.2 years. The RAC cohort was generally frail with an average reFI of 0.21 ± 0.11. According to the Kaplan-Meier estimate, survival varied significantly across different frailty categories (p < 0.01). The estimated hazard ratios (HRs) were 1.12 (95% CI 1.09-1.15), 1.11 (95% CI 1.07-1.14), and 1.1 (95% CI 1.04-1.17) at 1, 3 and 5 years. The ROC analysis of the reFI for mortality outcome showed an area under the curve (AUC) of ≥0.60 for 1, 3 and 5-year mortality. CONCLUSION: A novel reFI was developed using the routine data recorded at RACs. reFI can identify changes in the frailty index over time for elderly people, that could potentially help in creating personalised care plans for addressing their health deterioration.


Assuntos
Fragilidade , Idoso , Humanos , Estudos Retrospectivos , Fragilidade/diagnóstico , Fragilidade/epidemiologia , Instituição de Longa Permanência para Idosos , Eletrônica , Estimativa de Kaplan-Meier
2.
J Biomed Inform ; 64: 158-167, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27742349

RESUMO

OBJECTIVE: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. METHODS: Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. RESULTS: Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. CONCLUSION: Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the most effective combination of data sources depends on the specific disease to be classified.


Assuntos
Mineração de Dados , Doença/classificação , Registros Hospitalares , Processamento de Linguagem Natural , Hospitalização , Humanos , Cooperação do Paciente , Máquina de Vetores de Suporte
3.
J Biomed Inform ; 53: 251-60, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25460203

RESUMO

BACKGROUND: Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for detecting such IFDs from free-text radiology (CT) reports, using machine-learning techniques. METHOD: We obtained free-text reports of CT scans performed over a specific hospitalisation period (2003-2011), for 264 IFD and 289 control patients from three tertiary hospitals. We analysed IFD evidence at patient, report, and sentence levels. Three infectious disease experts annotated the reports of 73 IFD-positive patients for language suggestive of IFD at sentence level, and graded the sentences as to whether they suggested or excluded the presence of IFD. Reliable agreement between annotators was obtained and this was used as training data for our classifiers. We tested a variety of Machine Learning (ML), rule based, and hybrid systems, with feature types including bags of words, bags of phrases, and bags of concepts, as well as report-level structured features. Evaluation was carried out over a robust framework with separate Development and Held-Out datasets. RESULTS: The best systems (using Support Vector Machines) achieved very high recall at report- and patient-levels over unseen data: 95% and 100% respectively. Precision at report-level over held-out data was 71%; however, most of the associated false-positive reports (53%) belonged to patients who had a previous positive report appropriately flagged by the classifier, reducing negative impact in practice. CONCLUSIONS: Our machine learning application holds the potential for developing systematic IFD surveillance systems for hospital populations.


Assuntos
Aspergilose/diagnóstico , Mineração de Dados/métodos , Tomografia Computadorizada por Raios X , Algoritmos , Inteligência Artificial , Coleta de Dados/métodos , Processamento Eletrônico de Dados , Reações Falso-Positivas , Hospitalização , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Radiologia/métodos , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
4.
BJUI Compass ; 5(1): 121-141, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38179019

RESUMO

Objectives: To develop an online treatment decision aid (OTDA) to assist patients with low-risk prostate cancer (LRPC) and their partners in making treatment decisions. Patients and methods: Navigate, an OTDA for LRPC, was rigorously co-designed by patients with a confirmed diagnosis or at risk of LRPC and their partners, clinicians, researchers and website designers/developers. A theoretical model guided the development process. A mixed methods approach was used incorporating (1) evidence for essential design elements for OTDAs; (2) evidence for treatment options for LRPC; (3) an iterative co-design process involving stakeholder workshops and prototype review; and (4) expert rating using the International Patient Decision Aid Standards (IPDAS). Three co-design workshops with potential users (n = 12) and research and web-design team members (n = 10) were conducted. Results from each workshop informed OTDA modifications to the OTDA for testing in the subsequent workshop. Clinician (n = 6) and consumer (n = 9) feedback on usability and content on the penultimate version was collected. Results: The initial workshops identified key content and design features that were incorporated into the draft OTDA, re-workshopped and incorporated into the penultimate OTDA. Expert feedback on usability and content was also incorporated into the final OTDA. The final OTDA was deemed comprehensive, clear and appropriate and met all IPDAS criteria. Conclusion: Navigate is an interactive and acceptable OTDA for Australian men with LRPC designed by men for men using a co-design methodology. The effectiveness of Navigate in assisting patient decision-making is currently being assessed in a randomised controlled trial with patients with LRPC and their partners.

5.
BMC Bioinformatics ; 12 Suppl 2: S5, 2011 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-21489224

RESUMO

AIM: Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD: We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. RESULTS: For the classification tasks over all labels, our systems achieved micro-averaged f-scores of 80.9% and 66.9% over datasets of structured and unstructured abstracts respectively, using sequential features. In labeling only the key sentences, our systems produced f-scores of 89.3% and 74.0% over structured and unstructured abstracts respectively, using the same sequential features. The results over an external dataset were lower (f-scores of 63.1% for all labels, and 83.8% for key sentences). CONCLUSIONS: Of the features we used, the best for classifying any given sentence in an abstract were based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperformed feature sets used in previous work.


Assuntos
Indexação e Redação de Resumos/métodos , Medicina Baseada em Evidências , Armazenamento e Recuperação da Informação/métodos , Processamento Eletrônico de Dados/métodos , Processamento de Linguagem Natural , Semântica
6.
Appl Ergon ; 96: 103486, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34139375

RESUMO

This research empirically evaluates the introduction of speech to existing keyboard and mouse input modalities in an application used to control aircraft in a simulated, complex and dynamic environment. Task performance and task performance degradation are assessed for three levels of workload. Previous studies have evaluated task performance using these modalities however, only a couple have evaluated task performance under varying workload. Even though speech is a common addition to modern control interfaces, the effect of varying workload on this combination of control modalities has not yet been reported. Thirty-six participants commanded simulated aircraft through generated obstacle courses to reach a Combat Air Patrol (CAP) point while also responding to a secondary task. There were nine conditions that varied the control modality (Keyboard and Mouse (KM), Voice (V), and Keyboard, Mouse and Voice (KMV)), and workload by varying the number of aircraft being controlled (low, medium and high). Results showed that KM outperformed KMV and V for the low and medium workload levels. However, task performance with KMV was found to degrade the least as workload increased. KMV and KM were found to enable significantly more correct responses to the secondary task which was delivered aurally. Participants reported a preference for the combined modalities (KMV), self-assessing that KMV most reduced their workload. This research suggests that the addition of a speech interface to existing keyboard and mouse modalities, for control of aircraft in a simulation, may help manage cognitive load and may assist in controlling more aircraft under higher workloads.


Assuntos
Fala , Carga de Trabalho , Aeronaves , Simulação por Computador , Humanos , Análise e Desempenho de Tarefas
7.
Trials ; 22(1): 49, 2021 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-33430950

RESUMO

BACKGROUND: Active surveillance (AS) is the disease management option of choice for low-risk prostate cancer. Despite this, men with low-risk prostate cancer (LRPC) find management decisions distressing and confusing. We developed Navigate, an online decision aid to help men and their partners make management decisions consistent with their values. The aims are to evaluate the impact of Navigate on uptake of AS; decision-making preparedness; decisional conflict, regret and satisfaction; quality of illness communication; and prostate cancer-specific quality of life and anxiety. In addition, the healthcare cost impact, cost-effectiveness and patterns of use of Navigate will be assessed. This paper describes the study protocol. METHODS: Three hundred four men and their partners are randomly assigned one-to-one to Navigate or to the control arm. Randomisation is electronically generated and stratified by site. Navigate is an online decision aid that presents up-to-date, unbiased information on LRPC tailored to Australian men and their partners including each management option and potential side-effects, and an interactive values clarification exercise. Participants in the control arm will be directed to the website of Australia's peak national body for prostate cancer. Eligible patients will be men within 3 months of being diagnosed with LRPC, aged 18 years or older, and who are yet to make a treatment decision, who are deemed eligible for AS by their treating clinician and who have Internet access and sufficient English to participate. The primary outcome is self-reported uptake of AS as the first-line management option. Secondary outcomes include self-reported preparedness for decision-making; decisional conflict, regret and satisfaction; quality of illness communication; and prostate cancer-specific quality of life. Uptake of AS 1 month after consent will be determined through patient self-report. Men and their partners will complete study outcome measures before randomisation and 1, 3 and 6 months after study consent. DISCUSSION: The Navigate online decision aid has the potential to increase the choice of AS in LRPC, avoiding or delaying unnecessary radical treatments and associated side effects. In addition, Navigate is likely to reduce patients' and partners' confusion and distress in management decision-making and increase their quality of life. TRIAL REGISTRATION: Australian and New Zealand Clinical Trial Registry ACTRN12616001665426 . Registered on 2 December 2016. All items from the WHO Trial Registration Data set can be found in this manuscript.


Assuntos
Neoplasias da Próstata , Qualidade de Vida , Austrália , Tomada de Decisões , Técnicas de Apoio para a Decisão , Humanos , Masculino , Nova Zelândia , Neoplasias da Próstata/terapia , Ensaios Clínicos Controlados Aleatórios como Assunto
8.
Front Res Metr Anal ; 6: 654438, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33870071

RESUMO

Chemical patents represent a valuable source of information about new chemical compounds, which is critical to the drug discovery process. Automated information extraction over chemical patents is, however, a challenging task due to the large volume of existing patents and the complex linguistic properties of chemical patents. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced text mining techniques for chemical patents. The ChEMU 2020 lab proposed two fundamental information extraction tasks focusing on chemical reaction processes described in chemical patents: (1) chemical named entity recognition, requiring identification of essential chemical entities and their roles in chemical reactions, as well as reaction conditions; and (2) event extraction, which aims at identification of event steps relating the entities involved in chemical reactions. The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines. We further show the methods to be robust to variations in sampling of the test data. We provide a detailed overview of the ChEMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. We also present the methods adopted by participants, provide a detailed analysis of their performance, and carefully consider the potential impact of data leakage on interpretation of the results. The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents.

9.
BMC Med Inform Decis Mak ; 10: 58, 2010 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-20937152

RESUMO

BACKGROUND: The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized. METHODS: We explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task. RESULTS: Our results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process. CONCLUSIONS: Outcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Literatura de Revisão como Assunto
10.
Neural Netw ; 128: 345-357, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32470799

RESUMO

Continual learning is the ability of a learning system to solve new tasks by utilizing previously acquired knowledge from learning and performing prior tasks without having significant adverse effects on the acquired prior knowledge. Continual learning is key to advancing machine learning and artificial intelligence. Progressive learning is a deep learning framework for continual learning that comprises three procedures: curriculum, progression, and pruning. The curriculum procedure is used to actively select a task to learn from a set of candidate tasks. The progression procedure is used to grow the capacity of the model by adding new parameters that leverage parameters learned in prior tasks, while learning from data available for the new task at hand, without being susceptible to catastrophic forgetting. The pruning procedure is used to counteract the growth in the number of parameters as further tasks are learned, as well as to mitigate negative forward transfer, in which prior knowledge unrelated to the task at hand may interfere and worsen performance. Progressive learning is evaluated on a number of supervised classification tasks in the image recognition and speech recognition domains to demonstrate its advantages compared with baseline methods. It is shown that, when tasks are related, progressive learning leads to faster learning that converges to better generalization performance using a smaller number of dedicated parameters.


Assuntos
Aprendizado Profundo
11.
Hum Mutat ; 30(4): 496-510, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19306394

RESUMO

The remarkable progress in characterizing the human genome sequence, exemplified by the Human Genome Project and the HapMap Consortium, has led to the perception that knowledge and the tools (e.g., microarrays) are sufficient for many if not most biomedical research efforts. A large amount of data from diverse studies proves this perception inaccurate at best, and at worst, an impediment for further efforts to characterize the variation in the human genome. Because variation in genotype and environment are the fundamental basis to understand phenotypic variability and heritability at the population level, identifying the range of human genetic variation is crucial to the development of personalized nutrition and medicine. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) was proposed initially to systematically collect mutations that cause human disease and create a cyber infrastructure to link locus specific databases (LSDB). We report here the discussions and recommendations from the 2008 HVP planning meeting held in San Feliu de Guixols, Spain, in May 2008.


Assuntos
Bases de Dados Genéticas , Variação Genética , Genoma Humano/genética , Biologia Computacional/métodos , Biologia Computacional/normas , Predisposição Genética para Doença , Genótipo , Humanos , Disseminação de Informação , Mutação , Fenótipo , Polimorfismo Genético , Espanha
12.
Neural Netw ; 92: 60-68, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28396068

RESUMO

Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances.


Assuntos
Emoções , Aprendizado de Máquina , Interface para o Reconhecimento da Fala , Redes Neurais de Computação
13.
Stud Health Technol Inform ; 235: 196-200, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28423782

RESUMO

This paper introduces the annotation schema and annotation process for a corpus of clinical letters describing the disease course and treatment of oestrogen receptor positive breast cancer patients, after completion of primary surgery and radiotherapy treatment. Concepts related to therapy, clinical signs, and recurrence, as well as relationships linking these, are identified and annotated in 200 letters. This corpus will provide the basis for development of natural language processing tools for automatic extraction of key clinical factors from such letters.


Assuntos
Neoplasias da Mama/radioterapia , Neoplasias da Mama/cirurgia , Processamento de Linguagem Natural , Neoplasias da Mama/patologia , Feminino , Seguimentos , Humanos , Receptores de Estrogênio
14.
JMIR Mhealth Uhealth ; 5(12): e184, 2017 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-29212628

RESUMO

BACKGROUND: Optimal dosing of oral tyrosine kinase inhibitor therapy is critical to treatment success and survival of patients with chronic myeloid leukemia (CML). Drug intolerance secondary to toxicities and nonadherence are significant factors in treatment failure. OBJECTIVE: The objective of this study was to develop and pilot-test the clinical feasibility and acceptability of a mobile health system (REMIND) to increase oral drug adherence and patient symptom self-management among people with CML (chronic phase). METHODS: A multifaceted intervention was iteratively developed using the intervention development framework by Schofield and Chambers, consisting of defining the patient problem and iteratively refining the intervention. The clinical feasibility and acceptability were examined via patient and intervention nurse interviews, which were audiotaped, transcribed, and deductively content analyzed. RESULTS: The intervention comprised 2 synergistically operating elements: (1) daily medication reminders and routine assessment of side effects with evidence-based self-care advice delivered in real time and (2) question prompt list (QPL) questions and routinely collected individual patient adherence and side effect profile data used to shape nurses' consultations, which employed motivational interviewing to support adoption of self-management behaviors. A total of 4 consultations and daily alerts and advice were delivered over 10 weeks. In total, 58% (10/17) of patients and 2 nurses participated in the pilot study. Patients reported several benefits of the intervention: help in establishing medication routines, resolution of symptom uncertainty, increased awareness of self-care, and informed decision making. Nurses also endorsed the intervention: it assisted in establishing pill-taking routines and patients developing effective solutions to adherence challenges. CONCLUSIONS: The REMIND system with nurse support was usable and acceptable to both patients and nurses. It has the potential to improve adherence and side-effect management and should be further evaluated.

16.
Artif Intell Med ; 62(1): 11-21, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25001545

RESUMO

OBJECTIVE: We address the task of extracting information from free-text pathology reports, focusing on staging information encoded by the TNM (tumour-node-metastases) and ACPS (Australian clinico-pathological stage) systems. Staging information is critical for diagnosing the extent of cancer in a patient and for planning individualised treatment. Extracting such information into more structured form saves time, improves reporting, and underpins the potential for automated decision support. METHODS AND MATERIAL: We investigate the portability of a text mining model constructed from records from one health centre, by applying it directly to the extraction task over a set of records from a different health centre, with different reporting narrative characteristics. Other than a simple normalisation step on features associated with target labels, we apply the models from one system directly to the other. RESULTS: The best F-scores for in-hospital experiments are 81%, 85%, and 94% (for staging T, N, and M respectively), while best cross-hospital F-scores reach 84%, 81%, and 91% for the same respective categories. CONCLUSIONS: Our performance results compare favourably to the best levels reported in the literature, and--most relevant to our aim here--the cross-corpus results demonstrate the portability of the models we developed.


Assuntos
Neoplasias Colorretais/patologia , Mineração de Dados , Sistemas de Informação Hospitalar , Estadiamento de Neoplasias , Algoritmos , Humanos , Prontuários Médicos , Processamento de Linguagem Natural
17.
PLoS One ; 9(9): e107797, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25250675

RESUMO

PURPOSE: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs. PATIENTS AND METHODS: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC). RESULTS: For the development subset, sensitivity/specificity was 91% (95%CI 86% to 94%)/79% (95%CI 71% to 84%) and ROC area was 0.92 (95%CI 89% to 94%). Of 25 (5.6%) missed notifications, only 4 (0.9%) reports were regarded as clinically significant. CONCLUSION: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.


Assuntos
Neoplasias Hematológicas/complicações , Pneumopatias/diagnóstico , Micoses/diagnóstico , Processamento de Linguagem Natural , Tomografia Computadorizada por Raios X/métodos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Feminino , Humanos , Pneumopatias/complicações , Pneumopatias/microbiologia , Masculino , Pessoa de Meia-Idade , Micoses/complicações , Micoses/microbiologia , Vigilância da População , Curva ROC , Estudos Retrospectivos , Adulto Jovem
18.
Database (Oxford) ; 2013: bat019, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23584833

RESUMO

This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.


Assuntos
Mineração de Dados/métodos , Doença/genética , Variação Genética , Publicações , Bases de Dados Genéticas , Humanos , Semântica , Estatística como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA