Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Lang Resour Eval ; 58(3): 1043-1071, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39323984

RESUMO

Robot-assisted minimally invasive surgery is the gold standard for the surgical treatment of many pathological conditions since it guarantees to the patient shorter hospital stay and quicker recovery. Several manuals and academic papers describe how to perform these interventions and thus contain important domain-specific knowledge. This information, if automatically extracted and processed, can be used to extract or summarize surgical practices or develop decision making systems that can help the surgeon or nurses to optimize the patient's management before, during, and after the surgery by providing theoretical-based suggestions. However, general English natural language understanding algorithms have lower efficacy and coverage issues when applied to domain others than those they are typically trained on, and a domain specific textual annotated corpus is missing. To overcome this problem, we annotated the first robotic-surgery procedural corpus, with PropBank-style semantic labels. Starting from the original PropBank framebank, we enriched it by adding new lemmas, frames and semantic arguments required to cover missing information in general English but needed in procedural surgical language, releasing the Robotic-Surgery Procedural Framebank (RSPF). We then collected from robotic-surgery textbooks as-is sentences for a total of 32,448 tokens, and we annotated them with RSPF labels. We so obtained and publicly released the first annotated corpus of the robotic-surgical domain that can be used to foster further research on language understanding and procedural entities and relations extraction from clinical and surgical scientific literature.

2.
Foods ; 11(17)2022 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-36076868

RESUMO

Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources-Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities-recipes-which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating-the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data-recipes-annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications.

3.
J Biomed Semantics ; 13(1): 14, 2022 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-35606797

RESUMO

BACKGROUND: The evidence-based medicine paradigm requires the ability to aggregate and compare outcomes of interventions across different trials. This can be facilitated and partially automatized by information extraction systems. In order to support the development of systems that can extract information from published clinical trials at a fine-grained and comprehensive level to populate a knowledge base, we present a richly annotated corpus at two levels. At the first level, entities that describe components of the PICO elements (e.g., population's age and pre-conditions, dosage of a treatment, etc.) are annotated. The second level comprises schema-level (i.e., slot-filling templates) annotations corresponding to complex PICO elements and other concepts related to a clinical trial (e.g. the relation between an intervention and an arm, the relation between an outcome and an intervention, etc.). RESULTS: The final corpus includes 211 annotated clinical trial abstracts with substantial agreement between annotators at the entity and scheme level. The mean Kappa value for the glaucoma and T2DM corpora was 0.74 and 0.68, respectively, for single entities. The micro-averaged F1 score to measure inter-annotator agreement for complex entities (i.e. slot-filling templates) was 0.81.The BERT-base baseline method for entity recognition achieved average micro- F1 scores of 0.76 for glaucoma and 0.77 for diabetes with exact matching. CONCLUSIONS: In this work, we have created a corpus that goes beyond the existing clinical trial corpora, since it is annotated in a schematic way that represents the classes and properties defined in an ontology. Although the corpus is small, it has fine-grained annotations and could be used to fine-tune pre-trained machine learning models and transformers to the specific task of extracting information about clinical trial abstracts.For future work, we will use the corpus for training information extraction systems that extract single entities, and predict template slot-fillers (i.e., class data/object properties) to populate a knowledge base that relies on the C-TrO ontology for the description of clinical trials. The resulting corpus and the code to measure inter-annotation agreement and the baseline method are publicly available at https://zenodo.org/record/6365890.


Assuntos
Ensaios Clínicos como Assunto , Glaucoma , Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Humanos , Bases de Conhecimento , Processamento de Linguagem Natural
4.
J Med Internet Res ; 23(7): e26770, 2021 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-34328444

RESUMO

BACKGROUND: Patient portals tethered to electronic health records systems have become attractive web platforms since the enacting of the Medicare Access and Children's Health Insurance Program Reauthorization Act and the introduction of the Meaningful Use program in the United States. Patients can conveniently access their health records and seek consultation from providers through secure web portals. With increasing adoption and patient engagement, the volume of patient secure messages has risen substantially, which opens up new research and development opportunities for patient-centered care. OBJECTIVE: This study aims to develop a data model for patient secure messages based on the Fast Healthcare Interoperability Resources (FHIR) standard to identify and extract significant information. METHODS: We initiated the first draft of the data model by analyzing FHIR and manually reviewing 100 sentences randomly sampled from more than 2 million patient-generated secure messages obtained from the online patient portal at the Mayo Clinic Rochester between February 18, 2010, and December 31, 2017. We then annotated additional sets of 100 randomly selected sentences using the Multi-purpose Annotation Environment tool and updated the data model and annotation guideline iteratively until the interannotator agreement was satisfactory. We then created a larger corpus by annotating 1200 randomly selected sentences and calculated the frequency of the identified medical concepts in these sentences. Finally, we performed topic modeling analysis to learn the hidden topics of patient secure messages related to 3 highly mentioned microconcepts, namely, fatigue, prednisone, and patient visit, and to evaluate the proposed data model independently. RESULTS: The proposed data model has a 3-level hierarchical structure of health system concepts, including 3 macroconcepts, 28 mesoconcepts, and 85 microconcepts. Foundation and base macroconcepts comprise 33.99% (841/2474), clinical macroconcepts comprise 64.38% (1593/2474), and financial macroconcepts comprise 1.61% (40/2474) of the annotated corpus. The top 3 mesoconcepts among the 28 mesoconcepts are condition (505/2474, 20.41%), medication (424/2474, 17.13%), and practitioner (243/2474, 9.82%). Topic modeling identified hidden topics of patient secure messages related to fatigue, prednisone, and patient visit. A total of 89.2% (107/120) of the top-ranked topic keywords are actually the health concepts of the data model. CONCLUSIONS: Our data model and annotated corpus enable us to identify and understand important medical concepts in patient secure messages and prepare us for further natural language processing analysis of such free texts. The data model could be potentially used to automatically identify other types of patient narratives, such as those in various social media and patient forums. In the future, we plan to develop a machine learning and natural language processing solution to enable automatic triaging solutions to reduce the workload of clinicians and perform more granular content analysis to understand patients' needs and improve patient-centered care.


Assuntos
Registros Eletrônicos de Saúde , Medicare , Idoso , Criança , Humanos , Uso Significativo , Processamento de Linguagem Natural , Participação do Paciente , Estados Unidos
5.
J Biomed Inform ; 90: 103091, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30611893

RESUMO

"Psychiatric Treatment Adverse Reactions" (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients' expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25%. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients' narratives data, by linking the patients' expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Inibidores Seletivos de Recaptação de Serotonina/efeitos adversos , Inibidores da Recaptação de Serotonina e Norepinefrina/efeitos adversos , Algoritmos , Coleta de Dados , Mineração de Dados , Humanos , Farmacovigilância , Systematized Nomenclature of Medicine , Unified Medical Language System
6.
Stud Health Technol Inform ; 235: 196-200, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28423782

RESUMO

This paper introduces the annotation schema and annotation process for a corpus of clinical letters describing the disease course and treatment of oestrogen receptor positive breast cancer patients, after completion of primary surgery and radiotherapy treatment. Concepts related to therapy, clinical signs, and recurrence, as well as relationships linking these, are identified and annotated in 200 letters. This corpus will provide the basis for development of natural language processing tools for automatic extraction of key clinical factors from such letters.


Assuntos
Neoplasias da Mama/radioterapia , Neoplasias da Mama/cirurgia , Processamento de Linguagem Natural , Neoplasias da Mama/patologia , Feminino , Seguimentos , Humanos , Receptores de Estrogênio
7.
Beilstein J Nanotechnol ; 6: 1872-82, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26665057

RESUMO

To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.

8.
J Biomed Inform ; 55: 73-81, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25817970

RESUMO

CSIRO Adverse Drug Event Corpus (Cadec) is a new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs). The corpus is sourced from posts on social media, and contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules. Annotations contain mentions of concepts such as drugs, adverse effects, symptoms, and diseases linked to their corresponding concepts in controlled vocabularies, i.e., SNOMED Clinical Terms and MedDRA. The quality of the annotations is ensured by annotation guidelines, multi-stage annotations, measuring inter-annotator agreement, and final review of the annotations by a clinical terminologist. This corpus is useful for studies in the area of information extraction, or more generally text mining, from social media to detect possible adverse drug reactions from direct patient reports. The corpus is publicly available at https://data.csiro.au.(1).


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/organização & administração , Informação de Saúde ao Consumidor/organização & administração , Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/classificação , Mídias Sociais/organização & administração , Vocabulário Controlado , Conjuntos de Dados como Assunto/estatística & dados numéricos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Guias como Assunto , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Mídias Sociais/classificação , Terminologia como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA