Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 771
Filtrar
1.
Syst Rev ; 11(1): 229, 2022 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-36284336

RESUMO

BACKGROUND: Cluster randomized trials (CRTs) are becoming an increasingly important design. However, authors of CRTs do not always adhere to requirements to explicitly identify the design as cluster randomized in titles and abstracts, making retrieval from bibliographic databases difficult. Machine learning algorithms may improve their identification and retrieval. Therefore, we aimed to develop machine learning algorithms that accurately determine whether a bibliographic citation is a CRT report. METHODS: We trained, internally validated, and externally validated two convolutional neural networks and one support vector machine (SVM) algorithm to predict whether a citation is a CRT report or not. We exclusively used the information in an article citation, including the title, abstract, keywords, and subject headings. The algorithms' output was a probability from 0 to 1. We assessed algorithm performance using the area under the receiver operating characteristic (AUC) curves. Each algorithm's performance was evaluated individually and together as an ensemble. We randomly selected 5000 from 87,633 citations to train and internally validate our algorithms. Of the 5000 selected citations, 589 (12%) were confirmed CRT reports. We then externally validated our algorithms on an independent set of 1916 randomized trial citations, with 665 (35%) confirmed CRT reports. RESULTS: In internal validation, the ensemble algorithm discriminated best for identifying CRT reports with an AUC of 98.6% (95% confidence interval: 97.8%, 99.4%), sensitivity of 97.7% (94.3%, 100%), and specificity of 85.0% (81.8%, 88.1%). In external validation, the ensemble algorithm had an AUC of 97.8% (97.0%, 98.5%), sensitivity of 97.6% (96.4%, 98.6%), and specificity of 78.2% (75.9%, 80.4%)). All three individual algorithms performed well, but less so than the ensemble. CONCLUSIONS: We successfully developed high-performance algorithms that identified whether a citation was a CRT report with high sensitivity and moderately high specificity. We provide open-source software to facilitate the use of our algorithms in practice.


Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , MEDLINE , Ensaios Clínicos Controlados Aleatórios como Assunto , Descritores , Máquina de Vetores de Suporte
2.
Health Info Libr J ; 39(3): 225-243, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34409740

RESUMO

BACKGROUND: Small databases, such as Health Management Information Consortium (HMIC) and Social Policy and Practice (SPP), can add value to systematic searches. Search strategies designed for large databases may not be appropriate in small sources. A different approach to translating strategies could ensure that small databases are searched efficiently. OBJECTIVES: To establish the contribution HMIC and SPP made to public health guidelines (PHGs); and to recommend an efficient method of translating search strategies. METHODS: Eight PHGs were analysed to establish how many included publications were retrieved from HMIC and SPP. Six options for translating strategies from MEDLINE, using variations of free text and subject terms, were compared. RESULTS: Health Management Information Consortium contributed 15 and SPP eight of the 483 publications cited in the PHGs. The free-text only search was the one option to miss an included publication. The heading word (with truncation) option was more precise than applying subject headings. DISCUSSION: There is a risk of missing relevant publications in free-text only searches and it is preferable to include subject terms efficiently. CONCLUSION: The heading word (with truncation) option did not miss the evidence included in the PHGs and was the most efficient method for translating MEDLINE to HMIC and SPP.


Assuntos
Armazenamento e Recuperação da Informação , Descritores , Dacarbazina/análogos & derivados , Bases de Dados Bibliográficas , Humanos , MEDLINE , Política Pública
3.
J Biomed Semantics ; 11(1): 10, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32873340

RESUMO

BACKGROUND: Up to 35% of nurses' working time is spent on care documentation. We describe the evaluation of a system aimed at assisting nurses in documenting patient care and potentially reducing the documentation workload. Our goal is to enable nurses to write or dictate nursing notes in a narrative manner without having to manually structure their text under subject headings. In the current care classification standard used in the targeted hospital, there are more than 500 subject headings to choose from, making it challenging and time consuming for nurses to use. METHODS: The task of the presented system is to automatically group sentences into paragraphs and assign subject headings. For classification the system relies on a neural network-based text classification model. The nursing notes are initially classified on sentence level. Subsequently coherent paragraphs are constructed from related sentences. RESULTS: Based on a manual evaluation conducted by a group of three domain experts, we find that in about 69% of the paragraphs formed by the system the topics of the sentences are coherent and the assigned paragraph headings correctly describe the topics. We also show that the use of a paragraph merging step reduces the number of paragraphs produced by 23% without affecting the performance of the system. CONCLUSIONS: The study shows that the presented system produces a coherent and logical structure for freely written nursing narratives and has the potential to reduce the time and effort nurses are currently spending on documenting care in hospitals.


Assuntos
Documentação , Enfermeiras e Enfermeiros , Automação , Hospitais , Idioma , Descritores
4.
J Clin Epidemiol ; 126: 116-121, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32615208

RESUMO

OBJECTIVES: Embase is a biomedical and pharmacological bibliographic database of published literature, produced by Elsevier. In 2011, Embase introduced the Emtree term "diagnostic test accuracy study," after discussion with the diagnostic test accuracy (DTA) community of Cochrane. The aim of this study is to investigate the performance of this Emtree term when used to retrieve diagnostic accuracy studies. STUDY DESIGN AND SETTING: We first piloted a random selection of 1,000 titles from Embase and then repeated the process with 1,223 studies specifically limited to humans. Two researchers independently screened those for eligibility. From titles that were indicated as being relevant or potentially relevant by at least one assessor, the full texts were retrieved and screened. A third researcher retrieved the Emtree terms for each title and checked whether "diagnostic test accuracy study" was one of the attached Emtree terms. The results of both exercises were then cross-classified, and sensitivity and specificity of the Emtree term were estimated. RESULTS: Our pilot set consisted of 1,000 studies, of which 20 (2.0%) were studies from which DTA data could be extracted. Thirteen studies had the label DTA study, of which five were indeed DTA studies. The final set consisted of 1,223 studies, of which 33 (2.7%) were DTA studies. Twenty studies were labeled as DTA study, of which fourteen indeed were DTA studies. This resulted in a sensitivity of 42.4% (95% CI: 25.5% to 60.8%) and a specificity of 99.5% (95% CI: 98.9% to 99.8%). CONCLUSION: Although we planned to include a more focused set of studies in our second attempt, the percentage of DTA studies was similar in both attempts. The DTA label failed to retrieve most of the DTA studies and 30% of the studies labeled as being DTA study were in fact not DTA studies. The Emtree term DTA study does not meet the requirements to be useful for retrieving DTA studies accurately.


Assuntos
Testes Diagnósticos de Rotina/normas , Armazenamento e Recuperação da Informação/métodos , Ferramenta de Busca/normas , Confiabilidade dos Dados , Bases de Dados Bibliográficas , Medicina Baseada em Evidências , Humanos , Armazenamento e Recuperação da Informação/normas , Valor Preditivo dos Testes , Sensibilidade e Especificidade , Descritores
5.
J Am Med Inform Assoc ; 27(1): 81-88, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31605490

RESUMO

OBJECTIVE: This study focuses on the task of automatically assigning standardized (topical) subject headings to free-text sentences in clinical nursing notes. The underlying motivation is to support nurses when they document patient care by developing a computer system that can assist in incorporating suitable subject headings that reflect the documented topics. Central in this study is performance evaluation of several text classification methods to assess the feasibility of developing such a system. MATERIALS AND METHODS: Seven text classification methods are evaluated using a corpus of approximately 0.5 million nursing notes (5.5 million sentences) with 676 unique headings extracted from a Finnish university hospital. Several of these methods are based on artificial neural networks. Evaluation is first done in an automatic manner for all methods, then a manual error analysis is done on a sample. RESULTS: We find that a method based on a bidirectional long short-term memory network performs best with an average recall of 0.5435 when allowed to suggest 1 subject heading per sentence and 0.8954 when allowed to suggest 10 subject headings per sentence. However, other methods achieve comparable results. The manual analysis indicates that the predictions are better than what the automatic evaluation suggests. CONCLUSIONS: The results indicate that several of the tested methods perform well in suggesting the most appropriate subject headings on sentence level. Thus, we find it feasible to develop a text classification system that can support the use of standardized terminologies and save nurses time and effort on care documentation.


Assuntos
Indexação e Redação de Resumos/métodos , Processamento de Linguagem Natural , Registros de Enfermagem , Terminologia Padronizada em Enfermagem , Descritores , Registros Eletrônicos de Saúde , Finlândia
6.
Int J Med Inform ; 129: 100-106, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31445243

RESUMO

BACKGROUND: This work deals with Natural Language Processing applied to the clinical domain. Specifically, the work deals with a Medical Entity Recognition (MER) on Electronic Health Records (EHRs). Developing a MER system entailed heavy data preprocessing and feature engineering until Deep Neural Networks (DNNs) emerged. However, the quality of the word representations in terms of embedded layers is still an important issue for the inference of the DNNs. GOAL: The main goal of this work is to develop a robust MER system adapting general-purpose DNNs to cope with the high lexical variability shown in EHRs. In addition, given that EHRs tend to be scarce when there are out-domain corpora available, the aim is to assess the impact of the word representations on the performance of the MER as we move to other domains. In this line, exhaustive experimentation varying information generation methods and network parameters are crucial. METHODS: We adapted a general purpose sequential tagger based on Bidirectional Long-Short Term Memory cells and Conditional Random Fields (CRFs) in order to make it tolerant to high lexical variability and a limited amount of corpora. To this end, we incorporated part of speech (POS) and semantic-tag embedding layers to the word representations. RESULTS: One of the strengths of this work is the exhaustive evaluation of dense word representations obtained varying not only the domain and genre but also the learning algorithms and their parameter settings. With the proposed method, we attained an error reduction of 1.71 (5.7%) compared to the state-of-the-art even that no preprocessing or feature engineering was used. CONCLUSIONS: Our results indicate that dense representations built taking word order into account leverage the entity extraction system. Besides, we found that using a medical corpus (not necessarily EHRs) to infer the representations improves the performance, even if it does not correspond to the same genre.


Assuntos
Processamento de Linguagem Natural , Algoritmos , Registros Eletrônicos de Saúde , Redes Neurais de Computação , Semântica , Descritores
7.
AMIA Annu Symp Proc ; 2019: 1129-1138, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308910

RESUMO

With advances in Machine Learning (ML), neural network-based methods, such as Convolutional/Recurrent Neural Networks, have been proposed to assist terminology curators in the development and maintenance of terminologies. Bidirectional Encoder Representations from Transformers (BERT), a new language representation model, obtains state-of-the-art results on a wide array of general English NLP tasks. We explore BERT's applicability to medical terminology-related tasks. Utilizing the "next sentence prediction" capability of BERT, we show that the Fine-tuning strategy of Transfer Learning (TL) from the BERTBASE model can address a challenging problem in automatic terminology enrichment - insertion of new concepts. Adding a pre-training strategy enhances the results. We apply our strategies to the two largest hierarchies of SNOMED CT, with one release as training data and the following release as test data. The performance of the combined two proposed TL models achieves an average F1 score of 0.85 and 0.86 for the two hierarchies, respectively.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Descritores , Systematized Nomenclature of Medicine , Redes Neurais de Computação
8.
Syst Rev ; 7(1): 200, 2018 11 20.
Artigo em Inglês | MEDLINE | ID: mdl-30458825

RESUMO

BACKGROUND: Researchers performing systematic reviews (SRs) must carefully consider the relevance of thousands of citations retrieved from bibliographic database searches, the majority of which will be excluded later on close inspection. Well-developed bibliographic searches are generally created with thesaurus or index terms in combination with keywords found in the title and/or abstract fields of citation records. Records in the bibliographic database Embase contain many more thesaurus terms than MEDLINE. Here, we aim to examine how limiting searches to major thesaurus terms (in MEDLINE called focus terms) in Embase and MEDLINE as well as limiting to words in the title and abstract fields of those databases affects the overall recall of SR searches. METHODS: To examine the impact of using search techniques aimed at higher precision, we analyzed previously completed SRs and focused our original searches to major thesaurus terms or terms in title and/or abstract only in Embase.com or in Embase.com and MEDLINE (Ovid) combined. We examined the total number of search results in both Embase and MEDLINE and checked whether included references were retrieved by these more focused approaches. RESULTS: For 73 SRs, we limited Embase searches to major terms only while keeping the search in MEDLINE and other databases such as Web of Science as they were. The overall search yield (or total number of search results) was reduced by 8%. Six reviews (9%) lost more than 5% of the relevant references. Limiting Embase and MEDLINE to major thesaurus terms, the number of references was 13% lower. For 15% of the reviews, the loss of relevant references was more than 5%. Searching Embase for title and abstract caused a loss of more than 5% in 16 reviews (22%), while limiting Embase and MEDLINE that way this happened in 24 reviews (33%). CONCLUSIONS: Of the four search options, two options substantially reduced the overall search yield. However, this also resulted in a greater chance of losing relevant references, even though many references were still found in other databases such as Web of Science.


Assuntos
Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , MEDLINE , Descritores , Humanos , Estudos Prospectivos , Ferramenta de Busca , Revisões Sistemáticas como Assunto
9.
Psychiatr Danub ; 30(3): 317-322, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30267524

RESUMO

BACKGROUND: Suicide is a complex action of suicidal methods and peripheral factors with seemingly threatening components representing actual cause for the suicidal actions. It is especially those, apparently unimportant factors that represent a crucial milestone in the network of all the other, personal, cultural, genetic and biochemical factors, forming the method of action consequently deciding between life and death. SUBJECTS AND METHODS: Based on the Register of Suicides in the Republic of Slovenia kept by the University Psychiatric Clinic Ljubljana, we used a combination of attributes varying within a variable and between variables. Due to limited application of standard statistical methods and analyses in such cases, we used the Machine learning method, Multimethod hybrid approach, which allows combining of different approaches to machine learning (decision trees, genetic algorithms and supplementary vectors). The research included 56712 persons attempting suicide and 21913 persons committing suicide. We chose a form of a suicide action with both possible results: attempted suicide and suicide. RESULTS: Based on the analysis of machine learning, we defined attributes of the action regarding their lethal effect: attempted suicide and suicide commitment. The suicide register kept for the last 40 years shows hanging as the most commonly used suicidal method, used by men with the purpose of causing suicidal death rather than a suicidal attempt. On the other hand, use of medicaments is linked to the suicidal attempt and mostly used by females. CONCLUSIONS: All methods of suicidal actions cannot predict suicidal death, thus we examined different methods of suicide to most accurately predict the link between the method and its effect in terms of suicide attempt or suicide. The Machine learning method confirmed the attributes of suicide methods in connection with their different outcomes. This analytical method is useful in processing large databases since it enables one variable's intensity to affect other variables in terms of result and meaning. The identification of the most decisive risk factors for suicidal behaviour can serve as basis for planning an effective prevention strategies, timely identification and adequate proffessional help to the high risk persons.


Assuntos
Descritores , Tentativa de Suicídio/psicologia , Tentativa de Suicídio/estatística & dados numéricos , Suicídio/psicologia , Suicídio/estatística & dados numéricos , Adolescente , Adulto , Causas de Morte , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Mortalidade , Análise Multivariada , Sistema de Registros/estatística & dados numéricos , Fatores de Risco , Fatores Sexuais , Eslovênia , Ideação Suicida , Tentativa de Suicídio/prevenção & controle , Adulto Jovem , Prevenção ao Suicídio
10.
Res Synth Methods ; 9(4): 587-601, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30103261

RESUMO

OBJECTIVE: Identify the most performant automated text classification method (eg, algorithm) for differentiating empirical studies from nonempirical works in order to facilitate systematic mixed studies reviews. METHODS: The algorithms were trained and validated with 8050 database records, which had previously been manually categorized as empirical or nonempirical. A Boolean mixed filter developed for filtering MEDLINE records (title, abstract, keywords, and full texts) was used as a baseline. The set of features (eg, characteristics from the data) included observable terms and concepts extracted from a metathesaurus. The efficiency of the approaches was measured using sensitivity, precision, specificity, and accuracy. RESULTS: The decision trees algorithm demonstrated the highest performance, surpassing the accuracy of the Boolean mixed filter by 30%. The use of full texts did not result in significant gains compared with title, abstract, keywords, and records. Results also showed that mixing concepts with observable terms can improve the classification. SIGNIFICANCE: Screening of records, identified in bibliographic databases, for relevant studies to include in systematic reviews can be accelerated with automated text classification.


Assuntos
Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , Projetos de Pesquisa , Algoritmos , Teorema de Bayes , Mineração de Dados/métodos , Humanos , Armazenamento e Recuperação da Informação/normas , Modelos Estatísticos , Reconhecimento Automatizado de Padrão , Padrões de Referência , Ferramenta de Busca , Sensibilidade e Especificidade , Descritores , Máquina de Vetores de Suporte , Revisões Sistemáticas como Assunto
11.
Sao Paulo Med J ; 136(2): 103-108, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29340504

RESUMO

BACKGROUND: A high-quality electronic search is essential for ensuring accuracy and comprehensiveness among the records retrieved when conducting systematic reviews. Therefore, we aimed to identify the most efficient method for searching in both MEDLINE (through PubMed) and EMBASE, covering search terms with variant spellings, direct and indirect orders, and associations with MeSH and EMTREE terms (or lack thereof). DESIGN AND SETTING: Experimental study. UNESP, Brazil. METHODS: We selected and analyzed 37 search strategies that had specifically been developed for the field of anesthesiology. These search strategies were adapted in order to cover all potentially relevant search terms, with regard to variant spellings and direct and indirect orders, in the most efficient manner. RESULTS: When the strategies included variant spellings and direct and indirect orders, these adapted versions of the search strategies selected retrieved the same number of search results in MEDLINE (mean of 61.3%) and a higher number in EMBASE (mean of 63.9%) in the sample analyzed. The numbers of results retrieved through the searches analyzed here were not identical with and without associated use of MeSH and EMTREE terms. However, association of these terms from both controlled vocabularies retrieved a larger number of records than did the use of either one of them. CONCLUSIONS: In view of these results, we recommend that the search terms used should include both preferred and non-preferred terms (i.e. variant spellings and direct/indirect order of the same term) and associated MeSH and EMTREE terms, in order to develop highly-sensitive search strategies for systematic reviews.


Assuntos
Anestesiologia , Armazenamento e Recuperação da Informação/métodos , Literatura de Revisão como Assunto , Ferramenta de Busca/métodos , Descritores , Humanos , MEDLINE
12.
Res Synth Methods ; 9(4): 602-614, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29314757

RESUMO

Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non-RCTs than widely used traditional database search filters at all sensitivity levels; our best-performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984-0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high-sensitivity strategies) and (2) rapid reviews and clinical question answering (high-precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open-source software to enable these approaches to be used in practice.


Assuntos
Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , Aprendizado de Máquina , Ensaios Clínicos Controlados Aleatórios como Assunto , Literatura de Revisão como Assunto , Algoritmos , Medicina Baseada em Evidências , Humanos , Armazenamento e Recuperação da Informação/normas , Curva ROC , Sistema de Registros , Reprodutibilidade dos Testes , Ferramenta de Busca , Sensibilidade e Especificidade , Descritores , Máquina de Vetores de Suporte
13.
AMIA Annu Symp Proc ; 2018: 1157-1166, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815158

RESUMO

SNOMED CT is a large, complex and widely-used terminology. Auditing is part of the life cycle of terminologies. A review of terminologies' content can identify two error categories: commission errors, such as an incorrect parent or attribute relationship, indicating errors in a concept's modeling, and omission errors, such as missing a parent or attribute relationship, representing incomplete modeling of a concept. According to our experience, terminology curators are mostly interested in commission errors. In recent years, a long-term remodeling project has addressed modeling issues in SNOMED CT's Infectious disease and Congenital disease subhierarchies. In this longitudinal study, we investigated a posteriori the efficacy of complex concepts, called overlapping concepts, to identify commission errors during intensive auditing periods and during maintenance periods over several releases. The algorithmic implication is that when auditing resources are scarce, a methodology of auditing first, or only, the overlapping concepts will obtain a higher auditing yield.


Assuntos
Descritores , Systematized Nomenclature of Medicine , Classificação , Estudos Longitudinais , Registros Médicos , Software
14.
J Wound Ostomy Continence Nurs ; 44(3): 277-282, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28328646

RESUMO

PURPOSE: The objectives of this study were to characterize the odors of used incontinence products by descriptive analysis and to define attributes to be used in the analysis. A further objective was to investigate to what extent the odor profiles of used incontinence products differed from each other and, if possible, to group these profiles into classes. SUBJECTS AND SETTING: Used incontinence products were collected from 14 residents with urinary incontinence living in geriatric nursing homes in the Gothenburg area, Sweden. METHODS: Pieces were cut from the wet area of used incontinence products. They were placed in glass bottles and kept frozen until odor analysis was completed. A trained panel consisting of 8 judges experienced in this area of investigation defined terminology for odor attributes. The intensities of these attributes in the used products were determined by descriptive odor analysis. Data were analyzed both by analysis of variance (ANOVA) followed by the Tukey post hoc test and by principal component analysis and cluster analysis. RESULTS: An odor wheel, with 10 descriptive attributes, was developed. The total odor intensity, and the intensities of the attributes, varied considerably between different, used incontinence products. The typical odors varied from "sweetish" to "urinal," "ammonia," and "smoked." Cluster analysis showed that the used products, based on the quantitative odor data, could be divided into 5 odor classes with different profiles. CONCLUSIONS: The used products varied considerably in odor character and intensity. Findings suggest that odors in used absorptive products are caused by different types of compounds that may vary in concentration.


Assuntos
Tampões Absorventes para a Incontinência Urinária , Odorantes/análise , Percepção , Descritores , Idoso , Idoso de 80 Anos ou mais , Análise por Conglomerados , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Casas de Saúde/organização & administração , Suécia , Incontinência Urinária/enfermagem
15.
J Am Med Inform Assoc ; 24(4): 788-798, 2017 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-28339775

RESUMO

OBJECTIVE: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. MATERIAL AND METHODS: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. RESULTS: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. CONCLUSIONS: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.


Assuntos
Mineração de Dados/métodos , Descritores , Systematized Nomenclature of Medicine , Garantia da Qualidade dos Cuidados de Saúde
17.
AMIA Annu Symp Proc ; 2017: 364-373, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854100

RESUMO

Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structurallexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus.


Assuntos
National Cancer Institute (U.S.) , Vocabulário Controlado , Mineração de Dados , Controle de Qualidade , Descritores , Estados Unidos
19.
J Med Internet Res ; 18(1): e1, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26728964

RESUMO

BACKGROUND: Conventional Web-based search engines may be unusable by individuals with low health literacy for finding health-related information, thus precluding their use by this population. OBJECTIVE: We describe a conversational search engine interface designed to allow individuals with low health and computer literacy identify and learn about clinical trials on the Internet. METHODS: A randomized trial involving 89 participants compared the conversational search engine interface (n=43) to the existing conventional keyword- and facet-based search engine interface (n=46) for the National Cancer Institute Clinical Trials database. Each participant performed 2 tasks: finding a clinical trial for themselves and finding a trial that met prespecified criteria. RESULTS: Results indicated that all participants were more satisfied with the conversational interface based on 7-point self-reported satisfaction ratings (task 1: mean 4.9, SD 1.8 vs mean 3.2, SD 1.8, P<.001; task 2: mean 4.8, SD 1.9 vs mean 3.2, SD 1.7, P<.001) compared to the conventional Web form-based interface. All participants also rated the trials they found as better meeting their search criteria, based on 7-point self-reported scales (task 1: mean 3.7, SD 1.6 vs mean 2.7, SD 1.8, P=.01; task 2: mean 4.8, SD 1.7 vs mean 3.4, SD 1.9, P<.01). Participants with low health literacy failed to find any trials that satisfied the prespecified criteria for task 2 using the conventional search engine interface, whereas 36% (5/14) were successful at this task using the conversational interface (P=.05). CONCLUSIONS: Conversational agents can be used to improve accessibility to Web-based searches in general and clinical trials in particular, and can help decrease recruitment bias against disadvantaged populations.


Assuntos
Ensaios Clínicos como Assunto , Bases de Dados como Assunto , Letramento em Saúde , Armazenamento e Recuperação da Informação/métodos , Ferramenta de Busca , Descritores , Idoso , Alfabetização Digital , Feminino , Humanos , Internet , Masculino , Pessoa de Meia-Idade , Interface Usuário-Computador
20.
AMIA Annu Symp Proc ; 2016: 618-627, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28269858

RESUMO

The National Cancer Institute Thesaurus (NCIt) is a reference terminology used to support clinical, translational and basic research as well as administrative activities. As medical knowledge evolves, concepts that might be missing from a particular needed subdomain are regularly added to the NCIt. However, terminology development is known to be labor-intensive and error-prone. Therefore, cost-effective semi-automated methods for identifying potentially missing concepts would be useful to terminology curators. Previously, we have developed a structural method leveraging the native term mappings of the Unified Medical Language System to identify potential concepts in several of its source vocabularies to enrich the SNOMED CT. In this paper, we tested an analogous method for NCIt. Concepts from eight UMLS source terminologies were identified as possibilities to enrich NCIt's conceptual content.


Assuntos
National Cancer Institute (U.S.) , Descritores , Unified Medical Language System , Vocabulário Controlado , Humanos , Neoplasias , Systematized Nomenclature of Medicine , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...