Pesquisa | BVS IEC

Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features.

Gupta, Deepak; Loane, Russell; Gayen, Soumya; Demner-Fushman, Dina.

Knowl Based Syst ; 2782023 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-37780058

RESUMO

Nearest neighbor search, also known as NNS, is a technique used to locate the points in a high-dimensional space closest to a given query point. This technique has multiple applications in medicine, such as searching large medical imaging databases, disease classification, and diagnosis. However, when the number of points is significantly large, the brute-force approach for finding the nearest neighbor becomes computationally infeasible. Therefore, various approaches have been developed to make the search faster and more efficient to support the applications. With a focus on medical imaging, this paper proposes DenseLinkSearch (DLS), an effective and efficient algorithm that searches and retrieves the relevant images from heterogeneous sources of medical images. Towards this, given a medical database, the proposed algorithm builds an index that consists of pre-computed links of each point in the database. The search algorithm utilizes the index to efficiently traverse the database in search of the nearest neighbor. We also explore the role of medical image feature representation in content-based medical image retrieval tasks. We propose a Transformer-based feature representation technique that outperformed the existing pre-trained Transformer-based approaches on benchmark medical image retrieval datasets. We extensively tested the proposed NNS approach and compared the performance with state-of-the-art NNS approaches on benchmark datasets and our created medical image datasets. The proposed approach outperformed the existing approaches in terms of retrieving accurate neighbors and retrieval speed. In comparison to the existing approximate NNS approaches, our proposed DLS approach outperformed them in terms of lower average time per query and ≥ 99% R@10 on 11 out of 13 benchmark datasets. We also found that the proposed medical feature representation approach is better for representing medical images compared to the existing pre-trained image models. The proposed feature extraction strategy obtained an improvement of 9.37%, 7.0%, and 13.33% in terms of P@5, P@10, and P@20, respectively, in comparison to the best-performing pre-trained image model. The source code and datasets of our experiments are available at https://github.com/deepaknlp/DLS.

Effects of Porting Essie Tokenization and Normalization to Solr.

Gayen, Soumya; Gupta, Deepak; F Loane, Russell; Ide, Nicholas C; Demner-Fushman, Dina.

AMIA Annu Symp Proc ; 2023: 369-378, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38222430

RESUMO

Search for information is now an integral part of healthcare. Searches are enabled by search engines whose objective is to efficiently retrieve the relevant information for the user query. When it comes to retrieving biomedical text and literature, Essie search engine developed at the National Library of Medicine (NLM) performs exceptionally well. However, Essie is a software system developed for NLM that has ceased development and support. On the other hand, Solr is a popular opensource enterprise search engine used by many of the world's largest internet sites, offering continuous developments and improvements along with the state-of-the-art features. In this paper, we present our approach to porting the key features of Essie and developing custom components to be used in Solr. We demonstrate the effectiveness of the added components on three benchmark biomedical datasets. The custom components may aid the community in improving search methods for biomedical text retrieval.

Assuntos

Armazenamento e Recuperação da Informação , Software , Estados Unidos , Humanos , Ferramenta de Busca , National Library of Medicine (U.S.) , Benchmarking , Internet

Essie: a concept-based search engine for structured biomedical text.

Ide, Nicholas C; Loane, Russell F; Demner-Fushman, Dina.

J Am Med Inform Assoc ; 14(3): 253-63, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17329729

RESUMO

This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie's design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie's performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain.

Assuntos

Indexação e Redação de Resumos , Algoritmos , Armazenamento e Recuperação da Informação/métodos , Interface Usuário-Computador , Genômica , National Library of Medicine (U.S.) , Software , Unified Medical Language System , Estados Unidos

Previously unidentified duplicate registrations of clinical trials: an exploratory analysis of registry data worldwide.

van Valkenhoef, Gert; Loane, Russell F; Zarin, Deborah A.

Syst Rev ; 5(1): 116, 2016 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-27422636

RESUMO

BACKGROUND: Trial registries were established to combat publication bias by creating a comprehensive and unambiguous record of initiated clinical trials. However, the proliferation of registries and registration policies means that a single trial may be registered multiple times (i.e., "duplicates"). Because unidentified duplicates threaten our ability to identify trials unambiguously, we investigate to what degree duplicates have been identified across registries globally. METHODS: We retrieved all records from the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) search portal and made a list of all records identified as duplicates by the ICTRP. To investigate how to discriminate duplicates from non-duplicates, we applied text-based similarity scoring to various registration fields of both ICTRP-identified duplicates and arbitrary pairs of trials. We then used the best similarity measure to identify the most similar pairs of records and manually assessed a random sample of pairs not identified as duplicates by the ICTRP to estimate the number of previously unidentified (or "hidden") duplicates. RESULTS: Two hundred eighty-five thousand unique records, or 271 thousand unique trials after accounting for known duplicates, were retrieved from the ICTRP portal in April 2015. We found that the title field best discriminated duplicates from non-duplicates. Out of 41 billion total pair-wise comparisons, we identified the 474,000 pairs of titles with the highest similarity scores (>0.5). After manually assessing a random sample of 434 pairs, we estimated that 45 % of all duplicate registrations currently go undetected and remain to be identified and confirmed as duplicates. Thus, the actual number of unique trials represented in this dataset is estimated to be approximately 258,000 (5 % less). CONCLUSIONS: The ICTRP portal does not currently enable the unambiguous identification of trials across registries. Further research is needed to identify and verify the duplicates that currently go undetected. Sponsors, registries, and the ICTRP should consider actions to ensure duplicate registrations are easily identifiable.

Assuntos

Pesquisa Biomédica , Bases de Dados Factuais/normas , Viés de Publicação , Sistema de Registros/normas , Humanos , Organização Mundial da Saúde

Strategies for supporting consumer health information seeking.

McCray, Alexa T; Ide, Nicholas C; Loane, Russell R; Tse, Tony.

Stud Health Technol Inform ; 107(Pt 2): 1152-6, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-15360993

RESUMO

Despite a growing number of available Web-based health information resources, consumers continue to face a variety of barriers as they attempt to access these resources. Developing a system that appropriately responds to user queries poses several challenges. Guided by an earlier study that analyzed a large number of queries submitted to ClinicalTrials.gov, we developed a variety of techniques to assist user information seeking. We tested the efficacy of these techniques by submitting the original user queries to our new search engine to determine if these techniques would result in better system performance. Overall, the number of query failures was reduced, but the largest improvement was found in the system's query suggestion capability. For a subset of query failures, the current system was able to cut the earlier failure rate almost in half, in most cases providing a suggestion rather than directly finding records. The techniques described here provide a new approach for responding to user queries. The techniques are tolerant of certain types of errors and provide feedback to assist users in reformulating their queries.

Assuntos

Educação em Saúde , Armazenamento e Recuperação da Informação , Terminologia como Assunto , Indexação e Redação de Resumos , Algoritmos , Neoplasias da Mama , Ensaios Clínicos como Assunto , Bases de Dados como Assunto , Humanos , Serviços de Informação , Interface Usuário-Computador

dTagger: a POS tagger.

Divita, Guy; Browne, Allen C; Loane, Russell.

AMIA Annu Symp Proc ; : 200-3, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-17238331

RESUMO

The Lexical Systems Group at the National Library of Medicine (NLM) has developed a Part-of-Speech (POS) tagger to be freely distributed with the SPECIALIST NLP Tools. dTagger is specifically designed for use with the SPECIALIST lexicon but it can be used with an arbitrary tag set. It is capable of single or multi-word chunking. It is trainable with previously annotated text and in development is a version that is tunable with untagged text. The tagger allows users to add local lexicon content. It can report likelihoods for each sentence tagged. New words seen while tagging (the unknowns) are handled by shape identification including heuristics based on suffix statistics gleaned during the training. The performance of the supervised training is noted to be 95% on a modified version of the MedPost hand annotated Medline abstracts. Eight percent of the terms within this corpus were multi-word entities.

Assuntos

Indexação e Redação de Resumos , Linguística , Processamento de Linguagem Natural , Vocabulário Controlado , Algoritmos , MEDLINE , Cadeias de Markov , Software

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA