Search | VHL Search Portal

Best Match: New relevance search for PubMed.

Fiorini, Nicolas; Canese, Kathi; Starchenko, Grisha; Kireev, Evgeny; Kim, Won; Miller, Vadim; Osipov, Maxim; Kholodov, Michael; Ismagilov, Rafis; Mohan, Sunil; Ostell, James; Lu, Zhiyong.

PLoS Biol ; 16(8): e2005343, 2018 08.

Article in English | MEDLINE | ID: mdl-30153250

ABSTRACT

PubMed is a free search engine for biomedical literature accessed by millions of users from around the world each day. With the rapid growth of biomedical literature-about two articles are added every minute on average-finding and retrieving the most relevant papers for a given query is increasingly challenging. We present Best Match, a new relevance search algorithm for PubMed that leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order. The Best Match algorithm is trained with past user searches with dozens of relevance-ranking signals (factors), the most important being the past usage of an article, publication date, relevance score, and type of article. This new algorithm demonstrates state-of-the-art retrieval performance in benchmarking experiments as well as an improved user experience in real-world testing (over 20% increase in user click-through rate). Since its deployment in June 2017, we have observed a significant increase (60%) in PubMed searches with relevance sort order: it now assists millions of PubMed searches each week. In this work, we hope to increase the awareness and transparency of this new relevance sort option for PubMed users, enabling them to retrieve information more effectively.

Subject(s)

Data Mining/methods , Information Storage and Retrieval/methods , Algorithms , Humans , MEDLINE , Machine Learning , PubMed , Publications , Search Engine

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Agarwala, Richa; Bolton, Evan E; Brister, J Rodney; Canese, Kathi; Clark, Karen; Connor, Ryan; Fiorini, Nicolas; Funk, Kathryn; Hefferon, Timothy; Holmes, J Bradley; Kim, Sunghwan; Kimchi, Avi; Kitts, Paul A; Lathrop, Stacy; Lu, Zhiyong; Madden, Thomas L; Marchler-Bauer, Aron; Phan, Lon; Schneider, Valerie A; Schoch, Conrad L; Pruitt, Kim D; Ostell, James.

Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30395293

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Subject(s)

Biotechnology/organization & administration , Databases, Genetic , Animals , Biotechnology/methods , Databases, Chemical , Humans , Software , United States/epidemiology , Web Browser

Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents.

Kim, Sun; Fiorini, Nicolas; Wilbur, W John; Lu, Zhiyong.

J Biomed Inform ; 75: 122-127, 2017 Nov.

Article in English | MEDLINE | ID: mdl-28986328

ABSTRACT

The main approach of traditional information retrieval (IR) is to examine how many words from a query appear in a document. A drawback of this approach, however, is that it may fail to detect relevant documents where no or only few words from a query are found. The semantic analysis methods such as LSA (latent semantic analysis) and LDA (latent Dirichlet allocation) have been proposed to address the issue, but their performance is not superior compared to common IR approaches. Here we present a query-document similarity measure motivated by the Word Mover's Distance. Unlike other similarity measures, the proposed method relies on neural word embeddings to compute the distance between words. This process helps identify related words when no direct matches are found between a query and a document. Our method is efficient and straightforward to implement. The experimental results on TREC Genomics data show that our approach outperforms the BM25 ranking function by an average of 12% in mean average precision. Furthermore, for a real-world dataset collected from the PubMed® search logs, we combine the semantic measure with BM25 using a learning to rank method, which leads to improved ranking scores by up to 25%. This experiment demonstrates that the proposed approach and BM25 nicely complement each other and together produce superior performance.

Subject(s)

Information Storage and Retrieval , PubMed , Semantics

USI: a fast and accurate approach for conceptual document annotation.

Fiorini, Nicolas; Ranwez, Sylvie; Montmain, Jacky; Ranwez, Vincent.

BMC Bioinformatics ; 16: 83, 2015 Mar 14.

Article in English | MEDLINE | ID: mdl-25887746

ABSTRACT

BACKGROUND: Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document. RESULTS: In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity. CONCLUSIONS: By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion - instead of one score per concept.

Subject(s)

Abstracting and Indexing , Algorithms , Information Storage and Retrieval , Natural Language Processing , Semantics , User-Computer Interface , Humans , Medical Subject Headings , Pattern Recognition, Automated , Vocabulary, Controlled

CompPhy: a web-based collaborative platform for comparing phylogenies.

Fiorini, Nicolas; Lefort, Vincent; Chevenet, François; Berry, Vincent; Chifolleau, Anne-Muriel Arigon.

BMC Evol Biol ; 14: 253, 2014 Dec 14.

Article in English | MEDLINE | ID: mdl-25496383

ABSTRACT

BACKGROUND: Collaborative tools are of great help in conducting projects involving distant workers. Recent web technologies have helped to build such tools for jointly editing office documents and scientific data, yet none are available for handling phylogenies. Though a large number of studies and projects in evolutionary biology and systematics involve collaborations between scientists of different institutes, current tree comparison visualization software and websites are directed toward single-user access. Moreover, tree comparison functionalities are dispersed between different software that mainly focus on high level single tree visualization but to the detriment of basic tree comparison features. RESULTS: The web platform presented here, named CompPhy, intends to fill this gap by allowing collaborative work on phylogenies and by gathering simple advanced tools dedicated to tree comparison. It offers functionalities for tree edition, tree comparison, supertree inference and data management in a collaborative environment. The latter aspect is a specific feature of the platform, allowing people located in different places to work together at the same time on a common project. CompPhy thus proposes shared tree visualization, both synchronous and asynchronous tree manipulation, data exchange/storage, as well as facilities to keep track of the progress of analyses in working sessions. Specific advanced comparison tools are also available, such as consensus and supertree inference, or automated branch swaps of compared trees. As projects can be readily created and shared, CompPhy is also a tool that can be used easily to interact with students in a educational setting, either in the classroom or for assignments. CONCLUSIONS: CompPhy is the first web platform devoted to the comparison of phylogenetic trees allowing real-time distant collaboration on a phylogenetic/phylogenomic project. This application can be accessed freely with a recent browser at the following page of the ATGC bioinformatics platform: http://www.atgc-montpellier.fr/compphy/ .

Subject(s)

Biomedical Research , Cooperative Behavior , Phylogeny , Software , Biology , Computational Biology/methods , Humans , Internet , User-Computer Interface

How user intelligence is improving PubMed.

Fiorini, Nicolas; Leaman, Robert; Lipman, David J; Lu, Zhiyong.

Nat Biotechnol ; 2018 Oct 01.

Article in English | MEDLINE | ID: mdl-30272675

ABSTRACT

PubMed is a widely used search engine for biomedical literature. It is developed and maintained by the US National Library of Medicine/National Center for Biotechnology Information and is visited daily by millions of users around the world. For decades, PubMed has used advanced artificial intelligence technologies that extract patterns of collective user activity, such as machine learning and natural language processing, to inform the algorithmic changes that ultimately improve a user's search experience. Although these efforts have led to objective improvements in search quality, the technical underpinnings remain largely invisible and go largely unnoticed by most users. Here we describe how these 'under-the-hood' techniques work within PubMed and report how their effectiveness and usage is assessed in real-world scenarios. In doing so, we hope to increase the transparency of the PubMed system and enable users to make more effective use of the search engine. We also identify open challenges and new opportunities for computational researchers to explore the potential of future improvements.

PubMed Labs: an experimental system for improving biomedical literature search.

Fiorini, Nicolas; Canese, Kathi; Bryzgunov, Rostyslav; Radetska, Ievgeniia; Gindulyte, Asta; Latterner, Martin; Miller, Vadim; Osipov, Maxim; Kholodov, Michael; Starchenko, Grisha; Kireev, Evgeny; Lu, Zhiyong.

Database (Oxford) ; 20182018 01 01.

Article in English | MEDLINE | ID: mdl-30239682

ABSTRACT

PubMed is a freely accessible system for searching the biomedical literature, with ~ 2.5 million users worldwide on an average workday. In order to better meet our users' needs in an era of information overload, we have recently developed PubMed Labs (www.pubmed.gov/labs), an experimental system for users to test new search features/tools (e.g. Best Match) and provide feedback, which enables us to make more informed decisions about potential changes to improve the search quality and overall usability of PubMed. In addition, PubMed Labs features a mobile-first and responsive layout that offers better support for accessing PubMed from increasingly popular mobiles and small-screen devices. In this paper, we detail PubMed Labs, its purpose, new features and best practices. We also encourage users to share their experience with us; based on which we are continuously improving PubMed Labs with more advanced features and better user experience.

Subject(s)

PubMed , Publications , Search Engine , Statistics as Topic

Towards PubMed 2.0.

Fiorini, Nicolas; Lipman, David J; Lu, Zhiyong.

Elife ; 62017 10 30.

Article in English | MEDLINE | ID: mdl-29083299

ABSTRACT

Staff from the National Center for Biotechnology Information in the US describe recent improvements to the PubMed search engine and outline plans for the future, including a new experimental site called PubMed Labs.

Subject(s)

Data Mining/methods , PubMed/trends , Search Engine/methods , Software

PubRunner: A light-weight framework for updating text mining results.

Anekalla, Kishore R; Courneya, J P; Fiorini, Nicolas; Lever, Jake; Muchow, Michael; Busby, Ben.

F1000Res ; 6: 612, 2017.

Article in English | MEDLINE | ID: mdl-29152221

ABSTRACT

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP, and publicizing the location of these results on the public PubRunner website. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL