RESUMO
Computational models in biomedicine rely on biological and clinical assumptions. The selection of these assumptions contributes substantially to modeling success or failure. Assumptions used by experts at the cutting edge of research, however, are rarely explicitly described in scientific publications. One can directly collect and assess some of these assumptions through interviews and surveys. Here we investigate diversity in expert views about a complex biological phenomenon, the process of cancer metastasis. We harvested individual viewpoints from 28 experts in clinical and molecular aspects of cancer metastasis and summarized them computationally. While experts predominantly agreed on the definition of individual steps involved in metastasis, no two expert scenarios for metastasis were identical. We computed the probability that any two experts would disagree on k or fewer metastatic stages and found that any two randomly selected experts are likely to disagree about several assumptions. Considering the probability that two or more of these experts review an article or a proposal about metastatic cascades, the probability that they will disagree with elements of a proposed model approaches 1. This diversity of conceptions has clear consequences for advance and deadlock in the field. We suggest that strong, incompatible views are common in biomedicine but largely invisible to biomedical experts themselves. We built a formal Markov model of metastasis to encapsulate expert convergence and divergence regarding the entire sequence of metastatic stages. This model revealed stages of greatest disagreement, including the points at which cancer enters and leaves the bloodstream. The model provides a formal probabilistic hypothesis against which researchers can evaluate data on the process of metastasis. This would enable subsequent improvement of the model through Bayesian probabilistic update. Practically, we propose that model assumptions and hunches be harvested systematically and made available for modelers and scientists.
Assuntos
Modelos Biológicos , Metástase Neoplásica , Biologia Computacional , Progressão da Doença , Prova Pericial , Humanos , Cadeias de MarkovRESUMO
A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.
Assuntos
Armazenamento e Recuperação da InformaçãoRESUMO
UNLABELLED: The BioText Search Engine is a freely available Web-based application that provides biologists with new ways to access the scientific literature. One novel feature is the ability to search and browse article figures and their captions. A grid view juxtaposes many different figures associated with the same keywords, providing new insight into the literature. An abstract/title search and list view shows at a glance many of the figures associated with each article. The interface is carefully designed according to usability principles and techniques. The search engine is a work in progress, and more functionality will be added over time. AVAILABILITY: http://biosearch.berkeley.edu.
Assuntos
Indexação e Redação de Resumos/métodos , Inteligência Artificial , Biologia/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem NaturalRESUMO
BACKGROUND: Standard care for the rehabilitation of knee conditions involves exercise programs and information provision. Current methods of rehabilitation delivery struggle to keep up with large volumes of patients and the length of treatment required to maximize the recovery. Therefore, the development of novel interventions to support self-management is strongly recommended. Such interventions need to include information provision, goal setting, monitoring, feedback, and support groups, but the most effective methods of their delivery are poorly understood. The Internet provides a medium for intervention delivery with considerable potential for meeting these needs. OBJECTIVE: The objective of this study was to demonstrate the feasibility of a Web-based app and to conduct a preliminary review of its practicability as part of a complex medical intervention in the rehabilitation of knee disorders. This paper describes the development, implementation, and usability of such an app. METHODS: An interdisciplinary team of health care professionals and researchers, computer scientists, and app developers developed the TRAK app suite. The key functionality of the app includes information provision, a three-step exercise program based on a standard care for the rehabilitation of knee conditions, self-monitoring with visual feedback, and a virtual support group. There were two types of stakeholders (patients and physiotherapists) that were recruited for the usability study. The usability questionnaire was used to collect both qualitative and quantitative information on computer and Internet usage, task completion, and subjective user preferences. RESULTS: A total of 16 patients and 15 physiotherapists participated in the usability study. Based on the System Usability Scale, the TRAK app has higher perceived usability than 70% of systems. Both patients and physiotherapists agreed that the given Web-based approach would facilitate communication, provide information, help recall information, improve understanding, enable exercise progression, and support self-management in general. The Web app was found to be easy to use and user satisfaction was very high. The TRAK app suite can be accessed at http://apps.facebook.com/kneetrak/. CONCLUSIONS: The usability study suggests that a Web-based intervention is feasible and acceptable in supporting self-management of knee conditions.
RESUMO
Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances.
RESUMO
When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine, a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface.
Assuntos
Gráficos por Computador/tendências , Bases de Dados Bibliográficas/tendências , Armazenamento e Recuperação da Informação/tendências , Ferramenta de Busca , Indexação e Redação de Resumos , PubMed , Publicações , Interface Usuário-ComputadorRESUMO
This paper reports on the results of two questionnaires asking biologists about the incorporation of text-extracted entity information, specifically gene and protein names, into bioscience literature search user interfaces. Among the findings are that study participants want to see gene/protein metadata in combination with organism information; that a significant proportion would like to see gene names grouped by type (synonym, homolog, etc.), and that most participants want to see information that the system is confident about immediately, and see less certain information after taking additional action. These results inform future interface designs.
Assuntos
Biologia Computacional , Genes , Armazenamento e Recuperação da Informação , Proteínas , Algoritmos , Inquéritos e Questionários , Terminologia como Assunto , Interface Usuário-ComputadorRESUMO
BACKGROUND: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. RESULTS: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. CONCLUSION: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.
Assuntos
Biologia Computacional/métodos , Genes , Sociedades Científicas , Indexação e Redação de Resumos , Animais , Bases de Dados Genéticas , Humanos , MEDLINE , PubMed , Reprodutibilidade dos TestesRESUMO
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
Assuntos
Biologia Computacional/métodos , Genes , Sociedades Científicas , Congressos como AssuntoRESUMO
SUMMARY: BioIE is a rule-based system that extracts informative sentences relating to protein families, their structures, functions and diseases from the biomedical literaturE. Based on manual definition of templates and rules, it aims at precise sentence extraction rather than wide recall. After uploading source text or retrieving abstracts from MEDLINE, users can extract sentences based on predefined or user-defined template categories. BioIE also provides a brief insight into the syntactic and semantic context of the source-text by looking at word, N-gram and MeSH-term distributions. Important Applications of BioIE are in, for example, annotation of microarray data and of protein databases. AVAILABILITY: http://umber.sbs.man.ac.uk/dbbrowser/bioie/