Search | VHL Regional Portal

Condensing biomedical journal texts through paragraph ranking.

Chiang, Jung-Hsien; Liu, Heng-Hui; Huang, Yi-Ting.

Bioinformatics ; 27(8): 1143-9, 2011 Apr 15.

Article in English | MEDLINE | ID: mdl-21330292

ABSTRACT

MOTIVATION: The growing availability of full-text scientific articles raises the important issue of how to most efficiently digest full-text content. Although article titles and abstracts provide accurate and concise information on an article's contents, their brevity inevitably entails the loss of detail. Full-text articles provide those details, but require more time to read. The primary goal of this study is to combine the advantages of concise abstracts and detail-rich full-texts to ease the burden of reading. RESULTS: We retrieved abstract-related paragraphs from full-text articles through shared keywords between the abstract and paragraphs from the main text. Significant paragraphs were then recommended by applying a proposed paragraph ranking approach. Finally, the user was provided with a condensed text consisting of these significant paragraphs, allowing the user to save time from perusing the whole article. We compared the performance of the proposed approach with a keyword counting approach and a PageRank-like approach. Evaluation was conducted in two aspects: the importance of each retrieved paragraph and the information coverage of a set of retrieved paragraphs. In both evaluations, the proposed approach outperformed the other approaches. CONTACT: jchiang@mail.ncku.edu.tw.

Subject(s)

Abstracting and Indexing , Data Mining/methods , Periodicals as Topic , Algorithms

Overview of BioCreative II gene normalization.

Morgan, Alexander A; Lu, Zhiyong; Wang, Xinglong; Cohen, Aaron M; Fluck, Juliane; Ruch, Patrick; Divoli, Anna; Fundel, Katrin; Leaman, Robert; Hakenberg, Jörg; Sun, Chengjie; Liu, Heng-hui; Torres, Rafael; Krauthammer, Michael; Lau, William W; Liu, Hongfang; Hsu, Chun-Nan; Schuemie, Martijn; Cohen, K Bretonnel; Hirschman, Lynette.

Genome Biol ; 9 Suppl 2: S3, 2008.

Article in English | MEDLINE | ID: mdl-18834494

ABSTRACT

BACKGROUND: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. RESULTS: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. CONCLUSION: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.

Subject(s)

Computational Biology/methods , Genes , Societies, Scientific , Abstracting and Indexing , Animals , Databases, Genetic , Humans , MEDLINE , PubMed , Reproducibility of Results

GeneLibrarian: an effective gene-information summarization and visualization system.

Chiang, Jung-Hsien; Shin, Jyh-Wei; Liu, Heng-Hui; Chin, Chong-Liang.

BMC Bioinformatics ; 7: 392, 2006 Aug 29.

Article in English | MEDLINE | ID: mdl-16939640

ABSTRACT

BACKGROUND: Abundant information about gene products is stored in online searchable databases such as annotation or literature. To efficiently obtain and digest such information, there is a pressing need for automated information-summarization and functional-similarity clustering of genes. RESULTS: We have developed a novel method for semantic measurement of annotation and integrated it with a biomedical literature summarization system to establish a platform, GeneLibrarian, to provide users well-organized information about any specific group of genes (e.g. one cluster of genes from a microarray chip) they might be interested in. The GeneLibrarian generates a summarized viewgraph of candidate genes for a user based on his/her preference and delivers the desired background information effectively to the user. The summarization technique involves optimizing the text mining algorithm and Gene Ontology-based clustering method to enable the discovery of gene relations. CONCLUSION: GeneLibrarian is a Java-based web application that automates the process of retrieving critical information from the literature and expanding the number of potential genes for further analysis. This study concentrates on providing well organized information to users and we believe that will be useful in their researches. GeneLibrarian is available on http://gen.csie.ncku.edu.tw/GeneLibrarian/.

Subject(s)

Databases, Genetic , Information Storage and Retrieval/methods , Software , Algorithms , Cluster Analysis , Gene Expression/genetics , Internet , Reproducibility of Results , Semantics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL