High-throughput functional annotation of novel gene products using document clustering.

Renner, A; Aszódi, A

Renner, A; Aszódi, A.

Affiliation

Renner A; Structural Bioinformatics Laboratory, Novartis Forschungsinstitut GmbH, Vienna, Austria. Alexander.Renner@pharma.novartis.com

Pac Symp Biocomput ; : 54-68, 2000.

Article in En | MEDLINE | ID: mdl-10902156

ABSTRACT

Gene products differentially expressed in healthy vs. diseased tissues may be considered drug targets since the change in their expression level can be related to the cause and progression of the disease studied. A significant portion of the proteins produced by these genes will be unknown and consequently their function must be characterised. The experimental elucidation of biochemical function must be supported by computational tools which can help predicting the possible function of a given protein from its amino acid sequence. We have designed a high-throughput system which automatically analyses amino acid sequences deduced from differentially represented cDNA clones. The system attempts to assign a biological function to protein sequences by carrying out searches in sequence databanks and by locating functionally relevant motifs in the query sequences. The results delivered by the various prediction methods consist of the annotations of matching sequences and/or motifs, which are free-format texts written by humans and therefore may describe the same concept with synonymous words. It is desirable to present the results in such a way that the annotations describing the same biological function are grouped together. To this end we devised an algorithm that enables the hierarchical clustering of free-format documents based on their contents. The system is capable of detecting and flagging conflicting annotations, and will speed up the interpretation of the function prediction results.

Subject(s)

Proteins/genetics; Proteins/physiology; Sequence Alignment/methods; Algorithms; Cluster Analysis; Databases, Factual; Gene Expression; Humans; Medical Informatics Computing; Sequence Alignment/statistics & numerical data

Search on Google

Add to My VHL

XML

PubMed Links

Collection: 01-internacional Database: MEDLINE Main subject: Proteins / Sequence Alignment Limits: Humans Language: En Journal: Pac Symp Biocomput Journal subject: BIOTECNOLOGIA / INFORMATICA MEDICA Year: 2000 Document type: Article Affiliation country: Austria Country of publication: United States

Search on Google

Add to My VHL

XML

PubMed Links