High-throughput functional annotation of novel gene products using document clustering.
Pac Symp Biocomput
; : 54-68, 2000.
Article
in En
| MEDLINE
| ID: mdl-10902156
Gene products differentially expressed in healthy vs. diseased tissues may be considered drug targets since the change in their expression level can be related to the cause and progression of the disease studied. A significant portion of the proteins produced by these genes will be unknown and consequently their function must be characterised. The experimental elucidation of biochemical function must be supported by computational tools which can help predicting the possible function of a given protein from its amino acid sequence. We have designed a high-throughput system which automatically analyses amino acid sequences deduced from differentially represented cDNA clones. The system attempts to assign a biological function to protein sequences by carrying out searches in sequence databanks and by locating functionally relevant motifs in the query sequences. The results delivered by the various prediction methods consist of the annotations of matching sequences and/or motifs, which are free-format texts written by humans and therefore may describe the same concept with synonymous words. It is desirable to present the results in such a way that the annotations describing the same biological function are grouped together. To this end we devised an algorithm that enables the hierarchical clustering of free-format documents based on their contents. The system is capable of detecting and flagging conflicting annotations, and will speed up the interpretation of the function prediction results.
Search on Google
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Proteins
/
Sequence Alignment
Limits:
Humans
Language:
En
Journal:
Pac Symp Biocomput
Journal subject:
BIOTECNOLOGIA
/
INFORMATICA MEDICA
Year:
2000
Document type:
Article
Affiliation country:
Austria
Country of publication:
United States