RESUMO
A highly automated RT-PCR-based approach has been established to validate novel human gene predictions with no prior experimental evidence of mRNA splicing (ab initio predictions). Ab initio gene predictions were selected for high-throughput validation using predicted protein classification, sequence similarity to other genomes, colocalization with an MPSS tag, or microarray expression. Initial microarray prioritization followed by RT-PCR validation was the most efficient combination, resulting in approximately 35% of the ab initio predictions being validated by RT-PCR. Of the 7252 novel genes that were prioritized and processed, 796 constituted real transcripts. In addition, high-throughput RACE successfully extended the 5' and/or 3' ends of >60% of RT-PCR-validated genes. Reevaluation of these transcripts produced 574 novel transcripts using RefSeq as a reference. RT-PCR sequencing in combination with RACE on ab initio gene predictions could be used to define the transcriptome across all species.