RESUMO
BACKGROUND: Secondary structure interactions within introns have been shown to be essential for efficient splicing of several yeast genes. The nature of these base-pairing interactions and their effect on splicing efficiency were most extensively studied in ribosomal protein gene RPS17B (previously known as RP51B). It was determined that complementary pairing between two sequence segments located downstream of the 5' splice site and upstream of the branchpoint sequence promotes efficient splicing of the RPS17B pre-mRNA, presumably by shortening the branchpoint distance. However, no attempts were made to compute a shortened, 'structural' branchpoint distance and thus the functional relationship between this distance and the splicing efficiency remains unknown. RESULTS: In this paper we use computational RNA secondary structure prediction to analyze the secondary structure of the RPS17B intron. We show that it is necessary to consider suboptimal structure predictions and to compute the structural branchpoint distances in order to explain previously published splicing efficiency results. Our study reveals that there is a tight correlation between this distance and splicing efficiency levels of intron mutants described in the literature. We experimentally test this correlation on additional RPS17B mutants and intron mutants within two other yeast genes. CONCLUSION: The proposed model of secondary structure requirements for efficient splicing is the first attempt to specify the functional relationship between pre-mRNA secondary structure and splicing. Our findings provide further insights into the role of pre-mRNA secondary structure in gene splicing in yeast and also offer basis for improvement of computational methods for splice site identification and gene-finding.
Assuntos
Íntrons , Precursores de RNA/genética , Splicing de RNA , RNA Mensageiro/genética , Saccharomyces cerevisiae/genética , Algoritmos , Pareamento de Bases , Biologia Computacional , Genes Fúngicos , Genoma Fúngico , Mutação , Conformação de Ácido Nucleico , RNA Fúngico/genética , Proteínas Ribossômicas/genética , Proteínas de Saccharomyces cerevisiae/genéticaRESUMO
MOTIVATION: Despite constant improvements in prediction accuracy, gene-finding programs are still unable to provide automatic gene discovery with desired correctness. The current programs can identify up to 75% of exons correctly and less than 50% of predicted gene structures correspond to actual genes. New approaches to computational gene-finding are clearly needed. RESULTS: In this paper we have explored the benefits of combining predictions from already existing gene prediction programs. We have introduced three novel methods for combining predictions from programs Genscan and HMMgene. The methods primarily aim to improve exon level accuracy of gene-finding by identifying more probable exon boundaries and by eliminating false positive exon predictions. This approach results in improved accuracy at both the nucleotide and exon level, especially the latter, where the average improvement on the newly assembled dataset is 7.9% compared to the best result obtained by Genscan and HMMgene. When tested on a long genomic multi-gene sequence, our method that maintains reading frame consistency improved nucleotide level specificity by 21.0% and exon level specificity by 32.5% compared to the best result obtained by either of the two programs individually. AVAILABILITY: The scripts implementing our methods are available from http://www.cs.ubc.ca/labs/beta/genefinding/
Assuntos
Algoritmos , Sistemas de Gerenciamento de Base de Dados , Éxons/genética , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência/métodos , Animais , Metodologias Computacionais , DNA/genética , Bases de Dados Genéticas , Drosophilidae/genética , Reações Falso-Positivas , Humanos , Camundongos , Ratos , Fases de Leitura/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Alinhamento de Sequência/métodosRESUMO
UNLABELLED: We recently demonstrated that combining the output from Genscan and HMMgene can provide increased accuracy of gene predictions. We have created a robust software system that runs algorithms previously described on DNA sequences and provides a public web interface to the system for use by the biological community worldwide. The GeneComber system performs ab initio gene prediction by first taking a user inputted DNA sequence and running Genscan and HMMgene. The outputs of Genscan and HMMgene are then integrated using the EUI, GI and EUI_frame algorithms. All results are then stored into a relational database management system (RDBMS) and can then be retrieved through a web interface. The web interface provides a unified view of the GeneComber predictions by graphically overlaying outputs from Genscan, HMMgene, EUI, GI and EUI_frame. Outputs can also be retrieved in general feature format (GFF) or FASTA format. The software is written in the Perl programming language and is both dependent on and interoperable with the Bioperl toolkit. It includes high-level application programming interfaces (APIs) to run Genscan, HMMgene and a database API to insert prediction results into an RDBMS. The APIs are assembled into the genecomber script which is executed by the web interface or can be run directly from the Unix command line. The web interface is written in PHP and is structured so as to be easily modified for viewing data from any database that stores gene structures. AVAILABILITY: The GeneComber public web interface and supplementary information is located at http://bioinformatics.ubc.ca/genecomber The source code is released under the GNU General Public License and is available at ftp://ftp.bioinformatics.ubc.ca/pub/genecomber/software.