Pesquisa | Secretaria de Estado da Saúde

Comparison of intron-containing and intron-lacking human genes elucidates putative exonic splicing enhancers.

Fedorov, A; Saxonov, S; Fedorova, L; Daizadeh, I.

Nucleic Acids Res ; 29(7): 1464-9, 2001 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-11266547

RESUMO

Of the rules used by the splicing machinery to precisely determine intron-exon boundaries only a fraction is known. Recent evidence suggests that specific short sequences within exons help in defining these boundaries. Such sequences are known as exonic splicing enhancers (ESE). A possible bioinformatical approach to studying ESE sequences is to compare genes that harbor introns with genes that do not. For this purpose two non-redundant samples of 719 intron-containing and 63 intron-lacking human genes were created. We performed a statistical analysis on these datasets of intron-containing and intron-lacking human coding sequences and found a statistically significant difference (P = 0.01) between these samples in terms of 5-6mer oligonucleotide distributions. The difference is not created by a few strong signals present in the majority of exons, but rather by the accumulation of multiple weak signals through small variations in codon frequencies, codon biases and context-dependent codon biases between the samples. A list of putative novel human splicing regulation sequences has been elucidated by our analysis.

Assuntos

Processamento Alternativo , Elementos Facilitadores Genéticos/genética , Éxons/genética , Genes/genética , Íntrons/genética , Composição de Bases , Bases de Dados Factuais , Humanos , Fases de Leitura Aberta , Proteínas/genética , Estatística como Assunto

EID: the Exon-Intron Database-an exhaustive database of protein-coding intron-containing genes.

Saxonov, S; Daizadeh, I; Fedorov, A; Gilbert, W.

Nucleic Acids Res ; 28(1): 185-90, 2000 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-10592221

RESUMO

To aid studies of molecular evolution and to assist in gene prediction research, we have constructed an Exon-Intron Database (EID) in FASTA format. Currently, the database is derived from GenBank release 112, and it contains 51 289 protein-coding genes (287 209 exons) that harbor introns, along with extensive descriptions of each gene and its DNA and protein sequences, as well as splice motif information. There is 17% redundancy inherited from GenBank-a purge at the 99% identity level reduced the database to 42 460 genes (243 589 exons). We have created subdatabases of genes whose intron positions have been experimentally determined. One such database, constructed by comparing genomic and mRNA sequences, contains 11 242 genes (62 474 exons). A larger database of 22 196 genes (105 595 exons) was constructed by selecting on keywords to eliminate computer-predicted genes. By examining the two nucleotides adjacent to the intron boundary, we infer that there is a 2% rate of errors or other deviations from the standard GTellipsisAG motif in nuclear genes. This criterion can be used to eliminate 4921 genes from the overall database. Various tools are provided to enable generation of user-specific subsets of the EID. The EID distribution can be obtained from http://mcb.harvard.edu/gilbert/EID

Assuntos

Bases de Dados Factuais , Éxons , Íntrons , Proteínas/genética , Sequência de Bases , Humanos , Dados de Sequência Molecular

Intron distribution difference for 276 ancient and 131 modern genes suggests the existence of ancient introns.

Fedorov, A; Cao, X; Saxonov, S; de Souza, S J; Roy, S W; Gilbert, W.

Proc Natl Acad Sci U S A ; 98(23): 13177-82, 2001 Nov 06.

Artigo em Inglês | MEDLINE | ID: mdl-11687643

RESUMO

o introns delineate elements of protein tertiary structure? This issue is crucial to the debate about the role and origin of introns. We present an analysis of the full set of proteins with known three-dimensional structures that have homologs with intron positions recorded in GenBank. A computer program was generated that maps on a reference sequence the positions of all introns in homologous genes. We have applied this program to a set of 665 nonredundant protein sequences with defined three-dimensional structures in the Protein Data Bank (PDB), which yielded 8,217 introns in 407 proteins. For the subset of proteins corresponding to ancient conserved regions (ACR), we find that there is a correlation of phase-zero introns with the boundary regions of modules and no correlation for the phase-one and phase-two positions. However, for a subset of proteins without prokaryotic counterparts (131 non-ACR proteins), a set of presumably modern proteins (or proteins that have diverged extremely far from any ancestral form), we do not find any correlation of phase-zero intron positions with three-dimensional structure. Furthermore, we find an anticorrelation of phase-one intron positions with module boundaries: they actually have a preference for the interior of modules. This finding is explicable as a preference for phase-one introns to lie in glycines, between G/G sequences, the preference for glycines being anticorrelated with the three-dimensional modules. We interpret this anticorrelation as a sign that a number of phase-one introns, and hence many modern introns, have been inserted into G/G "protosplice" sequences.

Assuntos

Evolução Molecular , Íntrons

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa