Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 12: 361, 2011 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-21749696

RESUMO

BACKGROUND: Shine-Dalgarno (SD) signal has long been viewed as the dominant translation initiation signal in prokaryotes. Recently, leaderless genes, which lack 5'-untranslated regions (5'-UTR) on their mRNAs, have been shown abundant in archaea. However, current large-scale in silico analyses on initiation mechanisms in bacteria are mainly based on the SD-led initiation way, other than the leaderless one. The study of leaderless genes in bacteria remains open, which causes uncertain understanding of translation initiation mechanisms for prokaryotes. RESULTS: Here, we study signals in translation initiation regions of all genes over 953 bacterial and 72 archaeal genomes, then make an effort to construct an evolutionary scenario in view of leaderless genes in bacteria. With an algorithm designed to identify multi-signal in upstream regions of genes for a genome, we classify all genes into SD-led, TA-led and atypical genes according to the category of the most probable signal in their upstream sequences. Particularly, occurrence of TA-like signals about 10 bp upstream to translation initiation site (TIS) in bacteria most probably means leaderless genes. CONCLUSIONS: Our analysis reveals that leaderless genes are totally widespread, although not dominant, in a variety of bacteria. Especially for Actinobacteria and Deinococcus-Thermus, more than twenty percent of genes are leaderless. Analyzed in closely related bacterial genomes, our results imply that the change of translation initiation mechanisms, which happens between the genes deriving from a common ancestor, is linearly dependent on the phylogenetic relationship. Analysis on the macroevolution of leaderless genes further shows that the proportion of leaderless genes in bacteria has a decreasing trend in evolution.


Assuntos
Archaea/genética , Bactérias/genética , Evolução Molecular , Genes Bacterianos/genética , Genômica , Iniciação Traducional da Cadeia Peptídica/genética , Algoritmos , Sequência de Bases , Genes Arqueais/genética , Anotação de Sequência Molecular , Reprodutibilidade dos Testes
2.
Nature ; 466(7305): 503-7, 2010 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-20622853

RESUMO

X-linked mental retardation (XLMR) is a complex human disease that causes intellectual disability. Causal mutations have been found in approximately 90 X-linked genes; however, molecular and biological functions of many of these genetically defined XLMR genes remain unknown. PHF8 (PHD (plant homeo domain) finger protein 8) is a JmjC domain-containing protein and its mutations have been found in patients with XLMR and craniofacial deformities. Here we provide multiple lines of evidence establishing PHF8 as the first mono-methyl histone H4 lysine 20 (H4K20me1) demethylase, with additional activities towards histone H3K9me1 and me2. PHF8 is located around the transcription start sites (TSS) of approximately 7,000 RefSeq genes and in gene bodies and intergenic regions (non-TSS). PHF8 depletion resulted in upregulation of H4K20me1 and H3K9me1 at the TSS and H3K9me2 in the non-TSS sites, respectively, demonstrating differential substrate specificities at different target locations. PHF8 positively regulates gene expression, which is dependent on its H3K4me3-binding PHD and catalytic domains. Importantly, patient mutations significantly compromised PHF8 catalytic function. PHF8 regulates cell survival in the zebrafish brain and jaw development, thus providing a potentially relevant biological context for understanding the clinical symptoms associated with PHF8 patients. Lastly, genetic and molecular evidence supports a model whereby PHF8 regulates zebrafish neuronal cell survival and jaw development in part by directly regulating the expression of the homeodomain transcription factor MSX1/MSXB, which functions downstream of multiple signalling and developmental pathways. Our findings indicate that an imbalance of histone methylation dynamics has a critical role in XLMR.


Assuntos
Encéfalo/embriologia , Encéfalo/enzimologia , Cabeça/embriologia , Histona Desmetilases/metabolismo , Histonas/metabolismo , Fatores de Transcrição/metabolismo , Proteínas de Peixe-Zebra/metabolismo , Peixe-Zebra/embriologia , Animais , Biocatálise , Encéfalo/citologia , Domínio Catalítico , Ciclo Celular , Linhagem Celular Tumoral , Sobrevivência Celular , DNA Intergênico/genética , Regulação da Expressão Gênica , Histona Desmetilases/genética , Histonas/química , Proteínas de Homeodomínio/genética , Humanos , Arcada Osseodentária/citologia , Arcada Osseodentária/embriologia , Lisina/metabolismo , Deficiência Intelectual Ligada ao Cromossomo X/enzimologia , Deficiência Intelectual Ligada ao Cromossomo X/genética , Metilação , Neurônios/citologia , Neurônios/enzimologia , Regiões Promotoras Genéticas , Fatores de Transcrição/deficiência , Fatores de Transcrição/genética , Sítio de Iniciação de Transcrição , Peixe-Zebra/metabolismo , Proteínas de Peixe-Zebra/genética
3.
BMC Genomics ; 10: 552, 2009 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-19930606

RESUMO

BACKGROUND: As one of human pathogens, the genome of Uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was significant in pathogenetic bacterial genomics research. However, the current RefSeq annotation of this pathogen is now outdated to some degree, due to missing or misannotation of some essential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual efforts to provide a comprehensive understanding of virulence for the CFT073 genome. RESULTS: The reannotation excluded 608 coding sequences from the RefSeq annotation. Meanwhile, a total of 299 coding sequences were newly added, about one third of them are found in genomic island (GI) regions while more than one fifth of them are located in virulence related regions pathogenicity islands (PAIs). Furthermore, there are totally 341 genes were relocated with their translational initiation sites (TISs), which resulted in a high quality of gene start annotation. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. The number of miscellaneous genes (sRNAs) has been updated from 6 in RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, subsequent analysis were conducted by both general and case studies on new virulence factors or new virulence-associated genes that are crucial during the urinary tract infections (UTIs) process, including invasion, colonization, nutrition uptaking and population density control. Furthermore, miscellaneous RNAs collected in the reannotation are believed to contribute to the virulence of strain CFT073. The reannotation including the nucleotide data, the original RefSeq annotation, and all reannotated results is freely available via http://mech.ctb.pku.edu.cn/CFT073/. CONCLUSION: As a result, the reannotation presents a more comprehensive picture of mechanisms of uropathogenicity of UPEC strain CFT073. The new genes change the view of its uropathogenicity in many respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation thus functions as an important source by providing new information about genomic structure and organization, and gene function. Moreover, we expect that the detailed analysis will facilitate the studies for exploration of novel virulence mechanisms and help guide experimental design.


Assuntos
Escherichia coli/genética , Escherichia coli/patogenicidade , Genoma Bacteriano/genética , Genes Bacterianos , Ilhas Genômicas , Genômica , Humanos , RNA Bacteriano/genética , Virulência/genética , Fatores de Virulência/genética
4.
Bioinformatics ; 25(14): 1843-5, 2009 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-19389734

RESUMO

SUMMARY: We proposed a tool named MetaTISA with an aim to improve TIS prediction of current gene-finders for metagenomes. The method employs a two-step strategy to predict translation initiation sites (TISs) by first clustering metagenomic fragments into phylogenetic groups and then predicting TISs independently for each group in an unsupervised manner. As evaluated on experimentally verified TISs, MetaTISA greatly improves the accuracies of TIS prediction of current gene-finders. AVAILABILITY: The C++ source code is freely available under the GNU GPL license via http://mech.ctb.pku.edu.cn/MetaTISA/.


Assuntos
Genes , Genômica/métodos , Iniciação Traducional da Cadeia Peptídica , Software , Filogenia , Análise de Sequência de DNA/métodos
5.
Bioinformatics ; 25(1): 123-5, 2009 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-19015130

RESUMO

UNLABELLED: We report a new and simple method, TriTISA, for accurate prediction of translation initiation site (TIS) of microbial genomes. TriTISA classifies all candidate TISs into three categories based on evolutionary properties, and characterizes them in terms of Markov models. Then, it employs a Bayesian methodology for the selection of true TIS with a non-supervised, iterative procedure. Assessment on experimentally verified TIS data shows that TriTISA is overall better than all other methods of the state-of-the-art for microbial genome TIS prediction. In particular, TriTISA is shown to have a robust accuracy independent of the quality of initial annotation. AVAILABILITY: The C++ source code is freely available under the GNU GPL license via http://mech.ctb.pku.edu.cn/protisa/TriTISA.


Assuntos
Biologia Computacional/métodos , Genoma Bacteriano/genética , Biossíntese de Proteínas/genética , Software , Sequência de Bases
6.
BMC Bioinformatics ; 9: 160, 2008 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-18366730

RESUMO

BACKGROUND: Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. RESULTS: Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment of the reliability of TIS annotations for any prokaryotic genome. The method consists of modeling a positional weight matrix (PWM) of aligned sequences around predicted TISs in terms of a linear combination of three elementary PWMs, one for true TIS and the two others for false TISs. The three elementary PWMs are obtained using a reference set with highly reliable TIS predictions. A generalized least square estimator determines the weighting of the true TIS in the observed PWM, from which the accuracy of the prediction is derived. The validity of the method and the extent of the limitation of the assumptions are explicitly addressed by testing on experimentally verified TISs with variable accuracy of the reference sets. The method is applied to estimate the accuracy of TIS annotations that are provided on public databases such as RefSeq and ProTISA and by programs such as EasyGene, GeneMarkS, Glimmer 3 and TiCo. It is shown that RefSeq's TIS prediction is significantly less accurate than two recent predictors, Tico and ProTISA. With convincing proofs, we show two general preferential biases in the RefSeq annotation, i.e. over-annotating the longest open reading frame (LORF) and under-annotating ATG start codon. Finally, we have established a new TIS database, SupTISA, based on the best prediction of all the predictors; SupTISA has achieved an average accuracy of 92% over all 532 complete genomes. CONCLUSION: Large-scale computational evaluation of TIS annotation has been achieved. A new TIS database much better than RefSeq has been constructed, and it provides a valuable resource for further TIS studies.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Códon de Iniciação/genética , Bases de Dados Genéticas , Células Procarióticas/fisiologia , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
7.
Nucleic Acids Res ; 36(Database issue): D114-9, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17942412

RESUMO

Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous genes. Moreover, by combining the predictions from our recently developed TIS post-processor, ProTISA provides a refined annotation for the public database RefSeq. Furthermore, the database annotates the potential regulatory signals associated with translation initiation at the TIS upstream region. As of July 2007, ProTISA includes 440 microbial genomes with more than 390 000 confirmed TISs. The database is available at http://mech.ctb.pku.edu.cn/protisa.


Assuntos
Códon de Iniciação , Bases de Dados de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Iniciação Traducional da Cadeia Peptídica , Sequências Reguladoras de Ácido Nucleico , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Genômica , Internet , Interface Usuário-Computador
8.
BMC Bioinformatics ; 8: 97, 2007 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-17367537

RESUMO

BACKGROUND: Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. RESULTS: This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. CONCLUSION: Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genoma Arqueal/genética , Genoma Bacteriano/genética , Fases de Leitura Aberta/genética , Fatores de Iniciação em Procariotos/genética , Proteoma/genética , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Software
9.
Bioinformatics ; 20(18): 3308-17, 2004 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-15247104

RESUMO

MOTIVATION: At present the computational gene identification methods in microbial genomes have a high prediction accuracy of verified translation termination site (3' end), but a much lower accuracy of the translation initiation site (TIS, 5' end). The latter is important to the analysis and the understanding of the putative protein of a gene and the regulatory machinery of the translation. Improving the accuracy of prediction of TIS is one of the remaining open problems. RESULTS: In this paper, we develop a four-component statistical model to describe the TIS of prokaryotic genes. The model incorporates several features with biological meanings, including the correlation between translation termination site and TIS of genes, the sequence content around the start codon; the sequence content of the consensus signal related to ribosomal binding sites (RBSs), and the correlation between TIS and the upstream consensus signal. An entirely non-supervised training system is constructed, which takes as input a set of annotated coding open reading frames (ORFs) by any gene finder, and gives as output a set of organism-specific parameters (without any prior knowledge or empirical constants and formulas). The novel algorithm is tested on a set of reliable datasets of genes from Escherichia coli and Bacillus subtillis. MED-Start may correctly predict 95.4% of the start sites of 195 experimentally confirmed E.coli genes, 96.6% of 58 reliable B.subtillis genes. Moreover, the test results indicate that the algorithm gives higher accuracy for more reliable datasets, and is robust to the variation of gene length. MED-Start may be used as a postprocessor for a gene finder. After processing by our program, the improvement of gene start prediction of gene finder system is remarkable, e.g. the accuracy of TIS predicted by MED 1.0 increases from 61.7 to 91.5% for 854 E.coli verified genes, while that by GLIMMER 2.02 increases from 63.2 to 92.0% for the same dataset. These results show that our algorithm is one of the most accurate methods to identify TIS of prokaryotic genomes. AVAILABILITY: The program MED-Start can be accessed through the website of CTB at Peking University: http://ctb.pku.edu.cn/main/SheGroup/MED_Start.htm.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genoma Bacteriano , Modelos Genéticos , Modelos Estatísticos , Biossíntese de Proteínas/genética , Sítio de Iniciação de Transcrição , Sequência de Bases , Códon de Iniciação/genética , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA