Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

País de afiliação
Intervalo de ano de publicação
1.
Genet Mol Res ; 10(2): 1188-99, 2011 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-21732283

RESUMO

The current intense production of biological data, generated by sequencing techniques, has created an ever-growing volume of unanalyzed data. We reevaluated data produced by the guarana (Paullinia cupana) transcriptome sequencing project to identify cDNA clones with complete coding sequences (full-length clones) and complete sequences of genes of biotechnological interest, contributing to the knowledge of biological characteristics of this organism. We analyzed 15,490 ESTs of guarana in search of clones with complete coding regions. A total of 12,402 sequences were analyzed using BLAST, and 4697 full-length clones were identified, responsible for the production of 2297 different proteins. Eighty-four clones were identified as full-length for N-methyltransferase and 18 were sequenced in both directions to obtain the complete genome sequence, and confirm the search made in silico for full-length clones. Phylogenetic analyses were made with the complete genome sequences of three clones, which showed only 0.017% dissimilarity; these are phylogenetically close to the caffeine synthase of Theobroma cacao. The search for full-length clones allowed the identification of numerous clones that had the complete coding region, demonstrating this to be an efficient and useful tool in the process of biological data mining. The sequencing of the complete coding region of identified full-length clones corroborated the data from the in silico search, strengthening its efficiency and utility.


Assuntos
Etiquetas de Sequências Expressas , Paullinia/genética , Sequência de Aminoácidos , Sequência de Bases , Códon , Primers do DNA , DNA Complementar/genética , Genes de Plantas , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico
2.
Genet Mol Res ; 7(3): 910-24, 2008 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-18949709

RESUMO

A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Reprodutibilidade dos Testes
3.
Genet Mol Res ; 6(4): 937-45, 2007 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-18058714

RESUMO

Proteomics correspond to the identification and quantitative analysis of proteins expressed in different conditions or life stages of a cell or organism. Methods used in proteomics analysis include mainly chromatography, two-dimensional electrophoresis and mass spectrometry. Data generated in proteomics analysis vary significantly, and to identify a protein it is often necessary to perform a series of experiments, comparing its results to those found in proteomics databases. Existing proteomics databases are usually related to only one type of experiment or represent processed results, not raw data. Therefore, proteomics researchers frequently have to resort to several data repositories in order to be able to perform the identification. In this paper, we propose an integrated proteomics and transcriptomics database that stores raw and processed data, which are indexed allowing them to be retrieved together or individually. The proposed database, dubbed BNDb for Biomolecules Nucleus Database, is implemented using an MySQL server and is being used to store data from the parasite Schistosoma mansoni, the scorpion Tittyus serrulatus and the spider Phoneutria nigriventer. The database construction uses a relational approach and data indexes. The data model proposed uses groups of tables for each data subtype, which store details regarding the experimental procedure as well as raw data, analysis results and associated publications. BNDb also stores transcriptomics data publicly available which are associated with identifications performed on new samples. By using BNDb, we expect not only to contribute to proteomics research but also to provide a useful service for the scientific community.


Assuntos
Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Proteômica/métodos , Transcrição Gênica , Animais , Sistemas de Gerenciamento de Base de Dados , Interface Usuário-Computador
4.
Genet. mol. res. (Online) ; 6(4): 937-945, 2007. ilus, tab
Artigo em Inglês | LILACS | ID: lil-520055

RESUMO

Proteomics correspond to the identification and quantitative analysis of proteins expressed in different conditions or life stages of a cell or organism. Methods used in proteomics analysis include mainly chromatography, two-dimensional electrophoresis and mass spectrometry. Data generated in proteomics analysis vary significantly, and to identify a protein it is often necessary to perform a series of experiments, comparing its results to those found in proteomics databases. Existing proteomics databases are usually related to only one type of experiment or represent processed results, not raw data. Therefore, proteomics researchers frequently have to resort to several data repositories in order to be able to perform the identification. In this paper, we propose an integrated proteomics and transcriptomics database that stores raw and processed data, which are indexed allowing them to be retrieved together or individually. The proposed database, dubbed BNDb for Biomolecules Nucleus Database, is implemented using an MySQL server and is being used to store data from the parasite Schistosoma mansoni, the scorpion Tittyus serrulatus and the spider Phoneutria nigriventer. The database construction uses a relational approach and data indexes. The data model proposed uses groups of tables for each data subtype, which store details regarding the experimental procedure as well as raw data, analysis results and associated publications. BNDb also stores transcriptomics data publicly available which are associated with identifications performed on new samples. By using BNDb, we expect not only to contribute to proteomics research but also to provide a useful service for the scientific community.


Assuntos
Animais , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Proteômica/métodos , Transcrição Gênica , Sistemas de Gerenciamento de Base de Dados , Interface Usuário-Computador
5.
Genet. mol. res. (Online) ; 2(1): 169-177, Mar. 2003.
Artigo em Inglês | LILACS | ID: lil-417613

RESUMO

Microorganisms with large genomes are commonly the subjects of single-round partial sequencing of cDNA, generating expressed sequence tags (ESTs). Usually there is a great distance between gene discovery by EST projects and submission of amino acid sequences to public databases. We analyzed the relationship between available ESTs and protein sequences and used the sequences available in the secondary database, clusters of orthologous groups (COG), to investigate ESTs from eight microorganisms of medical and/or economic relevance, selecting for candidate ESTs that may be further pursued for protein characterization. The organisms chosen were Paracoccidioides brasiliensis, Dictyostelium discoideum, Fusarium graminearum, Plasmodium yoelii, Magnaporthe grisea, Emericella nidulans, Chlamydomonas reinhardtii and Eimeria tenella, which have more than 10,000 ESTs available in dbEST. A total of 77,114 protein sequences from COG were used, corresponding to 3,201 distinct genes. At least 212 of these were capable of identifying candidate ESTs for further studies (E. tenella). This number was extended to over 700 candidate ESTs (C. reinhardtii, F. graminearum). Remarkably, even the organism that presents the highest number of ESTs corresponding to known proteins, P. yoelii, showed a considerable number of candidate ESTs for protein characterization (477). For some organisms, such as P. brasiliensis, M. grisea and F. graminearum, bioinformatics has allowed for automatic annotation of up to about 20 of the ESTs that did not correspond to proteins already characterized in the organism. In conclusion, 4093 ESTs from these eight organisms that are homologous to COG genes were selected as candidates for protein characterization


Assuntos
Animais , Bases de Dados de Proteínas , Etiquetas de Sequências Expressas , Análise de Sequência de Proteína , Chlamydomonas reinhardtii/genética , Dictyostelium/genética , Eimeria tenella/genética , Emericella/genética , Fusarium/genética , Genoma , Magnaporthe/genética , Paracoccidioides/genética , Plasmodium yoelii/genética , Proteínas/genética , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA