Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
BioTech (Basel) ; 11(3)2022 Jul 30.
Article in English | MEDLINE | ID: mdl-35997339

ABSTRACT

DNA sequencers output a large set of very long biological data strings that we should persist in databases rather than basic text file systems. Many different data models and database management systems (DBMS) may deal with both storage and efficiency issues regarding genomic datasets. Specifically, there is a need for handling strings with variable sizes while keeping their biological meaning. Relational database management systems (RDBMS) provide several data types that could be further explored for the genomics context. Besides, they enforce integrity, consistency, and enable good abstractions for more conventional data. We propose the relational text data type to represent and manipulate biological sequences and their derivatives. We present a logical schema for representing the core biological information, which may be inferred from a given biological conceptual data schema and the corresponding function manipulations. We implement and evaluate these stored functions into an actual RDBMS for both efficacy and efficiency. We show that it is possible to enforce basic and complex requirements for the genomic domain. We claim that the well-established relational text data type in RDBMS may appropriately handle the representation and persistency of biological sequences. We base our approach on the idea of domain-specific abstract data types that can store data with semantically defined functions while hiding those details from non-technical end-users.

2.
Evol Bioinform Online ; 14: 1176934318797351, 2018.
Article in English | MEDLINE | ID: mdl-30210232

ABSTRACT

ß-lactamases, the enzymes responsible for resistance to ß-lactam antibiotics, are widespread among prokaryotic genera. However, current ß-lactamase classification schemes do not represent their present diversity. Here, we propose a workflow to identify and classify ß-lactamases. Initially, a set of curated sequences was used as a model for the construction of profiles Hidden Markov Models (HMM), specific for each ß-lactamase class. An extensive, nonredundant set of ß-lactamase sequences was constructed from 7 different resistance proteins databases to test the methodology. The profiles HMM were improved for their specificity and sensitivity and then applied to fully assembled genomes. Five hierarchical classification levels are described, and a new class of ß-lactamases with fused domains is proposed. Our profiles HMM provide a better annotation of ß-lactamases, with classes and subclasses defined by objective criteria such as sequence similarity. This classification offers a solid base to the elaboration of studies on the diversity, dispersion, prevalence, and evolution of the different classes and subclasses of this critical enzymatic activity.

3.
BMC Genomics ; 11: 610, 2010 Oct 29.
Article in English | MEDLINE | ID: mdl-21034488

ABSTRACT

BACKGROUND: Trypanosoma cruzi is the etiological agent of Chagas' disease, an endemic infection that causes thousands of deaths every year in Latin America. Therapeutic options remain inefficient, demanding the search for new drugs and/or new molecular targets. Such efforts can focus on proteins that are specific to the parasite, but analogous enzymes and enzymes with a three-dimensional (3D) structure sufficiently different from the corresponding host proteins may represent equally interesting targets. In order to find these targets we used the workflows MHOLline and AnEnΠ obtaining 3D models from homologous, analogous and specific proteins of Trypanosoma cruzi versus Homo sapiens. RESULTS: We applied genome wide comparative modelling techniques to obtain 3D models for 3,286 predicted proteins of T. cruzi. In combination with comparative genome analysis to Homo sapiens, we were able to identify a subset of 397 enzyme sequences, of which 356 are homologous, 3 analogous and 38 specific to the parasite. CONCLUSIONS: In this work, we present a set of 397 enzyme models of T. cruzi that can constitute potential structure-based drug targets to be investigated for the development of new strategies to fight Chagas' disease. The strategies presented here support the concept of structural analysis in conjunction with protein functional analysis as an interesting computational methodology to detect potential targets for structure-based rational drug design. For example, 2,4-dienoyl-CoA reductase (EC 1.3.1.34) and triacylglycerol lipase (EC 3.1.1.3), classified as analogous proteins in relation to H. sapiens enzymes, were identified as new potential molecular targets.


Subject(s)
Antiparasitic Agents/therapeutic use , Chagas Disease/drug therapy , Models, Molecular , Protozoan Proteins/chemistry , Sequence Homology, Amino Acid , Structural Homology, Protein , Trypanosoma cruzi/metabolism , 3-Hydroxyacyl CoA Dehydrogenases/metabolism , Amino Acid Sequence , Antiparasitic Agents/pharmacology , Chagas Disease/parasitology , Databases, Protein , Humans , Molecular Sequence Data , Protozoan Proteins/classification , Protozoan Proteins/metabolism , Species Specificity , Trypanosoma cruzi/drug effects , Trypanosoma cruzi/enzymology
4.
Genomics Insights ; 3: 29-56, 2010.
Article in English | MEDLINE | ID: mdl-26217103

ABSTRACT

We report here on the characterization of a cDNA library from seeds of Jatropha curcas L. at three stages of fruit maturation before yellowing. We sequenced a total of 2200 clones and obtained a set of 931 non-redundant sequences (unigenes) after trimming and quality control, ie, 140 contigs and 791 singlets with PHRED quality ≥10. We found low levels of sequence redundancy and extensive metabolic coverage by homology comparison to GO. After comparison of 5841 non-redundant ESTs from a total of 13193 reads from GenBank with KEGG, we identified tags with nucleotide variations among J. curcas accessions for genes of fatty acid, terpene, alkaloid, quinone and hormone pathways of biosynthesis. More specifically, the expression level of four genes (palmitoyl-acyl carrier protein thioesterase, 3-ketoacyl-CoA thiolase B, lysophosphatidic acid acyltransferase and geranyl pyrophosphate synthase) measured by real-time PCR proved to be significantly different between leaves and fruits. Since the nucleotide polymorphism of these tags is associated to higher level of gene expression in fruits compared to leaves, we propose this approach to speed up the search for quantitative traits in selective breeding of J. curcas. We also discuss its potential utility for the selective breeding of economically important traits in J. curcas.

5.
Mutat Res ; 683(1-2): 43-9, 2010 Jan 05.
Article in English | MEDLINE | ID: mdl-19909761

ABSTRACT

N-Acetyltransferase 2 (NAT2) metabolizes a variety of xenobiotics that includes many drugs, chemicals and carcinogens. This enzyme is genetically variable in human populations and polymorphisms in the NAT2 gene have been associated with drug toxicity and efficacy as well as cancer susceptibility. Here, we have focused on the identification of NAT2 variants in Brazilian individuals from two different regions, Rio de Janeiro and Goiás, by direct sequencing, and on the characterization of new haplotypes after cloning and re-sequencing. Upon analysis of DNA samples from 404 individuals, six new SNPs (c.29T>C, c.152G>T, c.203G>A, c.228C>T, c.458C>T and c.600A>G) and seven new NAT2 alleles were identified with different frequencies in Rio de Janeiro and Goiás. All new SNPs were found as singletons (observed only once in 808 genes) and were confirmed by three independent technical replicates. Molecular modeling and structural analysis suggested that p.Gly51Val variant may have an important effect on substrate recognition by NAT2. We also observed that amino acid change p.Cys68Tyr would affect acetylating activity due to the resulting geometric restrictions and incompatibility of the functional group in the Tyr side chain with the admitted chemical mechanism for catalysis by NATs. Moreover, other variants, such like p.Thr153Ile, p.Thr193Met, p.Pro228Leu and p.Val280Met, may lead to the presence of hydrophobic residues on NAT2 surface involved in protein aggregation and/or targeted degradation. Finally, the new alleles NAT2*6H and NAT2*5N, which showed the highest frequency in the Brazilian populations considered in this study, may code for a slow activity. Functional studies are needed to clarify the mechanisms by which new SNPs interfere with acetylation.


Subject(s)
Arylamine N-Acetyltransferase/chemistry , Arylamine N-Acetyltransferase/genetics , Haplotypes/genetics , Models, Molecular , Polymorphism, Single Nucleotide/genetics , Tuberculosis, Pulmonary/genetics , Acetylation , Brazil , Case-Control Studies , Humans , Molecular Structure , Sequence Analysis , Tuberculosis, Pulmonary/enzymology
6.
BMC Bioinformatics ; 9: 544, 2008 Dec 17.
Article in English | MEDLINE | ID: mdl-19091081

ABSTRACT

BACKGROUND: Enzymes are responsible for the catalysis of the biochemical reactions in metabolic pathways. Analogous enzymes are able to catalyze the same reactions, but they present no significant sequence similarity at the primary level, and possibly different tertiary structures as well. They are thought to have arisen as the result of independent evolutionary events. A detailed study of analogous enzymes may reveal new catalytic mechanisms, add information about the origin and evolution of biochemical pathways and disclose potential targets for drug development. RESULTS: In this work, we have constructed and implemented a new approach, AnEnPi (the Analogous Enzyme Pipeline), using a combination of bioinformatics tools like BLAST, HMMer, and in-house scripts, to assist in the identification, annotation, comparison and study of analogous and homologous enzymes. The algorithm for the detection of analogy is based i) on the construction of groups of homologous enzymes and ii) on the identification of cases where a given enzymatic activity is performed by two or more proteins without significant similarity between their primary structures. We applied this approach to a dataset obtained from KEGG Comprising all annotated enzymes, which resulted in the identification of 986 EC classes where putative analogy was detected (40.5% of all EC classes). AnEnPi is of considerable value in the construction of initial datasets that can be further curated, particularly in gene and genome annotation, in studies involving molecular evolution and metabolism and in the identification of new potential drug targets. CONCLUSION: AnEnPi is an efficient tool for detection and annotation of analogous enzymes and other enzymes in whole genomes. It is available for academic use at: http://bioinfo.pdtis.fiocruz.br/AnEnPi/


Subject(s)
Computational Biology/methods , Enzymes/chemistry , Algorithms , Animals , Catalysis , Cluster Analysis , Data Interpretation, Statistical , Databases, Protein , Drug Design , Genome , Humans , Leishmania major , Models, Biological , Protein Conformation , Software
7.
BMC Bioinformatics ; 9: 366, 2008 Sep 09.
Article in English | MEDLINE | ID: mdl-18782453

ABSTRACT

BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. RESULTS: We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. CONCLUSION: The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at http://bioinfo.pdtis.fiocruz.br/ReRep/.


Subject(s)
Algorithms , Chromosome Mapping/methods , Genome/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, DNA/methods , Software , Base Sequence , Molecular Sequence Data
8.
Mutat Res ; 624(1-2): 31-40, 2007 Nov 01.
Article in English | MEDLINE | ID: mdl-17509624

ABSTRACT

Arylamine N-acetyltranferase 2 is the main enzyme responsible for the isoniazid metabolization into hepatotoxic intermediates and the degree of hepatotoxicity severity has been attributed to genetic variability in the NAT2 gene. The main goal of this study was to describe the genetic profile of the NAT2 gene in individuals from two different regions of Brazil: Rio de Janeiro and Goiás States. Therefore, after preparation of DNA samples from 404 individuals, genotyping of the coding region of NAT2 was performed by direct PCR sequencing. Thirteen previously described SNPs were detected in these Brazilian populations, from which seven: 191 G>A; 282 C>T; 341 T>C; 481 C>T; 590 G>A; 803 A>G and 857 G>A are the most frequent in other populations. The presence of so-called ethnic-specific SNPs in our population is in accordance with the Brazilians' multiple ancestry. Upon allele and genotype analysis, the most frequent NAT2 alleles were respectively NAT2*5B (33%), NAT2*6A (26%) and NAT2*4 (20%) being NAT2*5/*5 the more prevalent genotype (31.7%). These results clearly demonstrate the predominance in the studied Brazilian groups of NAT2 alleles associated with slow over the fast and intermediate acetylator genotypes. Additionally, in Rio de Janeiro, a significantly higher frequency of intermediate acetylation status was found when compared to Goiás (42.5% versus 25%) (p=0.05), demonstrating that different regions of a country with a population characterized by a multi-ethnic ancestry may present a large degree of variability in NAT2 allelic frequencies. This finding has implications in the determination of nationwide policies for use of appropriate anti-TB drugs.


Subject(s)
Arylamine N-Acetyltransferase/genetics , Polymorphism, Single Nucleotide , Alleles , Antitubercular Agents/adverse effects , Antitubercular Agents/metabolism , Arylamine N-Acetyltransferase/metabolism , Base Sequence , Brazil , DNA Primers/genetics , Ethnicity/genetics , Gene Frequency , Genetics, Population , Genotype , Humans , Isoniazid/adverse effects , Isoniazid/metabolism , Pharmacogenetics
9.
BMC Bioinformatics ; 6: 197, 2005 Aug 03.
Article in English | MEDLINE | ID: mdl-16078998

ABSTRACT

BACKGROUND: BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. RESULTS: Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. CONCLUSION: Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.


Subject(s)
Computational Biology , Databases, Protein , Sequence Analysis, Protein/methods , Software , Base Sequence , Computer Systems , Internet , Online Systems , User-Computer Interface
10.
J Clin Microbiol ; 42(6): 2558-65, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15184434

ABSTRACT

It has not been possible to distinguish different strains of Mycobacterium leprae according to their genetic sequence. However, the genome contains several variable-number tandem repeats (VNTR), which have been used effectively in strain typing of other bacteria. To determine their suitability for differentiating M. leprae, we developed PCR systems to amplify 5 different VNTR loci and examined a battery of 12 M. leprae strains derived from patients in different regions of the United States, Brazil, Mexico, and the Philippines, as well as from wild armadillos and a sooty mangabey monkey. We found diversity at four VNTR (D = 0.74), but one system (C(16)G(8)) failed to yield reproducible results. Alleles for the GAA VNTR varied in length from 10 to 16 copies, those for AT(17) varied in length from 10 to 15 copies, those for GTA varied in length from 9 to 12 copies, and those for TA(18) varied in length from 13 to 20 copies. Relatively little variation was seen with interspecies transfer of bacilli or during short-term passage of strains in nude mice or armadillos. The TA(18) locus was more polymorphic than other VNTR, and genotypic variation was more common after long-term expansion in armadillos. Most strain genotypes remained fairly stable in passage, but strain Thai-53 showed remarkable variability. Statistical cluster analysis segregated strains and passage samples appropriately but did not reveal any particular genotype associable with different regions or hosts of origin. VNTR polymorphisms can be used effectively to discriminate M. leprae strains. Inclusion of additional loci and other elements will likely lead to a robust typing system that can be used in community-based epidemiological studies and select clinical applications.


Subject(s)
Minisatellite Repeats , Mycobacterium leprae/genetics , Genetic Variation , Genotype , Mycobacterium leprae/classification , Polymerase Chain Reaction
11.
s.l; s.n; 2004. 8 p. tab.
Non-conventional in English | Sec. Est. Saúde SP, HANSEN, Hanseníase Leprosy, SESSP-ILSLACERVO, Sec. Est. Saúde SP | ID: biblio-1242754

ABSTRACT

It has not been possible to distinguish different strains of Mycobacterium leprae according to their genetic sequence. However, the genonme contais several variable-number tandem repeats (VNTR), which have been used effectively in strain typing of other bacteria. To determine their suitability for differentiating M. leprae, we developed PCR systems to amplify 5 different VNTR loci and examined a battery of 12 M. leprae strains derived from patients in different regions of the United States, Brazil, Mexico, and the Philippines, as well as from wild armadillos and sooty mangabey monkey. We found diversity at for VNTR (D = 0.74), butone system (C16G8) failed to yield reproducible results. Alleles for the GAA VNTR varied in length from 9 to 12 copies, andthose for AT17 varied in length from 13 to 20 copies. Relatively little variation was seen with interspecies transfer of bacilli or during short-term passage of strains in nude mice or armadillos. The TA18 locus was more polymorphic than other VNTR, and genotypic variation was more common after long-term expansion in armadillos. Most strain genotypes remained fairly stable in passage, but atrain Thai-53 showed reamrkable any particular genotype associable with different regions or hosts of origen. VNTR polymorphisms can be used effectively to discriminate M. leprae strains. Inclusion of additional loci and other elements will likely lead to robust typing system that can be used in community-based epidemiological studies and select clinical application


Subject(s)
Humans , Leprosy/immunology , Leprosy/virology , Mycobacterium leprae/physiology , Mycobacterium leprae/genetics , Mycobacterium leprae/immunology , Mycobacterium leprae/metabolism , Mycobacterium leprae/pathogenicity
12.
Mem. Inst. Oswaldo Cruz ; 92(6): 805-9, Nov.-Dec. 1997. ilus, tab
Article in English | LILACS | ID: lil-197220

ABSTRACT

Data analysis, presentation and distribution is of utmost importance to a genome project. A public domain software, AdDB, has been chosen as the common basis for parasite genome databases, and a first release of TcruziDB, the Trypanosoma cruzi genome database, is available by ftp from ftp://irisdbbm.fiocruz.br/pub/genomedb/TcruziDB as well as versions of the software for different operating systems (ftp://iris.dbbm.fiocruz.br/pub/unixsoft/). Morever, data originated from the project are available from the WWW server at http://www.dbbm.fiocruz.br. It contains biological and parasitological data on CL Brener, its karytype, all available T. cruzi sequences from Genbank, data on th EST-sequencing project and on available libraries, a T. cruzi codon table and a listing of activities and participating groups in the genome project, as well as meeting reports. T. cruzi discussion lists (tcruzi-l@iris.dbbm.fiocruz.br and tcgenics@iris.dbbm.fiocruz.br) being maintained for communication and to promote collaboration in the genome project.


Subject(s)
Animals , Genome, Protozoan , Information Systems , Trypanosoma cruzi/genetics , Computer Communication Networks , Information Services , Information Storage and Retrieval
13.
Mem. Inst. Oswaldo Cruz ; 92(6): 863-6, Nov.-Dec. 1997.
Article in English | LILACS | ID: lil-197229

ABSTRACT

Random single pass sequencing of cDNA fragments, also known as generation of Expressed Sequence Taggs (ESTs), has been highly successful in the study of the gene content of higher organisms, and forms an integral part of most genome projects, with the objective to identify new genes and targets for disease control and prevention and to generate mapping probes. In the Trypanosoma cruzi genome project, EST sequencing has also been a starting point, and here we report data on the first 797 sequences obtained, partly from a CL Brener epimastigote non-normalized library, partly on a normalized library. Only around 30 per cent of the sequences obtained showed similarity with Genbank and dbEST databases, half of which with sequences already reported for T. cruzi.


Subject(s)
Animals , Gene Library , Genome, Protozoan , Trypanosoma cruzi/genetics , Clone Cells
14.
Mem. Inst. Oswaldo Cruz ; 91(3): 279-284, May-Jun. 1996.
Article in English | LILACS | ID: lil-319872

ABSTRACT

Sequence analysis of Leishmania (Viannia) kDNA minicircles and analysis of multiple sequence alignments of the conserved region (minirepeats) of five distinct minicircles from L. (V.) braziliensis species with corresponding sequences derived from other dermotropic leishmanias indicated the presence of a sub-genus specific sequence. An oligonucleotide bearing this sequence was designed and used as a molecular probe, being able to recognize solely the sub-genus Viannia species in hybridization experiments. A dendrogram reflecting the homologies among the minirepeat sequences was constructed. Sequence clustering was obtained corresponding to the traditional classification based on similarity of biochemical, biological and parasitological characteristics of these Leishmania species, distinguishing the Old World dermotropic leishmanias, the New World dermotropic leishmanias of the sub-genus Leishmania and of the sub-genus Viannia.


Subject(s)
Animals , DNA, Kinetoplast , Leishmania , Oligonucleotides , Base Sequence , DNA, Kinetoplast , Hybridization, Genetic , Leishmania , Leishmania braziliensis , Leishmania guyanensis , Molecular Sequence Data , Polymerase Chain Reaction , Sequence Analysis, DNA
15.
Mem. Inst. Oswaldo Cruz ; 88(2): 309-12, abr.-jun. 1993.
Article in English | LILACS | ID: lil-119495

ABSTRACT

The F508 mutation in the cystic fibrosis (CF) gene was studied in a population of 18 Brazilian CF patients and their 17 families by use of PCR and differential hybridization with oligonucleotides. In a total of 34 chromosomes considered, 12 (35%) carried the F508 deletion, a frequency much lower than that reported in most other populations. As a consequence, CF in Brazil would be predominantly caused by mutations different from the F508 deletion


Subject(s)
Cystic Fibrosis , Genetic Engineering , Brazil
SELECTION OF CITATIONS
SEARCH DETAIL
...