Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 8: 391, 2007 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-17963481

RESUMO

BACKGROUND: Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats. These repeats include surface molecule genes, and several other gene families. In the T. cruzi genome sequencing project, it was clear that not all copies of repetitive genes were present in the assembly, due to collapse of nearly identical repeats. However, at the time of publication of the T. cruzi genome, it was not clear to what extent this had occurred. RESULTS: We have developed a pipeline to estimate the genomic repeat content, where shotgun reads are aligned to the genomic sequence and the gene copy number is estimated using the average shotgun coverage. This method was applied to the genome of T. cruzi and copy numbers of all protein coding sequences and pseudogenes were estimated. The 22,640 results were stored in a database available online. 18% of all protein coding sequences and pseudogenes were estimated to exist in 14 or more copies in the T. cruzi CL Brener genome. The average coverage of the annotated protein coding sequences and pseudogenes indicate a total gene copy number, including allelic gene variants, of over 40,000. CONCLUSION: Our results indicate that the number of protein coding sequences and pseudogenes in the T. cruzi genome may be twice the previous estimate. We have constructed a database of the T. cruzi gene repeat data that is available as a resource to the community. The main purpose of the database is to enable biologists interested in repeated, unfinished regions to closely examine and resolve these regions themselves using all available shotgun data, instead of having to rely on annotated consensus sequences that often are erroneous and possibly misleading. Five repetitive genes were studied in more detail, in order to illustrate how the database can be used to analyze and extract information about gene repeats with different characteristics in Trypanosoma cruzi.


Assuntos
Bases de Dados Genéticas , Variação Genética , Sequências Repetitivas de Ácido Nucleico , Trypanosoma cruzi/genética , Sequência de Aminoácidos , Animais , Antígenos de Superfície/genética , Sequência Conservada , DNA de Protozoário , Amplificação de Genes , Dosagem de Genes , Genes de Protozoários/fisiologia , Genoma de Protozoário , Proteínas de Membrana/genética , Modelos Biológicos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos
2.
Comput Methods Programs Biomed ; 86(1): 87-92, 2007 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-17292508

RESUMO

Modern alignment methods designed to work rapidly and efficiently with large datasets often do so at the cost of method sensitivity. To overcome this, we have developed a novel alignment program, GRAT, built to accurately align short, highly similar DNA sequences. The program runs rapidly and requires no more memory and CPU power than a desktop computer. In addition, specificity is ensured by statistically separating the true alignments from spurious matches using phred quality values. An efficient separation is especially important when searching large datasets and whenever there are repeats present in the dataset. Results are superior in comparison to widely used existing software, and analysis of two large genomic datasets show the usefulness and scalability of the algorithm.


Assuntos
Alinhamento de Sequência/instrumentação , Análise de Sequência de DNA , Design de Software , Algoritmos , Animais , Galinhas
3.
BMC Bioinformatics ; 7: 155, 2006 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-16549006

RESUMO

BACKGROUND: Many genome projects are left unfinished due to complex, repeated regions. Finishing is the most time consuming step in sequencing and current finishing tools are not designed with particular attention to the repeat problem. RESULTS: We have developed DNPTrapper, a shotgun sequence finishing tool, specifically designed to address the problems posed by the presence of repeated regions in the target sequence. The program detects and visualizes single base differences between nearly identical repeat copies, and offers the overview and flexibility needed to rapidly resolve complex regions within a working session. The use of a database allows large amounts of data to be stored and handled, and allows viewing of mammalian size genomes. The program is available under an Open Source license. CONCLUSION: With DNPTrapper, it is possible to separate repeated regions that previously were considered impossible to resolve, and finishing tasks that previously took days or weeks can be resolved within hours or even minutes.


Assuntos
Algoritmos , DNA/genética , Documentação/métodos , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Software , Interface Usuário-Computador , Sequência de Bases , DNA/análise , DNA/química , Dados de Sequência Molecular
4.
Science ; 309(5733): 409-15, 2005 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16020725

RESUMO

Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.


Assuntos
Genoma de Protozoário , Proteínas de Protozoários/genética , Análise de Sequência de DNA , Trypanosoma cruzi/genética , Animais , Doença de Chagas/tratamento farmacológico , Doença de Chagas/parasitologia , Reparo do DNA , Replicação do DNA , DNA Mitocondrial/genética , DNA de Protozoário/genética , Genes de Protozoários , Humanos , Meiose , Proteínas de Membrana/química , Proteínas de Membrana/genética , Proteínas de Membrana/fisiologia , Família Multigênica , Proteínas de Protozoários/química , Proteínas de Protozoários/fisiologia , Recombinação Genética , Sequências Repetitivas de Ácido Nucleico , Retroelementos , Transdução de Sinais , Telômero/genética , Tripanossomicidas/farmacologia , Tripanossomicidas/uso terapêutico , Trypanosoma cruzi/química , Trypanosoma cruzi/fisiologia
5.
Gene ; 341: 149-65, 2004 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-15474298

RESUMO

Although microsatellites with functional effects have been described, generally, these repeats are considered as "junk" DNA in the same way as other repetitive sequences. Our aim was to investigate if certain microsatellites can have a functional role as cis-regulatory elements. A database was created of all short tandem repeats, from 2 to 10 bases, located in the first 10-kb 5' of the transcription start sites of all annotated genes of the human genome. Of 114 microsatellites selected based on their size and location in the promoter, 51 were found to be polymorphic. Using electrophoretic mobility shift assay (EMSA), we studied five repetitive motifs and three displayed specific protein binding which were found in 12 of the polymorphic microsatellites. An interesting microsatellite is the CTC/GAG repeat which, as double-stranded (DS) DNA, bound specificity protein 1 (SP1) with high affinity, formed triplexes in vitro and displayed differences in SP1 binding and triplex formation capacity for repeats with distinct numbers of repeat units. Interestingly, the polypyrimidine strand of the repeat (CTC) bound other proteins such as polypyrimidine tract-binding protein 1 (PTBP1) as single-stranded (SS) DNA, and a model with two alternative DNA conformations is proposed for these repeats. Distinct protein binding to DS DNA was also observed for different numbers of AAACA and AAAAT repeats. Our results suggest that certain microsatellites may act as cis-regulatory elements, controlling gene expression through transcription factor binding and/or secondary DNA structure formation. Due to their high polymorphism and abundance, they might represent an important source of quantitative genetic variation.


Assuntos
Repetições de Microssatélites/genética , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Sequência de Bases , Sítios de Ligação/genética , Cromatografia Líquida de Alta Pressão/métodos , Proposta de Concorrência , DNA/química , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Bases de Dados de Ácidos Nucleicos , Ensaio de Desvio de Mobilidade Eletroforética , Genótipo , Células HeLa , Humanos , Dados de Sequência Molecular , Oligonucleotídeos/genética , Oligonucleotídeos/metabolismo , Polimorfismo Genético , Regiões Promotoras Genéticas/genética , Ligação Proteica , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Fator de Transcrição Sp1/metabolismo
6.
Bioinformatics ; 20(5): 803-4, 2004 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-14751967

RESUMO

UNLABELLED: Finishing, i.e. gap closure and editing, is the most time-consuming part of genome sequencing. Repeated sequences together with sequencing errors complicate the assembly and often result in misassemblies that are difficult to correct. Repeat Discrepancy Tagger (ReDiT) is a tool designed to aid in the finishing step. This software processes assembly results produced by any fragment assembly program that outputs ace files. The input sequences are analyzed to determine possible differences between repeated sequences. The output is written as tags in an ace file that can be viewed by, e.g. the Consed sequence editor. AVAILABILITY: The ReDiT program is freely available at http://web.cgb.ki.se/redit


Assuntos
Mapeamento Cromossômico/métodos , Documentação/métodos , Etiquetas de Sequências Expressas , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Software , Interface Usuário-Computador , Algoritmos , Sequência de Bases , Gráficos por Computador , Perfilação da Expressão Gênica , Genoma , Dados de Sequência Molecular , Alinhamento de Sequência/métodos , Processamento de Texto/métodos
7.
Nucleic Acids Res ; 31(15): 4663-72, 2003 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-12888528

RESUMO

Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm, which takes advantage of the symmetry between indices that can be computed for similar words of the same length. This allows for rapid construction of multiple alignments, with no previous pair-wise matching of sequence reads required. Results from a C++ implementation of this method show that up to 99% of sequencing errors can be corrected, while up to 87% of the single base differences remain and up to 80% of the corrected reads contain at most one error. The results also show that the method outperforms the error correction method used in the EULER assembler. The prototype software, MisEd, is freely available from the authors for academic use.


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Genoma , Sequências Repetitivas de Ácido Nucleico , Alinhamento de Sequência/métodos , Software , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA