Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Assunto da revista
País de afiliação
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 109(19): E1183-91, 2012 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-22496592

RESUMO

Ultraconserved elements (UCEs) are DNA sequences that are 100% identical (no base substitutions, insertions, or deletions) and located in syntenic positions in at least two genomes. Although hundreds of UCEs have been found in animal genomes, little is known about the incidence of ultraconservation in plant genomes. Using an alignment-free information-retrieval approach, we have comprehensively identified all long identical multispecies elements (LIMEs), which include both syntenic and nonsyntenic regions, of at least 100 identical base pairs shared by at least two genomes. Among six animal genomes, we found the previously known syntenic UCEs as well as previously undescribed nonsyntenic elements. In contrast, among six plant genomes, we only found nonsyntenic LIMEs. LIMEs can also be classified as either simple (repetitive) or complex (nonrepetitive), they may occur in multiple copies in a genome, and they are often spread across multiple chromosomes. Although complex LIMEs were found in both animal and plant genomes, they differed significantly in their composition and copy number. Further analyses of plant LIMEs revealed their functional diversity, encompassing elements found near rRNA and enzyme-coding genes, as well as those found in transposons and noncoding DNA. We conclude that despite the common presence of LIMEs in both animal and plant lineages, the evolutionary processes involved in the creation and maintenance of these elements differ in the two groups and are likely attributable to several mechanisms, including transfer of genetic material from organellar to nuclear genomes, de novo sequence manufacturing, and purifying selection.


Assuntos
Sequência Conservada/genética , Evolução Molecular , Genoma de Planta/genética , Genoma/genética , Sequência de Aminoácidos , Animais , Arabidopsis/genética , Sequência de Bases , Núcleo Celular/genética , Mapeamento Cromossômico , Cromossomos de Mamíferos/genética , Cromossomos de Plantas/genética , Redes Reguladoras de Genes , Genoma Mitocondrial/genética , Humanos , Camundongos , Modelos Genéticos , Dados de Sequência Molecular , Ratos , Especificidade da Espécie , Sintenia
2.
Nucleic Acids Res ; 32(Web Server issue): W649-53, 2004 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15215469

RESUMO

We have developed a web server for the life sciences community to use to search for short repeats of DNA sequence of length between 3 and 10,000 bases within multiple species. This search employs a unique and fast hash function approach. Our system also applies information retrieval algorithms to discover knowledge of cross-species conservation of repeat sequences. Furthermore, we have incorporated a part of the Gene Ontology database into our information retrieval algorithms to broaden the coverage of the search. Our web server and tutorial can be found at http://acmes.rnet.missouri.edu.


Assuntos
Genômica , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA , Software , Algoritmos , Sequência Conservada , Genoma , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Internet , Fatores de Tempo , Interface Usuário-Computador
3.
BMC Bioinformatics ; 6: 111, 2005 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-15869708

RESUMO

BACKGROUND: Searching for small tandem/disperse repetitive DNA sequences streamlines many biomedical research processes. For instance, whole genomic array analysis in yeast has revealed 22 PHO-regulated genes. The promoter regions of all but one of them contain at least one of the two core Pho4p binding sites, CACGTG and CACGTT. In humans, microsatellites play a role in a number of rare neurodegenerative diseases such as spinocerebellar ataxia type 1 (SCA1). SCA1 is a hereditary neurodegenerative disease caused by an expanded CAG repeat in the coding sequence of the gene. In bacterial pathogens, microsatellites are proposed to regulate expression of some virulence factors. For example, bacteria commonly generate intra-strain diversity through phase variation which is strongly associated with virulence determinants. A recent analysis of the complete sequences of the Helicobacter pylori strains 26695 and J99 has identified 46 putative phase-variable genes among the two genomes through their association with homopolymeric tracts and dinucleotide repeats. Life scientists are increasingly interested in studying the function of small sequences of DNA. However, current search algorithms often generate thousands of matches -- most of which are irrelevant to the researcher. RESULTS: We present our hash function as well as our search algorithm to locate small sequences of DNA within multiple genomes. Our system applies information retrieval algorithms to discover knowledge of cross-species conservation of repeat sequences. We discuss our incorporation of the Gene Ontology (GO) database into these algorithms. We conduct an exhaustive time analysis of our system for various repetitive sequence lengths. For instance, a search for eight bases of sequence within 3.224 GBases on 49 different chromosomes takes 1.147 seconds on average. To illustrate the relevance of the search results, we conduct a search with and without added annotation terms for the yeast Pho4p binding sites, CACGTG and CACGTT. Also, a cross-species search is presented to illustrate how potential hidden correlations in genomic data can be quickly discerned. The findings in one species are used as a catalyst to discover something new in another species. These experiments also demonstrate that our system performs well while searching multiple genomes -- without the main memory constraints present in other systems. CONCLUSION: We present a time-efficient algorithm to locate small segments of DNA and concurrently to search the annotation data accompanying the sequence. Genome-wide searches for short sequences often return hundreds of hits. Our experiments show that subsequently searching the annotation data can refine and focus the results for the user. Our algorithms are also space-efficient in terms of main memory requirements. Source code is available upon request.


Assuntos
Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sequências Repetitivas de Ácido Nucleico , Algoritmos , Animais , Sequência de Bases , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genoma , Haemophilus influenzae/genética , Helicobacter pylori/genética , Humanos , Armazenamento e Recuperação da Informação , Linguagens de Programação , Regiões Promotoras Genéticas , Alinhamento de Sequência , Análise de Sequência de DNA , Software , Especificidade da Espécie , Interface Usuário-Computador
4.
AMIA Annu Symp Proc ; : 1072, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17238691

RESUMO

Accurate structural models of target proteins are an essential component to structure-based drug design methods. To aid in this effort, the sequential forward floating selection (SFFS) algorithm is being applied to protein structure prediction. This algorithm demonstrates a reasonable balance between optimity of feature selection and efficiency while using large datasets such as the protein databank (PDB). The resulting protein structure predictions will be used in a study of HIV-1 PR.


Assuntos
Algoritmos , Protease de HIV/química , Conformação Proteica , Bases de Dados de Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA