Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Dev Biol ; 378(2): 154-69, 2013 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-23545328

RESUMO

Epithelial tubes are the infrastructure for organs and tissues, and tube morphogenesis requires precise orchestration of cell signaling, shape, migration, and adhesion. Follicle cells in the Drosophila ovary form a pair of epithelial tubes whose lumens act as molds for the eggshell respiratory filaments, or dorsal appendages (DAs). DA formation is a robust and accessible model for studying the patterning, formation, and expansion of epithelial tubes. Tramtrack69 (TTK69), a transcription factor that exhibits a variable embryonic DNA-binding preference, controls DA lumen volume and shape by promoting tube expansion; the tramtrack mutation twin peaks (ttk(twk)) reduces TTK69 levels late in oogenesis, inhibiting this expansion. Microarray analysis of wild-type and ttk(twk) ovaries, followed by in situ hybridization and RNAi of candidate genes, identified the Phospholipase B-like protein Lamina ancestor (LAMA), the scaffold protein Paxillin, the endocytotic regulator Shibire (Dynamin), and the homeodomain transcription factor Mirror, as TTK69 effectors of DA-tube expansion. These genes displayed enriched expression in DA-tube cells, except lama, which was expressed in all follicle cells. All four genes showed reduced expression in ttk(twk) mutants and exhibited RNAi phenotypes that were enhanced in a ttk(twk)/+ background, indicating ttk(twk) genetic interactions. Although previous studies show that Mirror patterns the follicular epithelium prior to DA tubulogenesis, we show that Mirror has an independent, novel role in tube expansion, involving positive regulation of Paxillin. Thus, characterization of ttk(twk)-differentially expressed genes expands the network of TTK69 effectors, identifies novel epithelial tube-expansion regulators, and significantly advances our understanding of this vital developmental process.


Assuntos
Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Epitélio/metabolismo , Ovário/metabolismo , Animais , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/embriologia , Drosophila melanogaster/metabolismo , Dinaminas/genética , Dinaminas/metabolismo , Epitélio/embriologia , Proteínas do Olho/genética , Proteínas do Olho/metabolismo , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Imuno-Histoquímica , Hibridização in Situ Fluorescente , Masculino , Modelos Genéticos , Mutação , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Ovário/embriologia , Paxilina/genética , Paxilina/metabolismo , Ligação Proteica , Interferência de RNA , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
2.
J Bacteriol ; 195(4): 896-907, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23243302

RESUMO

Bacteria often respond to harmful environmental stimuli with the induction of extracytoplasmic function (ECF) sigma (σ) factors that in turn direct RNA polymerase to transcribe specific groups of response genes (or regulons) to minimize cellular damage and favor adaptation to the changed extracellular milieu. In Treponema pallidum subsp. pallidum, the agent of syphilis, the TP0092 gene is predicted to code for the pathogen's only annotated ECF σ factor, homologous to RpoE, known in Escherichia coli to control a key transduction pathway for maintenance of envelope homeostasis in response to external stress and cell growth. Here we have shown that TP0092 is highly transcribed during experimental syphilis. Furthermore, TP0092 transcription levels significantly increase as infection progresses toward immune clearance of the pathogen, suggesting a role for TP0092 in helping T. pallidum respond to harmful stimuli in the host environment. To investigate this hypothesis, we determined the TP0092 regulon at two different time points during infection using chromatin immunoprecipitation followed by high-throughput sequencing. A total of 22 chromosomal regions, all containing putative TP0092-binding sites and corresponding to as many T. pallidum genes, were identified. Noteworthy among them are the genes encoding desulfoferrodoxin and thioredoxin, involved in detoxification of reactive oxygen species (ROS). Because T. pallidum does not possess other enzymes for ROS detoxification, such as superoxide dismutase, catalase, or glutathione peroxidase, our results suggest that the TP0092 regulon is important in protecting the syphilis spirochete from damage caused by ROS produced at the site of infection during the inflammatory response.


Assuntos
Regulação Bacteriana da Expressão Gênica/fisiologia , Regulon/fisiologia , Fator sigma/genética , Sífilis/microbiologia , Sífilis/patologia , Treponema pallidum/fisiologia , Animais , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Membrana Celular , Imunoprecipitação da Cromatina , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Humanos , RNA Bacteriano/genética , RNA Bacteriano/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Coelhos , Reação em Cadeia da Polimerase em Tempo Real , Fator sigma/metabolismo
3.
Mol Microbiol ; 72(5): 1087-99, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19432808

RESUMO

Transcriptional regulation in Treponema pallidum ssp. pallidum is poorly understood, primarily because this organism cannot be cultivated in vitro or genetically manipulated. We have recently shown a phase variation mechanism controlling transcription initiation of Subfamily II tpr (T. pallidumrepeat) genes (tprE, tprG and tprJ), a group of virulence factor candidates. Furthermore, the same study suggested that additional mechanisms might influence the level of transcription of these tprs. The T. pallidum genome sequence has revealed a few open reading frames with similarity to known bacterial transcription factors, including four catabolite activator protein homologues. In this work, sequences matching the Escherichia coli cAMP receptor protein (CRP) binding motif were identified in silico upstream of tprE, tprG and tprJ. Using elecrophoretic mobility shift assay and DNaseI footprinting assay, recombinant TP0262, a T. pallidum CRP homologue, was shown to bind specifically to amplicons obtained from the tpr promoters containing putative CRP binding motifs. Using a heterologous reporter system, binding of TP0262 to these promoters was shown to either increase (tprE and tprJ) or decrease (tprG) tpr promoter activity. This is the first characterization of a T. pallidum transcriptional modulator that influences tpr promoter activity.


Assuntos
Proteínas de Bactérias/genética , Proteína Receptora de AMP Cíclico/genética , Regiões Promotoras Genéticas , Treponema pallidum/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Clonagem Molecular , Proteína Receptora de AMP Cíclico/metabolismo , Pegada de DNA , DNA Bacteriano/genética , Ensaio de Desvio de Mobilidade Eletroforética , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Dados de Sequência Molecular , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Transcrição Gênica , Treponema pallidum/metabolismo
4.
PLoS Comput Biol ; 5(12): e1000616, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20019805

RESUMO

Co-expression networks are routinely used to study human diseases like obesity and diabetes. Systematic comparison of these networks between species has the potential to elucidate common mechanisms that are conserved between human and rodent species, as well as those that are species-specific characterizing evolutionary plasticity. We developed a semi-parametric meta-analysis approach for combining gene-gene co-expression relationships across expression profile datasets from multiple species. The simulation results showed that the semi-parametric method is robust against noise. When applied to human, mouse, and rat liver co-expression networks, our method out-performed existing methods in identifying gene pairs with coherent biological functions. We identified a network conserved across species that highlighted cell-cell signaling, cell-adhesion and sterol biosynthesis as main biological processes represented in genome-wide association study candidate gene sets for blood lipid levels. We further developed a heterogeneity statistic to test for network differences among multiple datasets, and demonstrated that genes with species-specific interactions tend to be under positive selection throughout evolution. Finally, we identified a human-specific sub-network regulated by RXRG, which has been validated to play a different role in hyperlipidemia and Type 2 diabetes between human and mouse. Taken together, our approach represents a novel step forward in integrating gene co-expression networks from multiple large scale datasets to leverage not only common information but also differences that are dataset-specific.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Fígado/fisiologia , Modelos Biológicos , Biologia de Sistemas/métodos , Animais , Simulação por Computador , Diabetes Mellitus/metabolismo , Redes Reguladoras de Genes , Humanos , Metabolismo dos Lipídeos , Fígado/metabolismo , Camundongos , Camundongos Transgênicos , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Ratos , Análise de Regressão , Transdução de Sinais , Especificidade da Espécie
5.
BMC Bioinformatics ; 10: 432, 2009 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-20021665

RESUMO

BACKGROUND: In 2004, Bejerano et al. announced the startling discovery of hundreds of "ultraconserved elements", long genomic sequences perfectly conserved across human, mouse, and rat. Their announcement stimulated a flurry of subsequent research. RESULTS: We generalize the notion of ultraconserved element in a natural way from extraordinary human-rodent conservation to extraordinary conservation over an arbitrary set of species. We call these "Extremely Conserved Elements". There is a linear time algorithm to find all such Extremely Conserved Elements in any multiple sequence alignment, provided that the conservation is required to be across all the aligned species. For the general case of conservation across an arbitrary subset of the aligned species, we show that the question of whether there exists an Extremely Conserved Element is NP-complete. We illustrate the linear time algorithm by cataloguing all 177 Extremely Conserved Elements in the currently available 44-vertebrate whole-genome alignment, and point out some of the characteristics of these elements. CONCLUSIONS: The NP-completeness in the case of conservation across an arbitrary subset of the aligned species implies that it is unlikely an efficient algorithm exists for this general case. Despite this fact, for the interesting case of conservation across all or most of the aligned species, our algorithm is efficient enough to be practical. The 177 Extremely Conserved Elements that we catalog demonstrate many of the characteristics of the original ultraconserved elements of Bejerano et al.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Animais , Sequência de Bases , Sequência Conservada , Genoma , Humanos , Camundongos , Ratos , Análise de Sequência de DNA , Vertebrados
6.
Nucleic Acids Res ; 35(14): 4809-19, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17621584

RESUMO

We applied a computational pipeline based on comparative genomics to bacteria, and identified 22 novel candidate RNA motifs. We predicted six to be riboswitches, which are mRNA elements that regulate gene expression on binding a specific metabolite. In separate studies, we confirmed that two of these are novel riboswitches. Three other riboswitch candidates are upstream of either a putative transporter gene in the order Lactobacillales, citric acid cycle genes in Burkholderiales or molybdenum cofactor biosynthesis genes in several phyla. The remaining riboswitch candidate, the widespread Genes for the Environment, for Membranes and for Motility (GEMM) motif, is associated with genes important for natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. Among the other motifs, one has a genetic distribution similar to a previously published candidate riboswitch, ykkC/yxkD, but has a different structure. We identified possible non-coding RNAs in five phyla, and several additional cis-regulatory RNAs, including one in epsilon-proteobacteria (upstream of purD, involved in purine biosynthesis), and one in Cyanobacteria (within an ATP synthase operon). These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered.


Assuntos
Genômica/métodos , RNA Bacteriano/química , Sequências Reguladoras de Ácido Ribonucleico , Análise de Sequência de RNA/métodos , Sequência de Bases , Biologia Computacional , Sequência Consenso , Genoma Bacteriano , Dados de Sequência Molecular , Conformação de Ácido Nucleico , RNA Mensageiro/química , RNA não Traduzido/química
7.
PLoS Comput Biol ; 3(7): e126, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17616982

RESUMO

Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair-level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth.


Assuntos
Biologia Computacional/métodos , RNA não Traduzido/análise , Homologia de Sequência do Ácido Nucleico , Inteligência Artificial , Sequência de Bases , Sequência Conservada , Bases de Dados de Ácidos Nucleicos , Genes Reguladores , Genoma Bacteriano , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Reconhecimento Automatizado de Padrão , RNA Bacteriano/análise
8.
Nat Biotechnol ; 23(10): 1249-56, 2005 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16211068

RESUMO

We have analyzed issues of reliability in studies in which comparative genomic approaches have been applied to the discovery of regulatory elements at a genome-wide level in vertebrates. We point out some potential problems with such studies, including difficulties in accurately identifying orthologous promoter regions. Many of these subtle analytical problems have become apparent only when studying the more complex vertebrate genomes. By determining motif reliability, we compared existing tools when applied to the discovery of vertebrate regulatory elements. We then used a statistical clustering method to produce a computational catalog of high quality putative regulatory elements from vertebrates, some of which are widely conserved among vertebrates and many of which are novel regulatory elements. The results provide a glimpse into the wealth of information that comparative genomics can yield and suggest the need for further improvement of genome-wide comparative computational techniques.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genômica/métodos , Elementos Reguladores de Transcrição/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Vertebrados/genética , Animais , Sequência de Bases , Sequência Conservada , Pegada de DNA/métodos , Evolução Molecular , Modelos Genéticos , Modelos Estatísticos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie
9.
Nat Biotechnol ; 23(1): 137-44, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15637633

RESUMO

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.


Assuntos
Biologia Computacional/métodos , Expressão Gênica , Transcrição Gênica , Motivos de Aminoácidos , Animais , Sítios de Ligação , Bases de Dados de Proteínas , Drosophila , Proteínas Fúngicas/química , Humanos , Internet , Camundongos , Reprodutibilidade dos Testes , Software
10.
Nucleic Acids Res ; 34(Web Server issue): W366-8, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16845027

RESUMO

Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the most conserved motifs in those homologous regions. This note describes web software that has been designed specifically for this purpose in prokaryotic genomes, making use of the phylogenetic relationships among the homologous sequences in order to make more accurate predictions. The software is called MicroFootPrinter and is available at http://bio.cs.washington.edu/software.html.


Assuntos
Genoma Arqueal , Genoma Bacteriano , Genômica/métodos , Elementos Reguladores de Transcrição , Software , Internet , Filogenia , Homologia de Sequência de Aminoácidos , Interface Usuário-Computador
11.
BMC Bioinformatics ; 8: 417, 2007 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-17963514

RESUMO

BACKGROUND: Multiple alignment of homologous DNA sequences is of great interest to biologists since it provides a window into evolutionary processes. At present, the accuracy of whole-genome multiple alignments, particularly in noncoding regions, has not been thoroughly evaluated. RESULTS: We evaluate the alignment accuracy of certain noncoding regions using noncoding RNA alignments from Rfam as a reference. We inspect the MULTIZ 17-vertebrate alignment from the UCSC Genome Browser for all the human sequences in the Rfam seed alignments. In particular, we find 638 instances of chimeric and partial alignments to human noncoding RNA elements, of which at least 225 can be improved by straightforward means. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment. CONCLUSION: MULTIZ does a fairly accurate job of aligning these genomes in these difficult regions. However, our experiments indicate that better alignments exist in some regions.


Assuntos
Técnicas de Apoio para a Decisão , RNA não Traduzido/análise , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Validação de Programas de Computador , Sequência de Bases , Simulação por Computador , Bases de Dados de Ácidos Nucleicos , Estudos de Avaliação como Assunto , Genômica/métodos , Humanos , Família Multigênica , Controle de Qualidade , Alinhamento de Sequência/normas , Análise de Sequência de RNA/normas
12.
J Mol Biol ; 362(5): 1004-24, 2006 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-16949611

RESUMO

We recently used computational protein design to create an extremely stable, globular protein, Top7, with a sequence and fold not observed previously in nature. Since Top7 was created in the absence of genetic selection, it provides a rare opportunity to investigate aspects of the cellular protein production and surveillance machinery that are subject to natural selection. Here we show that a portion of the Top7 protein corresponding to the final 49 C-terminal residues is efficiently mis-translated and accumulates at high levels in Escherichia coli. We used circular dichroism, size-exclusion chromatography, small-angle X-ray scattering, analytical ultra-centrifugation, and NMR spectroscopy to show that the resulting C-terminal fragment (CFr) protein adopts a compact, extremely stable, homo-dimeric structure. Based on the solution structure, we engineered an even more stable variant of CFr by disulfide-induced covalent circularisation that should be an excellent platform for design of novel functions. The accumulation of high levels of CFr exposes the high error rate of the protein translation machinery. The rarity of correspondingly stable fragments in natural proteins coupled with the observation that high quality ribosome binding sites are found to occur within E. coli protein-coding regions significantly less often than expected by random chance implies a stringent evolutionary pressure against protein sub-fragments that can independently fold into stable structures. The symmetric self-association between two identical mis-translated CFr sub-domains to generate an extremely stable structure parallels a mechanism for natural protein-fold evolution by modular recombination of protein sub-structures.


Assuntos
Evolução Molecular , Engenharia de Proteínas , Sequência de Aminoácidos , Cromatografia em Gel , Dicroísmo Circular , Biologia Computacional , Cristalografia/métodos , Dimerização , Dissulfetos/química , Concentração de Íons de Hidrogênio , Modelos Moleculares , Dados de Sequência Molecular , Peso Molecular , Ressonância Magnética Nuclear Biomolecular , Biossíntese de Proteínas , Conformação Proteica , Desnaturação Proteica , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Ultracentrifugação
13.
Nucleic Acids Res ; 31(13): 3586-8, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12824371

RESUMO

A fundamental challenge facing biologists is to identify DNA binding sites for unknown regulatory factors, given a collection of genes believed to be coregulated. The program YMF identifies good candidates for such binding sites by searching for statistically overrepresented motifs. More specifically, YMF enumerates all motifs in the search space and is guaranteed to produce those motifs with greatest z-scores. This note describes the YMF web software, available at http://bio.cs.washington.edu/software.html.


Assuntos
Análise de Sequência de DNA/métodos , Software , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação , Interpretação Estatística de Dados , Internet , Mycobacterium tuberculosis/genética , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Saccharomyces cerevisiae/genética , Interface Usuário-Computador
14.
Nucleic Acids Res ; 31(13): 3840-2, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12824433

RESUMO

Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the best conserved motifs in those homologous regions. This note describes web software that has been designed specifically for this purpose, making use of the phylogenetic relationships among the homologous sequences in order to make more accurate predictions. The software is called FootPrinter and is available at http://bio.cs.washington.edu/software.html.


Assuntos
Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência Conservada , Internet , Filogenia , Homologia de Sequência do Ácido Nucleico , Interface Usuário-Computador
15.
Nucleic Acids Res ; 30(24): 5549-60, 2002 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-12490723

RESUMO

Understanding the complex and varied mechanisms that regulate gene expression is an important and challenging problem. A fundamental sub-problem is to identify DNA binding sites for unknown regulatory factors, given a collection of genes believed to be co-regulated. We discuss a computational method that identifies good candidates for such binding sites. Unlike local search techniques such as expectation maximization and Gibbs samplers that may not reach a global optimum, the method discussed enumerates all motifs in the search space, and is guaranteed to produce the motifs with greatest z-scores. We discuss the results of validation experiments in which this algorithm was used to identify candidate binding sites in several well studied regulons of Saccharomyces cerevisiae, where the most prominent transcription factor binding sites are largely known. We then discuss the results on gene families in the functional and mutant phenotype catalogs of S.cerevisiae, where the algorithm suggests many promising novel transcription factor binding sites. The program is available at http://bio.cs.washington.edu/software.html.


Assuntos
Biologia Computacional/métodos , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação/genética , Biologia Computacional/estatística & dados numéricos , DNA Fúngico/genética , DNA Fúngico/metabolismo , Genes Fúngicos/genética , Fenótipo , Regiões Promotoras Genéticas/genética , Ligação Proteica , Regulon/genética , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética
16.
BMC Bioinformatics ; 5: 170, 2004 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-15511292

RESUMO

BACKGROUND: This paper addresses the problem of discovering transcription factor binding sites in heterogeneous sequence data, which includes regulatory sequences of one or more genes, as well as their orthologs in other species. RESULTS: We propose an algorithm that integrates two important aspects of a motif's significance - overrepresentation and cross-species conservation - into one probabilistic score. The algorithm allows the input orthologous sequences to be related by any user-specified phylogenetic tree. It is based on the Expectation-Maximization technique, and scales well with the number of species and the length of input sequences. We evaluate the algorithm on synthetic data, and also present results for data sets from yeast, fly, and human. CONCLUSIONS: The results demonstrate that the new approach improves motif discovery by exploiting multiple species information.


Assuntos
Algoritmos , DNA Fúngico/genética , DNA/genética , Evolução Molecular , Animais , Composição de Bases/genética , Drosophila/genética , Drosophila melanogaster/genética , Elementos Facilitadores Genéticos/genética , Humanos , Modelos Genéticos , Filogenia , Saccharomyces/genética , Saccharomyces cerevisiae/genética , Software , Especificidade da Espécie
17.
J Comput Biol ; 9(2): 225-42, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12015879

RESUMO

The DNA motif discovery problem abstracts the task of discovering short, conserved sites in genomic DNA. Pevzner and Sze recently described a precise combinatorial formulation of motif discovery that motivates the following algorithmic challenge: find twenty planted occurrences of a motif of length fifteen in roughly twelve kilobases of genomic sequence, where each occurrence of the motif differs from its consensus in four randomly chosen positions. Such "subtle" motifs, though statistically highly significant, expose a weakness in existing motif-finding algorithms, which typically fail to discover them. Pevzner and Sze introduced new algorithms to solve their (15,4)-motif challenge, but these methods do not scale efficiently to more difficult problems in the same family, such as the (14,4)-, (16,5)-, and (18,6)-motif problems. We introduce a novel motif-discovery algorithm, PROJECTION, designed to enhance the performance of existing motif finders using random projections of the input's substrings. Experiments on synthetic data demonstrate that PROJECTION remedies the weakness observed in existing algorithms, typically solving the difficult (14,4)-, (16,5)-, and (18,6)-motif problems. Our algorithm is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge. A probabilistic estimate suggests that related motif-finding problems that PROJECTION fails to solve are in all likelihood inherently intractable. We also test the performance of our algorithm on realistic biological examples, including transcription factor binding sites in eukaryotes and ribosome binding sites in prokaryotes.


Assuntos
Algoritmos , DNA/genética , Composição de Bases , Sequência de Bases , Sítios de Ligação/genética , Biologia Computacional , Sequência Conservada , DNA/química , DNA/metabolismo , Ribossomos/metabolismo , Análise de Sequência de DNA/estatística & dados numéricos , Fatores de Transcrição/metabolismo
18.
J Comput Biol ; 9(1): 1-22, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-11911792

RESUMO

The advent of the DNA microarray technology has brought with it the exciting possibility of simultaneously observing the expression levels of all genes in an organism. One such microarray technology, called "oligo arrays," manufactures short single strands of DNA (called probes) onto a glass surface using photolithography. An altered or missed step in such a manufacturing protocol can adversely affect all probes using this failed step and is in general impossible to disentangle from experimental variation when using such a defective array. The idea of designing special quality control probes to detect a failed step was first formulated by Hubbell and Pevzner (1999). We consider an alternative formulation of this problem and use a combinatorial design approach to solve it. Our results improve over prior work in guaranteeing coverage of all protocol steps and in being able to tolerate a greater number of unreliable probe intensities.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/normas , Técnicas de Química Combinatória/normas , Desenho de Fármacos , Perfilação da Expressão Gênica/normas , Modelos Estatísticos , Controle de Qualidade
19.
J Comput Biol ; 9(2): 211-23, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12015878

RESUMO

Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous noncoding DNA sequences from multiple species. We introduce a new motif-finding problem, the Substring Parsimony Problem, which is a formalization of the ideas behind phylogenetic footprinting, and we present an exact dynamic programming algorithm to solve it. We then present a number of algorithmic optimizations that allow our program to run quickly on most biologically interesting datasets. We show how to handle data sets in which only an unknown subset of the sequences contains the regulatory element. Finally, we describe how to empirically assess the statistical significance of the motifs found. Each technique is implemented and successfully identifies a number of known binding sites, as well as several highly conserved but uncharacterized regions. The program is available at http://bio.cs.washington.edu/software.html.


Assuntos
Algoritmos , Pegada de DNA/estatística & dados numéricos , Filogenia , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Genes Reguladores , Software
20.
Nat Biotechnol ; 28(6): 567-72, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20495551

RESUMO

Multiple sequence alignment is a difficult computational problem. There have been compelling pleas for methods to assess whole-genome multiple sequence alignments and compare the alignments produced by different tools. We assess the four ENCODE alignments, each of which aligns 28 vertebrates on 554 Mbp of total input sequence. We measure the level of agreement among the alignments and compare their coverage and accuracy. We find a disturbing lack of agreement among the alignments not only in species distant from human, but even in mouse, a well-studied model organism. Overall, the assessment shows that Pecan produces the most accurate or nearly most accurate alignment in all species and genomic location categories, while still providing coverage comparable to or better than that of the other alignments in the placental mammals. Our assessment reveals that constructing accurate whole-genome multiple sequence alignments remains a significant challenge, particularly for noncoding regions and distantly related species.


Assuntos
Genoma/genética , Alinhamento de Sequência/métodos , Animais , Sequência de Bases , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa