Pesquisa | Portal de Pesquisa da BVS

1.

Human-mouse genome comparisons to locate regulatory sites.

Wasserman, W W; Palumbo, M; Thompson, W; Fickett, J W; Lawrence, C E.

Nat Genet ; 26(2): 225-8, 2000 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-11017083

RESUMO

Elucidating the human transcriptional regulatory network is a challenge of the post-genomic era. Technical progress so far is impressive, including detailed understanding of regulatory mechanisms for at least a few genes in multicellular organisms, rapid and precise localization of regulatory regions within extensive regions of DNA by means of cross-species comparison, and de novo determination of transcription-factor binding specificities from large-scale yeast expression data. Here we address two problems involved in extending these results to the human genome: first, it has been unclear how many model organism genomes will be needed to delineate most regulatory regions; and second, the discovery of transcription-factor binding sites (response elements) from expression data has not yet been generalized from single-celled organisms to multicellular organisms. We found that 98% (74/75) of experimentally defined sequence-specific binding sites of skeletal-muscle-specific transcription factors are confined to the 19% of human sequences that are most conserved in the orthologous rodent sequences. Also we found that in using this restriction, the binding specificities of all three major muscle-specific transcription factors (MYF, SRF and MEF2) can be computationally identified.

Assuntos

Genoma Humano , Camundongos/genética , Sequências Reguladoras de Ácido Nucleico , Algoritmos , Animais , Sequência de Bases , Sequência Consenso , Regulação da Expressão Gênica , Humanos , Modelos Genéticos , Alinhamento de Sequência , Transcrição Gênica

2.

Electronic data publishing and GenBank.

Cinkosky, M J; Fickett, J W; Gilna, P; Burks, C.

Science ; 252(5010): 1273-7, 1991 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-1925538

RESUMO

GenBank, the national repository for nucleotide sequence data, has implemented a new model of scientific data management, which we term electronic data publishing. In traditional publishing, both scientific conclusions and supporting data are communicated via the printed page, and in electronic journal publishing, both types of information are communicated via electronic media. In electronic data publishing, by contrast, conclusions are published in a journal while data are published via a network-accessible, electronic database.

Assuntos

Bases de Dados Factuais , Eletrônica , Editoração , Sequência de Bases , DNA/genética , Coleta de Dados/métodos , Projeto Genoma Humano , Humanos , Software

3.

Finding genes by computer: the state of the art.

Fickett, J W.

Trends Genet ; 12(8): 316-20, 1996 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-8783942

RESUMO

Discovering new genes, and their functions, can be aided not only by special purpose gene (and coding region) finding software, but also by searches in key databases, and by programs for finding particular sites relevant to gene expression, such as promoters and splice sites. No one software package includes all the necessary tools. I describe here the main kinds of tools; their working principles, strengths and limitations; and how combined evidence from multiple tools can aid in optimum gene identification.

Assuntos

Biologia Computacional , Bases de Dados Factuais , Genes , Sequência de Aminoácidos , Animais , Sequência de Bases , Códon , DNA/química , Éxons , Humanos , Dados de Sequência Molecular , Sequências Repetitivas de Ácido Nucleico , Software

4.

Quantitative discrimination of MEF2 sites.

Fickett, J W.

Mol Cell Biol ; 16(1): 437-41, 1996 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-8524326

RESUMO

Myocyte-specific enhancer factor 2 (MEF2) is a family of closely related transcription factors that play a key role in the differentiation of muscle tissues and are important in the muscle-specific expression of a number of genes. Given the centrality of MEF2 in muscle differentiation, regulatory regions newly determined to be muscle specific are often studied for potential MEF2 binding sites. Possible sites are often located by comparison to a homologous gene or by matching to the consensus MEF2 sequence. Enough data have accumulated that a richer description of the MEF2 binding site, a position weight matrix, can be reliably constructed and its usefulness can be assessed. It was shown that scores from such a matrix approximate MEF2 binding energy and enable recognition of naturally occurring MEF2 sites with high sensitivity and specificity. Regulation of genes via MEF2-like sites is complicated by the fact that a number of transcription factors are involved. Not only is MEF2 itself a family of proteins, but several other, nonhomologous, transcription factors overlap MEF2 in DNA-binding specificity. Thus, more quantitative methods for recognizing potential sites may help with the lengthy process of disentangling the complex regulatory circuits of muscle-specific expression.

Assuntos

Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Animais , Sítios de Ligação/genética , Biometria , DNA/metabolismo , Humanos , Fatores de Transcrição MEF2 , Dados de Sequência Molecular , Músculos/metabolismo , Mutagênese Sítio-Dirigida , Fatores de Regulação Miogênica

5.

Bacterial start site prediction.

Hannenhalli, S S; Hayes, W S; Hatzigeorgiou, A G; Fickett, J W.

Nucleic Acids Res ; 27(17): 3577-82, 1999 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-10446249

RESUMO

With the growing number of completely sequenced bacterial genes, accurate gene prediction in bacterial genomes remains an important problem. Although the existing tools predict genes in bacterial genomes with high overall accuracy, their ability to pinpoint the translation start site remains unsatisfactory. In this paper, we present a novel approach to bacterial start site prediction that takes into account multiple features of a potential start site, viz., ribosome binding site (RBS) binding energy, distance of the RBS from the start codon, distance from the beginning of the maximal ORF to the start codon, the start codon itself and the coding/non-coding potential around the start site. Mixed integer programing was used to optimize the discriminatory system. The accuracy of this approach is up to 90%, compared to 70%, using the most common tools in fully automated mode (that is, without expert human post-processing of results). The approach is evaluated using Bacillus subtilis, Escherichia coli and Pyrococcus furiosus. These three genomes cover a broad spectrum of bacterial genomes, since B.subtilis is a Gram-positive bacterium, E.coli is a Gram-negative bacterium and P. furiosus is an archaebacterium. A significant problem is generating a set of 'true' start sites for algorithm training, in the absence of experimental work. We found that sequence conservation between P. furiosus and the related Pyrococcus horikoshii clearly delimited the gene start in many cases, providing a sufficient training set.

Assuntos

Códon de Iniciação , Genoma Bacteriano , Biossíntese de Proteínas , Algoritmos , Sequência de Aminoácidos , Bacillus subtilis/genética , Sequência Conservada , Escherichia coli/genética , Dados de Sequência Molecular , Pyrococcus furiosus/genética , Homologia de Sequência de Aminoácidos

6.

Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA.

Guigó, R; Fickett, J W.

J Mol Biol ; 253(1): 51-60, 1995 Oct 13.

Artigo em Inglês | MEDLINE | ID: mdl-7473716

RESUMO

We have studied the behavior of a number of sequence statistics, mostly indicative of protein coding function, in a large set of human clone sequences randomly selected in the course of genome mapping (randomly selected clone sequences), and compared this with the behavior in known sequences containing genes (which we term genic sequences). As expected, given the higher coding density of the genic sequences, the sequence statistics studied behave in a substantially different manner in the randomly selected clone sequences (mostly intergenic DNA) and in the genic sequences. Strong differences in behavior of a number of such statistics are also observed, however when the randomly selected clone sequences are compared with only the non-coding fraction of the genic sequences, suggesting that intergenic and genic non-coding DNA constitute two different classes of non-coding DNA. By studying the behavior of the sequence statistics in simulated DNA of different C+G content, we have observed that a number of them are strongly dependent on C+G content. Thus, most differences between intergenic and genic non-coding DNA can be explained by differences in C+G content. A+T-rich intergenic DNA appears to be at the compositional equilibrium expected under random mutation, while C+G richer non-coding genic DNA is far from this equilibrium. The results obtained in simulated DNA indicate, on the other hand, that a very large fraction of the variation in the coding statistics that underlie gene identification algorithms is due simply to C+G content, and is not directly related to protein coding function. It appears, thus, that the performance of gene-finding algorithms should be improved by carefully distinguishing the effects of protein coding function from those of mere base compositional variation on such coding statistics.

Assuntos

Sequência de Bases/genética , DNA/genética , Genes/genética , Algoritmos , Composição de Bases , Bases de Dados Factuais , Análise Discriminante , Humanos , Fases de Leitura Aberta/genética , Proteínas/genética

7.

Identification of regulatory regions which confer muscle-specific gene expression.

Wasserman, W W; Fickett, J W.

J Mol Biol ; 278(1): 167-81, 1998 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-9571041

RESUMO

For many newly sequenced genes, sequence analysis of the putative protein yields no clue on function. It would be beneficial to be able to identify in the genome the regulatory regions that confer temporal and spatial expression patterns for the uncharacterized genes. Additionally, it would be advantageous to identify regulatory regions within genes of known expression pattern without performing the costly and time consuming laboratory studies now required. To achieve these goals, the wealth of case studies performed over the past 15 years will have to be collected into predictive models of expression. Extensive studies of genes expressed in skeletal muscle have identified specific transcription factors which bind to regulatory elements to control gene expression. However, potential binding sites for these factors occur with sufficient frequency that it is rare for a gene to be found without one. Analysis of experimentally determined muscle regulatory sequences indicates that muscle expression requires multiple elements in close proximity. A model is generated with predictive capability for identifying these muscle-specific regulatory modules. Phylogenetic footprinting, the identification of sequences conserved between distantly related species, complements the statistical predictions. Through the use of logistic regression analysis, the model promises to be easily modified to take advantage of the elucidation of additional factors, cooperation rules, and spacing constraints.

Assuntos

Regulação da Expressão Gênica , Músculo Esquelético/metabolismo , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Sítios de Ligação , Pegada de DNA , Teste de Complementação Genética , Genoma , Computação Matemática , Modelos Moleculares , Filogenia , Fatores de Transcrição/genética

8.

Discovery and modeling of transcriptional regulatory regions.

Fickett, J W; Wasserman, W W.

Curr Opin Biotechnol ; 11(1): 19-24, 2000 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-10679343

RESUMO

A complex network of regulatory controls governs the patterns of gene expression. Enabled by the tools of molecular cloning, initial experimental queries into the gene regulatory network elucidated a wide array of transcription factors and their cognate binding sites from hundreds of genes. The recent fusion of genome-scale experimental tools, a more comprehensive gene catalog, and concomitant advances in computational methodology, has extended the range of questions being posed. The potential to further our understanding of the biochemical mechanisms of transcriptional regulation and to accelerate the delineation of regulatory control regions in the human genome is enormous.

Assuntos

Biologia Computacional , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica/genética , Animais , Sequência de Bases , Sítios de Ligação , Pegada de DNA , Proteínas de Ligação a DNA/metabolismo , Humanos , Filogenia , Regiões Promotoras Genéticas/genética

9.

Coordinate positioning of MEF2 and myogenin binding sites.

Fickett, J W.

Gene ; 172(1): GC19-32, 1996 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-8654964

RESUMO

The MEF2 and MyoD families of transcriptional regulatory factors both play central roles in the terminal differentiation of skeletal muscle. Further, binding sites for the two families often occur nearby, and there have been a number of indications that members of the two families may bind coordinately. The present study provides evidence that known binding sites for the two occur with precise geometric restrictions related to the DNA helical repeat unit, that pairs of putative sites following these restrictions are indicative of skeletal muscle-specific transcriptional regulatory regions, and that the geometric relationship can help provide a consistent interpretation for data that has until now been difficult to explain.

Assuntos

Proteínas de Ligação a DNA/metabolismo , Miogenina/metabolismo , Fatores de Transcrição/metabolismo , Animais , Sequência de Bases , Sítios de Ligação , Evolução Biológica , Sequência Conservada , Proteínas de Ligação a DNA/genética , Elementos Facilitadores Genéticos , Humanos , Fatores de Transcrição MEF2 , Dados de Sequência Molecular , Fatores de Regulação Miogênica , Miogenina/genética , Oligodesoxirribonucleotídeos , Fatores de Transcrição/genética , Transcrição Gênica

10.

ORFs and genes: how strong a connection?

Fickett, J W.

J Comput Biol ; 2(1): 117-23, 1995.

Artigo em Inglês | MEDLINE | ID: mdl-7497114

RESUMO

The length of an open reading frame (ORF) is one important piece of evidence often used in locating new genes, particularly in organisms where splicing is rare. However, there have been no systematic studies quantifying the degree of correlation between length of ORF, on the one hand, and likelihood of gene function, on the other. In this paper, techniques are derived to estimate the conditional probability of gene function, given ORF length, based on evidence both from the databases and from simulation. Several complete chromosomes of Saccharomyces cerevisiae have now been sequenced, and considerable effort is being expended on locating and characterizing the genes in these sequences. Thus, we illustrate the techniques for this organism.

Assuntos

Cromossomos Fúngicos , Bases de Dados Factuais , Genes , Fases de Leitura Aberta , Saccharomyces cerevisiae/genética , Sequência de Aminoácidos , Sequência de Bases , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Biossíntese de Proteínas , Splicing de RNA

11.

A program for computer-assisted scoring of Southern blots.

Cannon, T M; Koskela, R J; Burks, C; Stallings, R L; Ford, A A; Hempfner, P E; Brown, H T; Fickett, J W.

Biotechniques ; 10(6): 764-7, 1991 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-1878210

RESUMO

SCORE, a program for computer-assisted scoring of Southern blots of clone DNA, retains the use of expert human judgment while taking over much of the drudgery of the scoring task. The primary functions of the program are to help make an aligned overlay of the fluorescence gel image and the autoradiogram blot image, to keep track of band and lane locations and to store the resulting data directly into a database. Use of SCORE has resulted in greatly increased efficiency and accuracy.

Assuntos

Southern Blotting , Software , Autorradiografia , Mapeamento Cromossômico/métodos , Impressões Digitais de DNA/métodos , Eletroforese em Gel de Ágar , Humanos , Processamento de Imagem Assistida por Computador/métodos

12.

Predictive methods using nucleotide sequences.

Fickett, J W.

Methods Biochem Anal ; 39: 231-45, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-9707933

Assuntos

DNA/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Códon/genética , Biologia Computacional , Redes de Comunicação de Computadores , Bases de Dados Factuais , Dados de Sequência Molecular , RNA de Transferência/genética , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/estatística & dados numéricos , Software

13.

Fast optimal alignment.

Fickett, J W.

Nucleic Acids Res ; 12(1 Pt 1): 175-9, 1984 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-6694900

RESUMO

We show how to speed up sequence alignment algorithms of the type introduced by Needleman and Wunsch (and generalized by Sellers and others). Faster alignment algorithms have been introduced, but always at the cost of possibly getting sub-optimal alignments. Our modification results in the optimal alignment still being found, often in 1/10 the usual time. What we do is reorder the computation of the usual alignment matrix so that the optimal alignment is ordinarily found when only a small fraction of the matrix is filled. The number of matrix elements which have to be computed is related to the distance between the sequences being aligned; the better the optimal alignment, the faster the algorithm runs.

Assuntos

Sequência de Bases , Ácidos Nucleicos , Computadores , Sistemas de Informação

14.

The gene identification problem: an overview for developers.

Fickett, J W.

Comput Chem ; 20(1): 103-18, 1996 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-16749184

RESUMO

The gene identification problem is the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding genes. This problem is of self-evident importance, and is far from being fully solved, particularly for higher eukaryotes. Thus it is not surprising that the number of algorithm and software developers working in the area is rapidly increasing. The present paper is an overview of the field, with an emphasis on eukaryotes, for such developers.

Assuntos

Genes/genética , Sequência de Bases/genética , Códon/genética , Éxons/genética , Expressão Gênica/genética , Modelos Genéticos , Homologia de Sequência

15.

Inferring genes from open reading frames.

Fickett, J W.

Comput Chem ; 18(3): 203-5, 1994 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-7952890

RESUMO

One expects that in DNA without protein coding function, stop codons (which constitute three of the 64 possible codons) should occur frequently in all reading frames, and that a long open reading frame (ORF) can be interpreted as a sign for the existence of a gene. We make a beginning on introducing quantitative measures of confidence into this inference--taking Saccharomyces cerevisiae as a sample case--and show that some common assumptions can reasonably be questioned. In particular we show that statistical support for the biological function of shorter ORFs listed as putative genes in recent papers is in fact very weak. This is an issue of practical as well as theoretical interest, since researching the function of a putative gene is difficult and expensive.

Assuntos

Genes , Fases de Leitura Aberta , Composição de Bases , Cromossomos Artificiais de Levedura , DNA Fúngico/genética , Genes Fúngicos , Modelos Genéticos , Saccharomyces cerevisiae/genética

16.

Recognition of protein coding regions in DNA sequences.

Fickett, J W.

Nucleic Acids Res ; 10(17): 5303-18, 1982 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-7145702

RESUMO

We give a test for protein coding regions which is based on simple and universal differences between protein-coding and noncoding DNA. The test is simple enough to use without a computer and is completely objective. The test has been thoroughly proven on 400,000 bases of sequence data: it misclassifies 5% of the regions tested and gives an answer of "No Opinion" one fifth of the time. We predict some new coding and noncoding regions in published sequences.

Assuntos

DNA/genética , Genes , Proteínas/genética , Computadores , Modelos Genéticos , Probabilidade

17.

Assessment of protein coding measures.

Fickett, J W; Tung, C S.

Nucleic Acids Res ; 20(24): 6441-50, 1992 Dec 25.

Artigo em Inglês | MEDLINE | ID: mdl-1480466

RESUMO

A number of methods for recognizing protein coding genes in DNA sequence have been published over the last 13 years, and new, more comprehensive algorithms, drawing on the repertoire of existing techniques, continue to be developed. To optimize continued development, it is valuable to systematically review and evaluate published techniques. At the core of most gene recognition algorithms is one or more coding measures--functions which produce, given any sample window of sequence, a number or vector intended to measure the degree to which a sample sequence resembles a window of 'typical' exonic DNA. In this paper we review and synthesize the underlying coding measures from published algorithms. A standardized benchmark is described, and each of the measures is evaluated according to this benchmark. Our main conclusion is that a very simple and obvious measure--counting oligomers--is more effective than any of the more sophisticated measures. Different measures contain different information. However there is a great deal of redundancy in the current suite of measures. We show that in future development of gene recognition algorithms, attention can probably be limited to six of the twenty or so measures proposed to date.

Assuntos

Sequência de Bases , DNA/genética , Genes , Técnicas Genéticas , Proteínas/genética , Algoritmos , Composição de Bases , Códon/genética , Éxons , Análise de Fourier , Humanos

18.

Estimation of protein coding density in a corpus of DNA sequence data.

Fickett, J W; Guigó, R.

Nucleic Acids Res ; 21(12): 2837-44, 1993 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-8332493

RESUMO

A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a 'coding statistic' is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C. elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.

Assuntos

Composição de Bases , Códon , DNA/química , Proteínas/genética , Animais , Caenorhabditis elegans/genética , Cosmídeos , DNA/análise , Genes Fúngicos , Humanos , Análise de Sequência de DNA , Estatística como Assunto

19.

A relational database system for the maintenance and verification of the Los Alamos sequence library.

Kanehisa, M; Fickett, J W; Goad, W B.

Nucleic Acids Res ; 12(1 Pt 1): 149-58, 1984 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-6694899

RESUMO

The nucleic acid sequence databases of Los Alamos National Laboratory, European Molecular Biology Laboratory, and others are organized in a single relational database. This organization with a suitable relational database management program facilitates the tasks of reporting statistics, making cross-references, and double-checking of the original databases.

Assuntos

Sequência de Bases , Sistemas de Informação , Ácidos Nucleicos

20.

Base compositional structure of genomes.

Fickett, J W; Torney, D C; Wolf, D R.

Genomics ; 13(4): 1056-64, 1992 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-1505943

RESUMO

We model the base compositional structure of the human and Escherichia coli genomes. Three particular properties are first quantified: (1) There is a significant tendency for any region of either genome to have a strand-symmetric base composition. (2) The variation in base composition from region to region, within each genome, is very much larger than expected from common homogeneous stochastic models. (3) A given local base composition tends to persist over a scale of at least kilobases (E. coli) or tens of kilobases (human). Multidomain stochastic models from the literature are reviewed and sharpened. In particular, quantitative measurements of the third property lead us to suggest a significant shift in the style of domain models, in which the variation of A+T content with position is modeled by a random walk with frequent small steps rather than with large quantum jumps. As an application, we suggest a way to reduce the amount of computation in the assembly of large sequences from sequences of randomly chosen fragments.

Assuntos

Escherichia coli/genética , Genoma Bacteriano , Genoma Humano , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA