Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
Science ; 291(5507): 1304-51, 2001 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-11181995

RESUMO

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.


Assuntos
Genoma Humano , Projeto Genoma Humano , Análise de Sequência de DNA , Algoritmos , Animais , Bandeamento Cromossômico , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Biologia Computacional , Sequência Consenso , Ilhas de CpG , DNA Intergênico , Bases de Dados Factuais , Evolução Molecular , Éxons , Feminino , Duplicação Gênica , Genes , Variação Genética , Humanos , Íntrons , Masculino , Fenótipo , Mapeamento Físico do Cromossomo , Polimorfismo de Nucleotídeo Único , Proteínas/genética , Proteínas/fisiologia , Pseudogenes , Sequências Repetitivas de Ácido Nucleico , Retroelementos , Análise de Sequência de DNA/métodos , Especificidade da Espécie
2.
J Mol Biol ; 183(2): 195-202, 1985 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-4009724

RESUMO

Replication-deficient mutants of the unit-copy miniplasmid lambda-P1:5R were isolated after hydroxylamine mutagenesis. Complementation tests showed that the majority of these mutants are defective in the production of the repA protein product. Two of these mutants have suppressible nonsense (amber) mutations. The DNA sequence of one of these, repA103, has been determined. The lesion lies within the repA open reading frame, showing that the repA product is essential for plasmid replication. Complementation of deletion mutants of lambda-P1:5R by repA protein showed that the origin of replication lies to the left of repA and that this 300-base-pair origin region is the only portion of the DNA essential for plasmid replication if repA protein is supplied in trans. Six of the 21 hydroxylamine-induced mutants were not complemented by repA. Replication of three of these could be restored by introduction into the plasmid of a wild-type origin region, suggesting that they were origin-defective. The DNA sequence of two mutants was determined. Mutant rep-11 has a 43-base-pair deletion within the incC sequence (incC is a series of five direct repeats of a 19-base-pair sequence known to be involved in the regulation of plasmid replication). The deletion appears to have been generated by homologous recombination between two repeats. Mutant rep-30 has a single base substitution in a region just to the left of incC that destroys one of five G-A-T-C (dam methylation) sites in this region. As lambda-P1:5R is unable to establish itself as a plasmid in a methylase-defective (dam-) strain, it seems probable that methylation of the G-A-T-C sequences is important for origin function. The incC region and the sequences to its left appear to constitute an essential part of the origin of replication.


Assuntos
Bacteriófagos/fisiologia , Replicação do DNA , Plasmídeos , Replicação Viral , Bacteriófagos/genética , Sequência de Bases , DNA Recombinante , DNA Viral , Mutação , Biossíntese de Proteínas
3.
Trends Biotechnol ; 10(1-2): 66-9, 1992.
Artigo em Inglês | MEDLINE | ID: mdl-1367939

RESUMO

The ultimate goal of the Human Genome project is to extract the biologically relevant information recorded in the estimated 100,000 genes encoded by the 3 x 10(9) bases of the human genome. This necessitates development of reliable computer-based methods capable of analysing and correctly identifying genes in the vast amounts of DNA-sequence data generated. Such tools may save time and labour by simplifying, for example, screening of cDNA libraries. They may also facilitate the localization of human disease genes by identifying candidate genes in promising regions of anonymous DNA sequence.


Assuntos
Inteligência Artificial , Sequência de Bases , DNA/genética , Bases de Dados Factuais , Projeto Genoma Humano , Dados de Sequência Molecular
4.
Gene ; 66(1): 55-63, 1988 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-2843430

RESUMO

Phosphoribulokinase (PRK) is a key enzyme in the Calvin cycle of autotrophic organisms. We have constructed a spinach leaf cDNA library in the phage expression vector, lambda gt11, and used a rabbit polyclonal antibody raised against spinach PRK to identify PRK clones. Analyses of the nucleotide sequences of two antibody-positive clones, 1.47 and 1.35 kb in length, showed that they encode a protein which contains the N-terminal amino acid (aa) sequence [Porter et al., Arch. Biochem. Biophys. 245 (1986) 14-23] of mature spinach PRK. The codon for the N-terminal serine of the mature protein occurs 170 bp from the 5' end of the open reading frame (ORF), suggesting that PRK is synthesized with a rather long transit peptide which is removed from the mature enzyme. The ORF, ending with an amber (TAG) codon at position 1054, predicts a mature enzyme of 351 aa with a calculated Mr of 39232.


Assuntos
Clonagem Molecular , DNA , Fosfotransferases (Aceptor do Grupo Álcool) , Fosfotransferases/genética , Plantas/genética , Sequência de Bases , Mapeamento Cromossômico , Células Clonais/metabolismo , Precursores Enzimáticos/genética , Técnicas Imunológicas , Sistemas de Informação , Dados de Sequência Molecular , Fosfotransferases/biossíntese , Plantas/enzimologia , RNA Mensageiro/biossíntese , Software
5.
J Comput Biol ; 3(3): 333-44, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8891953

RESUMO

Insertion and deletion (indel) sequencing errors in DNA coding regions disrupt DNA-to-protein translation frames, and hence make most frame-sensitive coding recognition approaches fail. This paper extends the authors' previous work on indel detection and "correction" algorithms, and presents a more effective algorithm for localizing indels that appear in DNA coding regions and "correcting" the located indels by inserting or deleting DNA bases. The algorithm localizes indels by discovering changes of the preferred translation frames within presumed coding regions, and then "corrects" them to restore a consistent translation frame within each coding region. An iterative strategy is exploited to repeatedly localize and "correct" indels until no more indels can be found. Test results have shown that this improved algorithm can detect and "correct" more indels while not worsening the rate of introduction of false indels when compared to the authors' previous work.


Assuntos
Algoritmos , Análise de Sequência de DNA/métodos , Elementos de DNA Transponíveis , Humanos , Deleção de Sequência
13.
Proc Natl Acad Sci U S A ; 88(24): 11261-5, 1991 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-1763041

RESUMO

Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.


Assuntos
Mapeamento Cromossômico , Cromossomos Humanos , DNA/genética , Hominidae/genética , Modelos Genéticos , Redes Neurais de Computação , Proteínas/genética , Animais , Sequência de Bases , Bases de Dados Factuais , Enzimas/genética , Genes ras , Humanos , Dados de Sequência Molecular
14.
J Virol ; 14(5): 1288-92, 1974 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-4547800

RESUMO

The isolation of a temperature phage specific for the photosynthetic bacterium Rhodopseudomonas spheroides is reported. This phage, Rphi-1, establishes a state of lysogeny and can be induced from the prophage state by exposure to mitomycin C or UV irradiation. Mutants of Rphi-1 which grow on a standard laboratory strain (2.4.1) of Rhodopseudomonas spheroides were isolated. Although the original Rphi-1 isolated was chloroform sensitive, the mutant which plates on strain 2.4.1 is chloroform resistant. Rphi-1 does not grow on closely related bacteria, such as Rhodopseudomonas palustris or Rhodopseudomonas capsulata. Rphi-1 mutants forms plaques with the same efficiency whether the plates are incubated under aerobic conditions in the dark or under anaerobic conditions in the light (phototropic conditions).


Assuntos
Bacteriófagos , Mutação , Rhodobacter sphaeroides , Anaerobiose , Bacteriófagos/efeitos dos fármacos , Bacteriófagos/isolamento & purificação , Bacteriófagos/ultraestrutura , Clorofórmio/farmacologia , Vírus de DNA , Resistência Microbiana a Medicamentos , Luz , Lisogenia , Microscopia Eletrônica , Mitomicinas/farmacologia , Raios Ultravioleta , Ensaio de Placa Viral
15.
J Biol Chem ; 266(16): 10694-9, 1991 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-1645355

RESUMO

The Calvin Cycle enzyme phosphoribulokinase is activated in higher plants by the reversible reduction of a disulfide bond, which is located at the active site. To determine the possible contribution of the two regulatory residues (Cys16 and Cys55) to catalysis, site-directed mutagenesis has been used to replace each of them in the spinach enzyme with serine or alanine. The only other cysteinyl residues of the kinase, Cys244 and Cys250, were also replaced individually by serine or alanine. A comparison of specific activities of native and mutant enzymes reveals that substitutions at positions 244 or 250 are inconsequential. The position 16 mutants retain 45-90% of the wild-type activity and display normal Km values for both ATP and ribulose 5-phosphate. In contrast, substitution at position 55 results in 85-95% loss of wild-type activity, with less than a 2-fold increase in the Km for ATP and a 4-8-fold increase in the Km for ribulose 5-phosphate. These results are consistent with moderate facilitation of catalysis by Cys55 and demonstrate that the other three cysteinyl residues do not contribute significantly either to structure or catalysis. The enhanced stability, relative to wild-type enzyme, of the Ser16 mutant protein to a sulfhydryl reagent supports earlier suggestions that Cys16 is the initial target of the oxidative deactivation process.


Assuntos
Cisteína/genética , Mutagênese Sítio-Dirigida , Fosfotransferases (Aceptor do Grupo Álcool) , Fosfotransferases/genética , Sequência de Bases , Western Blotting , DNA/genética , Eletroforese em Gel de Poliacrilamida , Etilmaleimida/farmacologia , Cinética , Dados de Sequência Molecular , Mutação , Fosfotransferases/antagonistas & inibidores , Plantas/enzimologia
16.
Artigo em Inglês | MEDLINE | ID: mdl-9322060

RESUMO

Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.


Assuntos
Algoritmos , Expressão Gênica , Técnicas Genéticas , Genoma Humano , DNA/genética , Bases de Dados Factuais , Éxons , Humanos , Modelos Genéticos , Software
17.
Comput Appl Biosci ; 11(2): 117-24, 1995 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-7620982

RESUMO

This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. This would permit improved sequencing efficiency and reduce genome sequencing costs. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of 'neutral' bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. We have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. Preliminary test results have shown the usefulness of this algorithm and also exhibited some of its weakness, providing possible directions for further improvement. On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the 'corrected' sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the 'corrupted' sequences using standard GRAIL II method (version 1.2).(ABSTRACT TRUNCATED AT 250 WORDS)


Assuntos
Análise de Sequência de DNA/normas , Software , Algoritmos , Éxons , Humanos , Biossíntese de Proteínas , Análise de Sequência de DNA/métodos
18.
J Protein Chem ; 12(2): 207-13, 1993 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-8387794

RESUMO

Based on selective labeling by ATP analogues, Lys68 of the Calvin Cycle enzyme phosphoribulokinase (PRK) from spinach has been assigned to the active-site region [Miziorko et al. (1990), J. Biol. Chem. 265, 3642-3647]. The equivalent position is occupied by lysyl or arginyl residues in the PRK from both prokaryotic and eukaryotic sources, suggesting a requirement for a basic residue at this location. To examine this possibility, we have replaced Lys68 of the spinach enzyme with arginyl, glutaminyl, alanyl, or glutamyl residues by site-directed mutagenesis. All of the mutant enzymes retain substantial kinase activity; and even in the case of the radical substitution by glutamate, the Km values for ATP and ribulose 5-phosphate are not perturbed significantly. Glutamate at position-68 may destabilize tertiary structure, because the yield of this mutant protein from transformed E. coli is quite low compared to that of the other proteins in this series. Despite the active-site proximity of Lys68, our results show that this residue does not play a key role in catalysis or substrate binding.


Assuntos
Lisina/metabolismo , Fosfotransferases (Aceptor do Grupo Álcool) , Fosfotransferases/metabolismo , Plantas/enzimologia , Trifosfato de Adenosina/química , Sequência de Aminoácidos , Sequência de Bases , Sítios de Ligação , Clonagem Molecular , Escherichia coli , Lisina/química , Dados de Sequência Molecular , Mutagênese Sítio-Dirigida , Oligonucleotídeos , Fosfotransferases/química , Fosfotransferases/genética , Ribulosefosfatos/química , Transformação Bacteriana
19.
Comput Appl Biosci ; 10(6): 613-23, 1994 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-7704660

RESUMO

This paper presents a computationally efficient algorithm, the Gene Assembly Program III (GAP III), for constructing gene models from a set of accurately-predicted 'exons'. The input to the algorithm is a set of clusters of exon candidates, generated by a new version of the GRAIL coding region recognition system. The exon candidates of a cluster differ in their presumed edges and occasionally in their reading frames. Each exon candidate has a numerical score representing its 'probability' of being an actual exon. GAP III uses a dynamic programming algorithm to construct a gene model, complete or partial, by optimizing a predefined objective function. The optimal gene models constructed by GAP III correspond very well with the structures of genes which have been determined experimentally and reported in the Genome Sequence Database (GSDB). On a test set of 137 human and mouse DNA sequences consisting of 954 true exons, GAP III constructed 137 gene models using 892 exons, among which 859 (859/954 = 90%) are true exons and 33 (33/892 = 3%) are false positive. Among the 859 true positives, 635 (74%) match the actual exons exactly, and 838 (98%) have at least one edge correct. GAP III is computationally efficient. If we use E and C to represent the total number of exon candidates in all clusters and the number of clusters, respectively, the running time of GAP III is proportional to (E x C).


Assuntos
Algoritmos , Éxons , Modelos Genéticos , Software , Animais , Humanos , Camundongos , Design de Software
20.
J Bacteriol ; 171(3): 1535-43, 1989 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-2493448

RESUMO

Oligodeoxynucleotide-mediated mutagenesis of the ada gene of Escherichia coli was used to produce two mutant Ada proteins. In mutant I the methyl acceptor Cys-321 for O6-methylguanine was replaced by histidine; and in mutant II the positions of Cys-321 and His-322 of the wild-type protein were inverted. Neither mutant protein had O6-methylguanine-DNA methyltransferase activity, but both retained the phosphotriester-DNA methyltransferase activity involving methyl group transfer to Cys-69. Under the control of the endogenous promoter, synthesis of mutant I protein was undetectable before or after adaptation treatment with promoter, synthesis of mutant I protein was undetectable before or after adaptation treatment with N-methyl-N'-nitro-N-nitrosoguanidine. This appeared to be due to both inhibition of transcription of the mutant gene and degradation of the synthesized protein. On the other hand, mutant II protein was inducible by N-methyl-N'-nitro-N-nitrosoguanidine, although to a smaller extent than the wild-type protein was, and the phosphotriester-DNA methyltransferase activity appeared to reside in 24- to 30-kilodalton cleavage products. Mutant I protein could be produced under lac promoter control, and its cleavage products, unlike those of mutant II protein, tended to aggregate. These results indicate that (i) Cys-321 cannot be replaced or transposed with the nucleophilic amino acid histidine for O6-methylguanine-DNA methyltransferase function, (ii) single amino acid replacement or transposition at the O6-methylguanine methyl acceptor site can have a profound effect on the in vivo stability and regulatory function of the Ada protein, and (iii) the integrity of the protein may not be absolutely needed for its transcription-activation function.


Assuntos
Proteínas de Bactérias/genética , Cisteína , Proteínas de Escherichia coli , Escherichia coli/genética , Genes Bacterianos , Genes , Histidina , Mutação , Sequência de Aminoácidos , Proteínas de Bactérias/metabolismo , Sequência de Bases , Western Blotting , DNA Recombinante/metabolismo , Escherichia coli/metabolismo , Metiltransferases/genética , Metiltransferases/metabolismo , Dados de Sequência Molecular , O(6)-Metilguanina-DNA Metiltransferase , Plasmídeos , Mapeamento por Restrição , Fatores de Transcrição , Transcrição Gênica , beta-Galactosidase/biossíntese
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA