Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Osteoarthritis Cartilage ; 14(8): 830-8, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16580849

RESUMO

OBJECTIVE: To better understand transcription regulation of osteoarthritis (OA) by examining common promoter motifs in canine osteoarthritic genes, to identify other genes containing these motifs and to assess the conservation of these motifs between canine, human, mouse and rat. DESIGN: Differentially expressed transcripts in canine OA were mapped to the human genome. We thus identified 20 orthologous human transcripts representing 19 up-regulated genes and 62 orthologous transcripts representing 60 down-regulated genes. The 5 kbp upstream regions of these transcripts were used to identify binding sites and build promoter models based on those sites. The human genome was subsequently searched for other transcripts likely to be regulated by the same promoter models. Orthologous transcripts were then identified in canine, rat and mouse for determination of potential cross-species conservation of binding sites comprising the promoter model. RESULTS: Four promoter models containing 5-6 transcripts and 5-8 common transcription factor binding sites were developed. They include binding sites for AP-4, AP-2alpha and gamma, and E2F. Several hundred other human genes were found to contain these promoter motifs. Furthermore these motifs were significantly over represented in the orthologous genes in canine, rat and mouse genomes. CONCLUSIONS: We have developed and applied a computational methodology to identify common promoter elements implicated in OA and shared amongst four higher vertebrates. The transcription factors associated with these binding sites and other genes driven by these promoter motifs have been implicated in OA, chondrocyte development and with other biological factors involved in the disease.


Assuntos
Sequência Conservada , Regulação da Expressão Gênica , Osteoartrite/genética , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/genética , Animais , Sequência de Bases , Sítios de Ligação , Biologia Computacional , Cães , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Camundongos , Dados de Sequência Molecular , Ratos
2.
J Virol ; 79(11): 6610-9, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15890899

RESUMO

We have investigated regulatory sequences in noncoding human DNA that are associated with repression of an integrated human immunodeficiency virus type 1 (HIV-1) promoter. HIV-1 integration results in the formation of precise and homogeneous junctions between viral and host DNA, but integration takes place at many locations. Thus, the variation in HIV-1 gene expression at different integration sites reports the activity of regulatory sequences at nearby chromosomal positions. Negative regulation of HIV transcription is of particular interest because of its association with maintaining HIV in a latent state in cells from infected patients. To identify chromosomal regulators of HIV transcription, we infected Jurkat T cells with an HIV-based vector transducing green fluorescent protein (GFP) and separated cells into populations containing well-expressed (GFP-positive) or poorly expressed (GFP-negative) proviruses. We then determined the chromosomal locations of the two classes by sequencing 971 junctions between viral and cellular DNA. Possible effects of endogenous cellular transcription were characterized by transcriptional profiling. Low-level GFP expression correlated with integration in (i) gene deserts, (ii) centromeric heterochromatin, and (iii) very highly expressed cellular genes. These data provide a genome-wide picture of chromosomal features that repress transcription and suggest models for transcriptional latency in cells from HIV-infected patients.


Assuntos
HIV-1/genética , Sequência de Bases , Cromossomos Humanos/genética , Cromossomos Humanos/virologia , DNA/genética , DNA Viral/genética , Inativação Gênica , Genes Reguladores , Genes Virais , Genoma Humano , Infecções por HIV/genética , Infecções por HIV/virologia , HIV-1/patogenicidade , HIV-1/fisiologia , Humanos , Células Jurkat , Modelos Genéticos , Dados de Sequência Molecular , Regiões Promotoras Genéticas , Provírus/genética , Provírus/fisiologia , Transcrição Gênica , Integração Viral/genética
3.
Bioinformatics ; 17(10): 871-7, 2001 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11673231

RESUMO

MOTIVATION: Whole genome shotgun sequencing strategies generate sequence data prior to the application of assembly methodologies that result in contiguous sequence. Sequence reads can be employed to indicate regions of conservation between closely related species for which only one genome has been assembled. Consequently, by using pairwise sequence alignments methods it is possible to identify novel, non-repetitive, conserved segments in non-coding sequence that exist between the assembled human genome and mouse whole genome shotgun sequencing fragments. Conserved non-coding regions identify potentially functional DNA that could be involved in transcriptional regulation. RESULTS: Local sequence alignment methods were applied employing mouse fragments and the assembled human genome. In addition, transcription factor binding sites were detected by aligning their corresponding positional weight matrices to the sequence regions. These methods were applied to a set of transcripts corresponding to 502 genes associated with a variety of different human diseases taken from the Online Mendelian Inheritance in Man database. Using statistical arguments we have shown that conserved non-coding segments contain an enrichment of transcription factor binding sites when compared to the sequence background in which the conserved segments are located. This enrichment of binding sites was not observed in coding sequence. Conserved non-coding segments are not extensively repeated in the genome and therefore their identification provides a rapid means of finding genes with related conserved regions, and consequently potentially related regulatory mechanism. Conserved segments in upstream regions are found to contain binding sites that are co-localized in a manner consistent with experimentally known transcription factor pairwise co-occurrences and afford the identification of novel co-occurring Transcription Factor (TF) pairs. This study provides a methodology and more evidence to suggest that conserved non-coding regions are biologically significant since they contain a statistical enrichment of regulatory signals and pairs of signals that enable the construction of regulatory models for human genes. CONTACT: samuel.levy@celera.com.


Assuntos
Genoma Humano , Análise de Sequência de DNA/estatística & dados numéricos , Algoritmos , Animais , Sítios de Ligação/genética , Biologia Computacional , Sequência Conservada , DNA/genética , DNA/metabolismo , Genes Reguladores , Humanos , Camundongos , Modelos Genéticos , Alinhamento de Sequência/estatística & dados numéricos , Fatores de Transcrição/metabolismo
4.
Bioinformatics ; 17 Suppl 1: S90-6, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11472997

RESUMO

Computational prediction of eukaryotic poIII promoters has been one of the most elusive problems despite considerable effort devoted to the study. Researchers have looked for various types of signals around the transcriptional start site (TSS), viz. oligo-nucleotide statistics, potential binding sites for core factors, clusters of binding sites, proximity to CpG islands etc. The proximity of CpG islands to gene starts is now a well established fact, although until recently, it was based on very little genomic data. In this work we explore the possibility of enhancing the promoter prediction accuracy by combining CpG island information with a few other, biologically motivated, seemingly independent signals, that cover most of the known knowledge. We benchmarked the method on a much larger genomic datasets compared to previous studies. We were able to improve slightly upon current prediction accuracy. Furthermore, we observe that CpG islands are the most dominant signals and the other signals do not improve the prediction. This suggests that the computational prediction of promoters for genes with no associated CpG-island (typically having tissue-specific expression) looking only at the immediate neighborhood of the TSS may not even be possible. We suggest some biological experiments and studies to better understand the biology of transcription.


Assuntos
Genoma Humano , Regiões Promotoras Genéticas , Sítios de Ligação/genética , Biologia Computacional , Ilhas de CpG , DNA/genética , DNA/metabolismo , Técnicas Genéticas/estatística & dados numéricos , Humanos
5.
Science ; 291(5507): 1304-51, 2001 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-11181995

RESUMO

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.


Assuntos
Genoma Humano , Projeto Genoma Humano , Análise de Sequência de DNA , Algoritmos , Animais , Bandeamento Cromossômico , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Biologia Computacional , Sequência Consenso , Ilhas de CpG , DNA Intergênico , Bases de Dados Factuais , Evolução Molecular , Éxons , Feminino , Duplicação Gênica , Genes , Variação Genética , Humanos , Íntrons , Masculino , Fenótipo , Mapeamento Físico do Cromossomo , Polimorfismo de Nucleotídeo Único , Proteínas/genética , Proteínas/fisiologia , Pseudogenes , Sequências Repetitivas de Ácido Nucleico , Retroelementos , Análise de Sequência de DNA/métodos , Especificidade da Espécie
6.
J Mol Biol ; 303(1): 61-76, 2000 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-11021970

RESUMO

The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional sub-types from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96 % compared to 80 % obtained for sequence similarity and 74 % for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94 % compared to 68 % for sequence similarity and 79 % for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Alinhamento de Sequência/métodos , Adenilil Ciclases/química , Adenilil Ciclases/metabolismo , Algoritmos , Sequência de Aminoácidos , Animais , Bases de Dados como Assunto , Entropia , Guanilato Ciclase/química , Guanilato Ciclase/metabolismo , Humanos , L-Lactato Desidrogenase/química , L-Lactato Desidrogenase/metabolismo , Malato Desidrogenase/química , Malato Desidrogenase/metabolismo , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Proteínas/classificação , Sensibilidade e Especificidade , Serina Endopeptidases/química , Serina Endopeptidases/metabolismo , Software , Relação Estrutura-Atividade , Especificidade por Substrato
7.
J Comput Biol ; 7(1-2): 59-70, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-10890388

RESUMO

This paper introduces a novel class of tree comparison problems strongly motivated by an important and cost intensive step in drug discovery pipeline viz., mapping cell bound receptors to the ligands they bind to and vice versa. Tree comparison studies motivated by problems such as virus-host tree comparison, gene-species tree comparison and consensus tree problem have been reported. None of these studies are applicable in our context because in all these problems, there is a well-defined mapping of the nodes the trees are built on across the set of trees being compared. A new class of tree comparison problems arises in cases where finding the correspondence among the nodes of the trees being compared is itself the problem. The problem arises while trying to find the interclass correspondence between the members of a pair of coevolving classes, e.g., cell bound receptors and their ligands. Given the evolution of the two classes, the combinatorial problem is to find a mapping among the leaves of the two trees that optimizes a given cost function. In this work we formulate various combinatorial optimization problems motivated by the aforementioned biological problem for the first time. We present hardness results, give an efficient algorithm for a restriction of the problem and demonstrate its applicability.


Assuntos
Receptores de Quimiocinas/metabolismo , Algoritmos , Evolução Biológica , Biometria , Quimiocinas/genética , Quimiocinas/metabolismo , Desenho de Fármacos , Ligantes , Receptores de Quimiocinas/genética
8.
Nucleic Acids Res ; 27(17): 3577-82, 1999 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-10446249

RESUMO

With the growing number of completely sequenced bacterial genes, accurate gene prediction in bacterial genomes remains an important problem. Although the existing tools predict genes in bacterial genomes with high overall accuracy, their ability to pinpoint the translation start site remains unsatisfactory. In this paper, we present a novel approach to bacterial start site prediction that takes into account multiple features of a potential start site, viz., ribosome binding site (RBS) binding energy, distance of the RBS from the start codon, distance from the beginning of the maximal ORF to the start codon, the start codon itself and the coding/non-coding potential around the start site. Mixed integer programing was used to optimize the discriminatory system. The accuracy of this approach is up to 90%, compared to 70%, using the most common tools in fully automated mode (that is, without expert human post-processing of results). The approach is evaluated using Bacillus subtilis, Escherichia coli and Pyrococcus furiosus. These three genomes cover a broad spectrum of bacterial genomes, since B.subtilis is a Gram-positive bacterium, E.coli is a Gram-negative bacterium and P. furiosus is an archaebacterium. A significant problem is generating a set of 'true' start sites for algorithm training, in the absence of experimental work. We found that sequence conservation between P. furiosus and the related Pyrococcus horikoshii clearly delimited the gene start in many cases, providing a sufficient training set.


Assuntos
Códon de Iniciação , Genoma Bacteriano , Biossíntese de Proteínas , Algoritmos , Sequência de Aminoácidos , Bacillus subtilis/genética , Sequência Conservada , Escherichia coli/genética , Dados de Sequência Molecular , Pyrococcus furiosus/genética , Homologia de Sequência de Aminoácidos
9.
J Comput Biol ; 4(3): 275-96, 1997.
Artigo em Inglês | MEDLINE | ID: mdl-9278060

RESUMO

We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill (1985), which models protein as a chain of amino acid residues that are either hydrophobic or polar, and hydrophobic interactions are the dominant initial driving force for the protein folding. Hart and Istrail (1996a) gave approximation algorithms for folding proteins on the cubic lattice under the HP model. In this paper, we examine the choice of a lattice by considering its algorithmic and geometric implications and argue that the triangular lattice is a more reasonable choice. We present a set of folding rules for a triangular lattice and analyze the approximation ratio they achieve. In addition, we introduce a generalization of the HP model to account for residues having different levels of hydrophobicity. After describing the biological foundation for this generalization, we show that in the new model we are able to achieve similar constant factor approximation guarantees on the triangular lattice as were achieved in the standard HP model. While the structures derived from our folding rules are probably still far from biological reality, we hope that having a set of folding rules with different properties will yield more interesting folds when combined.


Assuntos
Modelos Químicos , Conformação Proteica , Dobramento de Proteína , Algoritmos , Modelos Moleculares
10.
J Comput Biol ; 4(2): 119-25, 1997.
Artigo em Inglês | MEDLINE | ID: mdl-9228611

RESUMO

Optical mapping is a new technology for constructing restriction maps. Associated computational problems include aligning multiple partial restriction maps into a single "consensus" restriction map, and determining the correct orientation of each molecule, which was formalized as the Exclusive Binary Flip Cut (EBFC) Problem in (Muthukrishnan and Parida, 1997). Here we prove that the EBFC problem, as well as a number of its variants, are NP-complete. Therefore, they do not have efficient, that is, polynomial time solutions unless P = NP.


Assuntos
Algoritmos , Mapeamento por Restrição/métodos , Processamento de Imagem Assistida por Computador , Modelos Teóricos , Óptica e Fotônica
11.
Comput Appl Biosci ; 12(1): 19-24, 1996 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-8670615

RESUMO

Sequencing by hybridization (SBH) is a promising alternative to the classical DNA sequencing approaches. However, the resolving power of SBH is rather low: with 64kb sequencing chips, unknown DNA fragments only as long as 200 bp can be reconstructed in a single SBH experiment. To improve the resolving power of SBH, positional SBH (PSBH) has recently been suggested; this allows (with additional experimental work) approximate positions of every l-tuple in a target DNA fragment to be measured. We study the positional Eulerian path problem motivated by PSBH. The input to the positional eulerian path problem is an Eulerian graph G(V, E) in which every edge has an associated range of integers and the problem is to find an Eulerian path e1,...,e/E/ in G such that the range of ei contains i. We show that the positional Eulerian path problem is NP-complete even when the maximum out-degree (in-degree) of any vertex in the graph is 2. On a positive note we present polynomial algorithms to solve a special case of PSBH (bounded PSBH), where the range of the allowed positions for any edge is bounded by a constant (it corresponds to accurate experimental measurements of positions in PSBH). Moreover, if the positions of every l-tuple in an unknown DNA fragment of length n are measured with O(log n) error, then our algorithm runs in polynomial time. We also present an estimate of the resolving power of PSBH for a more realistic case when positions are measured with theta (n) error.


Assuntos
Algoritmos , Hibridização de Ácido Nucleico , Análise de Sequência de DNA/métodos , Matemática , Análise de Sequência de DNA/estatística & dados numéricos
12.
Genomics ; 30(2): 299-311, 1995 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-8586431

RESUMO

As large portions of related genomes are being sequenced, methods for comparing complete or nearly complete genomes, as opposed to comparing individual genes, are becoming progressively more important. A major, widespread phenomenon in genome evolution is the rearrangement of genes and gene blocks. There is, however, no consistent method for genome sequence comparison combined with the reconstruction of the evolutionary history of highly rearranged genomes. We developed a schema for genome sequence comparison that includes three successive steps: (i) comparison of all proteins encoded in different genomes and generation of genomic similarity plots; (ii) construction of an alphabet of conserved genes and gene blocks; and (iii) generation of most parsimonious genome rearrangement scenarios. The approach is illustrated by a comparison of the herpesvirus genomes that constitute the largest set of relatively long, complete genome sequences available to date. Herpesviruses have from 70 to about 200 genes; comparison of the amino acid sequences encoded in these genes results in an alphabet of about 30 conserved genes comprising 7 conserved blocks that are rearranged in the genomes of different herpesviruses. Algorithms to analyze rearrangements of multiple genomes were developed and applied to the derivation of most parsimonious scenarios of herpesvirus evolution under different evolutionary models. The developed approaches to genome comparison will be applicable to the comparative analysis of bacterial and eukaryotic genomes as soon as their sequences become available.


Assuntos
Rearranjo Gênico , Herpesviridae/genética , DNA Viral , Genoma Viral , Dados de Sequência Molecular , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...