Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Int J Cancer ; 137(1): 86-95, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25422082

RESUMO

Gastric cancer is one of the most prevalent and aggressive cancers worldwide, and its molecular mechanism remains largely elusive. Here we report the genomic landscape in primary gastric adenocarcinoma of human, based on the complete genome sequences of five pairs of cancer and matching normal samples. In total, 103,464 somatic point mutations, including 407 nonsynonymous ones, were identified and the most recurrent mutations were harbored by Mucins (MUC3A and MUC12) and transcription factors (ZNF717, ZNF595 and TP53). 679 genomic rearrangements were detected, which affect 355 protein-coding genes; and 76 genes show copy number changes. Through mapping the boundaries of the rearranged regions to the folded three-dimensional structure of human chromosomes, we determined that 79.6% of the chromosomal rearrangements happen among DNA fragments in close spatial proximity, especially when two endpoints stay in a similar replication phase. We demonstrated evidences that microhomology-mediated break-induced replication was utilized as a mechanism in inducing ∼40.9% of the identified genomic changes in gastric tumor. Our data analyses revealed potential integrations of Helicobacter pylori DNA into the gastric cancer genomes. Overall a large set of novel genomic variations were detected in these gastric cancer genomes, which may be essential to the study of the genetic basis and molecular mechanism of the gastric tumorigenesis.


Assuntos
Adenocarcinoma/genética , Aberrações Cromossômicas , Variação Genética , Infecções por Helicobacter/genética , Helicobacter pylori/fisiologia , Neoplasias Gástricas/genética , Adenocarcinoma/patologia , Adenocarcinoma/virologia , Idoso , Variações do Número de Cópias de DNA , DNA Viral/análise , Genoma Humano , Humanos , Masculino , Pessoa de Meia-Idade , Mutação Puntual , Polimorfismo de Nucleotídeo Único , Neoplasias Gástricas/patologia , Neoplasias Gástricas/virologia
2.
Nucleic Acids Res ; 39(4): 1197-207, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20965966

RESUMO

This report describes an integrated study on identification of potential markers for gastric cancer in patients' cancer tissues and sera based on: (i) genome-scale transcriptomic analyses of 80 paired gastric cancer/reference tissues and (ii) computational prediction of blood-secretory proteins supported by experimental validation. Our findings show that: (i) 715 and 150 genes exhibit significantly differential expressions in all cancers and early-stage cancers versus reference tissues, respectively; and a substantial percentage of the alteration is found to be influenced by age and/or by gender; (ii) 21 co-expressed gene clusters have been identified, some of which are specific to certain subtypes or stages of the cancer; (iii) the top-ranked gene signatures give better than 94% classification accuracy between cancer and the reference tissues, some of which are gender-specific; and (iv) 136 of the differentially expressed genes were predicted to have their proteins secreted into blood, 81 of which were detected experimentally in the sera of 13 validation samples and 29 found to have differential abundances in the sera of cancer patients versus controls. Overall, the novel information obtained in this study has led to identification of promising diagnostic markers for gastric cancer and can benefit further analyses of the key (early) abnormalities during its development.


Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Gástricas/genética , Adulto , Fatores Etários , Idoso , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Biologia Computacional , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Fatores Sexuais , Neoplasias Gástricas/sangue , Neoplasias Gástricas/classificação
3.
Proc Natl Acad Sci U S A ; 107(14): 6310-5, 2010 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-20308592

RESUMO

It is generally known that bacterial genes working in the same biological pathways tend to group into operons, possibly to facilitate cotranscription and to provide stoichiometry. However, very little is understood about what may determine the global arrangement of bacterial genes in a genome beyond the operon level. Here we present evidence that the global arrangement of operons in a bacterial genome is largely influenced by the tendency that a bacterium keeps its operons encoding the same biological pathway in nearby genomic locations, and by the tendency to keep operons involved in multiple pathways in locations close to the other members of their participating pathways. We also observed that the activation frequencies of pathways also influence the genomic locations of their encoding operons, tending to have operons of the more frequently activated pathways more tightly clustered together. We have quantitatively assessed the influences on the global genomic arrangement of operons by different factors. We found that the current arrangements of operons in most of the bacterial genomes we studied tend to minimize the overall distance between consecutive operons of a same pathway across all pathways encoded in the genome.


Assuntos
Bacillus subtilis/genética , Escherichia coli/genética , Genoma Bacteriano , Óperon , Família Multigênica
4.
Nucleic Acids Res ; 37(Database issue): D459-63, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18988623

RESUMO

We present a database DOOR (Database for prOkaryotic OpeRons) containing computationally predicted operons of all the sequenced prokaryotic genomes. All the operons in DOOR are predicted using our own prediction program, which was ranked to be the best among 14 operon prediction programs by a recent independent review. Currently, the DOOR database contains operons for 675 prokaryotic genomes, and supports a number of search capabilities to facilitate easy access and utilization of the information stored in it. (1) Querying the database: the database provides a search capability for a user to find desired operons and associated information through multiple querying methods. (2) Searching for similar operons: the database provides a search capability for a user to find operons that have similar composition and structure to a query operon. (3) Prediction of cis-regulatory motifs: the database provides a capability for motif identification in the promoter regions of a user-specified group of possibly coregulated operons, using motif-finding tools. (4) Operons for RNA genes: the database includes operons for RNA genes. (5) OperonWiki: the database provides a wiki page (OperonWiki) to facilitate interactions between users and the developer of the database. We believe that DOOR provides a useful resource to many biologists working on bacteria and archaea, which can be accessed at http://csbl1.bmb.uga.edu/OperonDB.


Assuntos
Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Óperon , Genômica , Software
5.
BMC Genomics ; 11: 291, 2010 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-20459751

RESUMO

BACKGROUND: Osmotic stress is caused by sudden changes in the impermeable solute concentration around a cell, which induces instantaneous water flow in or out of the cell to balance the concentration. Very little is known about the detailed response mechanism to osmotic stress in marine Synechococcus, one of the major oxygenic phototrophic cyanobacterial genera that contribute greatly to the global CO2 fixation. RESULTS: We present here a computational study of the osmoregulation network in response to hyperosmotic stress of Synechococcus sp strain WH8102 using comparative genome analyses and computational prediction. In this study, we identified the key transporters, synthetases, signal sensor proteins and transcriptional regulator proteins, and found experimentally that of these proteins, 15 genes showed significantly changed expression levels under a mild hyperosmotic stress. CONCLUSIONS: From the predicted network model, we have made a number of interesting observations about WH8102. Specifically, we found that (i) the organism likely uses glycine betaine as the major osmolyte, and others such as glucosylglycerol, glucosylglycerate, trehalose, sucrose and arginine as the minor osmolytes, making it efficient and adaptable to its changing environment; and (ii) sigma38, one of the seven types of sigma factors, probably serves as a global regulator coordinating the osmoregulation network and the other relevant networks.


Assuntos
Polissacarídeos/metabolismo , Synechococcus/química , Synechococcus/metabolismo , Equilíbrio Hidroeletrolítico , Arginina/metabolismo , Betaína/metabolismo , Synechococcus/enzimologia
6.
Nucleic Acids Res ; 35(7): 2125-40, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17353185

RESUMO

Functional classification of genes represents a fundamental problem to many biological studies. Most of the existing classification schemes are based on the concepts of homology and orthology, which were originally introduced to study gene evolution but might not be the most appropriate for gene function prediction, particularly at high resolution level. We have recently developed a scheme for hierarchical classification of genes (HCGs) in prokaryotes. In the HCG scheme, the functional equivalence relationships among genes are first assessed through a careful application of both sequence similarity and genomic neighborhood information; and genes are then classified into a hierarchical structure of clusters, where genes in each cluster are functionally equivalent at some resolution level, and the level of resolution goes higher as the clusters become increasingly smaller traveling down the hierarchy. The HCG scheme is validated through comparisons with the taxonomy of the prokaryotic genomes, Clusters of Orthologous Groups (COGs) of genes and the Pfam system. We have applied the HCG scheme to 224 complete prokaryotic genomes, and constructed a HCG database consisting of a forest of 5339 multi-level and 15 770 single-level trees of gene clusters covering approximately 93% of the genes of these 224 genomes. The validation results indicate that the HCG scheme not only captures the key features of the existing classification schemes but also provides a much richer organization of genes which can be used for functional prediction of genes at higher resolution and to help reveal evolutionary trace of the genes.


Assuntos
Biologia Computacional/métodos , Genes Bacterianos , Genômica/métodos , Bactérias/classificação , Análise por Conglomerados , Proteínas de Ligação a DNA/classificação , Proteínas de Ligação a DNA/genética , Genoma Bacteriano , Ribonucleotídeo Redutases/classificação , Ribonucleotídeo Redutases/genética
7.
Nucleic Acids Res ; 35(1): 288-98, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17170009

RESUMO

We have carried out a systematic analysis of the contribution of a set of selected features that include three new features to the accuracy of operon prediction. Our analyses have led to a number of new insights about operon prediction, including that (i) different features have different levels of discerning power when used on adjacent gene pairs with different ranges of intergenic distance, (ii) certain features are universally useful for operon prediction while others are more genome-specific and (iii) the prediction reliability of operons is dependent on intergenic distances. Based on these new insights, our newly developed operon-prediction program achieves more accurate operon prediction than the previous ones, and it uses features that are most readily available from genomic sequences. Our prediction results indicate that our (non-linear) decision tree-based classifier can predict operons in a prokaryotic genome very accurately when a substantial number of operons in the genome are already known. For example, the prediction accuracy of our program can reach 90.2 and 93.7% on Bacillus subtilis and Escherichia coli genomes, respectively. When no such information is available, our (linear) logistic function-based classifier can reach the prediction accuracy at 84.6 and 83.3% for E.coli and B.subtilis, respectively.


Assuntos
Genoma Bacteriano , Genômica/métodos , Óperon , Bacillus subtilis/genética , Classificação/métodos , Escherichia coli/genética
8.
BMC Bioinformatics ; 9: 546, 2008 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-19091119

RESUMO

BACKGROUND: Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1

Assuntos
Algoritmos , Sequência de Bases/genética , Biologia Computacional/métodos , Genoma/genética , Genômica/métodos , Especificidade da Espécie
9.
BMC Genomics ; 9: 36, 2008 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-18218090

RESUMO

BACKGROUND: Mobile genetic elements (MGEs) play an essential role in genome rearrangement and evolution, and are widely used as an important genetic tool. RESULTS: In this article, we present genetic maps of recently active Insertion Sequence (IS) elements, the simplest form of MGEs, for all sequenced cyanobacteria and archaea, predicted based on the previously identified ~1,500 IS elements. Our predicted IS maps are consistent with the NCBI annotations of the IS elements. By linking the predicted IS elements to various characteristics of the organisms under study and the organism's living conditions, we found that (a) the activities of IS elements heavily depend on the environments where the host organisms live; (b) the number of recently active IS elements in a genome tends to increase with the genome size; (c) the flanking regions of the recently active IS elements are significantly enriched with genes encoding DNA binding factors, transporters and enzymes; and (d) IS movements show no tendency to disrupt operonic structures. CONCLUSION: This is the first genome-scale maps of IS elements with detailed structural information on the sequence level. These genetic maps of recently active IS elements and the several interesting observations would help to improve our understanding of how IS elements proliferate and how they are involved in the evolution of the host genomes.


Assuntos
Archaea/genética , Archaea/metabolismo , Cianobactérias/genética , Cianobactérias/metabolismo , Elementos de DNA Transponíveis , Mutagênese Insercional , Sequência de Bases , Mapeamento Cromossômico , Cromossomos Bacterianos , Genoma Arqueal , Genoma Bacteriano , Modelos Genéticos , Conformação de Ácido Nucleico , Fases de Leitura Aberta , Filogenia , Sequências Repetitivas de Ácido Nucleico , Moldes Genéticos , Sequências Repetidas Terminais
10.
J Bioinform Comput Biol ; 6(3): 585-602, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18574864

RESUMO

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However, the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletion, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as insertion/deletion (indel) frequency arrays (IFAs). By applying IFAs to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity. We have also demonstrated that the application of this information can lead to an improvement in fold recognition.


Assuntos
Biologia Computacional , Mutação INDEL , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Dados de Sequência Molecular , Mutagênese Insercional/métodos , Conformação Proteica , Deleção de Sequência , Software , Relação Estrutura-Atividade
11.
Comput Biol Chem ; 32(3): 176-84, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18440870

RESUMO

Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.


Assuntos
Algoritmos , Classificação/métodos , Biologia Computacional/métodos , Genes Bacterianos/fisiologia , Genômica/métodos , Células Procarióticas/fisiologia , Análise por Conglomerados , Simulação por Computador , Genes Bacterianos/genética , Sensibilidade e Especificidade
12.
Nucleic Acids Res ; 34(3): 1050-65, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16473855

RESUMO

Deciphering the regulatory networks encoded in the genome of an organism represents one of the most interesting and challenging tasks in the post-genome sequencing era. As an example of this problem, we have predicted a detailed model for the nitrogen assimilation network in cyanobacterium Synechococcus sp. WH 8102 (WH8102) using a computational protocol based on comparative genomics analysis and mining experimental data from related organisms that are relatively well studied. This computational model is in excellent agreement with the microarray gene expression data collected under ammonium-rich versus nitrate-rich growth conditions, suggesting that our computational protocol is capable of predicting biological pathways/networks with high accuracy. We then refined the computational model using the microarray data, and proposed a new model for the nitrogen assimilation network in WH8102. An intriguing discovery from this study is that nitrogen assimilation affects the expression of many genes involved in photosynthesis, suggesting a tight coordination between nitrogen assimilation and photosynthesis processes. Moreover, for some of these genes, this coordination is probably mediated by NtcA through the canonical NtcA promoters in their regulatory regions.


Assuntos
Biologia Computacional/métodos , Regulação Bacteriana da Expressão Gênica , Genômica/métodos , Modelos Genéticos , Nitrogênio/metabolismo , Synechococcus/genética , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Proteínas de Ligação a DNA/genética , Perfilação da Expressão Gênica , Genoma Bacteriano , Óperon , Fotossíntese/genética , Filogenia , Regiões Promotoras Genéticas , Synechococcus/metabolismo , Fatores de Transcrição/genética
13.
BMC Genomics ; 8: 156, 2007 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-17559671

RESUMO

BACKGROUND: Phosphorus is an essential element for all life forms. However, it is limiting in most ecological environments where cyanobacteria inhabit. Elucidation of the phosphorus assimilation pathways in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. However, a systematic study of the Pho regulon, the core of the phosphorus assimilation pathway in a cyanobacterium, is hitherto lacking. RESULTS: We have predicted and analyzed the Pho regulons in 19 sequenced cyanobacterial genomes using a highly effective scanning algorithm that we have previously developed. Our results show that different cyanobacterial species/ecotypes may encode diverse sets of genes responsible for the utilization of various sources of phosphorus, ranging from inorganic phosphate, phosphodiester, to phosphonates. Unlike in E. coli, some cyanobacterial genes that are directly involved in phosphorus assimilation seem to not be under the regulation of the regulator SphR (orthologue of PhoB in E coli.) in some species/ecotypes. On the other hand, SphR binding sites are found for genes known to play important roles in other biological processes. These genes might serve as bridging points to coordinate the phosphorus assimilation and other biological processes. More interestingly, in three cyanobacterial genomes where no sphR gene is encoded, our results show that there is virtually no functional SphR binding site, suggesting that transcription regulators probably play an important role in retaining their binding sites. CONCLUSION: The Pho regulons in cyanobacteria are highly diversified to accommodate to their respective living environments. The phosphorus assimilation pathways in cyanobacteria are probably tightly coupled to a number of other important biological processes. The loss of a regulator may lead to the rapid loss of its binding sites in a genome.


Assuntos
Cianobactérias/genética , Genes Bacterianos/genética , Variação Genética , Fósforo/metabolismo , Filogenia , Regulon/genética , Sítios de Ligação/genética , Biologia Computacional/métodos , Sequência Conservada/genética , Cianobactérias/metabolismo , Monoéster Fosfórico Hidrolases/genética , Fosfotransferases/genética
14.
J Bioinform Comput Biol ; 5(4): 817-38, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17787058

RESUMO

One popular approach to prediction of binding motifs of transcription factors is to model the problem as to search for a group of l-mers (motifs), for some l > 0, one from each of the provided promoter regions of a group of co-expressed genes, that exhibit high information content when aligned without gaps. In our current work, we assume that these desired l-mers have evolved from a common ancestor, each of which has mutations in at most k-positions from the common ancestor, where k is substantially smaller than l. This implies that these l-mers should belong to the k-neighborhood of their common ancestor, measured in terms of Hamming distance. If the ancestor is given, then the problem for finding these l-mers becomes trivial. Unfortunately, the problem of identifying the unknown ancestor is probably as hard as the problem of predicting the motifs themselves. Our goal is to identify a set of l-mers that slightly violate the k-neighborhood of a putative ancestor, but capture all the desired motifs, which will lead to an efficient way for identification of the desired motifs. The main contributions of this paper are in four aspects: (a) we have derived nontrivial lower and upper bounds of information content for a set of l-mers that differ from an unknown ancestor in no more than k positions; (b) we have defined a new distance between two sequences and a k-pseudo-neighborhood, based on the new distance, that contains the k-neighborhood, defined by Hamming distance, of the to-be-defined ancestor; (c) we have developed an algorithm to minimize the sum of all the distances between a predicted ancestor motif and a group of l-mers from the provided promoter regions, using the new distance; and (d) we have tested PROMOCO and compared its prediction results performance with two other prediction programs. The algorithm, implemented as a computer software program PROMOCO, has been used to find all conserved motifs in a set of provided promoter sequences. Our preliminary application of PROMOCO shows that it achieves better or comparable prediction results, when compared to popular programs for identification of cis regulatory binding motifs. A limitation of the algorithm is that it does not work well when the size of the set of provided promoter sequences is too small or when desired motifs appear in only small portion of the given sequences.


Assuntos
Algoritmos , Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição , Sítios de Ligação/genética , Análise por Conglomerados , Simulação por Computador , Sequência Consenso , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Valor Preditivo dos Testes , Regiões Promotoras Genéticas , Alinhamento de Sequência , Fatores de Transcrição/genética
15.
Nucleic Acids Res ; 33(16): 5156-71, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16157864

RESUMO

We have developed a new method for prediction of cis-regulatory binding sites and applied it to predicting NtcA regulated genes in cyanobacteria. The algorithm rigorously utilizes concurrence information of multiple binding sites in the upstream region of a gene and that in the upstream regions of its orthologues in related genomes. A probabilistic model was developed for the evaluation of prediction reliability so that the prediction false positive rate could be well controlled. Using this method, we have predicted multiple new members of the NtcA regulons in nine sequenced cyanobacterial genomes, and showed that the false positive rates of the predictions have been reduced on an average of 40-fold compared to the conventional methods. A detailed analysis of the predictions in each genome showed that a significant portion of our predictions are consistent with previously published results about individual genes. Intriguingly, NtcA promoters are found for many genes involved in various stages of photosynthesis. Although photosynthesis is known to be tightly coordinated with nitrogen assimilation, very little is known about the underlying mechanism. We postulate for the fist time that these genes serve as the regulatory points to orchestrate these two important processes in a cyanobacterial cell.


Assuntos
Proteínas de Bactérias/metabolismo , Cianobactérias/genética , Genômica/métodos , Nitrogênio/metabolismo , Fotossíntese/genética , Regiões Promotoras Genéticas , Regulon , Fatores de Transcrição/metabolismo , Algoritmos , Sequência de Aminoácidos , Proteínas de Bactérias/química , Sítios de Ligação , Sequência Consenso , Sequência Conservada , Cianobactérias/metabolismo , Pegada de DNA , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Regulação Bacteriana da Expressão Gênica , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Fatores de Transcrição/química
16.
Nucleic Acids Res ; 33(9): 2822-37, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15901854

RESUMO

We present a computational method for the prediction of functional modules encoded in microbial genomes. In this work, we have also developed a formal measure to quantify the degree of consistency between the predicted and the known modules, and have carried out statistical significance analysis of consistency measures. We first evaluate the functional relationship between two genes from three different perspectives--phylogenetic profile analysis, gene neighborhood analysis and Gene Ontology assignments. We then combine the three different sources of information in the framework of Bayesian inference, and we use the combined information to measure the strength of gene functional relationship. Finally, we apply a threshold-based method to predict functional modules. By applying this method to Escherichia coli K12, we have predicted 185 functional modules. Our predictions are highly consistent with the previously known functional modules in E.coli. The application results have demonstrated that our approach is highly promising for the prediction of functional modules encoded in a microbial genome.


Assuntos
Biologia Computacional/métodos , Genoma Bacteriano , Genômica/métodos , Teorema de Bayes , Interpretação Estatística de Dados , Escherichia coli/genética , Filogenia , Reprodutibilidade dos Testes
17.
J Bioinform Comput Biol ; 4(5): 999-1014, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17099938

RESUMO

Many studies have used microarray technology to identify the molecular signatures of human cancer, yet the critical features of these often unmanageably large set of signatures remain elusive. We have investigated co-expression pattern in four subtypes of ovarian cancer from 104 cancer patients using covariance analysis, treating each subtype of ovarian cancer as a distinct disease entity. We sought gene pairs that were transcriptionally co-expressed in one or multiple subtypes of ovarian cancer, establishing a high confidence network of 87 genes interconnected by significantly high co-expression links that were observed in at least two subtypes of ovarian cancer. We have shown that certain groups of co-expressed gene pairs are cancer subtype specific, through demonstrating significant differences in co-expression patterns of gene pairs between subtypes of ovarian cancer. In addition, we identified a set of 24 genes that classified patients into specific cancer subtypes with a misclassification error rate of less than 5%. Our findings illustrate how large public microarray gene expression datasets could be exploited for identification of cancer subtype specific molecular signatures, and how to classify cancer patients into specific subtypes of cancer using gene expression profiles.


Assuntos
Biomarcadores Tumorais/metabolismo , Diagnóstico por Computador/métodos , Perfilação da Expressão Gênica/métodos , Proteínas de Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/metabolismo , Feminino , Humanos , Família Multigênica/genética , Análise de Regressão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
Genome Inform ; 17(2): 248-58, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17503397

RESUMO

Integer programming is a combinatorial optimization method that has been successfully applied to the protein threading problem. We seek to expand the model optimized by this technique to allow for a more accurate description of protein threading. We have developed and implemented an expanded model of integer programming that has the capability to model secondary structure element deletion, which was not possible in previous version of integer programming based optimization.


Assuntos
Modelos Químicos , Programação Linear , Estrutura Secundária de Proteína , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Modelos Moleculares , Dobramento de Proteína , Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína , Software , Moldes Genéticos
19.
Nucleic Acids Res ; 32(2): 551-61, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14744980

RESUMO

Residual dipolar coupling (RDC) represents one of the most exciting emerging NMR techniques for protein structure studies. However, solving a protein structure using RDC data alone is still a highly challenging problem. We report here a computer program, RDC-PROSPECT, for protein structure prediction based on a structural homolog or analog of the target protein in the Protein Data Bank (PDB), which best aligns with the (15)N-(1)H RDC data of the protein recorded in a single ordering medium. Since RDC-PROSPECT uses only RDC data and predicted secondary structure information, its performance is virtually independent of sequence similarity between a target protein and its structural homolog/analog, making it applicable to protein targets beyond the scope of current protein threading techniques. We have tested RDC-PROSPECT on all (15)N-(1)H RDC data (representing 43 proteins) deposited in the BioMagResBank (BMRB) database. The program correctly identified structural folds for 83.7% of the target proteins, and achieved an average alignment accuracy of 98.1% residues within a four-residue shift.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Software , Simulação por Computador , Bases de Dados de Proteínas , Dobramento de Proteína , Estrutura Secundária de Proteína , Sensibilidade e Especificidade
20.
Nucleic Acids Res ; 31(19): 5582-9, 2003 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-14500821

RESUMO

Massive amounts of gene expression data are generated using microarrays for functional studies of genes and gene expression data clustering is a useful tool for studying the functional relationship among genes in a biological process. We have developed a computer package EXCAVATOR for clustering gene expression profiles based on our new framework for representing gene expression data as a minimum spanning tree. EXCAVATOR uses a number of rigorous and efficient clustering algorithms. This program has a number of unique features, including capabilities for: (i) data- constrained clustering; (ii) identification of genes with similar expression profiles to pre-specified seed genes; (iii) cluster identification from a noisy background; (iv) computational comparison between different clustering results of the same data set. EXCAVATOR can be run from a Unix/Linux/DOS shell, from a Java interface or from a Web server. The clustering results can be visualized as colored figures and 2-dimensional plots. Moreover, EXCAVATOR provides a wide range of options for data formats, distance measures, objective functions, clustering algorithms, methods to choose number of clusters, etc. The effectiveness of EXCAVATOR has been demonstrated on several experimental data sets. Its performance compares favorably against the popular K-means clustering method in terms of clustering quality and computing time.


Assuntos
Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Software , Algoritmos , Animais , Análise por Conglomerados , Genes cdc , Ratos , Design de Software , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA