Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
PLoS Comput Biol ; 8(4): e1002464, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22496636

RESUMO

High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions-particularly those expressed with low abundance- is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample.


Assuntos
Algoritmos , Fusão Gênica/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Dados de Sequência Molecular
2.
PLoS One ; 5(2): e9317, 2010 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-20174472

RESUMO

Due to growing throughput and shrinking cost, massively parallel sequencing is rapidly becoming an attractive alternative to microarrays for the genome-wide study of gene expression and copy number alterations in primary tumors. The sequencing of transcripts (RNA-Seq) should offer several advantages over microarray-based methods, including the ability to detect somatic mutations and accurately measure allele-specific expression. To investigate these advantages we have applied a novel, strand-specific RNA-Seq method to tumors and matched normal tissue from three patients with oral squamous cell carcinomas. Additionally, to better understand the genomic determinants of the gene expression changes observed, we have sequenced the tumor and normal genomes of one of these patients. We demonstrate here that our RNA-Seq method accurately measures allelic imbalance and that measurement on the genome-wide scale yields novel insights into cancer etiology. As expected, the set of genes differentially expressed in the tumors is enriched for cell adhesion and differentiation functions, but, unexpectedly, the set of allelically imbalanced genes is also enriched for these same cancer-related functions. By comparing the transcriptomic perturbations observed in one patient to his underlying normal and tumor genomes, we find that allelic imbalance in the tumor is associated with copy number mutations and that copy number mutations are, in turn, strongly associated with changes in transcript abundance. These results support a model in which allele-specific deletions and duplications drive allele-specific changes in gene expression in the developing tumor.


Assuntos
Carcinoma de Células Escamosas/genética , Perfilação da Expressão Gênica , Neoplasias Bucais/genética , Análise de Sequência de DNA/métodos , Desequilíbrio Alélico , Análise por Conglomerados , Deleção de Genes , Dosagem de Genes , Duplicação Gênica , Regulação Neoplásica da Expressão Gênica , Estudo de Associação Genômica Ampla/métodos , Humanos , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Reação em Cadeia da Polimerase Via Transcriptase Reversa
3.
Cancer Inform ; 5: 45-65, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-19390668

RESUMO

We present a computational approach for studying the effect of potential drug combinations on the protein networks associated with tumor cells. The majority of therapeutics are designed to target single proteins, yet most diseased states are characterized by a combination of many interacting genes and proteins. Using the topology of protein-protein interaction networks, our methods can explicitly model the possible synergistic effect of targeting multiple proteins using drug combinations in different cancer types. The methodology can be conceptually split into two distinct stages. Firstly, we integrate protein interaction and gene expression data to develop network representations of different tissue types and cancer types. Secondly, we model network perturbations to search for target combinations which cause significant damage to a relevant cancer network but only minimal damage to an equivalent normal network. We have developed sets of predicted target and drug combinations for multiple cancer types, which are validated using known cancer and drug associations, and are currently in experimental testing for prostate cancer. Our methods also revealed significant bias in curated interaction data sources towards targets with associations compared with high-throughput data sources from model organisms. The approach developed can potentially be applied to many other diseased cell types.

4.
Proc Natl Acad Sci U S A ; 103(42): 15582-7, 2006 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-17030794

RESUMO

Rhodococcus sp. RHA1 (RHA1) is a potent polychlorinated biphenyl-degrading soil actinomycete that catabolizes a wide range of compounds and represents a genus of considerable industrial interest. RHA1 has one of the largest bacterial genomes sequenced to date, comprising 9,702,737 bp (67% G+C) arranged in a linear chromosome and three linear plasmids. A targeted insertion methodology was developed to determine the telomeric sequences. RHA1's 9,145 predicted protein-encoding genes are exceptionally rich in oxygenases (203) and ligases (192). Many of the oxygenases occur in the numerous pathways predicted to degrade aromatic compounds (30) or steroids (4). RHA1 also contains 24 nonribosomal peptide synthase genes, six of which exceed 25 kbp, and seven polyketide synthase genes, providing evidence that rhodococci harbor an extensive secondary metabolism. Among sequenced genomes, RHA1 is most similar to those of nocardial and mycobacterial strains. The genome contains few recent gene duplications. Moreover, three different analyses indicate that RHA1 has acquired fewer genes by recent horizontal transfer than most bacteria characterized to date and far fewer than Burkholderia xenovorans LB400, whose genome size and catabolic versatility rival those of RHA1. RHA1 and LB400 thus appear to demonstrate that ecologically similar bacteria can evolve large genomes by different means. Overall, RHA1 appears to have evolved to simultaneously catabolize a diverse range of plant-derived compounds in an O(2)-rich environment. In addition to establishing RHA1 as an important model for studying actinomycete physiology, this study provides critical insights that facilitate the exploitation of these industrially important microorganisms.


Assuntos
Proteínas de Bactérias , Genoma Bacteriano , Metabolismo , Rhodococcus , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Evolução Biológica , Mapeamento Cromossômico , Dados de Sequência Molecular , Filogenia , Rhodococcus/genética , Rhodococcus/metabolismo
5.
BMC Genomics ; 7: 246, 2006 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-17010196

RESUMO

BACKGROUND: High throughput sequencing-by-synthesis is an emerging technology that allows the rapid production of millions of bases of data. Although the sequence reads are short, they can readily be used for re-sequencing. By re-sequencing the mRNA products of a cell, one may rapidly discover polymorphisms and splice variants particular to that cell. RESULTS: We present the utility of massively parallel sequencing by synthesis for profiling the transcriptome of a human prostate cancer cell-line, LNCaP, that has been treated with the synthetic androgen, R1881. Through the generation of approximately 20 megabases (MB) of EST data, we detect transcription from over 10,000 gene loci, 25 previously undescribed alternative splicing events involving known exons, and over 1,500 high quality single nucleotide discrepancies with the reference human sequence. Further, we map nearly 10,000 ESTs to positions on the genome where no transcription is currently predicted to occur. We also characterize various obstacles with using sequencing by synthesis for transcriptome analysis and propose solutions to these problems. CONCLUSION: The use of high-throughput sequencing-by-synthesis methods for transcript profiling allows the specific and sensitive detection of many of a cell's transcripts, and also allows the discovery of high quality base discrepancies, and alternative splice variants. Thus, this technology may provide an effective means of understanding various disease states, discovering novel targets for disease treatment, and discovery of novel transcripts.


Assuntos
Adenocarcinoma/genética , Androgênios , DNA Complementar/genética , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica/métodos , Neoplasias Hormônio-Dependentes/genética , Neoplasias da Próstata/genética , RNA Mensageiro/genética , RNA Neoplásico/genética , Análise de Sequência de DNA/métodos , Transcrição Gênica , Adenocarcinoma/patologia , Processamento Alternativo , Linhagem Celular Tumoral/química , Linhagem Celular Tumoral/efeitos dos fármacos , Mapeamento Cromossômico , Cromossomos Humanos/genética , Éxons/genética , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Masculino , Metribolona/farmacologia , Neoplasias Hormônio-Dependentes/patologia , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/patologia , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico
6.
Nucleic Acids Res ; 34(12): e83, 2006 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-16840527

RESUMO

We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, 'Classic' Massively Parallel Signature Sequencing (MPSS) and 'Signature' MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).


Assuntos
Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Animais , Composição de Bases , Citosina/análise , DNA/química , Genes , Guanina/análise , Humanos , Camundongos , Sondas de Ácido Nucleico/química
7.
Genome Res ; 16(6): 768-75, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16741162

RESUMO

We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity.


Assuntos
Cryptococcus/genética , Drosophila melanogaster/genética , Genoma , Pan troglodytes/genética , Mapeamento Físico do Cromossomo , Análise de Sequência de DNA/métodos , Animais , Cromossomos Artificiais Bacterianos/genética , Bases de Dados de Ácidos Nucleicos , Software
8.
Genome Res ; 16(6): 796-803, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16672307

RESUMO

Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization.


Assuntos
Sequência de Bases , Biblioteca Gênica , Poliploidia , Xenopus laevis/genética , Xenopus/genética , Animais , Evolução Molecular , Expressão Gênica , Genes Duplicados , Genoma , Dados de Sequência Molecular , Fases de Leitura Aberta/genética , Filogenia , Homologia de Sequência do Ácido Nucleico
9.
Proc Natl Acad Sci U S A ; 102(51): 18485-90, 2005 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-16352711

RESUMO

We analyzed 8.55 million LongSAGE tags generated from 72 libraries. Each LongSAGE library was prepared from a different mouse tissue. Analysis of the data revealed extensive overlap with existing gene data sets and evidence for the existence of approximately 24,000 previously undescribed genomic loci. The visual cortex, pancreas, mammary gland, preimplantation embryo, and placenta contain the largest number of differentially expressed transcripts, 25% of which are previously undescribed loci.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Camundongos Endogâmicos C57BL/genética , Camundongos/genética , Processamento Alternativo/genética , Animais , Família Multigênica/genética , RNA não Traduzido/genética , Reprodutibilidade dos Testes , Transcrição Gênica/genética
10.
Genomics ; 86(4): 476-88, 2005 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16098712

RESUMO

Large amounts of gene expression data from several different technologies are becoming available to the scientific community. A common practice is to use these data to calculate global gene coexpression for validation or integration of other "omic" data. To assess the utility of publicly available datasets for this purpose we have analyzed Homo sapiens data from 1202 cDNA microarray experiments, 242 SAGE libraries, and 667 Affymetrix oligonucleotide microarray experiments. The three datasets compared demonstrate significant but low levels of global concordance (rc<0.11). Assessment against Gene Ontology (GO) revealed that all three platforms identify more coexpressed gene pairs with common biological processes than expected by chance. As the Pearson correlation for a gene pair increased it was more likely to be confirmed by GO. The Affymetrix dataset performed best individually with gene pairs of correlation 0.9-1.0 confirmed by GO in 74% of cases. However, in all cases, gene pairs confirmed by multiple platforms were more likely to be confirmed by GO. We show that combining results from different expression platforms increases reliability of coexpression. A comparison with other recently published coexpression studies found similar results in terms of performance against GO but with each method producing distinctly different gene pair lists.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/normas , Estatística como Assunto
11.
Biotechniques ; 38(5): 715-6, 718, 720, 2005 May.
Artigo em Inglês | MEDLINE | ID: mdl-15945370

RESUMO

We have designed and implemented a system to manage whole genome shotgun sequences and whole genome sequence assembly data flow. The Sequence Assembly Manager (SAM) consists primarily of a MySQL relational database and Perl applications designed to easily manipulate and coordinate the analysis of sequence information and to view and report genome assembly progress through its Common Gateway Interface (CGI) web interface. The application includes a tool to compare sequence assemblies to fingerprint maps that has been used successfully to improve and validate both maps and sequence assemblies of the Rhodococcus sp.RHAI and Cryptococcus neoformans WM276 genomes.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Impressões Digitais de DNA/métodos , Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA/métodos , Software , Interface Usuário-Computador
12.
Genome Res ; 14(5): 956-62, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15123592

RESUMO

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization.


Assuntos
Genômica/métodos , Imageamento Tridimensional/tendências , Software/tendências , Animais , Biologia Computacional/métodos , Gráficos por Computador/tendências , Sistemas de Gerenciamento de Base de Dados/tendências , Humanos , Camundongos , Ratos , Design de Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA