RESUMO
Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes.
Assuntos
Exoma , Genoma de Planta , Genômica/métodos , Hordeum/genética , Genômica/tendências , Ploidias , Polimorfismo de Nucleotídeo Único , Triticum/genéticaRESUMO
Sequence capture technologies, pioneered in mammalian genomes, enable the resequencing of targeted genomic regions. Most capture protocols require blocking DNA, the production of which in large quantities can prove challenging. A blocker-free, two-stage capture protocol was developed using NimbleGen arrays. The first capture depletes the library of repetitive sequences, while the second enriches for target loci. This strategy was used to resequence non-repetitive portions of an approximately 2.2 Mb chromosomal interval and a set of 43 genes dispersed in the 2.3 Gb maize genome. This approach achieved approximately 1800-3000-fold enrichment and 80-98% coverage of targeted bases. More than 2500 SNPs were identified in target genes. Low rates of false-positive SNP predictions were obtained, even in the presence of captured paralogous sequences. Importantly, it was possible to recover novel sequences from non-reference alleles. The ability to design novel repeat-subtraction and target capture arrays makes this technology accessible in any species.
Assuntos
Genoma de Planta , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , Hibridização Genômica Comparativa , DNA de Plantas/genética , Genes de Plantas , Polimorfismo de Nucleotídeo Único , Zea mays/genéticaRESUMO
Forward genetics (phenotype-driven approaches) remain the primary source for allelic variants in the mouse. Unfortunately, the gap between observable phenotype and causative genotype limits the widespread use of spontaneous and induced mouse mutants. As alternatives to traditional positional cloning and mutation detection approaches, sequence capture and next-generation sequencing technologies can be used to rapidly sequence subsets of the genome. Application of these technologies to mutation detection efforts in the mouse has the potential to significantly reduce the time and resources required for mutation identification by abrogating the need for high-resolution genetic mapping, long-range PCR, and sequencing of individual PCR amplimers. As proof of principle, we used array-based sequence capture and pyrosequencing to sequence an allelic series from the classically defined Kit locus (approximately 200 kb) from each of five noncomplementing Kit mutants (one known allele and four unknown alleles) and have successfully identified and validated a nonsynonymous coding mutation for each allele. These data represent the first documentation and validation that these new technologies can be used to efficiently discover causative mutations. Importantly, these data also provide a specific methodological foundation for the development of large-scale mutation detection efforts in the laboratory mouse.
Assuntos
Análise Mutacional de DNA/métodos , Camundongos/genética , Mutação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Alelos , Sequência de Aminoácidos , Animais , Sequência de Bases , Feminino , Masculino , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos DBA , Dados de Sequência Molecular , Alinhamento de SequênciaRESUMO
The field of high throughput proteomics has spawned a number of mass spectrometry-based technologies, which enable the quantitative analysis of protein expression. One of these technologies is iTRAQ (trademarked by Applied Biosystems), which through the use of isobaric tags, enables the quantitation of up to eight complex protein samples in a single multiplexed analysis. Isobaric tagging methods are emerging as an important tool to study protein expression dynamics. In this report, we describe iTRAQPak, a free software package developed in the R statistical and visualization environment that can be applied to the analysis of 8-plex expression data. The utility of this package is demonstrated through its application to the analysis of 8-plex iTRAQ protein expression data obtained from cerebrospinal fluid samples from Alzheimer's disease subjects involved in a Phase I drug trial.
Assuntos
Doença de Alzheimer/líquido cefalorraquidiano , Perfilação da Expressão Gênica/métodos , Proteômica , Software , Humanos , Peptídeos/líquido cefalorraquidianoRESUMO
PeerGAD is a web-based database-driven application that allows community-wide peer-reviewed annotation of prokaryotic genome sequences. The application was developed to support the annotation of the Pseudomonas syringae pv. tomato strain DC3000 genome sequence and is easily portable to other genome sequence annotation projects. PeerGAD incorporates several innovative design and operation features and accepts annotations pertaining to gene naming, role classification, gene translation and annotation derivation. The annotator tool in PeerGAD is built around a genome browser that offers users the ability to search and navigate the genome sequence. Because the application encourages annotation of the genome sequence directly by researchers and relies on peer review, it circumvents the need for an annotation curator while providing added value to the annotation data. Support for the Gene Ontology vocabulary, a structured and controlled vocabulary used in classification of gene roles, is emphasized throughout the system. Here we present the underlying concepts integral to the functionality of PeerGAD.
Assuntos
Biologia Computacional/métodos , Genoma Bacteriano , Genômica/métodos , Internet , Revisão da Pesquisa por Pares , Células Procarióticas , GenomaRESUMO
There is significant interest in the identification of effective biomarkers for Alzheimer's disease. Such biomarkers could aid in the clinical diagnosis of the disease and may be useful in assessing the efficacy of various treatment strategies. The search for biomarkers often includes the analysis of changes in cerebrospinal fluid protein expression that correlate with disease. These changes can be measured using a variety of technologies for protein expression profiling. Although there is great promise in the application of these methods to biomarker discovery based on some preliminary observations, there are significant issues in the capabilities of most of these technologies that have limited their effective application. The most recent literature involving proteomic discovery of new cerebrospinal fluid biomarkers for Alzheimer's disease is reviewed.
Assuntos
Doença de Alzheimer/líquido cefalorraquidiano , Biomarcadores/líquido cefalorraquidiano , Proteômica/métodos , Líquido Cefalorraquidiano/química , Eletroforese Capilar , HumanosRESUMO
The major histocompatibility complex (MHC) is one of the most variable and gene-dense regions of the human genome. Most studies of the MHC, and associated regions, focus on minor variants and HLA typing, many of which have been demonstrated to be associated with human disease susceptibility and metabolic pathways. However, the detection of variants in the MHC region, and diagnostic HLA typing, still lacks a coherent, standardized, cost effective and high coverage protocol of clinical quality and reliability. In this paper, we presented such a method for the accurate detection of minor variants and HLA types in the human MHC region, using high-throughput, high-coverage sequencing of target regions. A probe set was designed to template upon the 8 annotated human MHC haplotypes, and to encompass the 5 megabases (Mb) of the extended MHC region. We deployed our probes upon three, genetically diverse human samples for probe set evaluation, and sequencing data show that â¼97% of the MHC region, and over 99% of the genes in MHC region, are covered with sufficient depth and good evenness. 98% of genotypes called by this capture sequencing prove consistent with established HapMap genotypes. We have concurrently developed a one-step pipeline for calling any HLA type referenced in the IMGT/HLA database from this target capture sequencing data, which shows over 96% typing accuracy when deployed at 4 digital resolution. This cost-effective and highly accurate approach for variant detection and HLA typing in the MHC region may lend further insight into immune-mediated diseases studies, and may find clinical utility in transplantation medicine research. This one-step pipeline is released for general evaluation and use by the scientific community.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Complexo Principal de Histocompatibilidade/genética , Genótipo , Haplótipos/genética , Teste de Histocompatibilidade , HumanosRESUMO
We report the development and optimization of reagents for in-solution, hybridization-based capture of the mouse exome. By validating this approach in a multiple inbred strains and in novel mutant strains, we show that whole exome sequencing is a robust approach for discovery of putative mutations, irrespective of strain background. We found strong candidate mutations for the majority of mutant exomes sequenced, including new models of orofacial clefting, urogenital dysmorphology, kyphosis and autoimmune hepatitis.
Assuntos
Análise Mutacional de DNA/métodos , Exoma , Genômica/métodos , Mutação , Animais , Mapeamento Cromossômico , Cromossomos de Mamíferos/genética , Colágeno Tipo II/genética , Éxons , Frequência do Gene , Genótipo , Mutação INDEL , Indicadores e Reagentes/normas , MAP Quinase Quinase Quinases/genética , Camundongos , Camundongos Endogâmicos , Fenótipo , MAP Quinase Quinase Quinase 11 Ativada por MitógenoRESUMO
We have developed a solution-based method for targeted DNA capture-sequencing that is directed to the complete human exome. Using this approach allows the discovery of greater than 95% of all expected heterozygous singe base variants, requires as little as 3 Gbp of raw sequence data and constitutes an effective tool for identifying rare coding alleles in large scale genomic studies.
Assuntos
Pareamento de Bases/genética , Bases de Dados de Ácidos Nucleicos , Éxons/genética , Análise de Sequência de DNA/métodos , Biblioteca Gênica , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes , Alinhamento de Sequência , SoluçõesRESUMO
An 8-plex version of an isobaric reagent for the quantitation of proteins using shotgun methods is presented. The 8-plex version of the reagent relies on amine-labeling chemistry of peptides similar to 4-plex reagents. MS/MS reporter ions at 113, 114, 115, 116, 117, 118, 119, and 121 m/z are used to quantify protein expression. This technology which was first applied to a test mixture consisting of eight proteins and resulted in accurate quantitation, has the potential to increase throughput of analysis for quantitative shotgun proteomics experiments when compared to 2- and 4-plex methods. The technology was subsequently applied to a longitudinal study of cerebrospinal fluid (CSF) proteins from subjects undergoing intravenous Ig treatment for Alzheimer's disease. Results from this study identify a number of protein expression changes that occur in CSF after 3 and 6 months of treatment compared to a baseline and compared to a drug washout period. A visualization tool was developed for this dataset and is presented. The tool can aid in the identification of key peptides and measurements. One conclusion aided by the visualization tool is that there are differences in considering peptide-based observations versus protein-based observations from quantitative shotgun proteomics studies.
Assuntos
Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/terapia , Proteínas do Líquido Cefalorraquidiano/análise , Proteínas do Líquido Cefalorraquidiano/biossíntese , Imunoglobulinas Intravenosas/uso terapêutico , Proteômica , Doença de Alzheimer/imunologia , Sequência de Aminoácidos , Anidrases Carbônicas/biossíntese , Anidrases Carbônicas/líquido cefalorraquidiano , Anidrases Carbônicas/genética , Proteínas do Líquido Cefalorraquidiano/genética , Regulação da Expressão Gênica/imunologia , Humanos , Imunoglobulinas Intravenosas/administração & dosagem , Indicadores e Reagentes , Infusões Intravenosas , Espectrometria de Massas , Dados de Sequência Molecular , Proteômica/instrumentação , Proteômica/métodosRESUMO
The Pto gene encodes a serine/threonine protein kinase that confers resistance in tomato (Lycopersicon esculentum) to Pseudomonas syringae pv tomato strains that express the type III effector protein AvrPto. Constitutive overexpression of Pto in tomato, in the absence of AvrPto, activates defense responses and confers resistance to several diverse bacterial and fungal plant pathogens. We have used a series of gene discovery and expression profiling methods to examine the effect of Pto overexpression in tomato leaves. Analysis of the tomato expressed sequence tag database and suppression subtractive hybridization identified 600 genes that were potentially differentially expressed in Pto-overexpressing tomato plants compared with a sibling line lacking Pto. By using cDNA microarrays, we verified changes in expression of many of these genes at various time points after inoculation with P. syringae pv tomato (avrPto) of the resistant Pto-overexpressing line and the susceptible sibling line. The combination of these three approaches led to the identification of 223 POR (Pto overexpression responsive) genes. Strikingly, 40% of the genes induced in the Pto-overexpressing plants previously have been shown to be differentially expressed during the human (Homo sapiens) and/or fruitfly (Drosophila melanogaster) immune responses.
Assuntos
Drosophila melanogaster/imunologia , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Proteínas de Plantas , Proteínas Serina-Treonina Quinases/genética , Proteínas Serina-Treonina Quinases/metabolismo , Solanum lycopersicum/genética , Solanum lycopersicum/microbiologia , Animais , Drosophila melanogaster/genética , Etiquetas de Sequências Expressas , Genes de Plantas/genética , Humanos , Solanum lycopersicum/imunologia , Análise de Sequência com Séries de Oligonucleotídeos , Folhas de Planta/genética , Folhas de Planta/microbiologia , Regiões Promotoras Genéticas/genética , Transdução de SinaisRESUMO
The tomato transcription factor Pti4, an ethylene-responsive factor (ERF), interacts physically with the disease resistance protein Pto and binds the GCC box cis element that is present in the promoters of many pathogenesis-related (PR) genes. We reported previously that Arabidopsis plants expressing Pti4 constitutively express several GCC box-containing PR genes and show reduced disease symptoms compared with wild-type plants after inoculation with Pseudomonas syringae pv tomato or Erysiphe orontii. To gain insight into how genome-wide gene expression is affected by Pti4, we used serial analysis of gene expression (SAGE) to compare transcripts in wild-type and Pti4-expressing Arabidopsis plants. SAGE provided quantitative measurements of >20,000 transcripts and identified the 50 most highly expressed genes in Arabidopsis vegetative tissues. Comparison of the profiles from wild-type and Pti4-expressing Arabidopsis plants revealed 78 differentially abundant transcripts encoding defense-related proteins, protein kinases, ribosomal proteins, transporters, and two transcription factors (TFs). Many of the genes identified were expressed differentially in wild-type Arabidopsis during infection by Pseudomonas syringae pv tomato, supporting a role for them in defense-related processes. Unexpectedly, the promoters of most Pti4-regulated genes did not have a GCC box. Chromatin immunoprecipitation experiments confirmed that Pti4 binds in vivo to promoters lacking this cis element. Potential binding sites for ERF, MYB, and GBF TFs were present in statistically significantly increased numbers in promoters regulated by Pti4. Thus, Pti4 appears to regulate gene expression directly by binding the GCC box and possibly a non-GCC box element and indirectly by either activating the expression of TF genes or interacting physically with other TFs.
Assuntos
Proteínas de Ligação a DNA/genética , Perfilação da Expressão Gênica/métodos , Doenças das Plantas/genética , Proteínas de Plantas/genética , Solanum lycopersicum/genética , Fatores de Transcrição/genética , Arabidopsis/genética , Arabidopsis/metabolismo , Arabidopsis/microbiologia , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Sítios de Ligação/genética , Proteínas de Ligação a DNA/metabolismo , Fatores de Ligação G-Box , Regulação da Expressão Gênica de Plantas , Imunidade Inata/genética , Solanum lycopersicum/metabolismo , Solanum lycopersicum/microbiologia , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Doenças das Plantas/microbiologia , Proteínas de Plantas/metabolismo , Proteínas Proto-Oncogênicas c-myb/genética , Proteínas Proto-Oncogênicas c-myb/metabolismo , Pseudomonas/crescimento & desenvolvimento , Elementos de Resposta/genética , Fatores de Transcrição/metabolismoRESUMO
The ability of Pseudomonas syringae pv. tomato DC3000 to parasitize tomato and Arabidopsis thaliana depends on genes activated by the HrpL alternative sigma factor. To support various functional genomic analyses of DC3000, and specifically, to identify genes involved in pathogenesis, we developed a draft sequence of DC3000 and used an iterative process involving computational and gene expression techniques to identify virulence-implicated genes downstream of HrpL-responsive promoters. Hypersensitive response and pathogenicity (Hrp) promoters are known to control genes encoding the Hrp (type III protein secretion) machinery and a few type III effector proteins in DC3000. This process involved (i) identification of 9 new virulence-implicated genes in the Hrp regulon by miniTn5gus mutagenesis, (ii) development of a hidden Markov model (HMM) trained with known and transposon-identified Hrp promoter sequences, (iii) HMM identification of promoters upstream of 12 additional virulence-implicated genes, and (iv) microarray and RNA blot analyses of the HrpL-dependent expression of a representative subset of these DC3000 genes. We found that the Hrp regulon encodes candidates for 4 additional type III secretion machinery accessory factors, homologs of the effector proteins HopPsyA, AvrPpiB1 (2 copies), AvrPpiC2, AvrPphD (2 copies), AvrPphE, AvrPphF, and AvrXv3, and genes associated with the production or metabolism of virulence factors unrelated to the Hrp type III secretion system, including syringomycin synthetase (SyrE), N(epsilon)-(indole-3-acetyl)-l-lysine synthetase (IaaL), and a subsidiary regulon controlling coronatine production. Additional candidate effector genes, hopPtoA2, hopPtoB2, and an avrRps4 homolog, were preceded by Hrp promoter-like sequences, but these had HMM expectation values of relatively low significance and were not detectably activated by HrpL.
Assuntos
Proteínas de Bactérias/genética , Proteínas de Ligação a DNA , Genoma Bacteriano , Regiões Promotoras Genéticas , Pseudomonas/genética , Pseudomonas/patogenicidade , Fator sigma/genética , Elementos de DNA Transponíveis , Genes Reporter , Solanum lycopersicum/microbiologia , Cadeias de Markov , Modelos Genéticos , Dados de Sequência Molecular , Mutagênese Sítio-Dirigida , Análise de Sequência com Séries de Oligonucleotídeos , Fases de Leitura Aberta , RNA/metabolismo , Virulência/genéticaRESUMO
Gene expression profiling holds tremendous promise for dissecting the regulatory mechanisms and transcriptional networks that underlie biological processes. Here we provide details of approaches used by others and ourselves for gene expression profiling in plants with emphasis on cDNA microarrays and discussion of both experimental design and downstream analysis. We focus on methods and techniques emphasizing fabrication of cDNA microarrays, fluorescent labeling, cDNA hybridization, experimental design, and data processing. We include specific examples that demonstrate how this technology can be used to further our understanding of plant physiology and development (specifically fruit development and ripening) and for comparative genomics by comparing transcriptome activity in tomato and pepper fruit.
Assuntos
DNA Complementar/genética , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Corantes Fluorescentes , Hibridização de Ácido NucleicoRESUMO
We report the complete genome sequence of the model bacterial pathogen Pseudomonas syringae pathovar tomato DC3000 (DC3000), which is pathogenic on tomato and Arabidopsis thaliana. The DC3000 genome (6.5 megabases) contains a circular chromosome and two plasmids, which collectively encode 5,763 ORFs. We identified 298 established and putative virulence genes, including several clusters of genes encoding 31 confirmed and 19 predicted type III secretion system effector proteins. Many of the virulence genes were members of paralogous families and also were proximal to mobile elements, which collectively comprise 7% of the DC3000 genome. The bacterium possesses a large repertoire of transporters for the acquisition of nutrients, particularly sugars, as well as genes implicated in attachment to plant surfaces. Over 12% of the genes are dedicated to regulation, which may reflect the need for rapid adaptation to the diverse environments encountered during epiphytic growth and pathogenesis. Comparative analyses confirmed a high degree of similarity with two sequenced pseudomonads, Pseudomonas putida and Pseudomonas aeruginosa, yet revealed 1,159 genes unique to DC3000, of which 811 lack a known function.