RESUMEN
Cholera, an infectious disease with global impact, is caused by pathogenic strains of the bacterium Vibrio cholerae. High-throughput functional proteomics technologies now offer the opportunity to investigate all aspects of the proteome, which has led to an increased demand for comprehensive protein expression clone resources. Genome-scale reagents for cholera would encourage comprehensive analyses of immune responses and systems-wide functional studies that could lead to improved vaccine and therapeutic strategies. Here, we report the production of the FLEXGene clone set for V. cholerae O1 biovar eltor str. N16961: a complete-genome collection of ORF clones. This collection includes 3,761 sequence-verified clones from 3,887 targeted ORFs (97%). The ORFs were captured in a recombinational cloning vector to facilitate high-throughput transfer of ORF inserts into suitable expression vectors. To demonstrate its application, approximately 15% of the collection was transferred into the relevant expression vector and used to produce a protein microarray by transcribing, translating, and capturing the proteins in situ on the array surface with 92% success. In a second application, a method to screen for protein triggers of Toll-like receptors (TLRs) was developed. We tested in vitro-synthesized proteins for their ability to stimulate TLR5 in A549 cells. This approach appropriately identified FlaC, and previously uncharacterized TLR5 agonist activities. These data suggest that the genome-scale, fully sequenced ORF collection reported here will be useful for high-throughput functional proteomic assays, immune response studies, structure biology, and other applications.
Asunto(s)
Sistemas de Lectura Abierta/genética , Vibrio cholerae/genética , Vibrio cholerae/patogenicidad , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Amplificación de Genes/genética , Genoma Bacteriano/genética , Datos de Secuencia Molecular , Análisis por Matrices de Proteínas , Vibrio cholerae/metabolismoRESUMEN
We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in approximately 15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance.
Asunto(s)
Sistemas de Lectura Abierta , Clonación Molecular , Predisposición Genética a la Enfermedad , Humanos , Análisis de Secuencia de ADNRESUMEN
The rapid development of new technologies for the high throughput (HT) study of proteins has increased the demand for comprehensive plasmid clone resources that support protein expression. These clones must be full-length, sequence-verified and in a flexible format. The generation of these resources requires automated pipelines supported by software management systems. Although the availability of clone resources is growing, current collections are either not complete or not fully sequence-verified. We report an automated pipeline, supported by several software applications that enabled the construction of the first comprehensive sequence-verified plasmid clone resource for more than 96% of protein coding sequences of the genome of F. tularensis, a highly virulent human pathogen and the causative agent of tularemia. This clone resource was applied to a HT protein purification pipeline successfully producing recombinant proteins for 72% of the genes. These methods and resources represent significant technological steps towards exploiting the genomic information of F. tularensis in discovery applications.
Asunto(s)
Proteínas Bacterianas/genética , ADN Complementario/genética , Francisella tularensis/genética , Genes/genética , Genoma Bacteriano , Sistemas de Lectura Abierta/genética , Tularemia/genética , Proteínas Bacterianas/aislamiento & purificación , Proteínas Bacterianas/metabolismo , Clonación Molecular , Francisella tularensis/crecimiento & desarrollo , Humanos , Tularemia/microbiología , Tularemia/patologíaRESUMEN
The availability of an annotated genome sequence for the yeast Saccharomyces cerevisiae has made possible the proteome-scale study of protein function and protein-protein interactions. These studies rely on availability of cloned open reading frame (ORF) collections that can be used for cell-free or cell-based protein expression. Several yeast ORF collections are available, but their use and data interpretation can be hindered by reliance on now out-of-date annotations, the inflexible presence of N- or C-terminal tags, and/or the unknown presence of mutations introduced during the cloning process. High-throughput biochemical and genetic analyses would benefit from a "gold standard" (fully sequence-verified, high-quality) ORF collection, which allows for high confidence in and reproducibility of experimental results. Here, we describe Yeast FLEXGene, a S. cerevisiae protein-coding clone collection that covers over 5000 predicted protein-coding sequences. The clone set covers 87% of the current S. cerevisiae genome annotation and includes full sequencing of each ORF insert. Availability of this collection makes possible a wide variety of studies from purified proteins to mutation suppression analysis, which should contribute to a global understanding of yeast protein function.
Asunto(s)
Genómica/métodos , Proteómica/métodos , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Composición de Base , Secuencia de Bases , Western Blotting , Clonación Molecular , ADN de Hongos/química , ADN de Hongos/genética , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Genoma Fúngico , Sistemas de Lectura Abierta/genética , Reacción en Cadena de la Polimerasa , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADNRESUMEN
Functional proteomics approaches that comprehensively evaluate the biological activities of human cDNAs may provide novel insights into disease pathogenesis. To systematically investigate the functional activity of cDNAs that have been implicated in breast carcinogenesis, we generated a collection of cDNAs relevant to breast cancer, the Breast Cancer 1000 (BC1000), and conducted screens to identify proteins that induce phenotypic changes that resemble events which occur during tumor initiation and progression. Genes were selected for this set using bioinformatics and data mining tools that identify genes associated with breast cancer. Greater than 1000 cDNAs were assembled and sequence verified with high-throughput recombination-based cloning. To our knowledge, the BC1000 represents the first publicly available sequence-validated human disease gene collection. The functional activity of a subset of the BC1000 collection was evaluated in cell-based assays that monitor changes in cell proliferation, migration, and morphogenesis in MCF-10A mammary epithelial cells expressing a variant of ErbB2 that can be inducibly activated through dimerization. Using this approach, we identified many cDNAs, encoding diverse classes of cellular proteins, that displayed activity in one or more of the assays, thus providing insights into a large set of cellular proteins capable of inducing functional alterations associated with breast cancer development.