RESUMO
The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.
Assuntos
Inativação Gênica/fisiologia , Privacidade Genética , Anotação de Sequência Molecular , Privacidade Genética/tendências , Variação Genética , Genoma Humano/genética , Humanos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.
Assuntos
Bases de Dados Genéticas , Genômica , Animais , Doença/genética , Genes , Genoma , Humanos , Camundongos , Anotação de Sequência Molecular , SoftwareRESUMO
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Cricetinae , Cães , Ebolavirus/genética , Expressão Gênica , Genoma , Internet , Camundongos , Anotação de Sequência Molecular , Fenótipo , Ratos , SoftwareRESUMO
Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (â¼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.
Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Filogenia , Pseudogenes/genética , Animais , Evolução Molecular , Estudos de Associação Genética , Humanos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Homologia de Sequência do Ácido NucleicoRESUMO
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for â¼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.
Assuntos
Bases de Dados Genéticas , Genoma , Genômica , Alelos , Animais , Genoma Humano , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência , SoftwareRESUMO
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
Assuntos
Bases de Dados Genéticas , Proteínas/genética , Animais , Éxons , Genômica , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de SequênciaRESUMO
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica/métodos , Anotação de Sequência Molecular , Animais , Biologia Computacional/métodos , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Éxons , Loci Gênicos , Humanos , Internet , Modelos Moleculares , Fases de Leitura Aberta , Pseudogenes , Controle de Qualidade , Sítios de Splice de RNA , RNA Longo não Codificante , Reprodutibilidade dos Testes , Regiões não TraduzidasRESUMO
The Encyclopedia of DNA Elements (ENCODE), http://encodeproject.org, has completed its fifth year of scientific collaboration to create a comprehensive catalog of functional elements in the human genome, and its third year of investigations in the mouse genome. Since the last report in this journal, the ENCODE human data repertoire has grown by 898 new experiments (totaling 2886), accompanied by a major integrative analysis. In the mouse genome, results from 404 new experiments became available this year, increasing the total to 583, collected during the course of the project. The University of California, Santa Cruz, makes this data available on the public Genome Browser http://genome.ucsc.edu for visual browsing and data mining. Download of raw and processed data files are all supported. The ENCODE portal provides specialized tools and information about the ENCODE data sets.
Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica , Animais , Humanos , Internet , Camundongos , SoftwareRESUMO
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation 'tracks' are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.
Assuntos
Bases de Dados Genéticas , Genômica , Animais , Genoma Humano , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , SoftwareRESUMO
The Encyclopedia of DNA Elements (ENCODE) Consortium is entering its 5th year of production-level effort generating high-quality whole-genome functional annotations of the human genome. The past year has brought the ENCODE compendium of functional elements to critical mass, with a diverse set of 27 biochemical assays now covering 200 distinct human cell types. Within the mouse genome, which has been under study by ENCODE groups for the past 2 years, 37 cell types have been assayed. Over 2000 individual experiments have been completed and submitted to the Data Coordination Center for public use. UCSC makes this data available on the quality-reviewed public Genome Browser (http://genome.ucsc.edu) and on an early-access Preview Browser (http://genome-preview.ucsc.edu). Visual browsing, data mining and download of raw and processed data files are all supported. An ENCODE portal (http://encodeproject.org) provides specialized tools and information about the ENCODE data sets.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Humano , Genoma , Camundongos/genética , Animais , Humanos , Internet , Anotação de Sequência Molecular , SoftwareRESUMO
The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced 'track data hubs', which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma , Animais , Doença/genética , Genoma Humano , Genômica , Humanos , Internet , Anotação de Sequência Molecular , FenótipoRESUMO
The University of California, Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a 'mean+whiskers' windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.
Assuntos
Bases de Dados Genéticas , Genômica , Animais , Doença/genética , Genes , Genoma Humano , Hominidae/genética , Humanos , Internet , Anotação de Sequência Molecular , Fenótipo , Edição de RNA , SoftwareRESUMO
Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
Assuntos
Clonagem Molecular/métodos , Biologia Computacional/métodos , DNA Complementar/genética , Biblioteca Gênica , Genes/genética , Mamíferos/genética , Animais , DNA/biossíntese , Humanos , Camundongos , National Institutes of Health (U.S.) , Ratos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Estados UnidosRESUMO
The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users' own annotation data. As of September 2009, genomic sequence and a basic set of annotation 'tracks' are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genoma , Animais , Biologia Computacional/tendências , Variação Genética , Genoma Fúngico , Genômica , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Invertebrados , Modelos Moleculares , Fenótipo , SoftwareRESUMO
The goal of the Encyclopedia Of DNA Elements (ENCODE) Project is to identify all functional elements in the human genome. The pilot phase is for comparison of existing methods and for the development of new methods to rigorously analyze a defined 1% of the human genome sequence. Experimental datasets are focused on the origin of replication, DNase I hypersensitivity, chromatin immunoprecipitation, promoter function, gene structure, pseudogenes, non-protein-coding RNAs, transcribed RNAs, multiple sequence alignment and evolutionarily constrained elements. The ENCODE project at UCSC website (http://genome.ucsc.edu/ENCODE) is the primary portal for the sequence-based data produced as part of the ENCODE project. In the pilot phase of the project, over 30 labs provided experimental results for a total of 56 browser tracks supported by 385 database tables. The site provides researchers with a number of tools that allow them to visualize and analyze the data as well as download data for local analyses. This paper describes the portal to the data, highlights the data that has been made available, and presents the tools that have been developed within the ENCODE project. Access to the data and types of interactive analysis that are possible are illustrated through supplemental examples.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Humano , Genômica , Sequência de Bases , Humanos , Internet , Alinhamento de Sequência , Software , Interface Usuário-ComputadorRESUMO
RACK1 is one of a group of PKC-interacting proteins collectively called RACKs (Receptors for Activated C-Kinases). Previously, we showed that RACK1 also interacts with the Src tyrosine kinase, and is an inhibitor of Src activity and cell growth. PKC activation induces the intracellular movement and co-localization of RACK1 and Src, and the tyrosine phosphorylation of RACK1. To determine whether RACK1 is a Src substrate, we assessed phosphorylation of RACK1 by various tyrosine kinases in vitro, and by kinase-active and inactive mutants of Src in vivo. We found that RACK1 is a Src substrate. Moreover, Src activity is necessary for both the tyrosine phosphorylation of RACK1 and the binding of RACK1 to Src's SH2 domain that occur following PKC activation. To identify the tyrosine(s) on RACK1 that is phosphorylated by Src, we generated and tested a series of RACK1 mutants. We found that Src phosphorylates RACK1 on Tyr 228 and/or Tyr 246, highly-conserved tyrosines located in the sixth WD repeat that interact with Src's SH2 domain. We think that RACK1 is an important Src substrate that signals downstream of growth factor receptor tyrosine kinases and is involved in the regulation of Src function and cell growth.
Assuntos
Receptores de Superfície Celular/metabolismo , Quinases da Família src/metabolismo , Células 3T3 , Sequência de Aminoácidos , Substituição de Aminoácidos , Animais , Sítios de Ligação , Células CHO , Cricetinae , Receptores ErbB/metabolismo , Camundongos , Dados de Sequência Molecular , Proteínas Oncogênicas v-abl/metabolismo , Fosforilação , Proteína Quinase C/metabolismo , Receptores de Quinase C Ativada , Receptores de Superfície Celular/genética , Receptores do Fator de Crescimento Derivado de Plaquetas/metabolismo , Sequências Repetitivas de Aminoácidos , Homologia de Sequência de Aminoácidos , Especificidade por Substrato , Tirosina/metabolismo , Domínios de Homologia de srcRESUMO
Survival factors play critical roles in regulating cell growth in normal and cancer cells. We designed a genetic screen to identify survival factors which protect tumor cells from apoptosis. A retroviral expression library of random cDNA fragments was constructed from cancer cells and used to transduce the colon carcinoma cell line HCT116. Recipient cells were functionally selected for induction of caspase 3-mediated apoptosis. Analyses of over 10,000 putative genetic suppression elements (GSEs) sequences revealed cognate gene candidates that are implicated in apoptosis. We further analysed 26 genes encoding cell surface and secreted proteins that can potentially serve as targets for therapeutic antibodies. Tetracycline-inducible GSEs from several gene candidates induced apoptosis in stable HCT 116 cell lines. Similar phenotypes were caused by RNAi derived from the same genes. Our data suggest requirement for the cell surface targets IGF2R, L1CAM and SLC31A1 in tumor cell growth in vitro, and suggests that IGF2R is required for xenograft tumor growth in a mouse model.
Assuntos
Apoptose , Neoplasias do Colo/patologia , Receptor IGF Tipo 2/fisiologia , Animais , Caspase 3 , Caspases/fisiologia , Divisão Celular , Linhagem Celular Tumoral , Sobrevivência Celular , Humanos , Camundongos , Transplante de Neoplasias , RNA Interferente Pequeno/farmacologia , Receptor IGF Tipo 2/genética , Transdução Genética , Transplante HeterólogoRESUMO
BACKGROUND: Since the early stages of tumorigenesis involve adhesion, escape from immune surveillance, vascularization and angiogenesis, we devised a strategy to study the expression profiles of all publicly known and putative secreted and cell surface genes. We designed a custom oligonucleotide microarray containing probes for 3531 secreted and cell surface genes to study 5 diverse human transformed cell lines and their derivative xenograft tumors. The origins of these human cell lines were lung (A549), breast (MDA MB-231), colon (HCT-116), ovarian (SK-OV-3) and prostate (PC3) carcinomas. RESULTS: Three different analyses were performed: (1) A PCA-based linear discriminant analysis identified a 54 gene profile characteristic of all tumors, (2) Application of MANOVA (Pcorr < .05) to tumor data revealed a larger set of 149 differentially expressed genes. (3) After MANOVA was performed on data from individual tumors, a comparison of differential genes amongst all tumor types revealed 12 common differential genes. Seven of the 12 genes were identified by all three analytical methods. These included late angiogenic, morphogenic and extracellular matrix genes such as ANGPTL4, COL1A1, GP2, GPR57, LAMB3, PCDHB9 and PTGER3. The differential expression of ANGPTL4 and COL1A1 and other genes was confirmed by quantitative PCR. CONCLUSION: Overall, a comparison of the three analyses revealed an expression pattern indicative of late angiogenic processes. These results show that a xenograft model using multiple cell lines of diverse tissue origin can identify common tumorigenic cell surface or secreted molecules that may be important biomarker and therapeutic discoveries.
Assuntos
Biomarcadores Tumorais/genética , Membrana Celular/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Proteínas de Membrana/química , Neovascularização Patológica , Análise de Variância , Animais , Linhagem Celular Transformada , Linhagem Celular Tumoral , DNA Complementar/metabolismo , Feminino , Marcadores Genéticos , Técnicas Genéticas , Genômica/métodos , Humanos , Masculino , Proteínas de Membrana/genética , Camundongos , Camundongos Endogâmicos BALB C , Análise Multivariada , Transplante de Neoplasias , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase , Análise de Componente Principal , RNA/metabolismo , Transdução de SinaisRESUMO
Ion channels represent an important class of molecules that can be classified into 13 distinct groups. We present a strategy using a "learning set" of well-annotated ion channel sequences to detect homologues in 32 entire genome sequences from Archaea, Bacteria and Eukarya. A total of 299 putative ion channel protein sequences were detected, with significant variations across species. The clustering of these sequences reveals complex relationships between the different ion channel families.
Assuntos
Genoma , Canais Iônicos/genética , Família Multigênica/genética , Proteínas Arqueais/genética , Proteínas de Bactérias/genética , Biologia Computacional , Células Eucarióticas , Canais Iônicos/classificação , Filogenia , Canais de Potássio/genética , Estrutura Terciária de Proteína/genética , Homologia de Sequência de AminoácidosRESUMO
Cancer cells are capable of serum- and anchorage-independent growth, and focus formation on monolayers of normal cells. Previously, we showed that RACK1 inhibits c-Src kinase activity and NIH3T3 cell growth. Here, we show that RACK1 partially inhibits v-Src kinase activity, and the serum- and anchorage-independent growth of v-Src transformed cells, but has no effect on focus formation. RACK1-overexpressing v-Src cells show disassembly of podosomes, which are actin-rich structures that are distinctive to fully transformed cells. Together, our results demonstrate that RACK1 overexpression in v-Src cells partially reverses the transformed phenotype of the cells. Our results identify an endogenous inhibitor of the oncogenic Src tyrosine kinase and of cell transformation.