RESUMEN
The HUGO Gene Nomenclature Committee (HGNC) assigns unique symbols and names to human genes. The HGNC database (www.genenames.org) currently contains over 43 000 approved gene symbols, over 19 200 of which are assigned to protein-coding genes, 14 000 to pseudogenes and nearly 9000 to non-coding RNA genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC nomenclature advisors and links to related genomic, clinical, and proteomic information. Here, we describe updates to our resource, including improvements to our search facility and new download features.
Asunto(s)
Bases de Datos Genéticas , Humanos , Genoma , Genómica , Proteómica , Seudogenes , Terminología como AsuntoRESUMEN
Multiple resources currently exist that predict orthologous relationships between genes. These resources differ both in the methodologies used and in the species they make predictions for. The HGNC Comparison of Orthology Predictions (HCOP) search tool integrates and displays data from multiple ortholog prediction resources for a specified human gene or set of genes. An indication of the reliability of a prediction is provided by the number of resources that support it. HCOP was originally designed to show orthology predictions between human and mouse but has been expanded to include data from a current total of 20 selected vertebrate and model organism species. The HCOP pipeline used to fetch and integrate the information from the disparate ortholog and nomenclature data resources has recently been rewritten, both to enable the inclusion of new data and to take advantage of modern web technologies. Data from HCOP are used extensively in our work naming genes as the Vertebrate Gene Nomenclature Committee (https://vertebrate.genenames.org).
Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Homología de Secuencia , Programas Informáticos , Animales , Bases de Datos Genéticas , Humanos , Vertebrados , Navegador Web , Flujo de TrabajoRESUMEN
The HUGO Gene Nomenclature Committee (HGNC) based at EMBL's European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. There are over 42,000 approved gene symbols in our current database of which over 19 000 are for protein-coding genes. While we still update placeholder and problematic symbols, we are working towards stabilizing symbols where possible; over 2000 symbols for disease associated genes are now marked as stable in our symbol reports. All of our data is available at the HGNC website https://www.genenames.org. The Vertebrate Gene Nomenclature Committee (VGNC) was established to assign standardized nomenclature in line with human for vertebrate species lacking their own nomenclature committee. In addition to the previous VGNC core species of chimpanzee, cow, horse and dog, we now name genes in cat, macaque and pig. Gene groups have been added to VGNC and currently include two complex families: olfactory receptors (ORs) and cytochrome P450s (CYPs). In collaboration with specialists we have also named CYPs in species beyond our core set. All VGNC data is available at https://vertebrate.genenames.org/. This article provides an overview of our online data and resources, focusing on updates over the last two years.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genes/genética , Genómica/métodos , Terminología como Asunto , Vertebrados/genética , Animales , Humanos , Internet , Proteínas/genética , Especificidad de la Especie , Interfaz Usuario-Computador , Vertebrados/clasificaciónRESUMEN
The HUGO Gene Nomenclature Committee (HGNC) based at EMBL's European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. There are over 40 000 approved gene symbols in our current database of which over 19 000 are for protein-coding genes. The Vertebrate Gene Nomenclature Committee (VGNC) was established in 2016 to assign standardized nomenclature in line with human for vertebrate species that lack their own nomenclature committees. The VGNC initially assigned nomenclature for over 15000 protein-coding genes in chimpanzee. We have extended this process to other vertebrate species, naming over 14000 protein-coding genes in cow and dog and over 13 000 in horse to date. Our HGNC website https://www.genenames.org has undergone a major design update, simplifying the homepage to provide easy access to our search tools and making the site more mobile friendly. Our gene families pages are now known as 'gene groups' and have increased in number to over 1200, with nearly half of all named genes currently assigned to at least one gene group. This article provides an overview of our online data and resources, focusing on our work over the last two years.
Asunto(s)
Biología Computacional/normas , Bases de Datos Genéticas/normas , Genómica/normas , Terminología como Asunto , Animales , Bovinos , Perros , Caballos/genética , Humanos , Pan troglodytes/genética , Motor de BúsquedaRESUMEN
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as homology, function or phenotype. We have recently updated our gene family resources and introduced new improved visualizations which can be seen alongside our gene symbol reports on our primary website http://www.genenames.org In 2016 we expanded our remit and formed the Vertebrate Gene Nomenclature Committee (VGNC) which is responsible for assigning names to vertebrate species lacking a dedicated nomenclature group. Using the chimpanzee genome as a pilot project we have approved symbols and names for over 14 500 protein-coding genes in chimpanzee, and have developed a new website http://vertebrate.genenames.org to distribute these data. Here, we review our online data and resources, focusing particularly on the improvements and new developments made during the last two years.
Asunto(s)
Bases de Datos Genéticas , Genes , Genoma , Genómica/métodos , Terminología como Asunto , Vertebrados , Navegador Web , Animales , Humanos , Familia de Multigenes , Motor de BúsquedaRESUMEN
RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN no Traducido/química , Animales , Genómica , Humanos , Nucleótidos/química , Análisis de Secuencia de ARN , Especificidad de la EspecieRESUMEN
The HUGO Gene Nomenclature Committee (HGNC) approves unique gene symbols and names for human loci. As well as naming genomic loci, we manually curate genes into family sets based on shared characteristics such as function, homology or phenotype. Each HGNC gene family has its own dedicated gene family report on our website, www.genenames.org . We have recently redesigned these reports to support the visualisation and browsing of complex relationships between families and to provide extra curated information such as family descriptions, protein domain graphics and gene family aliases. Here, we review how our gene families are curated and explain how to view, search and download the gene family data.
Asunto(s)
Bases de Datos Genéticas , Genómica , Proteínas de Neoplasias/genética , Humanos , Internet , Proteínas de Neoplasias/clasificaciónRESUMEN
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. To date the HGNC have assigned over 39,000 gene names and, representing an increase of over 5000 entries in the past two years. As well as increasing the size of our database, we have continued redesigning our website http://www.genenames.org and have modified, updated and improved many aspects of the site including a faster and more powerful search, a vastly improved HCOP tool and a REST service to increase the number of ways users can retrieve our data. This article provides an overview of our current online data and resources, and highlights the changes we have made in recent years.
Asunto(s)
Bases de Datos Genéticas , Genes , Terminología como Asunto , Genoma Humano , Humanos , InternetRESUMEN
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.
Asunto(s)
Sistemas de Administración de Bases de Datos , Genómica , Humanos , Internet , Neoplasias/genética , ProteómicaRESUMEN
The HUGO Gene Nomenclature Committee situated at the European Bioinformatics Institute assigns unique symbols and names to human genes. Since 2011, the data within our database has expanded largely owing to an increase in naming pseudogenes and non-coding RNA genes, and we now have >33,500 approved symbols. Our gene families and groups have also increased to nearly 500, with â¼45% of our gene entries associated to at least one family or group. We have also redesigned the HUGO Gene Nomenclature Committee website http://www.genenames.org creating a constant look and feel across the site and improving usability and readability for our users. The site provides a public access portal to our database with no restrictions imposed on access or the use of the data. Within this article, we review our online resources and data with particular emphasis on the updates to our website.
Asunto(s)
Bases de Datos Genéticas , Genes , Terminología como Asunto , Humanos , Internet , Proteínas/genéticaRESUMEN
The HUGO Gene Nomenclature Committee has approved gene symbols for the majority of protein-coding genes on the human reference genome. To adequately represent regions of complex structural variation, the Genome Reference Consortium now includes alternative representations of some of these regions as part of the reference genome. Here, we describe examples of how we name novel genes in these regions and how this nomenclature is displayed on our website, http://genenames.org.
Asunto(s)
Genoma Humano/genética , Mutación/genética , Terminología como Asunto , Haplotipos , Humanos , Estándares de ReferenciaRESUMEN
We examined the coding sequence of 518 protein kinases, approximately 1.3 Mb of DNA per sample, in 25 breast cancers. In many tumors, we detected no somatic mutations. But a few had numerous somatic mutations with distinctive patterns indicative of either a mutator phenotype or a past exposure.
Asunto(s)
Neoplasias de la Mama/genética , Carcinoma Ductal de Mama/genética , Mutación , Proteínas Quinasas/genética , Anciano , Análisis Mutacional de ADN , Femenino , Humanos , Familia de MultigenesRESUMEN
The Vertebrate Gene Nomenclature Committee (VGNC) was established in 2016 as a sister project to the HUGO Gene Nomenclature Committee, to approve gene nomenclature in vertebrate species without an existing dedicated nomenclature committee. The VGNC aims to harmonize gene nomenclature across selected vertebrate species in line with human gene nomenclature, with orthologs assigned the same nomenclature where possible. This article presents an overview of the VGNC project and discussion of key findings resulting from this work to date. VGNC-approved nomenclature is accessible at https://vertebrate.genenames.org and is additionally displayed by the NCBI, Ensembl, and UniProt databases.
Asunto(s)
Bases de Datos Genéticas , Vertebrados , Animales , Humanos , Vertebrados/genéticaRESUMEN
The protein-kinase family is the most frequently mutated gene family found in human cancer and faulty kinase enzymes are being investigated as promising targets for the design of antitumour therapies. We have sequenced the gene encoding the transmembrane protein tyrosine kinase ERBB2 (also known as HER2 or Neu) from 120 primary lung tumours and identified 4% that have mutations within the kinase domain; in the adenocarcinoma subtype of lung cancer, 10% of cases had mutations. ERBB2 inhibitors, which have so far proved to be ineffective in treating lung cancer, should now be clinically re-evaluated in the specific subset of patients with lung cancer whose tumours carry ERBB2 mutations.
Asunto(s)
Neoplasias Pulmonares/genética , Mutación/genética , Receptor ErbB-2/genética , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Carcinoma de Pulmón de Células no Pequeñas/genética , Análisis Mutacional de ADN , Activación Enzimática , Receptores ErbB/química , Receptores ErbB/genética , Gefitinib , Humanos , Neoplasias Pulmonares/tratamiento farmacológico , Modelos Moleculares , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Estructura Terciaria de Proteína , Quinazolinas/uso terapéutico , Receptor ErbB-2/química , Receptor ErbB-2/metabolismoRESUMEN
Malignant gliomas have a very poor prognosis. The current standard of care for these cancers consists of extended adjuvant treatment with the alkylating agent temozolomide after surgical resection and radiotherapy. Although a statistically significant increase in survival has been reported with this regimen, nearly all gliomas recur and become insensitive to further treatment with this class of agents. We sequenced 500 kb of genomic DNA corresponding to the kinase domains of 518 protein kinases in each of nine gliomas. Large numbers of somatic mutations were observed in two gliomas recurrent after alkylating agent treatment. The pattern of mutations in these cases showed strong similarity to that induced by alkylating agents in experimental systems. Further investigation revealed inactivating somatic mutations of the mismatch repair gene MSH6 in each case. We propose that inactivating somatic mutations of MSH6 confer resistance to alkylating agents in gliomas in vivo and concurrently unleash accelerated mutagenesis in resistant clones as a consequence of continued exposure to alkylating agents in the presence of defective mismatch repair. The evidence therefore suggests that when MSH6 is inactivated in gliomas, alkylating agents convert from induction of tumor cell death to promotion of neoplastic progression. These observations highlight the potential of large scale sequencing for revealing and elucidating mutagenic processes operative in individual human cancers.
Asunto(s)
Antineoplásicos Alquilantes/uso terapéutico , Neoplasias Encefálicas/genética , Proteínas de Unión al ADN/genética , Dacarbazina/análogos & derivados , Glioma/genética , Mutación , Recurrencia Local de Neoplasia/genética , Anciano , Neoplasias Encefálicas/tratamiento farmacológico , Neoplasias Encefálicas/enzimología , Dacarbazina/uso terapéutico , Femenino , Glioma/tratamiento farmacológico , Glioma/enzimología , Humanos , Masculino , Persona de Mediana Edad , Recurrencia Local de Neoplasia/enzimología , Proteínas Quinasas/genética , TemozolomidaRESUMEN
Protein kinases are frequently mutated in human cancer and inhibitors of mutant protein kinases have proven to be effective anticancer drugs. We screened the coding sequences of 518 protein kinases (approximately 1.3 Mb of DNA per sample) for somatic mutations in 26 primary lung neoplasms and seven lung cancer cell lines. One hundred eighty-eight somatic mutations were detected in 141 genes. Of these, 35 were synonymous (silent) changes. This result indicates that most of the 188 mutations were "passenger" mutations that are not causally implicated in oncogenesis. However, an excess of approximately 40 nonsynonymous substitutions compared with that expected by chance (P = 0.07) suggests that some nonsynonymous mutations have been selected and are contributing to oncogenesis. There was considerable variation between individual lung cancers in the number of mutations observed and no mutations were found in lung carcinoids. The mutational spectra of most lung cancers were characterized by a high proportion of C:G > A:T transversions, compatible with the mutagenic effects of tobacco carcinogens. However, one neuroendocrine cancer cell line had a distinctive mutational spectrum reminiscent of UV-induced DNA damage. The results suggest that several mutated protein kinases may be contributing to lung cancer development, but that mutations in each one are infrequent.
Asunto(s)
Neoplasias Pulmonares/enzimología , Neoplasias Pulmonares/genética , Mutación , Proteínas Quinasas/genética , Adenocarcinoma/enzimología , Adenocarcinoma/genética , Tumor Carcinoide/enzimología , Tumor Carcinoide/genética , Carcinoma de Células Grandes/enzimología , Carcinoma de Células Grandes/genética , Carcinoma de Células Escamosas/enzimología , Carcinoma de Células Escamosas/genética , Línea Celular Tumoral , Análisis Mutacional de ADN , HumanosRESUMEN
The panel of 60 human cancer cell lines (the NCI-60) assembled by the National Cancer Institute for anticancer drug discovery is a widely used resource. The NCI-60 has been characterized pharmacologically and at the molecular level more extensively than any other set of cell lines. However, no systematic mutation analysis of genes causally implicated in oncogenesis has been reported. This study reports the sequence analysis of 24 known cancer genes in the NCI-60 and an assessment of 4 of the 24 genes for homozygous deletions. One hundred thirty-seven oncogenic mutations were identified in 14 (APC, BRAF, CDKN2, CTNNB1, HRAS, KRAS, NRAS, SMAD4, PIK3CA, PTEN, RB1, STK11, TP53, and VHL) of the 24 genes. All lines have at least one mutation among the cancer genes examined, with most lines (73%) having more than one. Identification of those cancer genes mutated in the NCI-60, in combination with pharmacologic and molecular profiles of the cells, will allow for more informed interpretation of anticancer agent screening and will enhance the use of the NCI-60 cell lines for molecularly targeted screens.
Asunto(s)
Línea Celular Tumoral , Genes Relacionados con las Neoplasias , Mutación , Análisis Mutacional de ADN , Exones , Eliminación de Gen , Perfilación de la Expresión Génica , Homocigoto , Humanos , Sitios de Empalme de ARNRESUMEN
Large-scale systematic resequencing has been proposed as the key future strategy for the discovery of rare, disease-causing sequence variants across the spectrum of human complex disease. We have sequenced the coding exons of the X chromosome in 208 families with X-linked mental retardation (XLMR), the largest direct screen for constitutional disease-causing mutations thus far reported. The screen has discovered nine genes implicated in XLMR, including SYP, ZNF711 and CASK reported here, confirming the power of this strategy. The study has, however, also highlighted issues confronting whole-genome sequencing screens, including the observation that loss of function of 1% or more of X-chromosome genes is compatible with apparently normal existence.
Asunto(s)
Cromosomas Humanos X/genética , Exones/genética , Discapacidad Intelectual Ligada al Cromosoma X/genética , Análisis de Secuencia de ADN/métodos , Mapeo Cromosómico , Femenino , Variación Genética , Humanos , Masculino , LinajeRESUMEN
We have identified one frameshift mutation, one splice-site mutation, and two missense mutations in highly conserved residues in ZDHHC9 at Xq26.1 in 4 of 250 families with X-linked mental retardation (XLMR). In three of the families, the mental retardation phenotype is associated with a Marfanoid habitus, although none of the affected individuals meets the Ghent criteria for Marfan syndrome. ZDHHC9 is a palmitoyltransferase that catalyzes the posttranslational modification of NRAS and HRAS. The degree of palmitoylation determines the temporal and spatial location of these proteins in the plasma membrane and Golgi complex. The finding of mutations in ZDHHC9 suggests that alterations in the concentrations and cellular distribution of target proteins are sufficient to cause disease. This is the first XLMR gene to be reported that encodes a posttranslational modification enzyme, palmitoyltransferase. Furthermore, now that the first palmitoyltransferase that causes mental retardation has been identified, defects in other palmitoylation transferases become good candidates for causing other mental retardation syndromes.
Asunto(s)
Aciltransferasas/genética , Síndrome de Marfan/complicaciones , Síndrome de Marfan/genética , Discapacidad Intelectual Ligada al Cromosoma X/complicaciones , Discapacidad Intelectual Ligada al Cromosoma X/genética , Mutación , Aciltransferasas/metabolismo , Secuencia de Aminoácidos , Secuencia de Bases , ADN/genética , Femenino , Humanos , Masculino , Síndrome de Marfan/enzimología , Discapacidad Intelectual Ligada al Cromosoma X/enzimología , Datos de Secuencia Molecular , Linaje , Fenotipo , Homología de Secuencia de Aminoácido , Proteínas ras/metabolismoRESUMEN
We have identified three truncating, two splice-site, and three missense variants at conserved amino acids in the CUL4B gene on Xq24 in 8 of 250 families with X-linked mental retardation (XLMR). During affected subjects' adolescence, a syndrome emerged with delayed puberty, hypogonadism, relative macrocephaly, moderate short stature, central obesity, unprovoked aggressive outbursts, fine intention tremor, pes cavus, and abnormalities of the toes. This syndrome was first described by Cazebas et al., in a family that was included in our study and that carried a CUL4B missense variant. CUL4B is a ubiquitin E3 ligase subunit implicated in the regulation of several biological processes, and CUL4B is the first XLMR gene that encodes an E3 ubiquitin ligase. The relatively high frequency of CUL4B mutations in this series indicates that it is one of the most commonly mutated genes underlying XLMR and suggests that its introduction into clinical diagnostics should be a high priority.