RESUMO
The Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group highlights the need to develop guidance on race, ethnicity, and ancestry (REA) data collection and use in clinical genomics. We present quantitative and qualitative evidence to characterize: (1) acquisition of REA data via clinical laboratory requisition forms, and (2) information disparity across populations in the Genome Aggregation Database (gnomAD) at clinically relevant sites ascertained from annotations in ClinVar. Our requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories. There was also striking disparity across REA populations in the amount of information available about clinically relevant variants in gnomAD. European ancestral populations constituted the majority of observations (55.8%), allele counts (59.7%), and private alleles (56.1%) in gnomAD at 550 loci with "pathogenic" and "likely pathogenic" expert-reviewed variants in ClinVar. Our findings highlight the importance of implementing and supporting programs to increase diversity in genome sequencing and clinical genomics, as well as measuring uncertainty around population-level datasets that are used in variant interpretation. Finally, we suggest the need for a standardized REA data collection framework to be developed through partnerships and collaborations and adopted across clinical genomics.
Assuntos
Variação Genética/genética , Alelos , Etnicidade , Testes Genéticos/métodos , Genômica/métodos , Humanos , Mutação , ProibitinasRESUMO
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
Assuntos
Bases de Dados Genéticas , Genômica , Animais , Bovinos , Perfilação da Expressão Gênica , Genoma Fúngico , Genoma Humano , Genoma Microbiano , Genoma de Planta , Genoma Viral , Genômica/normas , Humanos , Invertebrados/genética , Camundongos , Anotação de Sequência Molecular , Nematoides/genética , Filogenia , RNA Longo não Codificante/genética , Ratos , Padrões de Referência , Análise de Sequência de Proteína , Análise de Sequência de RNA , Vertebrados/genéticaRESUMO
The Adhesion family forms a large branch of the pharmacologically important superfamily of G protein-coupled receptors (GPCRs). As Adhesion GPCRs increasingly receive attention from a wide spectrum of biomedical fields, the Adhesion GPCR Consortium, together with the International Union of Basic and Clinical Pharmacology Committee on Receptor Nomenclature and Drug Classification, proposes a unified nomenclature for Adhesion GPCRs. The new names have ADGR as common dominator followed by a letter and a number to denote each subfamily and subtype, respectively. The new names, with old and alternative names within parentheses, are: ADGRA1 (GPR123), ADGRA2 (GPR124), ADGRA3 (GPR125), ADGRB1 (BAI1), ADGRB2 (BAI2), ADGRB3 (BAI3), ADGRC1 (CELSR1), ADGRC2 (CELSR2), ADGRC3 (CELSR3), ADGRD1 (GPR133), ADGRD2 (GPR144), ADGRE1 (EMR1, F4/80), ADGRE2 (EMR2), ADGRE3 (EMR3), ADGRE4 (EMR4), ADGRE5 (CD97), ADGRF1 (GPR110), ADGRF2 (GPR111), ADGRF3 (GPR113), ADGRF4 (GPR115), ADGRF5 (GPR116, Ig-Hepta), ADGRG1 (GPR56), ADGRG2 (GPR64, HE6), ADGRG3 (GPR97), ADGRG4 (GPR112), ADGRG5 (GPR114), ADGRG6 (GPR126), ADGRG7 (GPR128), ADGRL1 (latrophilin-1, CIRL-1, CL1), ADGRL2 (latrophilin-2, CIRL-2, CL2), ADGRL3 (latrophilin-3, CIRL-3, CL3), ADGRL4 (ELTD1, ETL), and ADGRV1 (VLGR1, GPR98). This review covers all major biologic aspects of Adhesion GPCRs, including evolutionary origins, interaction partners, signaling, expression, physiologic functions, and therapeutic potential.
Assuntos
Moléculas de Adesão Celular/metabolismo , AMP Cíclico/fisiologia , Modelos Moleculares , Receptores Acoplados a Proteínas G/metabolismo , Sistemas do Segundo Mensageiro , Animais , Adesão Celular , Moléculas de Adesão Celular/química , Membrana Celular/enzimologia , Membrana Celular/metabolismo , Movimento Celular , Humanos , Agências Internacionais , Ligantes , Farmacologia/tendências , Farmacologia Clínica/tendências , Isoformas de Proteínas/agonistas , Isoformas de Proteínas/química , Isoformas de Proteínas/classificação , Isoformas de Proteínas/metabolismo , Receptores Acoplados a Proteínas G/agonistas , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/classificação , Transdução de Sinais , Sociedades Científicas , Terminologia como AssuntoRESUMO
The HUGO Gene Nomenclature Committee (HGNC) approves unique gene symbols and names for human loci. As well as naming genomic loci, we manually curate genes into family sets based on shared characteristics such as function, homology or phenotype. Each HGNC gene family has its own dedicated gene family report on our website, www.genenames.org . We have recently redesigned these reports to support the visualisation and browsing of complex relationships between families and to provide extra curated information such as family descriptions, protein domain graphics and gene family aliases. Here, we review how our gene families are curated and explain how to view, search and download the gene family data.
Assuntos
Bases de Dados Genéticas , Genômica , Proteínas de Neoplasias/genética , Humanos , Internet , Proteínas de Neoplasias/classificaçãoRESUMO
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. To date the HGNC have assigned over 39,000 gene names and, representing an increase of over 5000 entries in the past two years. As well as increasing the size of our database, we have continued redesigning our website http://www.genenames.org and have modified, updated and improved many aspects of the site including a faster and more powerful search, a vastly improved HCOP tool and a REST service to increase the number of ways users can retrieve our data. This article provides an overview of our current online data and resources, and highlights the changes we have made in recent years.
Assuntos
Bases de Dados Genéticas , Genes , Terminologia como Assunto , Genoma Humano , Humanos , InternetRESUMO
The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.
Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Mapeamento Cromossômico , Humanos , Internet , RNA não Traduzido/genética , Análise de Sequência de RNARESUMO
Complete and accurate annotation of the mouse genome is critical to the advancement of research conducted on this important model organism. The National Center for Biotechnology Information (NCBI) develops and maintains many useful resources to assist the mouse research community. In particular, the reference sequence (RefSeq) database provides high-quality annotation of multiple mouse genome assemblies using a combinatorial approach that leverages computation, manual curation, and collaboration. Implementation of this conservative and rigorous approach, which focuses on representation of only full-length and non-redundant data, produces high-quality annotation products. RefSeq records explicitly link sequences to current knowledge in a timely manner, updating public records regularly and rapidly in response to nomenclature updates, addition of new relevant publications, collaborator discussion, and user feedback. Whole genome re-annotation is also conducted at least every 12-18 months, and often more frequently in response to assembly updates or availability of informative data. This article highlights key features and advantages of RefSeq genome annotation products and presents an overview of NCBI processes to generate these data. Further discussion of NCBI's resources highlights useful features and the best methods for accessing our data.
Assuntos
Sequência de Aminoácidos/genética , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genoma , Animais , Internet , CamundongosRESUMO
The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 38,000 approved gene symbols in our database (http://www.genenames.org), the majority represent protein-coding (pc) genes; however, we also name pseudogenes, phenotypic loci, some genomic features, and to date have named more than 8,500 human non-protein coding RNA (ncRNA) genes and ncRNA pseudogenes. We have already established unique names for most of the small ncRNA genes by working with experts for each class. Small ncRNAs can be defined into their respective classes by their shared homology and common function. In contrast, long non-coding RNA (lncRNA) genes represent a disparate set of loci related only by their size, more than 200 bases in length, share no conserved sequence homology, and have variable functions. As with pc genes, wherever possible, lncRNAs are named based on the known function of their product; a short guide is presented herein to help authors when developing novel gene symbols for lncRNAs with characterised function. Researchers must contact the HGNC with their suggestions prior to publication, to check whether the proposed gene symbol can be approved. Although thousands of lncRNAs have been predicted in the human genome, for the vast majority their function remains unresolved. lncRNA genes with no known function are named based on their genomic context. Working with lncRNA researchers, the HGNC aims to provide unique and, wherever possible, meaningful gene symbols to all lncRNA genes.
Assuntos
Bases de Dados Genéticas , Proteínas/genética , RNA Longo não Codificante , Genoma Humano , Genômica , Humanos , Internet , RNA Longo não Codificante/classificação , RNA Longo não Codificante/genéticaRESUMO
The HUGO Gene Nomenclature Committee situated at the European Bioinformatics Institute assigns unique symbols and names to human genes. Since 2011, the data within our database has expanded largely owing to an increase in naming pseudogenes and non-coding RNA genes, and we now have >33,500 approved symbols. Our gene families and groups have also increased to nearly 500, with â¼45% of our gene entries associated to at least one family or group. We have also redesigned the HUGO Gene Nomenclature Committee website http://www.genenames.org creating a constant look and feel across the site and improving usability and readability for our users. The site provides a public access portal to our database with no restrictions imposed on access or the use of the data. Within this article, we review our online resources and data with particular emphasis on the updates to our website.
Assuntos
Bases de Dados Genéticas , Genes , Terminologia como Assunto , Humanos , Internet , Proteínas/genéticaRESUMO
The HUGO Gene Nomenclature Committee has approved gene symbols for the majority of protein-coding genes on the human reference genome. To adequately represent regions of complex structural variation, the Genome Reference Consortium now includes alternative representations of some of these regions as part of the reference genome. Here, we describe examples of how we name novel genes in these regions and how this nomenclature is displayed on our website, http://genenames.org.
Assuntos
Genoma Humano/genética , Mutação/genética , Terminologia como Assunto , Haplótipos , Humanos , Padrões de ReferênciaRESUMO
The serpin family comprises a structurally similar, yet functionally diverse, set of proteins. Named originally for their function as serine proteinase inhibitors, many of its members are not inhibitors but rather chaperones, involved in storage, transport, and other roles. Serpins are found in genomes of all kingdoms, with 36 human protein-coding genes and five pseudogenes. The mouse has 60 Serpin functional genes, many of which are orthologous to human SERPIN genes and some of which have expanded into multiple paralogous genes. Serpins are found in tissues throughout the body; whereas most are extracellular, there is a class of intracellular serpins. Serpins appear to have roles in inflammation, immune function, tumorigenesis, blood clotting, dementia, and cancer metastasis. Further characterization of these proteins will likely reveal potential biomarkers and therapeutic targets for disease.
Assuntos
Serpinas/genética , Serpinas/fisiologia , Animais , Modelos Animais de Doenças , Evolução Molecular , Variação Genética , Humanos , Camundongos , Família Multigênica , Filogenia , Conformação ProteicaRESUMO
During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.
Assuntos
Bases de Dados de Ácidos Nucleicos , RNA/química , Animais , Sequência de Bases , HumanosRESUMO
The HUGO Gene Nomenclature Committee (HGNC) assigns approved gene symbols to human loci. There are currently over 33,000 approved gene symbols, the majority of which represent protein-coding genes, but we also name other locus types such as non-coding RNAs, pseudogenes and phenotypic loci. Where relevant, the HGNC organise these genes into gene families and groups. The HGNC website http://www.genenames.org/ is an online repository of HGNC-approved gene nomenclature and associated resources for human genes, and includes links to genomic, proteomic and phenotypic information. In addition to this, we also have dedicated gene family web pages and are currently expanding and generating more of these pages using data curated by the HGNC and from information derived from external resources that focus on particular gene families. Here, we review our current online resources with a particular focus on our gene family data, using it to highlight our new Gene Symbol Report and gene family data downloads.
Assuntos
Bases de Dados Genéticas , Loci Gênicos/genética , Família Multigênica/genética , Proteínas/genética , Terminologia como Assunto , Variação Genética , Genômica/métodos , Humanos , Proteínas/classificação , Proteínas/metabolismo , Proteômica/métodos , NavegadorRESUMO
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique gene symbol and name to every human gene. The HGNC database currently contains almost 30,000 approved gene symbols, over 19,000 of which represent protein-coding genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC editors and links to related genomic, phenotypic and proteomic information. Here we describe improvements to our resources, including a new Quick Gene Search, a new List Search, an integrated HGNC BioMart and a new Statistics and Downloads facility.
Assuntos
Bases de Dados Genéticas , Genes , Terminologia como Assunto , Animais , Genômica , Humanos , Fenótipo , Proteômica , SoftwareRESUMO
Previously, the majority of the human genome was thought to be 'junk' DNA with no functional purpose. Over the past decade, the field of RNA research has rapidly expanded, with a concomitant increase in the number of non-protein coding RNA (ncRNA) genes identified in this 'junk'. Many of the encoded ncRNAs have already been shown to be essential for a variety of vital functions, and this wealth of annotated human ncRNAs requires standardised naming in order to aid effective communication. The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 30,000 approved gene symbols currently listed in the HGNC database (http://www.genenames.org/search), the majority represent protein-coding genes; however, they also include pseudogenes, phenotypic loci and some genomic features. In recent years the list has also increased to include almost 3,000 named human ncRNA genes. HGNC is actively engaging with the RNA research community in order to provide unique symbols and names for each sequence that encodes an ncRNA. Most of the classical small ncRNA genes have now been provided with a unique nomenclature, and work on naming the long (>200 nucleotides) non-coding RNAs (lncRNAs) is ongoing.
Assuntos
RNA não Traduzido/classificação , Terminologia como Assunto , Endorribonucleases/classificação , Humanos , RNA Mensageiro/classificação , RNA Ribossômico/classificação , RNA de Transferência/classificação , Ribonuclease P/classificaçãoRESUMO
The secretoglobins (SCGBs) comprise a family of small, secreted proteins found in animals exclusively of mammalian lineage. There are 11 human SCGB genes and five pseudogenes. Interestingly, mice have 68 Scgb genes, four of which are highly orthologous to human SCGB genes; the remainder represent an 'evolutionary bloom' and make up a large gene family represented by only six counterparts in humans. SCGBs are found in high concentrations in many mammalian secretions, including fluids of the lung, lacrimal gland, salivary gland, prostate and uterus. Whereas the biological activities of most individual SCGBs have not been fully characterised, what already has been discovered suggests that this family has an important role in the modulation of inflammation, tissue repair and tumorigenesis. In mice, the large Scgb1b and Scgb2b gene families encode the androgen-binding proteins, which have been shown to play a role in mate selection. Although much has been learned about SCGBs in recent years, clearly more research remains to be done to allow a better understanding of the roles of these proteins in human health and disease. Such information is predicted to reveal valuable novel drug targets for the treatment of inflammation, as well as designing biomarkers that might identify tissue damage or cancer.
Assuntos
Proteína de Ligação a Androgênios/genética , Evolução Molecular , Secretoglobinas/genética , Animais , Humanos , CamundongosRESUMO
After the completion of a draft human genome sequence, the International Human Genome Sequencing Consortium has proceeded to finish and annotate each of the 24 chromosomes comprising the human genome. Here we describe the sequencing and analysis of human chromosome 3, one of the largest human chromosomes. Chromosome 3 comprises just four contigs, one of which currently represents the longest unbroken stretch of finished DNA sequence known so far. The chromosome is remarkable in having the lowest rate of segmental duplication in the genome. It also includes a chemokine receptor gene cluster as well as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion that occurred some time after the split of Homininae from Ponginae, and propose an evolutionary history of the inversion.
Assuntos
Cromossomos Humanos Par 3/genética , Animais , Sequência de Bases , Quebra Cromossômica/genética , Inversão Cromossômica/genética , Mapeamento de Sequências Contíguas , Ilhas de CpG/genética , DNA Complementar/genética , Evolução Molecular , Etiquetas de Sequências Expressas , Projeto Genoma Humano , Humanos , Macaca mulatta/genética , Dados de Sequência Molecular , Pan troglodytes/genética , Análise de Sequência de DNA , Sintenia/genéticaRESUMO
Keratins are intermediate filament-forming proteins that provide mechanical support and fulfill a variety of additional functions in epithelial cells. In 1982, a nomenclature was devised to name the keratin proteins that were known at that point. The systematic sequencing of the human genome in recent years uncovered the existence of several novel keratin genes and their encoded proteins. Their naming could not be adequately handled in the context of the original system. We propose a new consensus nomenclature for keratin genes and proteins that relies upon and extends the 1982 system and adheres to the guidelines issued by the Human and Mouse Genome Nomenclature Committees. This revised nomenclature accommodates functional genes and pseudogenes, and although designed specifically for the full complement of human keratins, it offers the flexibility needed to incorporate additional keratins from other mammalian species.
Assuntos
Queratinas/classificação , Terminologia como Assunto , Animais , Humanos , Queratinas/química , Queratinas/genética , Mamíferos , Pseudogenes/genéticaRESUMO
The IUPHAR database (IUPHAR-DB) integrates peer-reviewed pharmacological, chemical, genetic, functional and anatomical information on the 354 nonsensory G protein-coupled receptors (GPCRs), 71 ligand-gated ion channel subunits and 141 voltage-gated-like ion channel subunits encoded by the human, rat and mouse genomes. These genes represent the targets of approximately one-third of currently approved drugs and are a major focus of drug discovery and development programs in the pharmaceutical industry. IUPHAR-DB provides a comprehensive description of the genes and their functions, with information on protein structure and interactions, ligands, expression patterns, signaling mechanisms, functional assays and biologically important receptor variants (e.g. single nucleotide polymorphisms and splice variants). In addition, the phenotypes resulting from altered gene expression (e.g. in genetically altered animals or in human genetic disorders) are described. The content of the database is peer reviewed by members of the International Union of Basic and Clinical Pharmacology Committee on Receptor Nomenclature and Drug Classification (NC-IUPHAR); the data are provided through manual curation of the primary literature by a network of over 60 subcommittees of NC-IUPHAR. Links to other bioinformatics resources, such as NCBI, Uniprot, HGNC and the rat and mouse genome databases are provided. IUPHAR-DB is freely available at http://www.iuphar-db.org.
Assuntos
Bases de Dados de Proteínas , Canais Iônicos/genética , Canais Iônicos/fisiologia , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/fisiologia , Animais , Descoberta de Drogas , Humanos , Canais Iônicos/química , Ligantes , Camundongos , Subunidades Proteicas/química , Subunidades Proteicas/genética , Subunidades Proteicas/fisiologia , Ratos , Receptores Acoplados a Proteínas G/químicaRESUMO
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique and ideally meaningful name and symbol to every human gene. The HGNC database currently comprises over 24 000 public records containing approved human gene nomenclature and associated gene information. Following our recent relocation to the European Bioinformatics Institute our homepage can now be found at http://www.genenames.org, with direct links to the searchable HGNC database and other related database resources, such as the HCOP orthology search tool and manually curated gene family webpages.