Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 74
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Blood ; 142(24): 2055-2068, 2023 12 14.
Article in English | MEDLINE | ID: mdl-37647632

ABSTRACT

Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.


Subject(s)
Genome-Wide Association Study , Thrombosis , Humans , Biological Specimen Banks , Hemostasis , Hemorrhage/genetics , Rare Diseases
2.
Nucleic Acids Res ; 51(D1): D1003-D1009, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36243972

ABSTRACT

The HUGO Gene Nomenclature Committee (HGNC) assigns unique symbols and names to human genes. The HGNC database (www.genenames.org) currently contains over 43 000 approved gene symbols, over 19 200 of which are assigned to protein-coding genes, 14 000 to pseudogenes and nearly 9000 to non-coding RNA genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC nomenclature advisors and links to related genomic, clinical, and proteomic information. Here, we describe updates to our resource, including improvements to our search facility and new download features.


Subject(s)
Databases, Genetic , Humans , Genome , Genomics , Proteomics , Pseudogenes , Terminology as Topic
3.
EMBO J ; 39(6): e103777, 2020 03 16.
Article in English | MEDLINE | ID: mdl-32090359

ABSTRACT

Research on non-coding RNA (ncRNA) is a rapidly expanding field. Providing an official gene symbol and name to ncRNA genes brings order to otherwise potential chaos as it allows unambiguous communication about each gene. The HUGO Gene Nomenclature Committee (HGNC, www.genenames.org) is the only group with the authority to approve symbols for human genes. The HGNC works with specialist advisors for different classes of ncRNA to ensure that ncRNA nomenclature is accurate and informative, where possible. Here, we review each major class of ncRNA that is currently annotated in the human genome and describe how each class is assigned a standardised nomenclature.


Subject(s)
Genome, Human/genetics , RNA, Untranslated/classification , Terminology as Topic , Humans , RNA, Untranslated/genetics
4.
Am J Hum Genet ; 108(10): 1813-1816, 2021 10 07.
Article in English | MEDLINE | ID: mdl-34626580

ABSTRACT

The use of approved nomenclature in publications is vital to enable effective scientific communication and is particularly crucial when discussing genes of clinical relevance. Here, we discuss several examples of cases where the failure of researchers to use a HUGO Gene Nomenclature Committee (HGNC)-approved symbol in publications has led to confusion between unrelated human genes in the literature. We also inform authors of the steps they can take to ensure that they use approved nomenclature in their manuscripts and discuss how referencing HGNC IDs can remove ambiguity when referring to genes that have previously been published with confusing alias symbols.


Subject(s)
Databases, Genetic/standards , Genes/genetics , Genome, Human , Research Personnel/standards , Terminology as Topic , Genomics , Humans
6.
Nucleic Acids Res ; 50(W1): W623-W632, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35552456

ABSTRACT

The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.


Subject(s)
Benchmarking , Genomics , Phylogeny , Genomics/methods , Proteome
7.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-33959747

ABSTRACT

Multiple resources currently exist that predict orthologous relationships between genes. These resources differ both in the methodologies used and in the species they make predictions for. The HGNC Comparison of Orthology Predictions (HCOP) search tool integrates and displays data from multiple ortholog prediction resources for a specified human gene or set of genes. An indication of the reliability of a prediction is provided by the number of resources that support it. HCOP was originally designed to show orthology predictions between human and mouse but has been expanded to include data from a current total of 20 selected vertebrate and model organism species. The HCOP pipeline used to fetch and integrate the information from the disparate ortholog and nomenclature data resources has recently been rewritten, both to enable the inclusion of new data and to take advantage of modern web technologies. Data from HCOP are used extensively in our work naming genes as the Vertebrate Gene Nomenclature Committee (https://vertebrate.genenames.org).


Subject(s)
Computational Biology/methods , Genomics/methods , Sequence Homology , Software , Animals , Databases, Genetic , Humans , Vertebrates , Web Browser , Workflow
8.
IUBMB Life ; 75(5): 380-389, 2023 05.
Article in English | MEDLINE | ID: mdl-35880706

ABSTRACT

The HUGO Gene Nomenclature Committee (HGNC) is the sole group with the authority to approve symbols for human genes, including long non-coding RNA (lncRNA) genes. Use of approved symbols ensures that publications and biomedical databases are easily searchable and reduces the risks of confusion that can be caused by using the same symbol to refer to different genes or using many different symbols for the same gene. Here, we describe how the HGNC names lncRNA genes and review the nomenclature of the seven lncRNA genes most mentioned in the scientific literature.


Subject(s)
RNA, Long Noncoding , Humans , RNA, Long Noncoding/genetics , Databases, Genetic
9.
Hum Genomics ; 16(1): 66, 2022 12 02.
Article in English | MEDLINE | ID: mdl-36461115

ABSTRACT

The HUGO Gene Nomenclature Committee assigns unique symbols and names to human genes. The use of approved nomenclature enables effective communication between researchers, and there are multiple examples of how the usage of unapproved alias symbols can lead to confusion. We discuss here a recent nomenclature update (May 2022) for a set of genes that encode proteins with a shared repeating ß-groove domain. Some of the proteins encoded by genes in this group have already been shown to function as lipid transporters. By working with researchers in the field, we have been able to introduce a new root symbol (BLTP, which stands for "bridge-like lipid transfer protein") for this domain-based gene group. This new nomenclature not only reflects the shared domain in these proteins, but also takes into consideration the mounting evidence of a shared lipid transport function.


Subject(s)
Lipids , Humans
10.
Hum Genomics ; 16(1): 58, 2022 11 15.
Article in English | MEDLINE | ID: mdl-36380364

ABSTRACT

The HUGO Gene Nomenclature Committee (HGNC) has been providing standardized symbols and names for human genes since the late 1970s. As funding agencies change their priorities, finding financial support for critical biomedical resources such as the HGNC becomes ever more challenging. In this article, we outline the key roles the HGNC currently plays in aiding communication and the need for these activities to be maintained.


Subject(s)
Databases, Genetic , Genomics , Humans
11.
Hum Genomics ; 16(1): 1, 2022 01 06.
Article in English | MEDLINE | ID: mdl-34991727

ABSTRACT

Intermediate filament (IntFil) genes arose during early metazoan evolution, to provide mechanical support for plasma membranes contacting/interacting with other cells and the extracellular matrix. Keratin genes comprise the largest subset of IntFil genes. Whereas the first keratin gene appeared in sponge, and three genes in arthropods, more rapid increases in keratin genes occurred in lungfish and amphibian genomes, concomitant with land animal-sea animal divergence (~ 440 to 410 million years ago). Human, mouse and zebrafish genomes contain 18, 17 and 24 non-keratin IntFil genes, respectively. Human has 27 of 28 type I "acidic" keratin genes clustered at chromosome (Chr) 17q21.2, and all 26 type II "basic" keratin genes clustered at Chr 12q13.13. Mouse has 27 of 28 type I keratin genes clustered on Chr 11, and all 26 type II clustered on Chr 15. Zebrafish has 18 type I keratin genes scattered on five chromosomes, and 3 type II keratin genes on two chromosomes. Types I and II keratin clusters-reflecting evolutionary blooms of keratin genes along one chromosomal segment-are found in all land animal genomes examined, but not fishes; such rapid gene expansions likely reflect sudden requirements for many novel paralogous proteins having divergent functions to enhance species survival following sea-to-land transition. Using data from the Genotype-Tissue Expression (GTEx) project, tissue-specific keratin expression throughout the human body was reconstructed. Clustering of gene expression patterns revealed similarities in tissue-specific expression patterns for previously described "keratin pairs" (i.e., KRT1/KRT10, KRT8/KRT18, KRT5/KRT14, KRT6/KRT16 and KRT6/KRT17 proteins). The ClinVar database currently lists 26 human disease-causing variants within the various domains of keratin proteins.


Subject(s)
Keratins , Zebrafish , Animals , Genome , Keratins/genetics , Keratins, Type I/genetics , Mice
12.
Hum Genomics ; 16(1): 56, 2022 11 11.
Article in English | MEDLINE | ID: mdl-36369063

ABSTRACT

Following the draft sequence of the first human genome over 20 years ago, we have achieved unprecedented insights into the rules governing its evolution, often with direct translational relevance to specific diseases. However, staggering sequence complexity has also challenged the development of a more comprehensive understanding of human genome biology. In this context, interspecific genomic studies between humans and other animals have played a critical role in our efforts to decode human gene families. In this review, we focus on how the rapid surge of genome sequencing of both model and non-model organisms now provides a broader comparative framework poised to empower novel discoveries. We begin with a general overview of how comparative approaches are essential for understanding gene family evolution in the human genome, followed by a discussion of analyses of gene expression. We show how homology can provide insights into the genes and gene families associated with immune response, cancer biology, vision, chemosensation, and metabolism, by revealing similarity in processes among distant species. We then explain methodological tools that provide critical advances and show the limitations of common approaches. We conclude with a discussion of how these investigations position us to gain fundamental insights into the evolution of gene families among living organisms in general. We hope that our review catalyzes additional excitement and research on the emerging field of comparative genomics, while aiding the placement of the human genome into its existentially evolutionary context.


Subject(s)
Evolution, Molecular , Genomics , Animals , Humans , Genome , Base Sequence , Phylogeny
13.
Nucleic Acids Res ; 49(D1): D939-D946, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33152070

ABSTRACT

The HUGO Gene Nomenclature Committee (HGNC) based at EMBL's European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. There are over 42,000 approved gene symbols in our current database of which over 19 000 are for protein-coding genes. While we still update placeholder and problematic symbols, we are working towards stabilizing symbols where possible; over 2000 symbols for disease associated genes are now marked as stable in our symbol reports. All of our data is available at the HGNC website https://www.genenames.org. The Vertebrate Gene Nomenclature Committee (VGNC) was established to assign standardized nomenclature in line with human for vertebrate species lacking their own nomenclature committee. In addition to the previous VGNC core species of chimpanzee, cow, horse and dog, we now name genes in cat, macaque and pig. Gene groups have been added to VGNC and currently include two complex families: olfactory receptors (ORs) and cytochrome P450s (CYPs). In collaboration with specialists we have also named CYPs in species beyond our core set. All VGNC data is available at https://vertebrate.genenames.org/. This article provides an overview of our online data and resources, focusing on updates over the last two years.


Subject(s)
Computational Biology/methods , Databases, Genetic , Genes/genetics , Genomics/methods , Terminology as Topic , Vertebrates/genetics , Animals , Humans , Internet , Proteins/genetics , Species Specificity , User-Computer Interface , Vertebrates/classification
14.
Genome Res ; 29(12): 2073-2087, 2019 12.
Article in English | MEDLINE | ID: mdl-31537640

ABSTRACT

The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.


Subject(s)
Exons , Genome, Human , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Open Reading Frames , Sequence Analysis, DNA , Animals , Humans , Pseudogenes
15.
Genet Med ; 24(8): 1732-1742, 2022 08.
Article in English | MEDLINE | ID: mdl-35507016

ABSTRACT

PURPOSE: Several groups and resources provide information that pertains to the validity of gene-disease relationships used in genomic medicine and research; however, universal standards and terminologies to define the evidence base for the role of a gene in disease and a single harmonized resource were lacking. To tackle this issue, the Gene Curation Coalition (GenCC) was formed. METHODS: The GenCC drafted harmonized definitions for differing levels of gene-disease validity on the basis of existing resources, and performed a modified Delphi survey with 3 rounds to narrow the list of terms. The GenCC also developed a unified database to display curated gene-disease validity assertions from its members. RESULTS: On the basis of 241 survey responses from the genetics community, a consensus term set was chosen for grading gene-disease validity and database submissions. As of December 2021, the database contained 15,241 gene-disease assertions on 4569 unique genes from 12 submitters. When comparing submissions to the database from distinct sources, conflicts in assertions of gene-disease validity ranged from 5.3% to 13.4%. CONCLUSION: Terminology standardization, sharing of gene-disease validity classifications, and resolution of curation conflicts will facilitate collaborations across international curation efforts and in turn, improve consistency in genetic testing and variant interpretation.


Subject(s)
Databases, Genetic , Genomics , Genetic Testing , Genetic Variation , Humans
16.
Hum Genet ; 140(3): 381-400, 2021 Mar.
Article in English | MEDLINE | ID: mdl-32728807

ABSTRACT

Paired-box (PAX) genes encode a family of highly conserved transcription factors found in vertebrates and invertebrates. PAX proteins are defined by the presence of a paired domain that is evolutionarily conserved across phylogenies. Inclusion of a homeodomain and/or an octapeptide linker subdivides PAX proteins into four groups. Often termed "master regulators", PAX proteins orchestrate tissue and organ development throughout cell differentiation and lineage determination, and are essential for tissue structure and function through maintenance of cell identity. Mutations in PAX genes are associated with myriad human diseases (e.g., microphthalmia, anophthalmia, coloboma, hypothyroidism, acute lymphoblastic leukemia). Transcriptional regulation by PAX proteins is, in part, modulated by expression of alternatively spliced transcripts. Herein, we provide a genomics update on the nine human PAX family members and PAX homologs in 16 additional species. We also present a comprehensive summary of human tissue-specific PAX transcript variant expression and describe potential functional significance of PAX isoforms. While the functional roles of PAX proteins in developmental diseases and cancer are well characterized, much remains to be understood regarding the functional roles of PAX isoforms in human health. We anticipate the analysis of tissue-specific PAX transcript variant expression presented herein can serve as a starting point for such research endeavors.


Subject(s)
Genetic Predisposition to Disease , Paired Box Transcription Factors/genetics , Alternative Splicing , Animals , Chromosome Mapping , Evolution, Molecular , Humans , Phylogeny , RNA, Messenger/genetics , Transcription, Genetic
17.
Nucleic Acids Res ; 47(D1): D786-D792, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30304474

ABSTRACT

The HUGO Gene Nomenclature Committee (HGNC) based at EMBL's European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. There are over 40 000 approved gene symbols in our current database of which over 19 000 are for protein-coding genes. The Vertebrate Gene Nomenclature Committee (VGNC) was established in 2016 to assign standardized nomenclature in line with human for vertebrate species that lack their own nomenclature committees. The VGNC initially assigned nomenclature for over 15000 protein-coding genes in chimpanzee. We have extended this process to other vertebrate species, naming over 14000 protein-coding genes in cow and dog and over 13 000 in horse to date. Our HGNC website https://www.genenames.org has undergone a major design update, simplifying the homepage to provide easy access to our search tools and making the site more mobile friendly. Our gene families pages are now known as 'gene groups' and have increased in number to over 1200, with nearly half of all named genes currently assigned to at least one gene group. This article provides an overview of our online data and resources, focusing on our work over the last two years.


Subject(s)
Computational Biology/standards , Databases, Genetic/standards , Genomics/standards , Terminology as Topic , Animals , Cattle , Dogs , Horses/genetics , Humans , Pan troglodytes/genetics , Search Engine
18.
BMC Evol Biol ; 20(1): 42, 2020 04 15.
Article in English | MEDLINE | ID: mdl-32295537

ABSTRACT

BACKGROUND: Olfactory receptors (ORs) are G protein-coupled receptors with a crucial role in odor detection. A typical mammalian genome harbors ~ 1000 OR genes and pseudogenes; however, different gene duplication/deletion events have occurred in each species, resulting in complex orthology relationships. While the human OR nomenclature is widely accepted and based on phylogenetic classification into 18 families and further into subfamilies, for other mammals different and multiple nomenclature systems are currently in use, thus concealing important evolutionary and functional insights. RESULTS: Here, we describe the Mutual Maximum Similarity (MMS) algorithm, a systematic classifier for assigning a human-centric nomenclature to any OR gene based on inter-species hierarchical pairwise similarities. MMS was applied to the OR repertoires of seven mammals and zebrafish. Altogether, we assigned symbols to 10,249 ORs. This nomenclature is supported by both phylogenetic and synteny analyses. The availability of a unified nomenclature provides a framework for diverse studies, where textual symbol comparison allows immediate identification of potential ortholog groups as well as species-specific expansions/deletions; for example, Or52e5 and Or52e5b represent a rat-specific duplication of OR52E5. Another example is the complete absence of OR subfamily OR6Z among primate OR symbols. In other mammals, OR6Z members are located in one genomic cluster, suggesting a large deletion in the great ape lineage. An additional 14 mammalian OR subfamilies are missing from the primate genomes. While in chimpanzee 87% of the symbols were identical to human symbols, this number decreased to ~ 50% in dog and cow and to ~ 30% in rodents, reflecting the adaptive changes of the OR gene superfamily across diverse ecological niches. Application of the proposed nomenclature to zebrafish revealed similarity to mammalian ORs that could not be detected from the current zebrafish olfactory receptor gene nomenclature. CONCLUSIONS: We have consolidated a unified standard nomenclature system for the vertebrate OR superfamily. The new nomenclature system will be applied to cow, horse, dog and chimpanzee by the Vertebrate Gene Nomenclature Committee and its implementation is currently under consideration by other relevant species-specific nomenclature committees.


Subject(s)
Algorithms , Receptors, Odorant , Terminology as Topic , Vertebrates , Animals , Cattle , Dogs , Genome , Horses , Humans , Pan troglodytes , Phylogeny , Rats , Receptors, Odorant/genetics , Species Specificity , Synteny , Vertebrates/genetics , Zebrafish
19.
Hum Genomics ; 13(1): 11, 2019 02 19.
Article in English | MEDLINE | ID: mdl-30782214

ABSTRACT

Lipocalins (LCNs) are members of a family of evolutionarily conserved genes present in all kingdoms of life. There are 19 LCN-like genes in the human genome, and 45 Lcn-like genes in the mouse genome, which include 22 major urinary protein (Mup) genes. The Mup genes, plus 29 of 30 Mup-ps pseudogenes, are all located together on chromosome (Chr) 4; evidence points to an "evolutionary bloom" that resulted in this Mup cluster in mouse, syntenic to the human Chr 9q32 locus at which a single MUPP pseudogene is located. LCNs play important roles in physiological processes by binding and transporting small hydrophobic molecules -such as steroid hormones, odorants, retinoids, and lipids-in plasma and other body fluids. LCNs are extensively used in clinical practice as biochemical markers. LCN-like proteins (18-40 kDa) have the characteristic eight ß-strands creating a barrel structure that houses the binding-site; LCNs are synthesized in the liver as well as various secretory tissues. In rodents, MUPs are involved in communication of information in urine-derived scent marks, serving as signatures of individual identity, or as kairomones (to elicit fear behavior). MUPs also participate in regulation of glucose and lipid metabolism via a mechanism not well understood. Although much has been learned about LCNs and MUPs in recent years, more research is necessary to allow better understanding of their physiological functions, as well as their involvement in clinical disorders.


Subject(s)
Evolution, Molecular , Lipocalins/genetics , Animals , Genome, Human , Humans , Lipocalins/metabolism , Mice , Multigene Family
20.
Nucleic Acids Res ; 46(D1): D221-D228, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29126148

ABSTRACT

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.


Subject(s)
Consensus Sequence , Databases, Genetic , Open Reading Frames , Animals , Data Curation/methods , Data Curation/standards , Databases, Genetic/standards , Guidelines as Topic , Humans , Mice , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL