ABSTRACT
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Subject(s)
Chromosomes, Human, Y , Genomics , Sequence Analysis, DNA , Humans , Base Sequence , Chromosomes, Human, Y/genetics , DNA, Satellite/genetics , Genetic Variation/genetics , Genetics, Population , Genomics/methods , Genomics/standards , Heterochromatin/genetics , Multigene Family/genetics , Reference Standards , Segmental Duplications, Genomic/genetics , Sequence Analysis, DNA/standards , Tandem Repeat Sequences/genetics , Telomere/geneticsABSTRACT
DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources) shares candidate diagnostic variants and phenotypic data from patients with genetic disorders to facilitate research and improve the diagnosis, management, and therapy of rare diseases. The platform sits at the boundary between genomic research and the clinical community. DECIPHER aims to ensure that the most up-to-date data are made rapidly available within its interpretation interfaces to improve clinical care. Newly integrated cardiac case-control data that provide evidence of gene-disease associations and inform variant interpretation exemplify this mission. New research resources are presented in a format optimized for use by a broad range of professionals supporting the delivery of genomic medicine. The interfaces within DECIPHER integrate and contextualize variant and phenotypic data, helping to determine a robust clinico-molecular diagnosis for rare-disease patients, which combines both variant classification and clinical fit. DECIPHER supports discovery research, connecting individuals within the rare-disease community to pursue hypothesis-driven research.
Subject(s)
Genomics , Genomics/methods , Humans , Rare Diseases/genetics , Alleles , Practice Guidelines as Topic , DNA Copy Number Variations , Databases, GeneticABSTRACT
BACKGROUND: Genomic variant prioritisation is one of the most significant bottlenecks to mainstream genomic testing in healthcare. Tools to improve precision while ensuring high recall are critical to successful mainstream clinical genomic testing, in particular for whole genome sequencing where millions of variants must be considered for each patient. METHODS: We developed EyeG2P, a publicly available database and web application using the Ensembl Variant Effect Predictor. EyeG2P is tailored for efficient variant prioritisation for individuals with inherited ophthalmic conditions. We assessed the sensitivity of EyeG2P in 1234 individuals with a broad range of eye conditions who had previously received a confirmed molecular diagnosis through routine genomic diagnostic approaches. For a prospective cohort of 83 individuals, we assessed the precision of EyeG2P in comparison with routine diagnostic approaches. For 10 additional individuals, we assessed the utility of EyeG2P for whole genome analysis. RESULTS: EyeG2P had 99.5% sensitivity for genomic variants previously identified as clinically relevant through routine diagnostic analysis (n=1234 individuals). Prospectively, EyeG2P enabled a significant increase in precision (35% on average) in comparison with routine testing strategies (p<0.001). We demonstrate that incorporation of EyeG2P into whole genome sequencing analysis strategies can reduce the number of variants for analysis to six variants, on average, while maintaining high diagnostic yield. CONCLUSION: Automated filtering of genomic variants through EyeG2P can increase the efficiency of diagnostic testing for individuals with a broad range of inherited ophthalmic disorders.
Subject(s)
Databases, Genetic , Eye Diseases , Genetic Testing , Genome, Human , Genomics , Eye Diseases/genetics , Humans , Genetic VariationABSTRACT
The European Variation Archive (EVA; https://www.ebi.ac.uk/eva/) is a resource for sharing all types of genetic variation data (SNPs, indels, and structural variants) for all species. The EVA was created in 2014 to provide FAIR access to genetic variation data and has since grown to be a primary resource for genomic variants hosting >3 billion records. The EVA and dbSNP have established a compatible global system to assign unique identifiers to all submitted genetic variants. The EVA is active within the Global Alliance of Genomics and Health (GA4GH), maintaining, contributing and implementing standards such as VCF, Refget and Variant Representation Specification (VRS). In this article, we describe the submission and permanent accessioning services along with the different ways the data can be retrieved by the scientific community.
Subject(s)
Computational Biology , Databases, Genetic , Genetic Variation/genetics , Software , Animals , Genomic Structural Variation/genetics , Genomics , Humans , INDEL Mutation/genetics , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/geneticsABSTRACT
The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.
Subject(s)
COVID-19/virology , Databases, Genetic , SARS-CoV-2/genetics , Web Browser , Coronaviridae/genetics , Genetic Variation , Genome, Viral , Humans , Molecular Sequence AnnotationABSTRACT
Our incomplete knowledge of the human transcriptome impairs the detection of disease-causing variants, in particular if they affect transcripts only expressed under certain conditions. These transcripts are often lacking from reference transcript sets, such as Ensembl/GENCODE and RefSeq, and could be relevant for establishing genetic diagnoses. We present SUsPECT (Solving Unsolved Patient Exomes/gEnomes using Custom Transcriptomes), a pipeline based on the Ensembl Variant Effect Predictor (VEP) to predict variant impact on custom transcript sets, such as those generated by long-read RNA-sequencing, for downstream prioritization. Our pipeline predicts the functional consequence and likely deleteriousness scores for missense variants in the context of novel open reading frames predicted from any transcriptome. We demonstrate the utility of SUsPECT by uncovering potential mutational mechanisms of pathogenic variants in ClinVar that are not predicted to be pathogenic using the reference transcript annotation. In further support of SUsPECT's utility, we identified an enrichment of immune-related variants predicted to have a more severe molecular consequence when annotating with a newly generated transcriptome from stimulated immune cells instead of the reference transcriptome. Our pipeline outputs crucial information for further prioritization of potentially disease-causing variants for any disease and will become increasingly useful as more long-read RNA sequencing datasets become available.
Subject(s)
Software , Transcriptome , Humans , Molecular Sequence Annotation , Sequence Analysis, RNA/methods , Exome , High-Throughput Nucleotide SequencingABSTRACT
DECIPHER (https://www.deciphergenomics.org) is a free web platform for sharing anonymized phenotype-linked variant data from rare disease patients. Its dynamic interpretation interfaces contextualize genomic and phenotypic data to enable more informed variant interpretation, incorporating international standards for variant classification. DECIPHER supports almost all types of germline and mosaic variation in the nuclear and mitochondrial genome: sequence variants, short tandem repeats, copy-number variants, and large structural variants. Patient phenotypes are deposited using Human Phenotype Ontology (HPO) terms, supplemented by quantitative data, which is aggregated to derive gene-specific phenotypic summaries. It hosts data from >250 projects from ~40 countries, openly sharing >40,000 patient records containing >51,000 variants and >172,000 phenotype terms. The rich phenotype-linked variant data in DECIPHER drives rare disease research and diagnosis by enabling patient matching within DECIPHER and with other resources, and has been cited in >2,600 publications. In this study, we describe the types of data deposited to DECIPHER, the variant interpretation tools, and patient matching interfaces which make DECIPHER an invaluable rare disease resource.
Subject(s)
Databases, Genetic , Rare Diseases , Genomics , Humans , Phenotype , Rare Diseases/diagnosis , Rare Diseases/genetics , SoftwareABSTRACT
The Ensembl Variant Effect Predictor (VEP) is a freely available, open-source tool for the annotation and filtering of genomic variants. It predicts variant molecular consequences using the Ensembl/GENCODE or RefSeq gene sets. It also reports phenotype associations from databases such as ClinVar, allele frequencies from studies including gnomAD, and predictions of deleteriousness from tools such as Sorting Intolerant From Tolerant and Combined Annotation Dependent Depletion. Ensembl VEP includes filtering options to customize variant prioritization. It is well supported and updated roughly quarterly to incorporate the latest gene, variant, and phenotype association information. Ensembl VEP analysis can be performed using a highly configurable, extensible command-line tool, a Representational State Transfer application programming interface, and a user-friendly web interface. These access methods are designed to suit different levels of bioinformatics experience and meet different needs in terms of data size, visualization, and flexibility. In this tutorial, we will describe performing variant annotation using the Ensembl VEP web tool, which enables sophisticated analysis through a simple interface.
Subject(s)
Genomics , Software , Computational Biology , Databases, Genetic , Gene Frequency , Humans , Molecular Sequence Annotation , PhenotypeABSTRACT
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
Subject(s)
Computational Biology/methods , Databases, Genetic , Genetic Variation , Genome, Bacterial , Genome, Fungal , Genome, Plant , Algorithms , Animals , Caenorhabditis elegans/genetics , Genomics , Internet , Molecular Sequence Annotation , Phenotype , Plants/genetics , Reference Values , Software , User-Computer InterfaceABSTRACT
SUMMARY: Assessing the pathogenicity of genetic variants can be a complex and challenging task. Spliceogenic variants, which alter mRNA splicing, may yield mature transcripts that encode non-functional protein products, an important predictor of Mendelian disease risk. However, most variant annotation tools do not adequately assess spliceogenicity outside the native splice site and thus the disease-causing potential of variants in other intronic and exonic regions is often overlooked. Here, we present a plugin for the Ensembl Variant Effect Predictor that packages MaxEntScan and extends its functionality to provide splice site predictions using a maximum entropy model. The plugin incorporates a sliding window algorithm to predict splice site loss or gain for any variant that overlaps a transcript feature. We also demonstrate the utility of the plugin by comparing our predictions to two mRNA splicing datasets containing several cancer-susceptibility genes. AVAILABILITY AND IMPLEMENTATION: Source code is freely available under the Apache License, Version 2.0: https://github.com/Ensembl/VEP_plugins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
RNA Splicing , Software , Algorithms , Exons , IntronsABSTRACT
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
Subject(s)
Databases, Genetic , Datasets as Topic , Genome , Information Dissemination , Animals , Epigenomics , Genome, Human , Genome-Wide Association Study , Genomics , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Vertebrates/genetics , Web BrowserABSTRACT
Bacteremia (bacterial bloodstream infection) is a major cause of illness and death in sub-Saharan Africa but little is known about the role of human genetics in susceptibility. We conducted a genome-wide association study of bacteremia susceptibility in more than 5,000 Kenyan children as part of the Wellcome Trust Case Control Consortium 2 (WTCCC2). Both the blood-culture-proven bacteremia case subjects and healthy infants as controls were recruited from Kilifi, on the east coast of Kenya. Streptococcus pneumoniae is the most common cause of bacteremia in Kilifi and was thus the focus of this study. We identified an association between polymorphisms in a long intergenic non-coding RNA (lincRNA) gene (AC011288.2) and pneumococcal bacteremia and replicated the results in the same population (p combined = 1.69 × 10(-9); OR = 2.47, 95% CI = 1.84-3.31). The susceptibility allele is African specific, derived rather than ancestral, and occurs at low frequency (2.7% in control subjects and 6.4% in case subjects). Our further studies showed AC011288.2 expression only in neutrophils, a cell type that is known to play a major role in pneumococcal clearance. Identification of this novel association will further focus research on the role of lincRNAs in human infectious disease.
Subject(s)
Bacteremia/genetics , Pneumonia, Pneumococcal/genetics , Polymorphism, Genetic/genetics , RNA, Long Noncoding/genetics , Streptococcus pneumoniae/genetics , Adolescent , Bacteremia/microbiology , Bacteremia/pathology , Case-Control Studies , Child , Child, Preschool , Genome-Wide Association Study , Humans , Infant , Infant, Newborn , Kenya/epidemiology , Pneumonia, Pneumococcal/microbiology , Pneumonia, Pneumococcal/pathology , Risk FactorsABSTRACT
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Subject(s)
Computational Biology/methods , Databases, Genetic , Genomics/methods , Search Engine , Software , Web Browser , Animals , Data Mining , Evolution, Molecular , Gene Expression Regulation , Genetic Variation , Genome, Human , Humans , Molecular Sequence Annotation , Species Specificity , VertebratesABSTRACT
The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
Subject(s)
Databases, Genetic , Genomics , Molecular Sequence Annotation , Animals , Genes , Genetic Variation , Humans , Internet , Mice , Proteins/genetics , Rats , Regulatory Sequences, Nucleic Acid , SoftwareABSTRACT
Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.
Subject(s)
Genetic Predisposition to Disease/genetics , Immunity, Cellular/immunology , Multiple Sclerosis/genetics , Multiple Sclerosis/immunology , Alleles , Cell Differentiation/immunology , Europe/ethnology , Genome, Human/genetics , Genome-Wide Association Study , HLA-A Antigens/genetics , HLA-DR Antigens/genetics , HLA-DRB1 Chains , Humans , Immunity, Cellular/genetics , Major Histocompatibility Complex/genetics , Polymorphism, Single Nucleotide/genetics , Sample Size , T-Lymphocytes, Helper-Inducer/cytology , T-Lymphocytes, Helper-Inducer/immunologyABSTRACT
Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
Subject(s)
Databases, Nucleic Acid , Genomics , Animals , Epigenesis, Genetic , Genetic Variation , Genome, Human , Humans , Internet , Mice , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid , SoftwareABSTRACT
A variety of macromolecules accumulate in the glomerular mesangium in many different diseases, but the physics of the transport of these molecules within the mesangial matrix has not been extensively studied. We present a computational model of convection and diffusion within the porous mesangial matrix and apply this model to the specific instance of immunoglobulin A (IgA) transport in IgA nephropathy. We examine the influence of physiological factors including glomerular basement membrane (GBM) thickness and mesangial matrix density on the total accumulation of IgA. Our results suggest that IgA accumulation can be understood by relating convection and diffusion, thus demonstrating the importance of intrinsic glomerular factors.
Subject(s)
Computer Simulation , Glomerular Mesangium/metabolism , Glomerulonephritis, IGA/metabolism , Immunoglobulin A/metabolism , Models, Biological , Animals , Biological Transport , Diffusion , Glomerular Basement Membrane/metabolism , Glomerular Basement Membrane/pathology , Glomerular Mesangium/blood supply , Glomerular Mesangium/pathology , Glomerulonephritis, IGA/pathology , Humans , Motion , Osmotic Pressure , Particle Size , Porosity , Pressure , Renal Circulation , Time FactorsABSTRACT
Abdominal aortic aneurysm (AAA) is a common cause of morbidity and mortality and has a significant heritability. We carried out a genome-wide association discovery study of 1866 patients with AAA and 5435 controls and replication of promising signals (lead SNP with a p value < 1 × 10(-5)) in 2871 additional cases and 32,687 controls and performed further follow-up in 1491 AAA and 11,060 controls. In the discovery study, nine loci demonstrated association with AAA (p < 1 × 10(-5)). In the replication sample, the lead SNP at one of these loci, rs1466535, located within intron 1 of low-density-lipoprotein receptor-related protein 1 (LRP1) demonstrated significant association (p = 0.0042). We confirmed the association of rs1466535 and AAA in our follow-up study (p = 0.035). In a combined analysis (6228 AAA and 49182 controls), rs1466535 had a consistent effect size and direction in all sample sets (combined p = 4.52 × 10(-10), odds ratio 1.15 [1.10-1.21]). No associations were seen for either rs1466535 or the 12q13.3 locus in independent association studies of coronary artery disease, blood pressure, diabetes, or hyperlipidaemia, suggesting that this locus is specific to AAA. Gene-expression studies demonstrated a trend toward increased LRP1 expression for the rs1466535 CC genotype in arterial tissues; there was a significant (p = 0.029) 1.19-fold (1.04-1.36) increase in LRP1 expression in CC homozygotes compared to TT homozygotes in aortic adventitia. Functional studies demonstrated that rs1466535 might alter a SREBP-1 binding site and influence enhancer activity at the locus. In conclusion, this study has identified a biologically plausible genetic variant associated specifically with AAA, and we suggest that this variant has a possible functional role in LRP1 expression.
Subject(s)
Aorta/metabolism , Aortic Aneurysm, Abdominal/genetics , Genetic Loci/genetics , Low Density Lipoprotein Receptor-Related Protein-1/genetics , Polymorphism, Single Nucleotide , Aged , Case-Control Studies , Cell Line, Tumor , Data Interpretation, Statistical , Female , Follow-Up Studies , Genetic Predisposition to Disease , Genome-Wide Association Study , Homozygote , Humans , Male , Odds Ratio , Organ Specificity , Risk Factors , Sterol Regulatory Element Binding Protein 1/geneticsABSTRACT
Genetically determined disorders are highly heterogenous in clinical presentation and underlying molecular mechanism. The evidence underpinning these conditions in the peer-reviewed literature requires robust critical evaluation for diagnostic use. Here, we present a structured curation process for Gene2Phenotype (G2P). This draws on multiple lines of clinical, bioinformatic and functional evidence. The process utilises and extends existing terminologies, allows for precise definition of the molecular basis of disease, and confidence levels to be attributed to a given gene-disease assertion. In-depth disease curation using this process will prove useful in applications including in diagnostics, research and development of targeted therapeutics. G2P: www.ebi.ac.uk/gene2phenotype .
Subject(s)
Phenotype , Humans , Genomics/methods , Genetic Predisposition to Disease , Computational Biology/methods , Databases, Genetic , Genetic Association Studies/methods , Genetic Diseases, Inborn/geneticsABSTRACT
The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.