RESUMO
Decades of genetic association testing in human cohorts have provided important insights into the genetic architecture and biological underpinnings of complex traits and diseases. However, for certain traits, genome-wide association studies (GWAS) for common SNPs are approaching signal saturation, which underscores the need to explore other types of genetic variation to understand the genetic basis of traits and diseases. Copy number variation (CNV) is an important source of heritability that is well known to functionally affect human traits. Recent technological and computational advances enable the large-scale, genome-wide evaluation of CNVs, with implications for downstream applications such as polygenic risk scoring and drug target identification. Here, we review the current state of CNV-GWAS, discuss current limitations in resource infrastructure that need to be overcome to enable the wider uptake of CNV-GWAS results, highlight emerging opportunities and suggest guidelines and standards for future GWAS for genetic variation beyond SNPs at scale.
RESUMO
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.
Assuntos
Academias e Institutos , Biologia Computacional , Biologia Computacional/organização & administração , Biologia Computacional/tendências , Academias e Institutos/organização & administração , Academias e Institutos/tendências , Bases de Dados de Ácidos Nucleicos , Europa (Continente)RESUMO
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.
Assuntos
Inteligência Artificial , Biologia Computacional , Gerenciamento de Dados , Bases de Dados Factuais , Genoma , InternetRESUMO
The Open Targets Platform (https://platform.opentargets.org/) is an open source resource to systematically assist drug target identification and prioritisation using publicly available data. Since our last update, we have reimagined, redesigned, and rebuilt the Platform in order to streamline data integration and harmonisation, expand the ways in which users can explore the data, and improve the user experience. The gene-disease causal evidence has been enhanced and expanded to better capture disease causality across rare, common, and somatic diseases. For target and drug annotations, we have incorporated new features that help assess target safety and tractability, including genetic constraint, PROTACtability assessments, and AlphaFold structure predictions. We have also introduced new machine learning applications for knowledge extraction from the published literature, clinical trial information, and drug labels. The new technologies and frameworks introduced since the last update will ease the introduction of new features and the creation of separate instances of the Platform adapted to user requirements. Our new Community forum, expanded training materials, and outreach programme support our users in a range of use cases.
RESUMO
Clinical validity assessments of gene-disease associations underpin analysis and reporting in diagnostic genomics, and yet wide variability exists in practice, particularly in use of these assessments for virtual gene panel design and maintenance. Harmonization efforts are hampered by the lack of agreed terminology, agreed gene curation standards, and platforms that can be used to identify and resolve discrepancies at scale. We undertook a systematic comparison of the content of 80 virtual gene panels used in two healthcare systems by multiple diagnostic providers in the United Kingdom and Australia. The process was enabled by a shared curation platform, PanelApp, and resulted in the identification and review of 2,144 discordant gene ratings, demonstrating the utility of sharing structured gene-disease validity assessments and collaborative discordance resolution in establishing national and international consensus.
Assuntos
Consenso , Curadoria de Dados/normas , Doenças Genéticas Inatas/genética , Genômica/normas , Anotação de Sequência Molecular/normas , Austrália , Biomarcadores/metabolismo , Curadoria de Dados/métodos , Atenção à Saúde , Expressão Gênica , Ontologia Genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Genômica/métodos , Humanos , Aplicativos Móveis/provisão & distribuição , Terminologia como Assunto , Reino UnidoRESUMO
BACKGROUND: The U.K. 100,000 Genomes Project is in the process of investigating the role of genome sequencing in patients with undiagnosed rare diseases after usual care and the alignment of this research with health care implementation in the U.K. National Health Service. Other parts of this project focus on patients with cancer and infection. METHODS: We conducted a pilot study involving 4660 participants from 2183 families, among whom 161 disorders covering a broad spectrum of rare diseases were present. We collected data on clinical features with the use of Human Phenotype Ontology terms, undertook genome sequencing, applied automated variant prioritization on the basis of applied virtual gene panels and phenotypes, and identified novel pathogenic variants through research analysis. RESULTS: Diagnostic yields varied among family structures and were highest in family trios (both parents and a proband) and families with larger pedigrees. Diagnostic yields were much higher for disorders likely to have a monogenic cause (35%) than for disorders likely to have a complex cause (11%). Diagnostic yields for intellectual disability, hearing disorders, and vision disorders ranged from 40 to 55%. We made genetic diagnoses in 25% of the probands. A total of 14% of the diagnoses were made by means of the combination of research and automated approaches, which was critical for cases in which we found etiologic noncoding, structural, and mitochondrial genome variants and coding variants poorly covered by exome sequencing. Cohortwide burden testing across 57,000 genomes enabled the discovery of three new disease genes and 19 new associations. Of the genetic diagnoses that we made, 25% had immediate ramifications for clinical decision making for the patients or their relatives. CONCLUSIONS: Our pilot study of genome sequencing in a national health care system showed an increase in diagnostic yield across a range of rare diseases. (Funded by the National Institute for Health Research and others.).
Assuntos
Genoma Humano , Doenças Raras/genética , Adolescente , Adulto , Criança , Pré-Escolar , Características da Família , Feminino , Variação Genética , Humanos , Masculino , Pessoa de Meia-Idade , Projetos Piloto , Reação em Cadeia da Polimerase , Doenças Raras/diagnóstico , Sensibilidade e Especificidade , Medicina Estatal , Reino Unido , Sequenciamento Completo do Genoma , Adulto JovemRESUMO
PURPOSE: The terminology used for gene-disease curation and variant annotation to describe inheritance, allelic requirement, and both sequence and functional consequences of a variant is currently not standardized. There is considerable discrepancy in the literature and across clinical variant reporting in the derivation and application of terms. Here, we standardize the terminology for the characterization of disease-gene relationships to facilitate harmonized global curation and to support variant classification within the ACMG/AMP framework. METHODS: Terminology for inheritance, allelic requirement, and both structural and functional consequences of a variant used by Gene Curation Coalition members and partner organizations was collated and reviewed. Harmonized terminology with definitions and use examples was created, reviewed, and validated. RESULTS: We present a standardized terminology to describe gene-disease relationships, and to support variant annotation. We demonstrate application of the terminology for classification of variation in the ACMG SF 2.0 genes recommended for reporting of secondary findings. Consensus terms were agreed and formalized in both Sequence Ontology (SO) and Human Phenotype Ontology (HPO) ontologies. Gene Curation Coalition member groups intend to use or map to these terms in their respective resources. CONCLUSION: The terminology standardization presented here will improve harmonization, facilitate the pooling of curation datasets across international curation efforts and, in turn, improve consistency in variant classification and genetic test interpretation.
Assuntos
Testes Genéticos , Variação Genética , Humanos , Alelos , Bases de Dados GenéticasRESUMO
The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is publicly available and the underlying code is open source. Since our last update two years ago, we have had 10 releases to maintain and continuously improve evidence for target-disease relationships from 20 different data sources. In addition, we have integrated new evidence from key datasets, including prioritised targets identified from genome-wide CRISPR knockout screens in 300 cancer models (Project Score), and GWAS/UK BioBank statistical genetic analysis evidence from the Open Targets Genetics Portal. We have evolved our evidence scoring framework to improve target identification. To aid the prioritisation of targets and inform on the potential impact of modulating a given target, we have added evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety. We have also developed the user interface and backend technologies to improve performance and usability. In this article, we describe the latest enhancements to the Platform, to address the fundamental challenge that developing effective and safe drugs is difficult and expensive.
Assuntos
Antineoplásicos/uso terapêutico , Drogas em Investigação/uso terapêutico , Bases de Conhecimento , Terapia de Alvo Molecular/métodos , Neoplasias/tratamento farmacológico , Software , Antineoplásicos/química , Bases de Dados Factuais , Conjuntos de Dados como Assunto , Descoberta de Drogas/métodos , Drogas em Investigação/química , Humanos , Internet , Neoplasias/classificação , Neoplasias/genética , Neoplasias/patologiaRESUMO
Open Targets Genetics (https://genetics.opentargets.org) is an open-access integrative resource that aggregates human GWAS and functional genomics data including gene expression, protein abundance, chromatin interaction and conformation data from a wide range of cell types and tissues to make robust connections between GWAS-associated loci, variants and likely causal genes. This enables systematic identification and prioritisation of likely causal variants and genes across all published trait-associated loci. In this paper, we describe the public resources we aggregate, the technology and analyses we use, and the functionality that the portal offers. Open Targets Genetics can be searched by variant, gene or study/phenotype. It offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue. Data visualizations such as Manhattan-like plots, regional plots, credible sets overlap between studies and PheWAS plots enable users to explore GWAS signals in depth. The integrated data is made available through the web portal, for bulk download and via a GraphQL API, and the software is open source. Applications of this integrated data include identification of novel targets for drug discovery and drug repurposing.
Assuntos
Bases de Dados Genéticas , Genoma Humano , Doenças Inflamatórias Intestinais/genética , Terapia de Alvo Molecular/métodos , Locos de Características Quantitativas , Software , Cromatina/química , Cromatina/metabolismo , Conjuntos de Dados como Assunto , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Doenças Inflamatórias Intestinais/tratamento farmacológico , Doenças Inflamatórias Intestinais/metabolismo , Doenças Inflamatórias Intestinais/patologia , Internet , Fenótipo , Característica Quantitativa HerdávelRESUMO
PURPOSE: Several groups and resources provide information that pertains to the validity of gene-disease relationships used in genomic medicine and research; however, universal standards and terminologies to define the evidence base for the role of a gene in disease and a single harmonized resource were lacking. To tackle this issue, the Gene Curation Coalition (GenCC) was formed. METHODS: The GenCC drafted harmonized definitions for differing levels of gene-disease validity on the basis of existing resources, and performed a modified Delphi survey with 3 rounds to narrow the list of terms. The GenCC also developed a unified database to display curated gene-disease validity assertions from its members. RESULTS: On the basis of 241 survey responses from the genetics community, a consensus term set was chosen for grading gene-disease validity and database submissions. As of December 2021, the database contained 15,241 gene-disease assertions on 4569 unique genes from 12 submitters. When comparing submissions to the database from distinct sources, conflicts in assertions of gene-disease validity ranged from 5.3% to 13.4%. CONCLUSION: Terminology standardization, sharing of gene-disease validity classifications, and resolution of curation conflicts will facilitate collaborations across international curation efforts and in turn, improve consistency in genetic testing and variant interpretation.
Assuntos
Bases de Dados Genéticas , Genômica , Testes Genéticos , Variação Genética , HumanosRESUMO
Although over 150 unique mutations affecting the coding sequence of CHM have been identified in patients with the X-linked chorioretinal disease choroideremia (CHM), no regulatory mutations have been reported, and indeed the promoter has not been defined. Here, we describe two independent families affected by CHM bearing a mutation outside the gene's coding region at position c.-98: C>A and C>T, which segregated with the disease. The male proband of family 1 was found to lack CHM mRNA and its gene product Rab escort protein 1, whereas whole-genome sequencing of an affected male in family 2 excluded the involvement of any other known retinal genes. Both mutations abrogated luciferase activity when inserted into a reporter construct, and by further employing the luciferase reporter system to assay sequences 5' to the gene, we identified the CHM promoter as the region encompassing nucleotides c.-119 to c.-76. These findings suggest that the CHM promoter region should be examined in patients with CHM who lack coding sequence mutations, and reveals, for the first time, features of the gene's regulation.
Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Coroideremia/genética , Doenças Genéticas Ligadas ao Cromossomo X , Degeneração Retiniana/genética , Coroideremia/complicações , Coroideremia/patologia , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Mutação , Linhagem , Regiões Promotoras Genéticas/genética , Retina/metabolismo , Retina/patologia , Degeneração Retiniana/complicações , Degeneração Retiniana/patologiaRESUMO
Open Targets, a consortium among academic and industry partners, focuses on using human genetics and genomics to provide insights to key questions that build therapeutic hypotheses. Large-scale experiments generate foundational data, and open-source informatic platforms systematically integrate evidence for target-disease relationships and provide dynamic tooling for target prioritization. A locus-to-gene machine learning model uses evidence from genome-wide association studies (GWAS Catalog, UK BioBank, and FinnGen), functional genomic studies, epigenetic studies, and variant effect prediction to predict potential drug targets for complex diseases. These predictions are combined with genetic evidence from gene burden analyses, rare disease genetics, somatic mutations, perturbation assays, pathway analyses, scientific literature, differential expression, and mouse models to systematically build target-disease associations (https://platform.opentargets.org). Scored target attributes such as clinical precedence, tractability, and safety guide target prioritization. Here we provide our perspective on the value and impact of human genetics and genomics for generating therapeutic hypotheses.
Assuntos
Genômica , Humanos , Genômica/métodos , Estudo de Associação Genômica Ampla , Genética Humana , Animais , Aprendizado de Máquina , Terapia de Alvo MolecularRESUMO
PURPOSE: As part of the 100,000 Genomes Project, we set out to assess the potential viability and clinical impact of reporting genetic variants associated with drug-induced toxicity for patients with cancer recruited for whole-genome sequencing (WGS) as part of a genomic medicine service. METHODS: Germline WGS from 76,805 participants was analyzed for pharmacogenetic (PGx) variants in four genes (DPYD, NUDT15, TPMT, UGT1A1) associated with toxicity induced by five drugs used in cancer treatment (capecitabine, fluorouracil, mercaptopurine, thioguanine, irinotecan). Linking genomic data with prescribing and hospital incidence records, a phenome-wide association study (PheWAS) was performed to identify whether phenotypes indicative of adverse drug reactions (ADRs) were enriched in drug-exposed individuals with the relevant PGx variants. In a subset of 7,081 patients with cancer, DPYD variants were reported back to clinicians and outcomes were collected. RESULTS: We identified clinically relevant PGx variants across the four genes in 62.7% of participants in our cohort. Extending this to annual prescription numbers in England for the drugs affected by these PGx variants, approximately 14,540 patients per year could potentially benefit from a reduced dose or alternative drug to reduce the risk of ADRs. Validating PGx associations in a real-world data set, we found a significant association between PGx variants in DPYD and toxicity-related phenotypes in patients treated with capecitabine or fluorouracil. Reported DPYD variants were deemed informative for clinical decision making in a majority of cases. CONCLUSION: Reporting PGx variants from germline WGS relevant to patients with cancer alongside primary findings related to their cancer can be clinically informative, informing prescribing to reduce the risk of ADRs. Extending the range of actionable variants to those found in patients of non-European ancestry is important and will extend the potential clinical impact.
RESUMO
The drug-metabolizing enzyme thiopurine methyltransferase (TPMT) has become one of the best examples of pharmacogenomics to be translated into routine clinical practice. TPMT metabolizes the thiopurines 6-mercaptopurine, 6-thioguanine, and azathioprine, drugs that are widely used for treatment of acute leukemias, inflammatory bowel diseases, and other disorders of immune regulation. Since the discovery of genetic polymorphisms in the TPMT gene, many sequence variants that cause a decreased enzyme activity have been identified and characterized. Increasingly, to optimize dose, pretreatment determination of TPMT status before commencing thiopurine therapy is now routine in many countries. Novel TPMT sequence variants are currently numbered sequentially using PubMed as a source of information; however, this has caused some problems as exemplified by two instances in which authors' articles appeared on PubMed at the same time, resulting in the same allele numbers given to different polymorphisms. Hence, there is an urgent need to establish an order and consensus to the numbering of known and novel TPMT sequence variants. To address this problem, a TPMT nomenclature committee was formed in 2010, to define the nomenclature and numbering of novel variants for the TPMT gene. A website (http://www.imh.liu.se/tpmtalleles) serves as a platform for this work. Researchers are encouraged to submit novel TPMT alleles to the committee for designation and reservation of unique allele numbers. The committee has decided to renumber two alleles: nucleotide position 106 (G>A) from TPMT*24 to TPMT*30 and position 611 (T>C, rs79901429) from TPMT*28 to TPMT*31. Nomenclature for all other known alleles remains unchanged.
Assuntos
Doenças Inflamatórias Intestinais/enzimologia , Metiltransferases/classificação , Metiltransferases/genética , Polimorfismo Genético , Alelos , Azatioprina/metabolismo , Genótipo , Humanos , Mercaptopurina/metabolismo , Metiltransferases/metabolismo , Farmacogenética , Tioguanina/metabolismoRESUMO
PURPOSE: The terminology used for gene-disease curation and variant annotation to describe inheritance, allelic requirement, and both sequence and functional consequences of a variant is currently not standardized. There is considerable discrepancy in the literature and across clinical variant reporting in the derivation and application of terms. Here we standardize the terminology for the characterization of disease-gene relationships to facilitate harmonized global curation, and to support variant classification within the ACMG/AMP framework. METHODS: Terminology for inheritance, allelic requirement, and both structural and functional consequences of a variant used by Gene Curation Coalition (GenCC) members and partner organizations was collated and reviewed. Harmonized terminology with definitions and use examples was created, reviewed, and validated. RESULTS: We present a standardized terminology to describe gene-disease relationships, and to support variant annotation. We demonstrate application of the terminology for classification of variation in the ACMG SF 2.0 genes recommended for reporting of secondary findings. Consensus terms were agreed and formalized in both sequence ontology (SO) and human phenotype ontology (HPO) ontologies. GenCC member groups intend to use or map to these terms in their respective resources. CONCLUSION: The terminology standardization presented here will improve harmonization, facilitate the pooling of curation datasets across international curation efforts and, in turn, improve consistency in variant classification and genetic test interpretation.
Assuntos
Aminofenóis/uso terapêutico , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Fibrose Cística/tratamento farmacológico , Quinolonas/uso terapêutico , Aminofenóis/farmacocinética , Fibrose Cística/genética , Fibrose Cística/patologia , Regulador de Condutância Transmembrana em Fibrose Cística/agonistas , Humanos , Quinolonas/farmacocinéticaRESUMO
Background: The virus SARS-CoV-2 can exploit biological vulnerabilities (e.g. host proteins) in susceptible hosts that predispose to the development of severe COVID-19. Methods: To identify host proteins that may contribute to the risk of severe COVID-19, we undertook proteome-wide genetic colocalisation tests, and polygenic (pan) and cis-Mendelian randomisation analyses leveraging publicly available protein and COVID-19 datasets. Results: Our analytic approach identified several known targets (e.g. ABO, OAS1), but also nominated new proteins such as soluble Fas (colocalisation probability >0.9, p=1 × 10-4), implicating Fas-mediated apoptosis as a potential target for COVID-19 risk. The polygenic (pan) and cis-Mendelian randomisation analyses showed consistent associations of genetically predicted ABO protein with several COVID-19 phenotypes. The ABO signal is highly pleiotropic, and a look-up of proteins associated with the ABO signal revealed that the strongest association was with soluble CD209. We demonstrated experimentally that CD209 directly interacts with the spike protein of SARS-CoV-2, suggesting a mechanism that could explain the ABO association with COVID-19. Conclusions: Our work provides a prioritised list of host targets potentially exploited by SARS-CoV-2 and is a precursor for further research on CD209 and FAS as therapeutically tractable targets for COVID-19. Funding: MAK, JSc, JH, AB, DO, MC, EMM, MG, ID were funded by Open Targets. J.Z. and T.R.G were funded by the UK Medical Research Council Integrative Epidemiology Unit (MC_UU_00011/4). JSh and GJW were funded by the Wellcome Trust Grant 206194. This research was funded in part by the Wellcome Trust [Grant 206194]. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Individuals who become infected with the virus that causes COVID-19 can experience a wide variety of symptoms. These can range from no symptoms or minor symptoms to severe illness and death. Key demographic factors, such as age, gender and race, are known to affect how susceptible an individual is to infection. However, molecular factors, such as unique gene mutations and gene expression levels can also have a major impact on patient responses by affecting the levels of proteins in the body. Proteins that are too abundant or too scarce may mean the difference between dying from or surviving COVID-19. Identifying the molecular factors in a host that affect how viruses can infect individuals, evade immune defences or trigger severe illness, could provide new ways to treat patients with COVID-19. Such factors are likely to remain constant, even when the virus mutates into new strains. Hence, insights would likely apply across all virus strains, including current strains, such as alpha and delta, and any new strains that may emerge in the future. Using such a 'natural experiment' approach, Karim et al. compared the genetic profiles of over 30,000 COVID-19 patients and a million healthy individuals. Nine proteins were found to have an impact on COVID-19 infection and disease severity. Four proteins were ranked as top priorities for potential treatment targets. One protein, called CD209 (also known as DC-SIGN), is involved in how the virus enters the host cells, and had one of the strongest associations with COVID-19. Two proteins, called IL-6R and FAS, were involved in the immune response and could be responsible for the immune over-activation often seen in severe COVID-19. Finally, one protein, called OAS1, formed part of the body's innate antiviral defence system and appeared to reduce susceptibility to COVID-19. Knowing more about the proteins that influence the severity of COVID-19 opens up new ways to predict, protect and treat patients who may have severe or fatal reactions to infection. Indeed, one of the identified proteins (IL-6R) had already been targeted in recent clinical trials with some encouraging results. Considering CD209 as a potential receptor for the virus could provide another avenue for therapeutics, similar to previously successful approaches to block the virus' known interaction with a receptor protein. Ultimately, this research could supply an entirely new set of treatment options to help combat the COVID-19 pandemic.