RESUMO
Genomic data from millions of individuals have been generated worldwide to drive discovery and clinical impact in precision medicine. Lowering the barriers to using these data collectively is needed to equitably realize the benefits of the diversity and scale of population data. We examine the current landscape of global genomic data sharing, including the evolution of data sharing models from data aggregation through to data visiting, and for certain use cases, cross-cohort analysis using federated approaches across multiple environments. We highlight emerging examples of best practice relating to participant, patient and community engagement; evolution of technical standards, tools and infrastructure; and impact of research and health-care policy. We outline 12 actions we can all take together to scale up efforts to enable safe global data sharing and move beyond projects demonstrating feasibility to routinely cross-analysing research and clinical data sets, optimizing benefit.
RESUMO
PURPOSE: As part of the 100,000 Genomes Project, we set out to assess the potential viability and clinical impact of reporting genetic variants associated with drug-induced toxicity for patients with cancer recruited for whole-genome sequencing (WGS) as part of a genomic medicine service. METHODS: Germline WGS from 76,805 participants was analyzed for pharmacogenetic (PGx) variants in four genes (DPYD, NUDT15, TPMT, UGT1A1) associated with toxicity induced by five drugs used in cancer treatment (capecitabine, fluorouracil, mercaptopurine, thioguanine, irinotecan). Linking genomic data with prescribing and hospital incidence records, a phenome-wide association study (PheWAS) was performed to identify whether phenotypes indicative of adverse drug reactions (ADRs) were enriched in drug-exposed individuals with the relevant PGx variants. In a subset of 7,081 patients with cancer, DPYD variants were reported back to clinicians and outcomes were collected. RESULTS: We identified clinically relevant PGx variants across the four genes in 62.7% of participants in our cohort. Extending this to annual prescription numbers in England for the drugs affected by these PGx variants, approximately 14,540 patients per year could potentially benefit from a reduced dose or alternative drug to reduce the risk of ADRs. Validating PGx associations in a real-world data set, we found a significant association between PGx variants in DPYD and toxicity-related phenotypes in patients treated with capecitabine or fluorouracil. Reported DPYD variants were deemed informative for clinical decision making in a majority of cases. CONCLUSION: Reporting PGx variants from germline WGS relevant to patients with cancer alongside primary findings related to their cancer can be clinically informative, informing prescribing to reduce the risk of ADRs. Extending the range of actionable variants to those found in patients of non-European ancestry is important and will extend the potential clinical impact.
RESUMO
The Cancer Programme of the 100,000 Genomes Project was an initiative to provide whole-genome sequencing (WGS) for patients with cancer, evaluating opportunities for precision cancer care within the UK National Healthcare System (NHS). Genomics England, alongside NHS England, analyzed WGS data from 13,880 solid tumors spanning 33 cancer types, integrating genomic data with real-world treatment and outcome data, within a secure Research Environment. Incidence of somatic mutations in genes recommended for standard-of-care testing varied across cancer types. For instance, in glioblastoma multiforme, small variants were present in 94% of cases and copy number aberrations in at least one gene in 58% of cases, while sarcoma demonstrated the highest occurrence of actionable structural variants (13%). Homologous recombination deficiency was identified in 40% of high-grade serous ovarian cancer cases with 30% linked to pathogenic germline variants, highlighting the value of combined somatic and germline analysis. The linkage of WGS and longitudinal life course clinical data allowed the assessment of treatment outcomes for patients stratified according to pangenomic markers. Our findings demonstrate the utility of linking genomic and real-world clinical data to enable survival analysis to identify cancer genes that affect prognosis and advance our understanding of how cancer genomics impacts patient outcomes.
Assuntos
Glioblastoma , Medicina de Precisão , Humanos , Genômica , Oncogenes , Mutação em Linhagem Germinativa/genéticaRESUMO
Newborn screening for treatable disorders is one of the great public health success stories of the twentieth century worldwide. This commentary examines the potential use of a new technology, next generation sequencing, in newborn screening through the lens of the Wilson and Jungner criteria. Each of the ten criteria are examined to show how they might be applied by programmes using genomic sequencing as a screening tool. While there are obvious advantages to a method that can examine all disease-causing genes in a single assay at an ever-diminishing cost, implementation of genomic sequencing at scale presents numerous challenges, some which are intrinsic to screening for rare disease and some specifically linked to genomics-led screening. In addition to questions specific to routine screening considerations, the ethical, communication, data management, legal, and social implications of genomic screening programmes require consideration.
RESUMO
Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2-4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes-including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)-in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease.
Assuntos
COVID-19 , Estado Terminal , Genoma Humano , Interações Hospedeiro-Patógeno , Sequenciamento Completo do Genoma , Transportadores de Cassetes de Ligação de ATP , COVID-19/genética , COVID-19/mortalidade , COVID-19/patologia , COVID-19/virologia , Moléculas de Adesão Celular , Cuidados Críticos , Estado Terminal/mortalidade , Selectina E , Fator VIII , Fucosiltransferases , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Interações Hospedeiro-Patógeno/genética , Humanos , Subunidade beta de Receptor de Interleucina-10 , Lectinas Tipo C , Mucina-1 , Proteínas do Tecido Nervoso , Proteínas de Transferência de Fosfolipídeos , Receptores de Superfície Celular , Proteínas Repressoras , SARS-CoV-2/patogenicidade , Galactosídeo 2-alfa-L-FucosiltransferaseRESUMO
BACKGROUND: Repeat expansion disorders affect about 1 in 3000 individuals and are clinically heterogeneous diseases caused by expansions of short tandem DNA repeats. Genetic testing is often locus-specific, resulting in underdiagnosis of people who have atypical clinical presentations, especially in paediatric patients without a previous positive family history. Whole genome sequencing is increasingly used as a first-line test for other rare genetic disorders, and we aimed to assess its performance in the diagnosis of patients with neurological repeat expansion disorders. METHODS: We retrospectively assessed the diagnostic accuracy of whole genome sequencing to detect the most common repeat expansion loci associated with neurological outcomes (AR, ATN1, ATXN1, ATXN2, ATXN3, ATXN7, C9orf72, CACNA1A, DMPK, FMR1, FXN, HTT, and TBP) using samples obtained within the National Health Service in England from patients who were suspected of having neurological disorders; previous PCR test results were used as the reference standard. The clinical accuracy of whole genome sequencing to detect repeat expansions was prospectively examined in previously genetically tested and undiagnosed patients recruited in 2013-17 to the 100â000 Genomes Project in the UK, who were suspected of having a genetic neurological disorder (familial or early-onset forms of ataxia, neuropathy, spastic paraplegia, dementia, motor neuron disease, parkinsonian movement disorders, intellectual disability, or neuromuscular disorders). If a repeat expansion call was made using whole genome sequencing, PCR was used to confirm the result. FINDINGS: The diagnostic accuracy of whole genome sequencing to detect repeat expansions was evaluated against 793 PCR tests previously performed within the NHS from 404 patients. Whole genome sequencing correctly classified 215 of 221 expanded alleles and 1316 of 1321 non-expanded alleles, showing 97·3% sensitivity (95% CI 94·2-99·0) and 99·6% specificity (99·1-99·9) across the 13 disease-associated loci when compared with PCR test results. In samples from 11â631 patients in the 100â000 Genomes Project, whole genome sequencing identified 81 repeat expansions, which were also tested by PCR: 68 were confirmed as repeat expansions in the full pathogenic range, 11 were non-pathogenic intermediate expansions or premutations, and two were non-expanded repeats (16% false discovery rate). INTERPRETATION: In our study, whole genome sequencing for the detection of repeat expansions showed high sensitivity and specificity, and it led to identification of neurological repeat expansion disorders in previously undiagnosed patients. These findings support implementation of whole genome sequencing in clinical laboratories for diagnosis of patients who have a neurological presentation consistent with a repeat expansion disorder. FUNDING: Medical Research Council, Department of Health and Social Care, National Health Service England, National Institute for Health Research, and Illumina.
Assuntos
Expansão das Repetições de DNA , Medicina Estatal , Criança , Proteína do X Frágil da Deficiência Intelectual/genética , Humanos , Estudos Prospectivos , Estudos Retrospectivos , Reino Unido , Sequenciamento Completo do Genoma/métodosRESUMO
Clinical validity assessments of gene-disease associations underpin analysis and reporting in diagnostic genomics, and yet wide variability exists in practice, particularly in use of these assessments for virtual gene panel design and maintenance. Harmonization efforts are hampered by the lack of agreed terminology, agreed gene curation standards, and platforms that can be used to identify and resolve discrepancies at scale. We undertook a systematic comparison of the content of 80 virtual gene panels used in two healthcare systems by multiple diagnostic providers in the United Kingdom and Australia. The process was enabled by a shared curation platform, PanelApp, and resulted in the identification and review of 2,144 discordant gene ratings, demonstrating the utility of sharing structured gene-disease validity assessments and collaborative discordance resolution in establishing national and international consensus.
Assuntos
Consenso , Curadoria de Dados/normas , Doenças Genéticas Inatas/genética , Genômica/normas , Anotação de Sequência Molecular/normas , Austrália , Biomarcadores/metabolismo , Curadoria de Dados/métodos , Atenção à Saúde , Expressão Gênica , Ontologia Genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Genômica/métodos , Humanos , Aplicativos Móveis/provisão & distribuição , Terminologia como Assunto , Reino UnidoRESUMO
Each year, blood transfusions save millions of lives. However, under current blood-matching practices, sensitization to non-self-antigens is an unavoidable adverse side effect of transfusion. We describe a universal donor typing platform that could be adopted by blood services worldwide to facilitate a universal extended blood-matching policy and reduce sensitization rates. This DNA-based test is capable of simultaneously typing most clinically relevant red blood cell (RBC), human platelet (HPA), and human leukocyte (HLA) antigens. Validation was performed, using samples from 7927 European, 27 South Asian, 21 East Asian, and 9 African blood donors enrolled in 2 national biobanks. We illustrated the usefulness of the platform by analyzing antibody data from patients sensitized with multiple RBC alloantibodies. Genotyping results demonstrated concordance of 99.91%, 99.97%, and 99.03% with RBC, HPA, and HLA clinically validated typing results in 89 371, 3016, and 9289 comparisons, respectively. Genotyping increased the total number of antigen typing results available from 110 980 to >1 200 000. Dense donor typing allowed identification of 2 to 6 times more compatible donors to serve 3146 patients with multiple RBC alloantibodies, providing at least 1 match for 176 individuals for whom previously no blood could be found among the same donors. This genotyping technology is already being used to type thousands of donors taking part in national genotyping studies. Extraction of dense antigen-typing data from these cohorts provides blood supply organizations with the opportunity to implement a policy of genomics-based precision matching of blood.
Assuntos
Doadores de Sangue , Transfusão de Sangue , Genótipo , Humanos , Isoanticorpos , Estudos ProspectivosRESUMO
Most patients with rare diseases do not receive a molecular diagnosis and the aetiological variants and causative genes for more than half such disorders remain to be discovered1. Here we used whole-genome sequencing (WGS) in a national health system to streamline diagnosis and to discover unknown aetiological variants in the coding and non-coding regions of the genome. We generated WGS data for 13,037 participants, of whom 9,802 had a rare disease, and provided a genetic diagnosis to 1,138 of the 7,065 extensively phenotyped participants. We identified 95 Mendelian associations between genes and rare diseases, of which 11 have been discovered since 2015 and at least 79 are confirmed to be aetiological. By generating WGS data of UK Biobank participants2, we found that rare alleles can explain the presence of some individuals in the tails of a quantitative trait for red blood cells. Finally, we identified four novel non-coding variants that cause disease through the disruption of transcription of ARPC1B, GATA1, LRBA and MPL. Our study demonstrates a synergy by using WGS for diagnosis and aetiological discovery in routine healthcare.
Assuntos
Internacionalidade , Programas Nacionais de Saúde , Doenças Raras/diagnóstico , Doenças Raras/genética , Sequenciamento Completo do Genoma , Complexo 2-3 de Proteínas Relacionadas à Actina/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , Alelos , Bases de Dados Factuais , Eritrócitos/metabolismo , Fator de Transcrição GATA1/genética , Humanos , Fenótipo , Locos de Características Quantitativas , Receptores de Trombopoetina/genética , Medicina Estatal , Reino UnidoRESUMO
Approximately 2.4% of the human mitochondrial DNA (mtDNA) genome exhibits common homoplasmic genetic variation. We analyzed 12,975 whole-genome sequences to show that 45.1% of individuals from 1526 mother-offspring pairs harbor a mixed population of mtDNA (heteroplasmy), but the propensity for maternal transmission differs across the mitochondrial genome. Over one generation, we observed selection both for and against variants in specific genomic regions; known variants were more likely to be transmitted than previously unknown variants. However, new heteroplasmies were more likely to match the nuclear genetic ancestry as opposed to the ancestry of the mitochondrial genome on which the mutations occurred, validating our findings in 40,325 individuals. Thus, human mtDNA at the population level is shaped by selective forces within the female germ line under nuclear genetic control, which ensures consistency between the two independent genetic lineages.
Assuntos
DNA Mitocondrial/genética , Genoma Mitocondrial , Herança Materna , Óvulo/crescimento & desenvolvimento , Seleção Genética , Feminino , Variação Genética , HumanosRESUMO
BACKGROUND: Biological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBase was created as an integrative solution. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources. Access to this information is done through a RESTful web service API, which provides an efficient interface to the data. RESULTS: In this work we present PyCellBase, a Python package that provides programmatic access to the rich RESTful web service API offered by CellBase. This package offers a fast and user-friendly access to biological information without the need of installing any local database. In addition, a series of command-line tools are provided to perform common bioinformatic tasks, such as variant annotation. CellBase data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance. CONCLUSION: PyCellBase is an open-source Python package that provides an efficient access to heterogeneous biological information. It allows to perform tasks that require a comprehensive set of knowledge resources, as for example variant annotation. Queries can be easily fine-tuned to retrieve the desired information of particular biological features. PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications and pipelines.
Assuntos
Bases de Dados Factuais , Software , Biologia Computacional , Interface Usuário-ComputadorRESUMO
Retinal dystrophies are a heterogeneous group of disorders of visual function leading to partial or complete blindness. We report the genetic basis of an unusual retinal dystrophy in five families with affected females and no affected males. Heterozygous missense variants were identified in the X-linked phosphoribosyl pyrophosphate synthetase 1 (PRPS1) gene: c.47C > T, p.(Ser16Phe); c.586C > T, p.(Arg196Trp); c.641G > C, p.(Arg214Pro); and c.640C > T, p.(Arg214Trp). Missense variants in PRPS1 are usually associated with disease in male patients, including Arts syndrome, Charcot-Marie-Tooth, and nonsyndromic sensorineural deafness. In our study families, affected females manifested a retinal dystrophy with interocular asymmetry. Three unrelated females from these families had hearing loss leading to a diagnosis of Usher syndrome. Other neurological manifestations were also observed in three individuals. Our data highlight the unexpected X-linked inheritance of retinal degeneration in females caused by variants in PRPS1 and suggest that tissue-specific skewed X-inactivation or variable levels of pyrophosphate synthetase-1 deficiency are the underlying mechanism(s). We speculate that the absence of affected males in the study families suggests that some variants may be male embryonic lethal when inherited in the hemizygous state. The unbiased nature of next-generation sequencing enables all possible modes of inheritance to be considered for association of gene variants with novel phenotypic presentation.
Assuntos
Genes Ligados ao Cromossomo X , Mutação de Sentido Incorreto , Degeneração Retiniana/diagnóstico , Degeneração Retiniana/genética , Ribose-Fosfato Pirofosfoquinase/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Sequência de Aminoácidos , Substituição de Aminoácidos , Feminino , Estudos de Associação Genética , Genótipo , Humanos , Modelos Moleculares , Linhagem , Fenótipo , Conformação Proteica , Ribose-Fosfato Pirofosfoquinase/química , Adulto JovemRESUMO
A recurrent de novo missense variant within the C-terminal Sin3-like domain of ZSWIM6 was previously reported to cause acromelic frontonasal dysostosis (AFND), an autosomal-dominant severe frontonasal and limb malformation syndrome, associated with neurocognitive and motor delay, via a proposed gain-of-function effect. We present detailed phenotypic information on seven unrelated individuals with a recurrent de novo nonsense variant (c.2737C>T [p.Arg913Ter]) in the penultimate exon of ZSWIM6 who have severe-profound intellectual disability and additional central and peripheral nervous system symptoms but an absence of frontonasal or limb malformations. We show that the c.2737C>T variant does not trigger nonsense-mediated decay of the ZSWIM6 mRNA in affected individual-derived cells. This finding supports the existence of a truncated ZSWIM6 protein lacking the Sin3-like domain, which could have a dominant-negative effect. This study builds support for a key role for ZSWIM6 in neuronal development and function, in addition to its putative roles in limb and craniofacial development, and provides a striking example of different variants in the same gene leading to distinct phenotypes.
Assuntos
Proteínas de Ligação a DNA/genética , Deficiência Intelectual/genética , Transtornos Neurocognitivos/genética , Sistema Nervoso Central/anormalidades , Sistema Nervoso Central/embriologia , Códon sem Sentido/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Deformidades Congênitas dos Membros/genética , Disostose Mandibulofacial/genética , Sistema Nervoso Periférico/anormalidades , Sistema Nervoso Periférico/enzimologiaRESUMO
Linking non-coding genetic variants associated with the risk of diseases or disease-relevant traits to target genes is a crucial step to realize GWAS potential in the introduction of precision medicine. Here we set out to determine the mechanisms underpinning variant association with platelet quantitative traits using cell type-matched epigenomic data and promoter long-range interactions. We identify potential regulatory functions for 423 of 565 (75%) non-coding variants associated with platelet traits and we demonstrate, through ex vivo and proof of principle genome editing validation, that variants in super enhancers play an important role in controlling archetypical platelet functions.
Assuntos
Plaquetas/fisiologia , Elementos Facilitadores Genéticos , Eritroblastos/química , Variação Genética , Megacariócitos/química , Cromatina , Humanos , Regiões Promotoras GenéticasRESUMO
High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic data for key reference projects in a clean, fast and integrated fashion. HGVA provides an efficient and intuitive web-interface for easy data mining, a comprehensive RESTful API and client libraries in Python, Java and JavaScript for fast programmatic access to its knowledge base. HGVA calculates population frequencies for these projects and enriches their data with variant annotation provided by CellBase, a rich and fast annotation solution. HGVA serves as a proof-of-concept of the genome analysis developments being carried out by the University of Cambridge together with UK's 100 000 genomes project and the National Institute for Health Research BioResource Rare-Diseases, in particular, deploying open-source for Computational Biology (OpenCB) software platform for storing and analyzing massive genomic datasets.
Assuntos
Variação Genética , Genoma Humano , Software , Humanos , Internet , Interface Usuário-ComputadorRESUMO
The Src family kinase (SFK) member SRC is a major target in drug development because it is activated in many human cancers, yet deleterious SRC germline mutations have not been reported. We used genome sequencing and Human Phenotype Ontology patient coding to identify a gain-of-function mutation in SRC causing thrombocytopenia, myelofibrosis, bleeding, and bone pathologies in nine cases. Modeling of the E527K substitution predicts loss of SRC's self-inhibitory capacity, which we confirmed with in vitro studies showing increased SRC kinase activity and enhanced Tyr(419) phosphorylation in COS-7 cells overexpressing E527K SRC. The active form of SRC predominates in patients' platelets, resulting in enhanced overall tyrosine phosphorylation. Patients with myelofibrosis have hypercellular bone marrow with trilineage dysplasia, and their stem cells grown in vitro form more myeloid and megakaryocyte (MK) colonies than control cells. These MKs generate platelets that are dysmorphic, low in number, highly variable in size, and have a paucity of α-granules. Overactive SRC in patient-derived MKs causes a reduction in proplatelet formation, which can be rescued by SRC kinase inhibition. Stem cells transduced with lentiviral E527K SRC form MKs with a similar defect and enhanced tyrosine phosphorylation levels. Patient-derived and E527K-transduced MKs show Y419 SRC-positive stained podosomes that induce altered actin organization. Expression of mutated src in zebrafish recapitulates patients' blood and bone phenotypes. Similar studies of platelets and MKs may reveal the mechanism underlying the severe bleeding frequently observed in cancer patients treated with next-generation SFK inhibitors.
Assuntos
Osso e Ossos/patologia , Hemorragia/genética , Mutação/genética , Mielofibrose Primária/genética , Trombocitopenia/genética , Quinases da Família src/genética , Animais , Plaquetas/patologia , Células COS , Chlorocebus aethiops , Feminino , Hematopoese , Hemorragia/complicações , Humanos , Masculino , Linhagem , Fenótipo , Mielofibrose Primária/complicações , Trombocitopenia/complicações , Transfecção , Peixe-ZebraRESUMO
Macrothrombocytopenia (MTP) is a heterogeneous group of disorders characterized by enlarged and reduced numbers of circulating platelets, sometimes resulting in abnormal bleeding. In most MTP, this phenotype arises because of altered regulation of platelet formation from megakaryocytes (MKs). We report the identification of DIAPH1, which encodes the Rho-effector diaphanous-related formin 1 (DIAPH1), as a candidate gene for MTP using exome sequencing, ontological phenotyping, and similarity regression. We describe 2 unrelated pedigrees with MTP and sensorineural hearing loss that segregate with a DIAPH1 R1213* variant predicting partial truncation of the DIAPH1 diaphanous autoregulatory domain. The R1213* variant was linked to reduced proplatelet formation from cultured MKs, cell clustering, and abnormal cortical filamentous actin. Similarly, in platelets, there was increased filamentous actin and stable microtubules, indicating constitutive activation of DIAPH1. Overexpression of DIAPH1 R1213* in cells reproduced the cytoskeletal alterations found in platelets. Our description of a novel disorder of platelet formation and hearing loss extends the repertoire of DIAPH1-related disease and provides new insight into the autoregulation of DIAPH1 activity.