RESUMO
BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.
Assuntos
Variação Genética , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Doenças Raras/genética , Sequenciamento Completo do Genoma , Testes Genéticos , Mutação , Proteínas de Ciclo CelularRESUMO
The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public-private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics1.
Assuntos
Bancos de Espécimes Biológicos , Proteínas Sanguíneas , Bases de Dados Factuais , Genômica , Saúde , Proteoma , Proteômica , Humanos , Sistema ABO de Grupos Sanguíneos/genética , Proteínas Sanguíneas/análise , Proteínas Sanguíneas/genética , COVID-19/genética , Descoberta de Drogas , Epistasia Genética , Fucosiltransferases/metabolismo , Predisposição Genética para Doença , Plasma/química , Pró-Proteína Convertase 9/metabolismo , Proteoma/análise , Proteoma/genética , Parcerias Público-Privadas , Locos de Características Quantitativas , Reino Unido , Galactosídeo 2-alfa-L-FucosiltransferaseRESUMO
PURPOSE: The translation of genome sequencing into routine health care has been slow, partly because of concerns about affordability. The aspirational cost of sequencing a genome is $1000, but there is little evidence to support this estimate. We estimate the cost of using genome sequencing in routine clinical care in patients with cancer or rare diseases. METHODS: We performed a microcosting study of Illumina-based genome sequencing in a UK National Health Service laboratory processing 399 samples/year. Cost data were collected for all steps in the sequencing pathway, including bioinformatics analysis and reporting of results. Sensitivity analysis identified key cost drivers. RESULTS: Genome sequencing costs £6841 per cancer case (comprising matched tumor and germline samples) and £7050 per rare disease case (three samples). The consumables used during sequencing are the most expensive component of testing (68-72% of the total cost). Equipment costs are higher for rare disease cases, whereas consumable and staff costs are slightly higher for cancer cases. CONCLUSION: The cost of genome sequencing is underestimated if only sequencing costs are considered, and likely surpasses $1000/genome in a single laboratory. This aspirational sequencing cost will likely only be achieved if consumable costs are considerably reduced and sequencing is performed at scale.
Assuntos
Neoplasias/genética , Doenças Raras/genética , Sequenciamento Completo do Genoma/economia , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Neoplasias/economia , Doenças Raras/economia , Medicina Estatal , Pesquisa Translacional Biomédica , Reino Unido , Sequenciamento Completo do Genoma/instrumentaçãoRESUMO
Next-generation sequencing (NGS) efforts have established catalogs of mutations relevant to cancer development. However, the clinical utility of this information remains largely unexplored. Here, we present the results of the first eight patients recruited into a clinical whole-genome sequencing (WGS) program in the United Kingdom. We performed PCR-free WGS of fresh frozen tumors and germline DNA at 75× and 30×, respectively, using the HiSeq2500 HTv4. Subtracted tumor VCFs and paired germlines were subjected to comprehensive analysis of coding and noncoding regions, integration of germline with somatically acquired variants, and global mutation signatures and pathway analyses. Results were classified into tiers and presented to a multidisciplinary tumor board. WGS results helped to clarify an uncertain histopathological diagnosis in one case, led to informed or supported prognosis in two cases, leading to de-escalation of therapy in one, and indicated potential treatments in all eight. Overall 26 different tier 1 potentially clinically actionable findings were identified using WGS compared with six SNVs/indels using routine targeted NGS. These initial results demonstrate the potential of WGS to inform future diagnosis, prognosis, and treatment choice in cancer and justify the systematic evaluation of the clinical utility of WGS in larger cohorts of patients with cancer.
Assuntos
Biomarcadores Tumorais , Mutação , Neoplasias/diagnóstico , Neoplasias/genética , Sequenciamento Completo do Genoma , Adolescente , Adulto , Idoso , Biópsia , Criança , Análise Mutacional de DNA , Feminino , Humanos , Imuno-Histoquímica , Masculino , Pessoa de Meia-Idade , Reino Unido , Adulto JovemRESUMO
BACKGROUND: Transposable elements (TEs) are mobile genetic sequences that randomly propagate within their host's genome. This mobility has the potential to affect gene transcription and cause disease. However, TEs are technically challenging to identify, which complicates efforts to assess the impact of TE insertions on disease. Here we present a targeted sequencing protocol and computational pipeline to identify polymorphic and novel TE insertions using next-generation sequencing: TE-NGS. The method simultaneously targets the three subfamilies that are responsible for the majority of recent TE activity (L1HS, AluYa5/8, and AluYb8/9) thereby obviating the need for multiple experiments and reducing the amount of input material required. RESULTS: Here we describe the laboratory protocol and detection algorithm, and a benchmark experiment for the reference genome NA12878. We demonstrate a substantial enrichment for on-target fragments, and high sensitivity and precision to both reference and NA12878-specific insertions. We report 17 previously unreported loci for this individual which are supported by orthogonal long-read evidence, and we identify 1470 polymorphic and novel TEs in 12 additional samples that were previously undocumented in databases of insertion polymorphisms. CONCLUSIONS: We anticipate that future applications of TE-NGS alongside exome sequencing of patients with sporadic disease will reduce the number of unresolved cases, and improve estimates of the contribution of TEs to human genetic disease.
Assuntos
Algoritmos , Elementos de DNA Transponíveis , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Biblioteca Gênica , HumanosRESUMO
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Assuntos
Genoma Humano , Mutação INDEL , Taxa de Mutação , Seleção Genética , Evolução Molecular , Conversão Gênica , Loci Gênicos , Humanos , Modelos Genéticos , Filogenia , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The densities of transposable elements (TEs) in the human genome display substantial variation both within individual chromosomes and among chromosome types (autosomes and the two sex chromosomes). Finding an explanation for this variability has been challenging, especially in light of genome landscapes unique to the sex chromosomes. Here, using a multiple regression framework, we investigate primate Alu and L1 densities shaped by regional genome features and location on a particular chromosome type. As a result of our analysis, first, we build statistical models explaining up to 79% and 44% of variation in Alu and L1 element density, respectively. Second, we analyze sex chromosome versus autosome TE densities corrected for regional genomic effects. We discover that sex-chromosome bias in Alu and L1 distributions not only persists after accounting for these effects, but even presents differences in patterns, confirming preferential Alu integration in the male germline, yet likely integration of L1s in both male and female germlines or in early embryogenesis. Additionally, our models reveal that local base composition (measured by GC content and density of L1 target sites) and natural selection (inferred via density of most conserved elements) are significant to predicting densities of L1s. Interestingly, measurements of local double-stranded breaks (a 13-mer associated with genome instability) strongly correlate with densities of Alu elements; little evidence was found for the role of recombination-driven deletion in driving TE distributions over evolutionary time. Thus, Alu and L1 densities have been influenced by the combination of distinct local genome landscapes and the unique evolutionary dynamics of sex chromosomes.
Assuntos
Elementos Alu/genética , Genoma/genética , Elementos Nucleotídeos Longos e Dispersos/genética , Primatas/genética , Cromossomos Sexuais/genética , Elementos Nucleotídeos Curtos e Dispersos/genética , Animais , Mapeamento Cromossômico , Cromossomos Humanos X/genética , Cromossomos Humanos Y/genética , Evolução Molecular , Feminino , Humanos , Masculino , Modelos GenéticosRESUMO
Recent studies have revealed that insertions and deletions (indels) are more different in their formation than previously assumed. What remains enigmatic is how the local DNA sequence context contributes to these differences. To investigate the relative impact of various molecular mechanisms to indel formation, we analyzed sequence contexts of indels in the non protein- or RNA-coding, nonrepetitive (NCNR) portion of the human genome. We considered small (Assuntos
Genoma Humano
, Mutação INDEL/genética
, Deleção de Genes
, Humanos
, Mutagênese Insercional
, Sequências Repetitivas de Ácido Nucleico
RESUMO
Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments.