RESUMO
SUMMARY: StructuralVariantAnnotation is an R/Bioconductor package that provides a framework for decoupling downstream analysis of structural variant breakpoints from upstream variant calling methods. It standardizes the representational format from BEDPE, or any of the three different notations supported by VCF into a breakpoint GRanges data structure suitable for use by the wider Bioconductor ecosystem. It handles both transitive breakpoints and duplication/insertion notational differences of identical variants-both common scenarios when comparing short/long read-based call sets that confound downstream analysis. StructuralVariantAnnotation provides the caller-agnostic foundation needed for a R/Bioconductor ecosystem of structural variant annotation, classification and interpretation tools able to handle both simple and complex genomic rearrangements. AVAILABILITY AND IMPLEMENTATION: StructuralVariantAnnotation is implemented in R and available for download as the Bioconductor StructuralVariantAnnotation package. Details can be found at https://www.bioconductor.org/packages/release/bioc/html/StructuralVariantAnnotation.html. It has been released under a GPL license.
Assuntos
Ecossistema , Software , Genômica/métodos , GenomaRESUMO
The precise role of CD4 T cell turnover in maintaining HIV persistence during antiretroviral therapy (ART) has not yet been well characterized. In resting CD4 T cell subpopulations from 24 HIV-infected ART-suppressed and 6 HIV-uninfected individuals, we directly measured cellular turnover by heavy water labeling, HIV reservoir size by integrated HIV-DNA (intDNA) and cell-associated HIV-RNA (caRNA), and HIV reservoir clonality by proviral integration site sequencing. Compared to HIV-negatives, ART-suppressed individuals had similar fractional replacement rates in all subpopulations, but lower absolute proliferation rates of all subpopulations other than effector memory (TEM) cells, and lower plasma IL-7 levels (p = 0.0004). Median CD4 T cell half-lives decreased with cell differentiation from naïve to TEM cells (3 years to 3 months, p<0.001). TEM had the fastest replacement rates, were most highly enriched for intDNA and caRNA, and contained the most clonal proviral expansion. Clonal proviruses detected in less mature subpopulations were more expanded in TEM, suggesting that they were maintained through cell differentiation. Earlier ART initiation was associated with lower levels of intDNA, caRNA and fractional replacement rates. In conclusion, circulating integrated HIV proviruses appear to be maintained both by slow turnover of immature CD4 subpopulations, and by clonal expansion as well as cell differentiation into effector cells with faster replacement rates.
Assuntos
Antirretrovirais/uso terapêutico , Linfócitos T CD4-Positivos/patologia , Diferenciação Celular , Infecções por HIV/virologia , HIV-1/imunologia , Carga Viral , Replicação Viral , Adulto , Linfócitos T CD4-Positivos/efeitos dos fármacos , Linfócitos T CD4-Positivos/virologia , Estudos de Casos e Controles , DNA Viral/análise , Infecções por HIV/tratamento farmacológico , Infecções por HIV/imunologia , Infecções por HIV/patologia , HIV-1/efeitos dos fármacos , HIV-1/genética , Humanos , Masculino , Pessoa de Meia-IdadeRESUMO
MOTIVATION: Integration of viruses into infected host cell DNA can cause DNA damage and disrupt genes. Recent cost reductions and growth of whole genome sequencing has produced a wealth of data in which viral presence and integration detection is possible. While key research and clinically relevant insights can be uncovered, existing software has not achieved widespread adoption, limited in part due to high computational costs, the inability to detect a wide range of viruses, as well as precision and sensitivity. RESULTS: Here, we describe VIRUSBreakend, a high-speed tool that identifies viral DNA presence and genomic integration. It utilizes single breakends, breakpoints in which only one side can be unambiguously placed, in a novel virus-centric variant calling and assembly approach to identify viral integrations with high sensitivity and a near-zero false discovery rate. VIRUSBreakend detects viral integrations anywhere in the host genome including regions such as centromeres and telomeres unable to be called by existing tools. Applying VIRUSBreakend to a large metastatic cancer cohort, we demonstrate that it can reliably detect clinically relevant viral presence and integration including HPV, HBV, MCPyV, EBV and HHV-8. AVAILABILITY AND IMPLEMENTATION: VIRUSBreakend is part of the Genomic Rearrangement IDentification Software Suite (GRIDSS). It is available under a GPLv3 license from https://github.com/PapenfussLab/VIRUSBreakend. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Genomic rearrangements are common in cancer, with demonstrated links to disease progression and treatment response. These rearrangements can be complex, resulting in fusions of multiple chromosomal fragments and generation of derivative chromosomes. Although methods exist for detecting individual fusions, they are generally unable to reconstruct complex chained events. To overcome these limitations, we adopted a new optical mapping approach, allowing megabase-length genome maps to be reconstructed and rearranged genomes to be visualized without loss of integrity. Whole-genome mapping (Bionano Genomics) of a well-studied highly rearranged liposarcoma cell line resulted in 3338 assembled consensus genome maps, including 72 fusion maps. These fusion maps represent 112.3 Mb of highly rearranged genomic regions, illuminating the complex architecture of chained fusions, including content, order, orientation, and size. Spanning the junction of 147 chromosomal translocations, we found a total of 28 Mb of interspersed sequences that could not be aligned to the reference genome. Traversing these interspersed sequences using short-read sequencing breakpoint calls, we were able to identify and place 399 sequencing fragments within the optical mapping gaps, thus illustrating the complementary nature of optical mapping and short-read sequencing. We demonstrate that optical mapping provides a powerful new approach for capturing a higher level of complex genomic architecture, creating a scaffold for renewed interpretation of sequencing data of particular relevance to human cancer.
Assuntos
Mapeamento Cromossômico/métodos , Variação Genética , Genoma Humano/genética , Neoplasias/genética , Linhagem Celular Tumoral , Aberrações Cromossômicas , Fusão Gênica , Rearranjo Gênico , Haplótipos , Humanos , Modelos Genéticos , Análise de Sequência de DNA/métodosRESUMO
The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.
Assuntos
Rearranjo Gênico , Genômica/métodos , Software , Linhagem Celular , Simulação por Computador , Genoma , Variação Estrutural do Genoma , Humanos , Neoplasias/genética , Plasmodium falciparum/genética , Sensibilidade e EspecificidadeRESUMO
Research and medical genomics require comprehensive and scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size (e.g., SNV/SV) or location (e.g., repeats). Here we present DRAGEN that utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection). DRAGEN outperforms all other state-of-the-art methods in speed and accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN, GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability, accuracy, and innovations to further advance the integration of comprehensive genomics for research and medical applications.
RESUMO
Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size or location. Here we present DRAGEN, which uses multigenome mapping with pangenome references, hardware acceleration and machine learning-based variant detection to provide insights into individual genomes, with ~30 min of computation time from raw reads to variant detection. DRAGEN outperforms current state-of-the-art methods in speed and accuracy across all variant types (single-nucleotide variations, insertions or deletions, short tandem repeats, structural variations and copy number variations) and incorporates specialized methods for analysis of medically relevant genes. We demonstrate the performance of DRAGEN across 3,202 whole-genome sequencing datasets by generating fully genotyped multisample variant call format files and demonstrate its scalability, accuracy and innovation to further advance the integration of comprehensive genomics. Overall, DRAGEN marks a major milestone in sequencing data analysis and will provide insights across various diseases, including Mendelian and rare diseases, with a highly comprehensive and scalable platform.
Assuntos
DNA Tumoral Circulante/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Transtornos Linfoproliferativos/diagnóstico , Adulto , Idoso , Linfócitos B/patologia , Linfoma de Burkitt/diagnóstico , Linfoma de Burkitt/genética , Feminino , Genômica , Humanos , Transtornos Linfoproliferativos/genética , Masculino , Mieloma Múltiplo/diagnóstico , Mieloma Múltiplo/genética , Hibridização de Ácido NucleicoRESUMO
Complex somatic genomic rearrangements and copy number alterations are hallmarks of nearly all cancers. We have developed an algorithm, LINX, to aid interpretation of structural variant and copy number data derived from short-read, whole-genome sequencing. LINX classifies raw structural variant calls into distinct events and predicts their effect on the local structure of the derivative chromosome and the functional impact on affected genes. Visualizations facilitate further investigation of complex rearrangements. LINX allows insights into a diverse range of structural variation events and can reliably detect pathogenic rearrangements, including gene fusions, immunoglobulin enhancer rearrangements, intragenic deletions, and duplications. Uniquely, LINX also predicts chained fusions that we demonstrate account for 13% of clinically relevant oncogenic fusions. LINX also reports a class of inactivation events that we term homozygous disruptions that may be a driver mutation in up to 9% of tumors and may frequently affect PTEN, TP53, and RB1.
RESUMO
Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality, gold-standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines. Here, we performed somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different sequencing technologies. Based on the evidence from multiple technologies combined with extensive experimental validation, we compiled a comprehensive set of carefully curated and validated somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects. The truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.
RESUMO
GRIDSS2 is the first structural variant caller to explicitly report single breakends-breakpoints in which only one side can be unambiguously determined. By treating single breakends as a fundamental genomic rearrangement signal on par with breakpoints, GRIDSS2 can explain 47% of somatic centromere copy number changes using single breakends to non-centromere sequence. On a cohort of 3782 deeply sequenced metastatic cancers, GRIDSS2 achieves an unprecedented 3.1% false negative rate and 3.3% false discovery rate and identifies a novel 32-100 bp duplication signature. GRIDSS2 simplifies complex rearrangement interpretation through phasing of structural variants with 16% of somatic calls phasable using paired-end sequencing.
Assuntos
Pontos de Quebra do Cromossomo , Variações do Número de Cópias de DNA , Neoplasias/genética , Software , Mapeamento de Sequências Contíguas , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma Humano , Genômica , Humanos , Metástase Neoplásica , Neoplasias/patologiaRESUMO
Maximizing the personal, public, research, and clinical value of genomic information will require the reliable exchange of genetic variation data. We report here the Variation Representation Specification (VRS, pronounced "verse"), an extensible framework for the computable representation of variation that complements contemporary human-readable and flat file standards for genomic variation representation. VRS provides semantically precise representations of variation and leverages this design to enable federated identification of biomolecular variation with globally consistent and unique computed identifiers. The VRS framework includes a terminology and information model, machine-readable schema, data sharing conventions, and a reference implementation, each of which is intended to be broadly useful and freely available for community use. VRS was developed by a partnership among national information resource providers, public initiatives, and diagnostic testing laboratories under the auspices of the Global Alliance for Genomics and Health (GA4GH).
RESUMO
Although melanoma is initiated by acquisition of point mutations and limited focal copy number alterations in melanocytes-of-origin, the nature of genetic changes that characterise lethal metastatic disease is poorly understood. Here, we analyze the evolution of human melanoma progressing from early to late disease in 13 patients by sampling their tumours at multiple sites and times. Whole exome and genome sequencing data from 88 tumour samples reveals only limited gain of point mutations generally, with net mutational loss in some metastases. In contrast, melanoma evolution is dominated by whole genome doubling and large-scale aneuploidy, in which widespread loss of heterozygosity sculpts the burden of point mutations, neoantigens and structural variants even in treatment-naïve and primary cutaneous melanomas in some patients. These results imply that dysregulation of genomic integrity is a key driver of selective clonal advantage during melanoma progression.
Assuntos
Aneuploidia , Variações do Número de Cópias de DNA/genética , Genoma Humano/genética , Melanoma/genética , Neoplasias Cutâneas/genética , Progressão da Doença , Exoma/genética , Humanos , Mutação INDEL/genética , Melanócitos/patologia , Mutação Puntual/genética , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento do Exoma , Sequenciamento Completo do Genoma , Melanoma Maligno CutâneoRESUMO
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
RESUMO
In recent years, many software packages for identifying structural variants (SVs) using whole-genome sequencing data have been released. When published, a new method is commonly compared with those already available, but this tends to be selective and incomplete. The lack of comprehensive benchmarking of methods presents challenges for users in selecting methods and for developers in understanding algorithm behaviours and limitations. Here we report the comprehensive evaluation of 10 SV callers, selected following a rigorous process and spanning the breadth of detection approaches, using high-quality reference cell lines, as well as simulations. Due to the nature of available truth sets, our focus is on general-purpose rather than somatic callers. We characterise the impact on performance of event size and type, sequencing characteristics, and genomic context, and analyse the efficacy of ensemble calling and calibration of variant quality scores. Finally, we provide recommendations for both users and methods developers.
Assuntos
Biologia Computacional/métodos , Genoma Humano/genética , Variação Estrutural do Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Sequenciamento Completo do Genoma/métodos , Linhagem Celular , Diploide , Genômica/métodos , Humanos , Reprodutibilidade dos TestesRESUMO
Pheochromocytomas (PC) and paragangliomas (PGL) are endocrine tumors for which the genetic and clinicopathological features of metastatic progression remain incompletely understood. As a result, the risk of metastasis from a primary tumor cannot be predicted. Early diagnosis of individuals at high risk of developing metastases is clinically important and the identification of new biomarkers that are predictive of metastatic potential is of high value. Activation of TERT has been associated with a number of malignant tumors, including PC/PGL. However, the mechanism of TERT activation in the majority of PC/PGL remains unclear. As TERT promoter mutations occur rarely in PC/PGL, we hypothesized that other mechanisms - such as structural variations - may underlie TERT activation in these tumors. From 35 PC and four PGL, we identified three primary PCs that developed metastases with elevated TERT expression, each of which lacked TERT promoter mutations and promoter DNA methylation. Using whole genome sequencing, we identified somatic structural alterations proximal to the TERT locus in two of these tumors. In both tumors, the genomic rearrangements led to the positioning of super-enhancers proximal to the TERT promoter, that are likely responsible for the activation of the normally tightly repressed TERT expression in chromaffin cells.