RESUMEN
The advent of high-throughput sequencing technologies has revolutionized the field of genomic sciences by cutting down the cost and time associated with standard sequencing methods. This advancement has not only provided the research community with an abundance of data but has also presented the challenge of analyzing it. The paramount challenge in analyzing the copious amount of data is in using the optimal resources in terms of available tools. To address this research gap, we propose "Kuura-An automated workflow for analyzing WES and WGS data", which is optimized for both whole exome and whole genome sequencing data. This workflow is based on the nextflow pipeline scripting language and uses docker to manage and deploy the workflow. The workflow consists of four analysis stages-quality control, mapping to reference genome & quality score recalibration, variant calling & variant recalibration and variant consensus & annotation. An important feature of the DNA-seq workflow is that it uses the combination of multiple variant callers (GATK Haplotypecaller, DeepVariant, VarScan2, Freebayes and Strelka2), generating a list of high-confidence variants in a consensus call file. The workflow is flexible as it integrates the fragmented tools and can be easily extended by adding or updating tools or amending the parameters list. The use of a single parameters file enhances reproducibility of the results. The ease of deployment and usage of the workflow further increases computational reproducibility providing researchers with a standardized tool for the variant calling step in different projects. The source code, instructions for installation and use of the tool are publicly available at our github repository https://github.com/dhanaprakashj/kuura_pipeline.
Asunto(s)
Biología Computacional , Programas Informáticos , Biología Computacional/métodos , Flujo de Trabajo , Reproducibilidad de los Resultados , Secuenciación Completa del GenomaRESUMEN
BACKGROUND: Breast cancer is the most common malignancy, with a mean age of onset of approximately 60 years. Only a minority of breast cancer patients present with an early onset at or before 40 years of age. An exceptionally young age at diagnosis hints at a possible genetic etiology. Currently, known pathogenic genetic variants only partially explain the disease burden of younger patients. Thus, new knowledge is warranted regarding additional risk variants. In this study, we analyzed DNA repair genes to identify additional variants to shed light on the etiology of early-onset breast cancer. METHODS: Germline whole-exome sequencing was conducted in a cohort of 63 patients diagnosed with breast cancer at or before 40 years of age (median 33, mean 33.02, range 23-40 years) with no known pathogenic variants in BRCA genes. After filtering, all detected rare variants were sorted by pathogenicity prediction scores (CADD score and REVEL) to identify the most damaging genetic changes. The remaining variants were then validated by comparison to a validation cohort of 121 breast cancer patients with no preselected age at cancer diagnosis (mean 51.4 years, range 28-80 years). Analysis of novel exonic variants was based on protein structure modeling. RESULTS: Five novel, deleterious variants in the genes WRN, RNF8, TOP3A, ERCC2, and TREX2 were found in addition to a splice acceptor variant in RNF4 and two frameshift variants in EXO1 and POLE genes, respectively. There were also multiple previously reported putative risk variants in other DNA repair genes. CONCLUSIONS: Taken together, whole-exome sequencing yielded 72 deleterious variants, including 8 novel variants that may play a pivotal role in the development of early-onset breast cancer. Although more studies are warranted, we demonstrate that young breast cancer patients tend to carry multiple deleterious variants in one or more DNA repair genes.
RESUMEN
BACKGROUND: Germline ATM mutations are suggested to contribute to predisposition to prostate cancer (PrCa). Previous studies have had inadequate power to estimate variant effect sizes. OBJECTIVE: To precisely estimate the contribution of germline ATM mutations to PrCa risk. DESIGN, SETTING, AND PARTICIPANTS: We analysed next-generation sequencing data from 13 PRACTICAL study groups comprising 5560 cases and 3353 controls of European ancestry. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: Variant Call Format files were harmonised, annotated for rare ATM variants, and classified as tier 1 (likely pathogenic) or tier 2 (potentially deleterious). Associations with overall PrCa risk and clinical subtypes were estimated. RESULTS AND LIMITATIONS: PrCa risk was higher in carriers of a tier 1 germline ATM variant, with an overall odds ratio (OR) of 4.4 (95% confidence interval [CI]: 2.0-9.5). There was also evidence that PrCa cases with younger age at diagnosis (<65 yr) had elevated tier 1 variant frequencies (pdifference = 0.04). Tier 2 variants were also associated with PrCa risk, with an OR of 1.4 (95% CI: 1.1-1.7). CONCLUSIONS: Carriers of pathogenic ATM variants have an elevated risk of developing PrCa and are at an increased risk for earlier-onset disease presentation. These results provide information for counselling of men and their families. PATIENT SUMMARY: In this study, we estimated that men who inherit a likely pathogenic mutation in the ATM gene had an approximately a fourfold risk of developing prostate cancer. In addition, they are likely to develop the disease earlier.
Asunto(s)
Predisposición Genética a la Enfermedad , Neoplasias de la Próstata , Proteínas de la Ataxia Telangiectasia Mutada/genética , Mutación de Línea Germinal , Humanos , Masculino , Neoplasias de la Próstata/epidemiología , Neoplasias de la Próstata/genéticaRESUMEN
MOTIVATION: Annotation of large amounts of generated sequencing data is a demanding task. Most of the currently available robust annotation tools, like ANNOVAR, are command-line based tools which require a certain degree of programming skills. User-friendly tools for variant annotation of sequencing data with graphical interface are under-represented. RESULTS: We have developed an interactive application, which harnesses the easy usability of R Shiny and combines it with the versatile annotation features of ANNOVAR. This application is easy to use and gives comprehensive annotations for user supplied vcf files using multiples databases. The output table contains the list of variants and their corresponding annotation presented within the graphical interface. In addition, the annotation results are downloadable as text file.