RESUMEN
BACKGROUND: Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality. RESULTS: We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set. CONCLUSIONS: Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.
Asunto(s)
Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento , Benchmarking , Biología Computacional/métodos , Biología Computacional/normas , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Programas InformáticosRESUMEN
OBJECTIVE: To determine whether a monogenic basis explains sudden infant death syndrome (SIDS) using an exome-wide focus. STUDY DESIGN: A cohort of 427 unrelated cases of SIDS (257 male; average age = 2.7 ± 1.9 months) underwent whole-exome sequencing. Exome-wide rare variant analyses were carried out with 278 SIDS cases of European ancestry (173 male; average age = 2.7 ± 1.98 months) and 973 ethnic-matched controls based on 6 genetic models. Ingenuity Pathway Analysis also was performed. The cohort was collected in collaboration with coroners, medical examiners, and pathologists by St George's University of London, United Kingdom, and Mayo Clinic, Rochester, Minnesota. Whole-exome sequencing was performed at the Genomic Laboratory, Kings College London, United Kingdom, or Mayo Clinic's Medical Genome Facility, Rochester, Minnesota. RESULTS: Although no exome-wide significant (P < 2.5 × 10-6) difference in burden of ultra-rare variants was detected for any gene, 405 genes had a greater prevalence (P < .05) of ultra-rare nonsynonymous variants among cases with 17 genes at P < .005. Some of these potentially overrepresented genes may represent biologically plausible novel candidate genes for a monogenic basis for a portion of patients with SIDS. The top canonical pathway identified was glucocorticoid biosynthesis (P = .01). CONCLUSIONS: The lack of exome-wide significant genetic associations indicates an extreme heterogeneity of etiologies underlying SIDS. Our approach to understanding the genetic mechanisms of SIDS has far reaching implications for the SIDS research community as a whole and may catalyze new evidence-based SIDS research across multiple disciplines. Perturbations in glucocorticoid biosynthesis may represent a novel SIDS-associated biological pathway for future SIDS investigative research.