RESUMEN
Preterm birth (PTB) complications are the leading cause of long-term morbidity and mortality in children. By using whole blood samples, we integrated whole-genome sequencing (WGS), RNA sequencing (RNA-seq), and DNA methylation data for 270 PTB and 521 control families. We analyzed this combined dataset to identify genomic variants associated with PTB and secondary analyses to identify variants associated with very early PTB (VEPTB) as well as other subcategories of disease that may contribute to PTB. We identified differentially expressed genes (DEGs) and methylated genomic loci and performed expression and methylation quantitative trait loci analyses to link genomic variants to these expression and methylation changes. We performed enrichment tests to identify overlaps between new and known PTB candidate gene systems. We identified 160 significant genomic variants associated with PTB-related phenotypes. The most significant variants, DEGs, and differentially methylated loci were associated with VEPTB. Integration of all data types identified a set of 72 candidate biomarker genes for VEPTB, encompassing genes and those previously associated with PTB. Notably, PTB-associated genes RAB31 and RBPJ were identified by all three data types (WGS, RNA-seq, and methylation). Pathways associated with VEPTB include EGFR and prolactin signaling pathways, inflammation- and immunity-related pathways, chemokine signaling, IFN-γ signaling, and Notch1 signaling. Progress in identifying molecular components of a complex disease is aided by integrated analyses of multiple molecular data types and clinical data. With these data, and by stratifying PTB by subphenotype, we have identified associations between VEPTB and the underlying biology.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Nacimiento Prematuro/genética , Metilación de ADN/genética , Femenino , Genómica/métodos , Humanos , Recién Nacido , Masculino , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Transducción de Señal/genética , Secuenciación Completa del Genoma/métodosRESUMEN
In the version of this article published, the P values for the enrichment of single mutation categories were inadvertently not corrected for multiple testing. After multiple-testing correction, only two of the six mutation categories mentioned are still statistically significant. To reflect this, the text "More specifically, paternally derived DNMs are enriched in transitions in A[.]G contexts, especially ACG>ATG and ATG>ACG (Bonferroni-corrected P = 1.3 × 10-2 and P = 1 × 10-3, respectively). Additionally, we observed overrepresentation of ATA>ACA mutations (Bonferroni-corrected P = 4.28 × 10-2) for DNMs of paternal origin. Among maternally derived DNMs, CCA>CTA, GCA>GTA and TCT>TGT mutations were significantly overrepresented (Bonferroni-corrected P = 4 × 10-4, P = 5 × 10-4, P = 1 × 10-3, respectively)" should read "More specifically, CCA>CTA and GCA>GTA mutations were significantly overenriched on the maternal allele (Bonferroni-corrected P = 0.0192 and P = 0.048, respectively)." Additionally, the last sentence to the legend for Fig. 3b should read "Green boxes highlight the mutation categories that differ significantly" instead of "Green boxes highlight the mutation categories that differ more than 1% of mutation load with a bootstrapping P value <0.05." Corrected versions of Fig. 3b and Supplementary Table 25 appear with the Author Correction.
RESUMEN
De novo mutations (DNMs) originating in gametogenesis are an important source of genetic variation. We use a data set of 7,216 autosomal DNMs with resolved parent of origin from whole-genome sequencing of 816 parent-offspring trios to investigate differences between maternally and paternally derived DNMs and study the underlying mutational mechanisms. Our results show that the number of DNMs in offspring increases not only with paternal age, but also with maternal age, and that some genome regions show enrichment for maternally derived DNMs. We identify parent-of-origin-specific mutation signatures that become more pronounced with increased parental age, pointing to different mutational mechanisms in spermatogenesis and oogenesis. Moreover, we find DNMs that are spatially clustered to have a unique mutational signature with no significant differences between parental alleles, suggesting a different mutational mechanism. Our findings provide insights into the molecular mechanisms that underlie mutagenesis and are relevant to disease and evolution in humans.
Asunto(s)
Regulación de la Expresión Génica , Genoma Humano , Mutación de Línea Germinal/genética , Edad Materna , Mutagénesis/genética , Edad Paterna , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , MasculinoAsunto(s)
Agenesia del Cuerpo Calloso/genética , Catarata/genética , Empalme Alternativo , Humanos , MutaciónRESUMEN
Germline mutations are the source of evolution and contribute substantially to many health-related processes. Here we use whole-genome deep sequencing data from 693 parents-offspring trios to examine the de novo point mutations (DNMs) in the offspring. Our estimate for the mutation rate per base pair per generation is 1.05 × 10(-8), well within the range of previous studies. We show that maternal age has a small but significant correlation with the total number of DNMs in the offspring after controlling for paternal age (0.51 additional mutations per year, 95% CI: 0.29, 0.73), which was not detectable in the smaller and younger parental cohorts of earlier studies. Furthermore, while the total number of DNMs increases at a constant rate for paternal age, the contribution from the mother increases at an accelerated rate with age.These observations have implications related to the incidence of de novo mutations relating to maternal age.
Asunto(s)
Mutación de Línea Germinal , Edad Materna , Adolescente , Adulto , Análisis Mutacional de ADN , Femenino , Humanos , Masculino , Persona de Mediana Edad , Tasa de Mutación , Edad Paterna , Adulto JovenRESUMEN
PURPOSE: To assess the potential of whole-genome sequencing (WGS) to replicate and augment results from conventional blood-based newborn screening (NBS). METHODS: Research-generated WGS data from an ancestrally diverse cohort of 1,696 infants and both parents of each infant were analyzed for variants in 163 genes involved in disorders included or under discussion for inclusion in US NBS programs. WGS results were compared with results from state NBS and related follow-up testing. RESULTS: NBS genes are generally well covered by WGS. There is a median of one (range: 0-6) database-annotated pathogenic variant in the NBS genes per infant. Results of WGS and NBS in detecting 28 state-screened disorders and four hemoglobin traits were concordant for 88.6% of true positives (n = 35) and 98.9% of true negatives (n = 45,757). Of the five infants affected with a state-screened disorder, WGS identified two whereas NBS detected four. WGS yielded fewer false positives than NBS (0.037 vs. 0.17%) but more results of uncertain significance (0.90 vs. 0.013%). CONCLUSION: WGS may help rule in and rule out NBS disorders, pinpoint molecular diagnoses, and detect conditions not amenable to current NBS assays.
Asunto(s)
Predisposición Genética a la Enfermedad , Genoma Humano , Tamizaje Neonatal/métodos , Análisis de Secuencia de ADN/métodos , Estudios de Cohortes , Femenino , Variación Genética , Humanos , Recién Nacido , Masculino , Sensibilidad y EspecificidadRESUMEN
Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest.
Asunto(s)
Biología Computacional/métodos , Clasificación , Interpretación Estadística de Datos , Lenguajes de Programación , Análisis de Regresión , Programas InformáticosRESUMEN
Risk assessment for prostate cancer is challenging due to its genetic heterogeneity. In this study, our goal was to develop an operational framework to select and evaluate gene variants that may contribute to familial prostate cancer risk. Drawing on orthogonal sources, we developed a candidate list of genes relevant to prostate cancer, then analyzed germline exomes from 12 case-only prostate cancer patients from high-risk families to identify patterns of protein-damaging gene variants. We described an average of 5 potentially disruptive variants in each individual and annotated them in the context of public databases representing human variation. Novel damaging variants were found in several genes of relevance to prostate cancer. Almost all patients had variants associated with defects in DNA damage response. Many also had variants linked to androgen signaling. Treatment of primary T-lymphocytes from these prostate cancer patients versus controls with DNA damaging agents showed elevated levels of the DNA double strand break (DSB) marker γH2AX (p < 0.05), supporting the idea of an underlying defect in DNA repair. This work suggests the value of focusing on underlying defects in DNA damage in familial prostate cancer risk assessment and demonstrates an operational framework for exome sequencing in case-only prostate cancer genetic evaluation.
Asunto(s)
Reparación del ADN/genética , Predisposición Genética a la Enfermedad/genética , Mutación , Neoplasias de la Próstata/genética , Adulto , Anciano , Antineoplásicos Fitogénicos/farmacología , Células Cultivadas , Roturas del ADN de Doble Cadena/efectos de los fármacos , Etopósido/farmacología , Exoma/genética , Salud de la Familia , Histonas/metabolismo , Humanos , Mutación INDEL , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/patología , Medición de Riesgo , Factores de Riesgo , Análisis de Secuencia de ADN , Linfocitos T/efectos de los fármacos , Linfocitos T/metabolismoRESUMEN
BACKGROUND & AIMS: DNA structural lesions are prevalent in sporadic colorectal cancer. Therefore, we proposed that gene variants that predispose to DNA double-strand breaks (DSBs) would be found in patients with familial colorectal carcinomas of an undefined genetic basis (UFCRC). METHODS: We collected primary T cells from 25 patients with UFCRC and matched patients without colorectal cancer (controls) and assayed for DSBs. We performed exome sequence analyses of germline DNA from 20 patients with UFCRC and 5 undiagnosed patients with polyposis. The prevalence of identified variants in genes linked to DNA integrity was compared with that of individuals without a family history of cancer. The effects of representative variants found to be associated with UFCRC was confirmed in functional assays with HCT116 cells. RESULTS: Primary T cells from most patients with UFCRC had increased levels of the DSB marker γ(phosphorylated)histone2AX (γH2AX) after treatment with DNA damaging agents, compared with T cells from controls (P < .001). Exome sequence analysis identified a mean 1.4 rare variants per patient that were predicted to disrupt functions of genes relevant to DSBs. Controls (from public databases) had a much lower frequency of variants in the same genes (P < .001). Knockdown of representative variant genes in HCT116 CRC cells increased γH2AX. A detailed analysis of immortalized patient-derived B cells that contained variants in the Werner syndrome, RecQ helicase-like gene (WRN, encoding T705I), and excision repair cross-complementation group 6 (ERCC6, encoding N180Y) showed reduced levels of these proteins and increased DSBs, compared with B cells from controls. This phenotype was rescued by exogenous expression of WRN or ERCC6. Direct analysis of the recombinant variant proteins confirmed defective enzymatic activities. CONCLUSIONS: These results provide evidence that defects in suppression of DSBs underlie some cases of UFCRC; these can be identified by assays of circulating lymphocytes. We specifically associated UFCRC with variants in WRN and ERCC6 that reduce the capacity for repair of DNA DSBs. These observations could lead to a simple screening strategy for UFCRC, and provide insight into the pathogenic mechanisms of colorectal carcinogenesis.
Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias Colorrectales/genética , Roturas del ADN de Doble Cadena , Variación Genética , Linfocitos T/patología , Adulto , Anciano , Anciano de 80 o más Años , Biomarcadores de Tumor/metabolismo , Estudios de Casos y Controles , Neoplasias Colorrectales/inmunología , Neoplasias Colorrectales/metabolismo , Neoplasias Colorrectales/patología , Biología Computacional , ADN Helicasas/genética , ADN Helicasas/metabolismo , Reparación del ADN , Enzimas Reparadoras del ADN/genética , Enzimas Reparadoras del ADN/metabolismo , Bases de Datos Genéticas , Exodesoxirribonucleasas/genética , Exodesoxirribonucleasas/metabolismo , Exoma , Femenino , Frecuencia de los Genes , Técnicas de Silenciamiento del Gen , Predisposición Genética a la Enfermedad , Inestabilidad Genómica , Células HCT116 , Herencia , Histonas/metabolismo , Humanos , Masculino , Persona de Mediana Edad , Mutágenos/farmacología , Fenotipo , Fosforilación , Proteínas de Unión a Poli-ADP-Ribosa , RecQ Helicasas/genética , RecQ Helicasas/metabolismo , Análisis de Secuencia de ADN , Linfocitos T/efectos de los fármacos , Linfocitos T/inmunología , Linfocitos T/metabolismo , Transfección , Regulación hacia Arriba , Helicasa del Síndrome de WernerRESUMEN
The field of cancer diagnostics is in constant flux as a result of the rapid discovery of new genes associated with cancer, improvements in laboratory techniques for identifying disease causing events, and novel analytic methods that enable the integration of many different types of data. These advances have helped in the identification of novel, informative biomarkers. As more whole genome sequence data are generated and analyzed, emerging information on the baseline variability of the human genome has shown the importance of the ancestral genomic background in patients with a potential disease causing variant. The recent discovery of many novel DNA sequence variants, advances in sequencing and genomic technology, and improved analytic methods enable the impact of germline and somatic genome variation on tumorigenesis and metastasis to be determined. New molecular targets and companion diagnostics are changing the way geneticists and oncologists think about the causes, diagnosis, and treatment of cancer.
Asunto(s)
Biomarcadores de Tumor/genética , Genómica , Neoplasias/genética , Carcinogénesis/genética , Genoma Humano , Humanos , Metástasis de la Neoplasia/genética , Neoplasias/diagnóstico , Neoplasias/terapia , Análisis de Secuencia de ADNRESUMEN
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
RESUMEN
Rubinstein-Taybi syndrome (RSTS) can be caused by heterozygous mutations or deletions involving CREBBP or, less commonly, EP300. To date, only 15 patients with EP300 mutations have been clinically described. Frequently reported manifestations in these patients include characteristic facial and limb features, varying degrees of neurocognitive dysfunction, and maternal preeclampsia. Other congenital anomalies are less frequently reported. We describe a child found to have a de novo EP300 mutation (c.4933C>T, predicted to result in p.Arg1645X) through research-based whole-genome sequencing of the family trio. The child's presentation involved dysmorphic features as well as unilateral renal agenesis, a myelomeningocele, and minor genitourinary anomalies. The involvement of congenital anomalies in all 16 clinically described patients with EP300 mutations (25% of which have been identified by "hypothesis free" methods, including microarray, exome, and whole-genome sequencing) is reviewed. In summary, genitourinary anomalies have been identified in 38%, cardiovascular anomalies in 25%, spinal/vertebral anomalies in 19%, other skeletal anomalies in 19%, brain anomalies in 13%, and renal anomalies in 6%. Our patient expands the phenotypic spectrum in EP300-related RSTS; this case demonstrates the evolving practice of clinical genomics related to increasing availability of genomic sequencing methods.
Asunto(s)
Proteína p300 Asociada a E1A/genética , Mutación , Síndrome de Rubinstein-Taybi/genética , Anomalías Urogenitales/genética , Secuencia de Bases , Mapeo Cromosómico , Exoma/genética , Femenino , Humanos , Lactante , Imagen por Resonancia Magnética , Embarazo , Radiografía , Síndrome de Rubinstein-Taybi/diagnóstico por imagen , Síndrome de Rubinstein-Taybi/etiología , Síndrome de Rubinstein-Taybi/fisiopatología , Eliminación de Secuencia , Columna Vertebral/diagnóstico por imagen , Columna Vertebral/fisiopatología , Anomalías Urogenitales/fisiopatologíaRESUMEN
D-Bifunctional protein deficiency, caused by recessive mutations in HSD17B4, is a severe disorder of peroxisomal fatty acid oxidation. Nonspecific clinical features may contribute to diagnostic challenges. We describe a newborn female with infantile-onset seizures and nonspecific mild dysmorphisms who underwent extensive genetic workup that resulted in the detection of a novel homozygous mutation (c.302+1_4delGTGA) in the HSD17B4 gene, consistent with a diagnosis of D-bifunctional protein deficiency. By comparing the standard clinical workup to diagnostic analysis performed through research-based whole-genome sequencing (WGS), which independently identified the causative mutation, we demonstrated the ability of genomic sequencing to serve as a timely and cost-effective diagnostic tool for the molecular diagnosis of apparent and occult newborn diseases. As genomic sequencing becomes more available and affordable, we anticipate that WGS and related omics technologies will eventually replace the traditional tiered approach to newborn diagnostic workup.
RESUMEN
Notch signaling determines and reinforces cell fate in bilaterally symmetric multicellular eukaryotes. Despite the involvement of Notch in many key developmental systems, human mutations in Notch signaling components have mainly been described in disorders with vascular and bone effects. Here, we report five heterozygous NOTCH1 variants in unrelated individuals with Adams-Oliver syndrome (AOS), a rare disease with major features of aplasia cutis of the scalp and terminal transverse limb defects. Using whole-genome sequencing in a cohort of 11 families lacking mutations in the four genes with known roles in AOS pathology (ARHGAP31, RBPJ, DOCK6, and EOGT), we found a heterozygous de novo 85 kb deletion spanning the NOTCH1 5' region and three coding variants (c.1285T>C [p.Cys429Arg], c.4487G>A [p.Cys1496Tyr], and c.5965G>A [p.Asp1989Asn]), two of which are de novo, in four unrelated probands. In a fifth family, we identified a heterozygous canonical splice-site variant (c.743-1 G>T) in an affected father and daughter. These variants were not present in 5,077 in-house control genomes or in public databases. In keeping with the prominent developmental role described for Notch1 in mouse vasculature, we observed cardiac and multiple vascular defects in four of the five families. We propose that the limb and scalp defects might also be due to a vasculopathy in NOTCH1-related AOS. Our results suggest that mutations in NOTCH1 are the most common cause of AOS and add to a growing list of human diseases that have a vascular and/or bony component and are caused by alterations in the Notch signaling pathway.
Asunto(s)
Anomalías Múltiples/genética , Displasia Ectodérmica/genética , Displasia Ectodérmica/patología , Deformidades Congénitas de las Extremidades/genética , Deformidades Congénitas de las Extremidades/patología , Mutación/genética , Receptor Notch1/genética , Dermatosis del Cuero Cabelludo/congénito , Adolescente , Adulto , Animales , Preescolar , Femenino , Humanos , Lactante , Masculino , Ratones , Linaje , Dermatosis del Cuero Cabelludo/genética , Dermatosis del Cuero Cabelludo/patología , Adulto JovenRESUMEN
Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.
Asunto(s)
Predisposición Genética a la Enfermedad , Genoma Humano/genética , Mutación de Línea Germinal/genética , Salud , Neoplasias/genética , Análisis de Secuencia de ADN , Adolescente , Adulto , Alelos , Estudios de Cohortes , Femenino , Frecuencia de los Genes/genética , Pool de Genes , Genes Relacionados con las Neoplasias , Humanos , Masculino , Persona de Mediana Edad , Modelos Moleculares , Sistemas de Lectura Abierta/genética , Filogenia , Adulto JovenRESUMEN
Whole-genome sequencing and whole-exome sequencing are becoming more widely applied in clinical medicine to help diagnose rare genetic diseases. Identification of the underlying causative mutations by genome-wide sequencing is greatly facilitated by concurrent analysis of multiple family members, most often the mother-father-proband trio, using bioinformatics pipelines that filter genetic variants by mode of inheritance. However, current pipelines are limited to Mendelian inheritance patterns and do not specifically address disorders caused by mutations in imprinted genes, such as forms of Angelman syndrome and Beckwith-Wiedemann syndrome. Using publicly available tools, we implemented a genetic inheritance search mode to identify imprinted-gene mutations. Application of this search mode to whole-genome sequences from a family trio led to a diagnosis for a proband for whom extensive clinical testing and Mendelian inheritance-based sequence analysis were nondiagnostic. The condition in this patient, IMAGe syndrome, is likely caused by the heterozygous mutation c.832A>G (p.Lys278Glu) in the imprinted gene CDKN1C. The genotypes and disease status of six members of the family are consistent with maternal expression of the gene, and allele-biased expression was confirmed by RNA-Seq for the heterozygotes. This analysis demonstrates that an imprinted-gene search mode is a valuable addition to genome sequence analysis pipelines for identifying disease-causative variants.
RESUMEN
The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.