ABSTRACT
The differential performance of polygenic risk scores (PRSs) by group is one of the major ethical barriers to their clinical use. It is also one of the main practical challenges for any implementation effort. The social repercussions of how people are grouped in PRS research must be considered in communications with research participants, including return of results. Here, we outline the decisions faced and choices made by a large multi-site clinical implementation study returning PRSs to diverse participants in handling this issue of differential performance. Our approach to managing the complexities associated with the differential performance of PRSs serves as a case study that can help future implementers of PRSs to plot an anticipatory course in response to this issue.
Subject(s)
Genetic Predisposition to Disease , Multifactorial Inheritance , Humans , Multifactorial Inheritance/genetics , Risk Factors , Genome-Wide Association Study , Risk Assessment , Genetic Testing/methods , Genetic Risk ScoreABSTRACT
Two major goals of the Electronic Medical Record and Genomics (eMERGE) Network are to learn how best to return research results to patient/participants and the clinicians who care for them and also to assess the impact of placing these results in clinical care. Yet since its inception, the Network has confronted a host of challenges in achieving these goals, many of which had ethical, legal, or social implications (ELSIs) that required consideration. Here, we share impediments we encountered in recruiting participants, returning results, and assessing their impact, all of which affected our ability to achieve the goals of eMERGE, as well as the steps we took to attempt to address these obstacles. We divide the domains in which we experienced challenges into four broad categories: (1) study design, including recruitment of more diverse groups; (2) consent; (3) returning results to participants and their health care providers (HCPs); and (4) assessment of follow-up care of participants and measuring the impact of research on participants and their families. Since most phases of eMERGE have included children as well as adults, we also address the particular ELSI posed by including pediatric populations in this research. We make specific suggestions for improving translational genomic research to ensure that future projects can effectively return results and assess their impact on patient/participants and providers if the goals of genomic-informed medicine are to be achieved.
Subject(s)
Electronic Health Records , Genomics , Child , Adult , Humans , Genome , Translational Research, Biomedical , Population GroupsABSTRACT
As large-scale genomic screening becomes increasingly prevalent, understanding the influence of actionable results on healthcare utilization is key to estimating the potential long-term clinical impact. The eMERGE network sequenced individuals for actionable genes in multiple genetic conditions and returned results to individuals, providers, and the electronic health record. Differences in recommended health services (laboratory, imaging, and procedural testing) delivered within 12 months of return were compared among individuals with pathogenic or likely pathogenic (P/LP) findings to matched individuals with negative findings before and after return of results. Of 16,218 adults, 477 unselected individuals were found to have a monogenic risk for arrhythmia (n = 95), breast cancer (n = 96), cardiomyopathy (n = 95), colorectal cancer (n = 105), or familial hypercholesterolemia (n = 86). Individuals with P/LP results more frequently received services after return (43.8%) compared to before return (25.6%) of results and compared to individuals with negative findings (24.9%; p < 0.0001). The annual cost of qualifying healthcare services increased from an average of $162 before return to $343 after return of results among the P/LP group (p < 0.0001); differences in the negative group were non-significant. The mean difference-in-differences was $149 (p < 0.0001), which describes the increased cost within the P/LP group corrected for cost changes in the negative group. When stratified by individual conditions, significant cost differences were observed for arrhythmia, breast cancer, and cardiomyopathy. In conclusion, less than half of individuals received billed health services after monogenic return, which modestly increased healthcare costs for payors in the year following return.
Subject(s)
Breast Neoplasms , Cardiomyopathies , Adult , Humans , Female , Prospective Studies , Patient Acceptance of Health Care , Arrhythmias, Cardiac , Breast Neoplasms/genetics , Cardiomyopathies/geneticsABSTRACT
Polygenic risk scores (PRS) have potential to improve health care by identifying individuals that have elevated risk for common complex conditions. Use of PRS in clinical practice, however, requires careful assessment of the needs and capabilities of patients, providers, and health care systems. The electronic Medical Records and Genomics (eMERGE) network is conducting a collaborative study which will return PRS to 25,000 pediatric and adult participants. All participants will receive a risk report, potentially classifying them as high risk (â¼2-10% per condition) for 1 or more of 10 conditions based on PRS. The study population is enriched by participants from racial and ethnic minority populations, underserved populations, and populations who experience poorer medical outcomes. All 10 eMERGE clinical sites conducted focus groups, interviews, and/or surveys to understand educational needs among key stakeholders-participants, providers, and/or study staff. Together, these studies highlighted the need for tools that address the perceived benefit/value of PRS, types of education/support needed, accessibility, and PRS-related knowledge and understanding. Based on findings from these preliminary studies, the network harmonized training initiatives and formal/informal educational resources. This paper summarizes eMERGE's collective approach to assessing educational needs and developing educational approaches for primary stakeholders. It discusses challenges encountered and solutions provided.
Subject(s)
Electronic Health Records , Ethnicity , Adult , Humans , Child , Minority Groups , Risk Factors , GenomicsABSTRACT
BACKGROUND: As a collaboration model between the International HundredK+ Cohorts Consortium (IHCC) and the Davos Alzheimer's Collaborative (DAC), our aim was to develop a trans-ethnic genomic informed risk assessment (GIRA) algorithm for Alzheimer's disease (AD). METHODS: The GIRA model was created to include polygenic risk score calculated from the AD genome-wide association study loci, the apolipoprotein E haplotypes, and non-genetic covariates including age, sex, and the first three principal components of population substructure. RESULTS: We validated the performance of the GIRA model in different populations. The proteomic study in the participant sites identified proteins related to female infertility and autoimmune thyroiditis and associated with the risk scores of AD. CONCLUSIONS: As the initial effort by the IHCC to leverage existing large-scale datasets in a collaborative setting with DAC, we developed a trans-ethnic GIRA for AD with the potential of identifying individuals at high risk of developing AD for future clinical applications.
Subject(s)
Alzheimer Disease , Humans , Female , Alzheimer Disease/genetics , Alzheimer Disease/epidemiology , Genome-Wide Association Study , Proteomics , Genomics , Risk AssessmentABSTRACT
BACKGROUND: As genomic sequencing moves closer to clinical implementation, there has been an increasing acceptance of returning incidental findings to research participants and patients for mutations in highly penetrant, medically actionable genes. A curated list of genes has been recommended by the American College of Medical Genetics and Genomics (ACMG) for return of incidental findings. However, the pleiotropic effects of these genes are not fully known. Such effects could complicate genetic counseling when returning incidental findings. In particular, there has been no systematic evaluation of psychiatric manifestations associated with rare variation in these genes. RESULTS: Here, we leveraged a targeted sequence panel and real-world electronic health records from the eMERGE network to assess the burden of rare variation in the ACMG-56 genes and two psychiatric-associated genes (CACNA1C and TCF4) across common mental health conditions in 15,181 individuals of European descent. As a positive control, we showed that this approach replicated the established association between rare mutations in LDLR and hypercholesterolemia with no visible inflation from population stratification. However, we did not identify any genes significantly enriched with rare deleterious variants that confer risk for common psychiatric disorders after correction for multiple testing. Suggestive associations were observed between depression and rare coding variation in PTEN (P = 1.5 × 10-4), LDLR (P = 3.6 × 10-4), and CACNA1S (P = 5.8 × 10-4). We also observed nominal associations between rare variants in KCNQ1 and substance use disorders (P = 2.4 × 10-4), and APOB and tobacco use disorder (P = 1.1 × 10-3). CONCLUSIONS: Our results do not support an association between psychiatric disorders and incidental findings in medically actionable gene mutations, but power was limited with the available sample sizes. Given the phenotypic and genetic complexity of psychiatric phenotypes, future work will require a much larger sequencing dataset to determine whether incidental findings in these genes have implications for risk of psychopathology.
Subject(s)
Exome , Genetic Testing , Genetic Testing/methods , Genetic Variation , Genomics/methods , Humans , Mutation , PhenotypeABSTRACT
PURPOSE: We estimated the penetrance of pathogenic/likely pathogenic (P/LP) variants in arteriopathy-related genes and assessed near-term outcomes following return of results. METHODS: Participants (N = 24,520) in phase III of the Electronic Medical Records and Genomics network underwent targeted sequencing of 68 actionable genes, including 9 genes associated with arterial aneurysmal diseases. Penetrance was estimated on the basis of the presence of relevant clinical traits. Outcomes occurring within 1 year of return of results included new diagnoses, referral to a specialist, new tests ordered, surveillance initiated, and new medications started. RESULTS: P/LP variants were present in 34 participants. The average penetrance across genes was 59%, ranging from 86% for FBN1 variants to 25% for SMAD3. Of 16 participants in whom results were returned, 1-year outcomes occurred in 63%. A new diagnosis was made in 44% of the participants, 56% were referred to a specialist, a new test was ordered in 44%, surveillance was initiated in 31%, and a new medication was started in 31%. CONCLUSION: Penetrance of P/LP variants in arteriopathy-related genes, identified in a large, targeted sequencing study, was variable and overall lower than that reported in clinical cohorts. Meaningful outcomes within the first year were noted in 63% of participants who received results.
Subject(s)
Genomics , Humans , Penetrance , PhenotypeABSTRACT
BACKGROUND: Previous study has shown that dyslipidemia is common in patients with Sickle cell disease (SCD) and is associated with more serious SCD complications. METHODS: This study investigated systematically dyslipidemia in SCD using a state-of-art nuclear magnetic resonance (NMR) metabolomics platform, including 147 pediatric cases with SCD and 1234 controls without SCD. We examined 249 metabolomic biomarkers, including 98 biomarkers for lipoprotein subclasses, 70 biomarkers for relative lipoprotein lipid concentrations, plus biomarkers for fatty acids and phospholipids. RESULTS: Specific patterns of hypolipoproteinemia and hypocholesterolemia in pediatric SCD were observed in lipoprotein subclasses other than larger VLDL subclasses. Triglycerides are not significantly changed in SCD, except increased relative concentrations in lipoprotein subclasses. Decreased plasma FFAs (including total-FA, SFA, PUFA, Omega-6, and linoleic acid) and decreased plasma phospholipids were observed in SCD. CONCLUSION: This study scrutinized, for the first time, lipoprotein subclasses in pediatric patients with SCD, and identified SCD-specific dyslipidemia from altered lipoprotein metabolism. The findings of this study depict a broad panorama of lipid metabolism and nutrition in SCD, suggesting the potential of specific dietary supplementation of the deficient nutrients for the management of SCD.
Subject(s)
Anemia, Sickle Cell , Dyslipidemias , Humans , Child , Metabolomics , Anemia, Sickle Cell/complications , Plasma , TriglyceridesABSTRACT
BACKGROUND: Precise risk prediction of type 1 diabetes (T1D) facilitates early intervention and identification of risk factors prior to irreversible beta-islet cell destruction, and can significantly improve T1D prevention and clinical care. Sharp et al. developed a genetic risk scoring (GRS) system for T1D (T1D-GRS2) capable of predicting T1D risk in children of European ancestry. The T1D-GRS2 was developed on the basis of causal genetic variants, thus may be applicable to minor populations, while a trans-ethnic GRS for T1D may avoid the exacerbation of health disparities due to the lack of genomic information in minorities. METHODS: Here, we describe a T1D-GRS2 calculator validated in two independent cohorts, including African American children and European American children. Participants were recruited by the Center for Applied Genomics at the Children's Hospital of Philadelphia. RESULTS: It demonstrates that GRS2 is applicable to the T1D risk prediction in the AA cohort, while population-specific thresholds are needed for different populations. CONCLUSIONS: The study highlights the potential to further improve T1D-GRS2 performance with the inclusion of additional genetic markers.
Subject(s)
Diabetes Mellitus, Type 1 , Algorithms , Child , Diabetes Mellitus, Type 1/diagnosis , Diabetes Mellitus, Type 1/epidemiology , Diabetes Mellitus, Type 1/genetics , Genetic Markers , Genetic Predisposition to Disease , Humans , Polymorphism, Single Nucleotide , Risk FactorsABSTRACT
Structured representation of clinical genetic results is necessary for advancing precision medicine. The Electronic Medical Records and Genomics (eMERGE) Network's Phase III program initially used a commercially developed XML message format for standardized and structured representation of genetic results for electronic health record (EHR) integration. In a desire to move towards a standard representation, the network created a new standardized format based upon Health Level Seven Fast Healthcare Interoperability Resources (HL7® FHIR®), to represent clinical genomics results. These new standards improve the utility of HL7® FHIR® as an international healthcare interoperability standard for management of genetic data from patients. This work advances the establishment of standards that are being designed for broad adoption in the current health information technology landscape.
Subject(s)
Electronic Health Records , Medical Informatics , Genomics , Health Level Seven , Humans , Precision MedicineABSTRACT
PURPOSE: Secondary findings are typically offered in an all or none fashion when sequencing is used for clinical purposes. This study aims to describe the process of offering categorical and granular choices for results in a large research consortium. METHODS: Within the third phase of the electronic MEdical Records and GEnomics (eMERGE) Network, several sites implemented studies that allowed participants to choose the type of results they wanted to receive from a multigene sequencing panel. Sites were surveyed to capture the details of the implementation protocols and results of these choices. RESULTS: Across the ten eMERGE sites, 4664 participants including adolescents and adults were offered some type of choice. Categories of choices offered and methods for selecting categories varied. Most participants (94.5%) chose to learn all genetic results, while 5.5% chose subsets of results. Several sites allowed participants to change their choices at various time points, and 0.5% of participants made changes. CONCLUSION: Offering choices that include learning some results is important and should be a dynamic process to allow for changes in scientific knowledge, participant age group, and individual preference.
Subject(s)
Electronic Health Records , Genome , Adolescent , Adult , Genomics , Humans , Population Groups , Surveys and QuestionnairesABSTRACT
BACKGROUND: The extent to which obesity and genetics determine postoperative complications is incompletely understood. METHODS: We performed a retrospective study using two population cohorts with electronic health record (EHR) data. The first included 736,726 adults with body mass index (BMI) recorded between 1990 and 2017 at Vanderbilt University Medical Center. The second cohort consisted of 65,174 individuals from 12 institutions contributing EHR and genome-wide genotyping data to the Electronic Medical Records and Genomics (eMERGE) Network. Pairwise logistic regression analyses were used to measure the association of BMI categories with postoperative complications derived from International Classification of Disease-9 codes, including postoperative infection, incisional hernia, and intestinal obstruction. A genetic risk score was constructed from 97 obesity-risk single-nucleotide polymorphisms for a Mendelian randomization study to determine the association of genetic risk of obesity on postoperative complications. Logistic regression analyses were adjusted for sex, age, site, and race/principal components. RESULTS: Individuals with overweight or obese BMI (≥25 kg/m2) had increased risk of incisional hernia (odds ratio [OR] 1.7-5.5, p < 3.1 × 10-20), and people with obesity (BMI ≥ 30 kg/m2) had increased risk of postoperative infection (OR 1.2-2.3, p < 2.5 × 10-5). In the eMERGE cohort, genetically predicted BMI was associated with incisional hernia (OR 2.1 [95% CI 1.8-2.5], p = 1.4 × 10-6) and postoperative infection (OR 1.6 [95% CI 1.4-1.9], p = 3.1 × 10-6). Association findings were similar after limitation of the cohorts to those who underwent abdominal procedures. CONCLUSIONS: Clinical and Mendelian randomization studies suggest that obesity, as measured by BMI, is associated with the development of postoperative incisional hernia and infection.
Subject(s)
Mendelian Randomization Analysis/methods , Obesity/complications , Postoperative Complications/genetics , Adult , Body Mass Index , Female , Humans , Logistic Models , Male , Middle Aged , Polymorphism, Single Nucleotide , Postoperative Complications/etiology , Retrospective Studies , Risk FactorsABSTRACT
BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition. METHODS: First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI). RESULTS: Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10- 20). This effect was consistent in both pediatric (p = 9.92 × 10- 6) and adult (p = 9.73 × 10- 15) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10- 8, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10- 4). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10- 8), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10- 11). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses. CONCLUSIONS: In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.
Subject(s)
Non-alcoholic Fatty Liver Disease/genetics , Adult , Aged , Body Mass Index , Case-Control Studies , Community Networks/organization & administration , Community Networks/statistics & numerical data , Disease Progression , Electronic Health Records/organization & administration , Electronic Health Records/statistics & numerical data , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Genomics/organization & administration , Genomics/statistics & numerical data , Humans , Lipase/genetics , Male , Membrane Proteins/genetics , Middle Aged , Morbidity , Non-alcoholic Fatty Liver Disease/epidemiology , Phenotype , Polymorphism, Single Nucleotide , Signal Transduction/geneticsABSTRACT
PURPOSE: To provide a validated method to confidently identify exon-containing copy-number variants (CNVs), with a low false discovery rate (FDR), in targeted sequencing data from a clinical laboratory with particular focus on single-exon CNVs. METHODS: DNA sequence coverage data are normalized within each sample and subsequently exonic CNVs are identified in a batch of samples, when the target log2 ratio of the sample to the batch median exceeds defined thresholds. The quality of exonic CNV calls is assessed by C-scores (Z-like scores) using thresholds derived from gold standard samples and simulation studies. We integrate an ExonQC threshold to lower FDR and compare performance with alternate software (VisCap). RESULTS: Thirteen CNVs were used as a truth set to validate Atlas-CNV and compared with VisCap. We demonstrated FDR reduction in validation, simulation, and 10,926 eMERGESeq samples without sensitivity loss. Sixty-four multiexon and 29 single-exon CNVs with high C-scores were assessed by Multiplex Ligation-dependent Probe Amplification (MLPA). CONCLUSION: Atlas-CNV is validated as a method to identify exonic CNVs in targeted sequencing data generated in the clinical laboratory. The ExonQC and C-score assignment can reduce FDR (identification of targets with high variance) and improve calling accuracy of single-exon CNVs respectively. We propose guidelines and criteria to identify high confidence single-exon CNVs.
Subject(s)
DNA Copy Number Variations/genetics , Exons/genetics , Genome, Human/genetics , Software , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNAABSTRACT
Asthma is the leading chronic disease in children. Several studies have identified genetic biomarkers associated with susceptibility and severity in both adult and pediatric cases. In this study, we evaluated outcomes in 400 African American and European American pediatric cases all of whom were regular users of inhaled corticosteroids. Patients were stratified by genotype using two single nucleotide polymorphisms in the ß-2-adrenergic receptor (ADRB2) gene - rs1042713 and rs1042714, previously associated with asthma outcome. These correspond to nonsynonymous single nucleotide polymorphisms at positions 16 [arginine to glycine (Arg16Gly); rs1042713] and 27 [glutamic acid to glutamine (Glu27Gln); rs1042714], which are relatively common (minor allele frequencies â¼40-50%), and have been well characterized in asthma pharmacogenetics. We controlled for adherence to the National Heart, Lung and Blood Institute guidelines using deep mining of electronic health record data to determine treatment course. We found no significant effect for rs1042713 (Arg16Gly) but did identify an effect for rs1042714, where participants homozygous for Gln27 had increased exacerbations while taking inhaled corticosteroids in comparison with those who were either heterozygous or homozygous for Glu27. This is consistent with previous studies and demonstrates for the first time that the Glu27 variant in the ADRB2 gene is associated with increased frequencies of asthma exacerbations. Moreover, this study also lends an important proof-of-principle on how electronic health records linked to genotype can be efficiently and systematically mined to delineate health outcomes.
Subject(s)
Adrenal Cortex Hormones/adverse effects , Asthma/genetics , Genetic Predisposition to Disease , Receptors, Adrenergic, beta-2/genetics , Adolescent , Adrenal Cortex Hormones/therapeutic use , Adult , Alleles , Asthma/pathology , Child , Electronic Health Records , Female , Gene Frequency , Genotype , Heterozygote , Humans , Male , Pharmacogenetics , Polymorphism, Single Nucleotide/genetics , Young AdultABSTRACT
The Philadelphia Neurodevelopmental Cohort (PNC) is a large-scale study of child development that combines neuroimaging, diverse clinical and cognitive phenotypes, and genomics. Data from this rich resource is now publicly available through the Database of Genotypes and Phenotypes (dbGaP). Here we focus on the data from the PNC that is available through dbGaP and describe how users can access this data, which is evolving to be a significant resource for the broader neuroscience community for studies of normal and abnormal neurodevelopment.
Subject(s)
Brain/abnormalities , Brain/growth & development , Developmental Disabilities/pathology , Developmental Disabilities/psychology , Information Dissemination , Nervous System/growth & development , Adolescent , Child , Child Development , Cognition , Female , Genomics , Humans , Internet , Male , NeuroimagingABSTRACT
BACKGROUND: As biobanks play an increasing role in the genomic research that will lead to precision medicine, input from diverse and large populations of patients in a variety of health care settings will be important in order to successfully carry out such studies. One important topic is participants' views towards consent and data sharing, especially since the 2011 Advanced Notice of Proposed Rulemaking (ANPRM), and subsequently the 2015 Notice of Proposed Rulemaking (NPRM) were issued by the Department of Health and Human Services (HHS) and Office of Science and Technology Policy (OSTP). These notices required that participants consent to research uses of their de-identified tissue samples and most clinical data, and allowing such consent be obtained in a one-time, open-ended or "broad" fashion. Conducting a survey across multiple sites provides clear advantages to either a single site survey or using a large online database, and is a potentially powerful way of understanding the views of diverse populations on this topic. METHODS: A workgroup of the Electronic Medical Records and Genomics (eMERGE) Network, a national consortium of 9 sites (13 separate institutions, 11 clinical centers) supported by the National Human Genome Research Institute (NHGRI) that combines DNA biorepositories with electronic medical record (EMR) systems for large-scale genetic research, conducted a survey to understand patients' views on consent, sample and data sharing for future research, biobank governance, data protection, and return of research results. RESULTS: Working across 9 sites to design and conduct a national survey presented challenges in organization, meeting human subjects guidelines at each institution, and survey development and implementation. The challenges were met through a committee structure to address each aspect of the project with representatives from all sites. Each committee's output was integrated into the overall survey plan. A number of site-specific issues were successfully managed allowing the survey to be developed and implemented uniformly across 11 clinical centers. CONCLUSIONS: Conducting a survey across a number of institutions with different cultures and practices is a methodological and logistical challenge. With a clear infrastructure, collaborative attitudes, excellent lines of communication, and the right expertise, this can be accomplished successfully.
Subject(s)
Confidentiality , Electronic Health Records/statistics & numerical data , Genome-Wide Association Study/statistics & numerical data , Information Dissemination/methods , Surveys and Questionnaires , Humans , Informed Consent , National Human Genome Research Institute (U.S.) , Patient Participation , Patient Rights , United StatesABSTRACT
BACKGROUND: Ehlers Danlos Syndrome is a rare form of inherited connective tissue disorder, which primarily affects skin, joints, muscle, and blood cells. The current study aimed at finding the mutation that causing EDS type VII C also known as "Dermatosparaxis" in this family. METHODS: Through systematic data querying of the electronic medical records (EMRs) of over 80,000 individuals, we recently identified an EDS family that indicate an autosomal dominant inheritance. The family was consented for genomic analysis of their de-identified data. After a negative screen for known mutations, we performed whole genome sequencing on the male proband, his affected father, and unaffected mother. We filtered the list of non-synonymous variants that are common between the affected individuals. RESULTS: The analysis of non-synonymous variants lead to identifying a novel mutation in the ADAMTSL2 (p. Gly421Ser) gene in the affected individuals. Sanger sequencing confirmed the mutation. CONCLUSION: Our work is significant not only because it sheds new light on the pathophysiology of EDS for the affected family and the field at large, but also because it demonstrates the utility of unbiased large-scale clinical recruitment in deciphering the genetic etiology of rare mendelian diseases. With unbiased large-scale clinical recruitment we strive to sequence as many rare mendelian diseases as possible, and this work in EDS serves as a successful proof of concept to that effect.
Subject(s)
ADAM Proteins/genetics , Data Mining/methods , Databases, Genetic , Ehlers-Danlos Syndrome/genetics , Genetic Variation/genetics , ADAMTS Proteins , Child , Ehlers-Danlos Syndrome/diagnosis , Female , Humans , Male , PedigreeABSTRACT
BACKGROUND: Systemic sclerosis (SSc) is a rheumatologic disease with a multifactorial etiology. Genome-wide association studies imply a polygenic, complex mode of inheritance with contributions from variation at the human leukocyte antigen locus and non-coding variation at a locus on chromosome 6p21, among other modestly impactful loci. Here we describe an 8-year-old female proband presenting with diffuse cutaneous SSc/scleroderma and a family history of SSc in a grandfather and maternal aunt. METHODS: We employed whole exome sequencing (WES) of three members of this family. We examined rare missense, nonsense, splice-altering, and coding indels matching an autosomal dominant inheritance model. We selected one missense variant for Sanger sequencing confirmation based on its predicted impact on gene function and location in a known SSc genetic locus. RESULTS: Bioinformatic analysis found eight candidate variants meeting our criteria. We identified a very rare missense variant in the regulatory NODP domain of NOTCH4 located at the 6p21 locus, c.4245G > A:p.Met1415Ile, segregating with the phenotype. This allele has a frequency of 1.83 × 10-5 by the data of the Exome Aggregation Consortium. CONCLUSION: This family suggests a novel mechanism of SSc pathogenesis in which a rare and penetrant coding variation can substantially elevate disease risk in contrast to the more modest non-coding variation typically found at this locus. These results suggest that modulation of the NOTCH4 gene might be responsible for the association signal at chromosome 6p21 in SSc.