Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Genome Res ; 2024 Oct 02.
Article in English | MEDLINE | ID: mdl-39358015

ABSTRACT

Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

2.
Am J Hum Genet ; 111(10): 2129-2138, 2024 Oct 03.
Article in English | MEDLINE | ID: mdl-39270648

ABSTRACT

Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.


Subject(s)
Supervised Machine Learning , Whole Genome Sequencing , Humans , Whole Genome Sequencing/methods , Genome, Human , Genetics, Population/methods , Ethnicity/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Genotype
3.
bioRxiv ; 2024 Sep 22.
Article in English | MEDLINE | ID: mdl-39345378

ABSTRACT

The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first to be explicitly consented for public dissemination of genomic data and cell lines. Here, we describe a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line (HG008-T) and matched normal cells from duodenal tissue (HG008-N-D) and pancreatic tissue (HG008-N-P). The data come from thirteen whole genome measurement technologies: Illumina paired-end, Element standard and long insert, Ultima UG100, PacBio (HiFi and Onso), Oxford Nanopore (standard and ultra-long), Bionano Optical Mapping, Arima and Phase Genomics Hi-C, G-banded karyotyping, directional genomic hybridization, and BioSkryb Genomics single-cell ResolveDNA. Most tumor data is from a large homogenous batch of non-viable cells after 23 passages of the primary tumor cells, along with some data from different passages to enable an initial understanding of genomic instability. These data will be used by the GIAB Consortium to develop matched tumor-normal benchmarks for somatic variant detection. In addition, extensive data from two different normal tissues from the same individual can enable understanding of mosaicism. Long reads also contain methylation tags for epigenetic analyses. We expect these data to facilitate innovation for whole genome measurement technologies, de novo assembly of tumor and normal genomes, and bioinformatic tools to identify small and structural somatic mutations. This first-of-its-kind broadly consented open-access resource will facilitate further understanding of sequencing methods used for cancer biology.

4.
medRxiv ; 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38496498

ABSTRACT

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

5.
Commun Biol ; 4(1): 1026, 2021 09 01.
Article in English | MEDLINE | ID: mdl-34471188

ABSTRACT

Autism arises in high and low-risk families. De novo mutation contributes to autism incidence in low-risk families as there is a higher incidence in the affected of the simplex families than in their unaffected siblings. But the extent of contribution in low-risk families cannot be determined solely from simplex families as they are a mixture of low and high-risk. The rate of de novo mutation in nearly pure populations of high-risk families, the multiplex families, has not previously been rigorously determined. Moreover, rates of de novo mutation have been underestimated from studies based on low resolution microarrays and whole exome sequencing. Here we report on findings from whole genome sequence (WGS) of both simplex families from the Simons Simplex Collection (SSC) and multiplex families from the Autism Genetic Resource Exchange (AGRE). After removing the multiplex samples with excessive cell-line genetic drift, we find that the contribution of de novo mutation in multiplex is significantly smaller than the contribution in simplex. We use WGS to provide high resolution CNV profiles and to analyze more than coding regions, and revise upward the rate in simplex autism due to an excess of de novo events targeting introns. Based on this study, we now estimate that de novo events contribute to 52-67% of cases of autism arising from low risk families, and 30-39% of cases of all autism.


Subject(s)
Autistic Disorder/epidemiology , Genetic Predisposition to Disease/genetics , Mutation , Adult , Autism Spectrum Disorder , Autistic Disorder/genetics , Female , Humans , Incidence , Male , Middle Aged , New York/epidemiology , Risk Factors , Young Adult
7.
Nature ; 583(7814): 83-89, 2020 07.
Article in English | MEDLINE | ID: mdl-32460305

ABSTRACT

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.


Subject(s)
Genetic Variation , Genome, Human/genetics , Whole Genome Sequencing , Alleles , Case-Control Studies , Epigenesis, Genetic , Female , Gene Dosage/genetics , Genetics, Population , High-Throughput Nucleotide Sequencing , Humans , Male , Molecular Sequence Annotation , Quantitative Trait Loci , Racial Groups/genetics , Software
8.
Genetics ; 215(3): 869-886, 2020 07.
Article in English | MEDLINE | ID: mdl-32327564

ABSTRACT

Baseline lung function, quantified as forced expiratory volume in the first second of exhalation (FEV1), is a standard diagnostic criterion used by clinicians to identify and classify lung diseases. Using whole-genome sequencing data from the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine project, we identified a novel genetic association with FEV1 on chromosome 12 in 867 African American children with asthma (P = 1.26 × 10-8, ß = 0.302). Conditional analysis within 1 Mb of the tag signal (rs73429450) yielded one major and two other weaker independent signals within this peak. We explored statistical and functional evidence for all variants in linkage disequilibrium with the three independent signals and yielded nine variants as the most likely candidates responsible for the association with FEV1 Hi-C data and expression QTL analysis demonstrated that these variants physically interacted with KITLG (KIT ligand, also known as SCF), and their minor alleles were associated with increased expression of the KITLG gene in nasal epithelial cells. Gene-by-air-pollution interaction analysis found that the candidate variant rs58475486 interacted with past-year ambient sulfur dioxide exposure (P = 0.003, ß = 0.32). This study identified a novel protective genetic association with FEV1, possibly mediated through KITLG, in African American children with asthma. This is the first study that has identified a genetic association between lung function and KITLG, which has established a role in orchestrating allergic inflammation in asthma.


Subject(s)
Air Pollution , Asthma/genetics , Forced Expiratory Volume , Gene-Environment Interaction , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Stem Cell Factor/genetics , Adolescent , Black or African American/genetics , Asthma/epidemiology , Asthma/physiopathology , Child , Chromosomes, Human, Pair 12/genetics , Female , Humans , Linkage Disequilibrium , Male , Nasal Mucosa/metabolism , Stem Cell Factor/metabolism , Young Adult
10.
BMC Med Genomics ; 12(1): 56, 2019 04 25.
Article in English | MEDLINE | ID: mdl-31023376

ABSTRACT

BACKGROUND: Prompted by the revolution in high-throughput sequencing and its potential impact for treating cancer patients, we initiated a clinical research study to compare the ability of different sequencing assays and analysis methods to analyze glioblastoma tumors and generate real-time potential treatment options for physicians. METHODS: A consortium of seven institutions in New York City enrolled 30 patients with glioblastoma and performed tumor whole genome sequencing (WGS) and RNA sequencing (RNA-seq; collectively WGS/RNA-seq); 20 of these patients were also analyzed with independent targeted panel sequencing. We also compared results of expert manual annotations with those from an automated annotation system, Watson Genomic Analysis (WGA), to assess the reliability and time required to identify potentially relevant pharmacologic interventions. RESULTS: WGS/RNAseq identified more potentially actionable clinical results than targeted panels in 90% of cases, with an average of 16-fold more unique potentially actionable variants identified per individual; 84 clinically actionable calls were made using WGS/RNA-seq that were not identified by panels. Expert annotation and WGA had good agreement on identifying variants [mean sensitivity = 0.71, SD = 0.18 and positive predictive value (PPV) = 0.80, SD = 0.20] and drug targets when the same variants were called (mean sensitivity = 0.74, SD = 0.34 and PPV = 0.79, SD = 0.23) across patients. Clinicians used the information to modify their treatment plan 10% of the time. CONCLUSION: These results present the first comprehensive comparison of technical and machine augmented analysis of targeted panel and WGS/RNA-seq to identify potential cancer treatments.


Subject(s)
Glioblastoma/drug therapy , Glioblastoma/genetics , Whole Genome Sequencing , Adult , Aged , Aged, 80 and over , Female , High-Throughput Nucleotide Sequencing , Humans , Male , Middle Aged , Molecular Targeted Therapy , Ploidies , Reproducibility of Results
12.
Genome Res ; 27(11): 1895-1903, 2017 11.
Article in English | MEDLINE | ID: mdl-28887402

ABSTRACT

Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.


Subject(s)
Amyotrophic Lateral Sclerosis/genetics , DNA Repeat Expansion , Whole Genome Sequencing/methods , Algorithms , C9orf72 Protein/genetics , Databases, Genetic , Humans , Precision Medicine , Sensitivity and Specificity , Software
13.
Nat Genet ; 49(7): 1005-1014, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28504702

ABSTRACT

Genomic rearrangements are a hallmark of human cancers. Here, we identify the piggyBac transposable element derived 5 (PGBD5) gene as encoding an active DNA transposase expressed in the majority of childhood solid tumors, including lethal rhabdoid tumors. Using assembly-based whole-genome DNA sequencing, we found previously undefined genomic rearrangements in human rhabdoid tumors. These rearrangements involved PGBD5-specific signal (PSS) sequences at their breakpoints and recurrently inactivated tumor-suppressor genes. PGBD5 was physically associated with genomic PSS sequences that were also sufficient to mediate PGBD5-induced DNA rearrangements in rhabdoid tumor cells. Ectopic expression of PGBD5 in primary immortalized human cells was sufficient to promote cell transformation in vivo. This activity required specific catalytic residues in the PGBD5 transposase domain as well as end-joining DNA repair and induced structural rearrangements with PSS breakpoints. These results define PGBD5 as an oncogenic mutator and provide a plausible mechanism for site-specific DNA rearrangements in childhood and adult solid tumors.


Subject(s)
Cell Transformation, Neoplastic/genetics , Rhabdoid Tumor/genetics , Transposases/physiology , Adult , Animals , Catalytic Domain , Cell Line , Child , Child, Preschool , Chromosome Aberrations , Chromosome Breakpoints , DNA End-Joining Repair/genetics , DNA, Neoplasm/genetics , Gene Rearrangement/genetics , Genes, Tumor Suppressor , Humans , Infant , Mice , Mice, Nude , Mutagenesis, Site-Directed , RNA Interference , Recombinant Proteins/metabolism , Regulatory Sequences, Nucleic Acid , Terminal Repeat Sequences/genetics , Transposases/chemistry , Transposases/genetics
14.
Nature ; 539(7627): 112-117, 2016 11 03.
Article in English | MEDLINE | ID: mdl-27595394

ABSTRACT

Clear cell renal cell carcinoma (ccRCC) is characterized by inactivation of the von Hippel-Lindau tumour suppressor gene (VHL). Because no other gene is mutated as frequently in ccRCC and VHL mutations are truncal, VHL inactivation is regarded as the governing event. VHL loss activates the HIF-2 transcription factor, and constitutive HIF-2 activity restores tumorigenesis in VHL-reconstituted ccRCC cells. HIF-2 has been implicated in angiogenesis and multiple other processes, but angiogenesis is the main target of drugs such as the tyrosine kinase inhibitor sunitinib. HIF-2 has been regarded as undruggable. Here we use a tumourgraft/patient-derived xenograft platform to evaluate PT2399, a selective HIF-2 antagonist that was identified using a structure-based design approach. PT2399 dissociated HIF-2 (an obligatory heterodimer of HIF-2α-HIF-1ß) in human ccRCC cells and suppressed tumorigenesis in 56% (10 out of 18) of such lines. PT2399 had greater activity than sunitinib, was active in sunitinib-progressing tumours, and was better tolerated. Unexpectedly, some VHL-mutant ccRCCs were resistant to PT2399. Resistance occurred despite HIF-2 dissociation in tumours and evidence of Hif-2 inhibition in the mouse, as determined by suppression of circulating erythropoietin, a HIF-2 target and possible pharmacodynamic marker. We identified a HIF-2-dependent gene signature in sensitive tumours. Gene expression was largely unaffected by PT2399 in resistant tumours, illustrating the specificity of the drug. Sensitive tumours exhibited a distinguishing gene expression signature and generally higher levels of HIF-2α. Prolonged PT2399 treatment led to resistance. We identified binding site and second site suppressor mutations in HIF-2α and HIF-1ß, respectively. Both mutations preserved HIF-2 dimers despite treatment with PT2399. Finally, an extensively pretreated patient whose tumour had given rise to a sensitive tumourgraft showed disease control for more than 11 months when treated with a close analogue of PT2399, PT2385. We validate HIF-2 as a target in ccRCC, show that some ccRCCs are HIF-2 independent, and set the stage for biomarker-driven clinical trials.


Subject(s)
Basic Helix-Loop-Helix Transcription Factors/antagonists & inhibitors , Carcinoma, Renal Cell/drug therapy , Carcinoma, Renal Cell/metabolism , Indans/pharmacology , Indans/therapeutic use , Kidney Neoplasms/drug therapy , Kidney Neoplasms/metabolism , Sulfones/pharmacology , Sulfones/therapeutic use , Animals , Aryl Hydrocarbon Receptor Nuclear Translocator/genetics , Aryl Hydrocarbon Receptor Nuclear Translocator/metabolism , Basic Helix-Loop-Helix Transcription Factors/genetics , Basic Helix-Loop-Helix Transcription Factors/metabolism , Binding Sites , Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/pathology , Cell Line, Tumor , Cell Transformation, Neoplastic , Drug Resistance, Neoplasm/drug effects , Erythropoietin/antagonists & inhibitors , Erythropoietin/blood , Female , Gene Expression Regulation, Neoplastic , Humans , Indans/administration & dosage , Indoles/pharmacology , Indoles/therapeutic use , Kidney Neoplasms/genetics , Kidney Neoplasms/pathology , Male , Mice , Mice, Inbred NOD , Mice, SCID , Molecular Targeted Therapy , Mutation , Pyrroles/pharmacology , Pyrroles/therapeutic use , Reproducibility of Results , Sulfones/administration & dosage , Sunitinib , Xenograft Model Antitumor Assays
15.
J Nurs Adm ; 45(2): 74-83, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25621749

ABSTRACT

An academic hospital used Transforming Care at the Bedside (TCAB) principles as the framework for generating evidence-based recommendations for the design of an expansion of the current hospital. The interdisciplinary team used the table of evidence-based data to advocate for a patient- and family-centered, safe, and positive work environment. A nurse project manager acted as liaison between the TCAB design team, architects, and facilities and design consultants. Part 2 of this series describes project evaluation outcomes.


Subject(s)
Evidence-Based Medicine , Health Facility Environment/standards , Hospital Design and Construction/standards , Nursing Staff, Hospital/organization & administration , Occupational Health/standards , Patient Safety/standards , Quality Assurance, Health Care/standards , Academic Medical Centers , Health Facility Environment/economics , Hospital Design and Construction/economics , Humans , Interdisciplinary Communication , Interinstitutional Relations , Interprofessional Relations , Leadership , Nursing Staff, Hospital/standards , Patient Handoff/organization & administration , Patient Handoff/standards
16.
Mol Genet Metab ; 104(1-2): 160-6, 2011.
Article in English | MEDLINE | ID: mdl-21700483

ABSTRACT

X-linked adrenoleukodystrophy (X-ALD) is a progressive peroxisomal disorder affecting adrenal glands, testes and myelin stability that is caused by mutations in the ABCD1 (NM_000033) gene. Males with X-ALD may be diagnosed by the demonstration of elevated very long chain fatty acid (VLCFA) levels in plasma. In contrast, only 80% of female carriers have elevated plasma VLCFA; therefore targeted mutation analysis is the most effective means for carrier detection. Amongst 489 X-ALD families tested at Kennedy Krieger Institute, we identified 20 cases in which the ABCD1 mutation was de novo in the index case, indicating that the mutation arose in the maternal germ line and supporting a new mutation rate of at least 4.1% for this group. In addition, we identified 10 cases in which a de novo mutation arose in the mother or the grandmother of the index case. In two of these cases studies indicated that the mothers were low level gonosomal mosaics. In a third case biochemical, molecular and pedigree analysis indicated the mother was a gonadal mosaic. To the best of our knowledge mosaicism has not been previously reported in X-ALD. In addition, we identified one pedigree in which the maternal grandfather was mosaic for the familial ABCD1 mutation. Less than 1% of our patient population had evidence of gonadal or gonosomal mosaicism, suggesting it is a rare occurrence for this gene and its associated disorders. However, the residual maternal risk for having additional ovum carrying the mutant allele identified in an index case that appears to have a de novo mutation is at least 13%.


Subject(s)
ATP-Binding Cassette Transporters/genetics , Adrenoleukodystrophy/genetics , Mosaicism , Mutation/genetics , ATP Binding Cassette Transporter, Subfamily D, Member 1 , Base Sequence , Child , Child, Preschool , DNA Mutational Analysis , Exons/genetics , Family , Fatal Outcome , Female , Gonads/pathology , Heterozygote , Humans , Male , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL