Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
1.
bioRxiv ; 2024 Jan 28.
Article in English | MEDLINE | ID: mdl-38293151

ABSTRACT

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets. nf-core/airrflow is available free of charge, under the MIT license on GitHub (https://github.com/nf-core/airrflow). Detailed documentation and example results are available on the nf-core website at (https://nf-co.re/airrflow).

2.
Appl Environ Microbiol ; 89(6): e0007923, 2023 06 28.
Article in English | MEDLINE | ID: mdl-37191555

ABSTRACT

Bacteriophages have received recent attention for their therapeutic potential to treat antibiotic-resistant bacterial infections. One particular idea in phage therapy is to use phages that not only directly kill their bacterial hosts but also rely on particular bacterial receptors, such as proteins involved in virulence or antibiotic resistance. In such cases, the evolution of phage resistance would correspond to the loss of those receptors, an approach termed evolutionary steering. We previously found that during experimental evolution, phage U136B can exert selection pressure on Escherichia coli to lose or modify its receptor, the antibiotic efflux protein TolC, often resulting in reduced antibiotic resistance. However, for TolC-reliant phages like U136B to be used therapeutically, we also need to study their own evolutionary potential. Understanding phage evolution is critical for the development of improved phage therapies as well as the tracking of phage populations during infection. Here, we characterized phage U136B evolution in 10 replicate experimental populations. We quantified phage dynamics that resulted in five surviving phage populations at the end of the 10-day experiment. We found that phages from all five surviving populations had evolved higher rates of adsorption on either ancestral or coevolved E. coli hosts. Using whole-genome and whole-population sequencing, we established that these higher rates of adsorption were associated with parallel molecular evolution in phage tail protein genes. These findings will be useful in future studies to predict how key phage genotypes and phenotypes influence phage efficacy and survival despite the evolution of host resistance. IMPORTANCE Antibiotic resistance is a persistent problem in health care and a factor that may help maintain bacterial diversity in natural environments. Bacteriophages ("phages") are viruses that specifically infect bacteria. We previously discovered and characterized a phage called U136B, which infects bacteria through TolC. TolC is an antibiotic resistance protein that helps bacteria pump antibiotics out of the cell. Over short timescales, phage U136B can be used to evolutionarily "steer" bacterial populations to lose or modify the TolC protein, sometimes reducing antibiotic resistance. In this study, we investigate whether U136B itself evolves to better infect bacterial cells. We discovered that the phage can readily evolve specific mutations that increase its infection rate. This work will be useful for understanding how phages can be used to treat bacterial infections.


Subject(s)
Bacteriophages , Bacteriophages/genetics , Escherichia coli/genetics , Adsorption , Mutation , Anti-Bacterial Agents/pharmacology
3.
BMC Med Genomics ; 13(Suppl 7): 78, 2020 07 21.
Article in English | MEDLINE | ID: mdl-32693796

ABSTRACT

BACKGROUND: Genomic variants are considered sensitive information, revealing potentially private facts about individuals. Therefore, it is important to control access to such data. A key aspect of controlled access is secure storage and efficient query of access logs, for potential misuse. However, there are challenges to securing logs, such as designing against the consequences of "single points of failure". A potential approach to circumvent these challenges is blockchain technology, which is currently popular in cryptocurrency due to its properties of security, immutability, and decentralization. One of the tasks of the iDASH (Integrating Data for Analysis, Anonymization, and Sharing) Secure Genome Analysis Competition in 2018 was to develop time- and space-efficient blockchain-based ledgering solutions to log and query user activity accessing genomic datasets across multiple sites, using MultiChain. METHODS: MultiChain is a specific blockchain platform that offers "data streams" embedded in the chain for rapid and secure data storage. We devised a storage protocol taking advantage of the keys in the MultiChain data streams and created a data frame from the chain allowing efficient query. Our solution to the iDASH competition was selected as the winner at a workshop held in San Diego, CA in October 2018. Although our solution worked well in the challenge, it has the drawback that it requires downloading all the data from the chain and keeping it locally in memory for fast query. To address this, we provide an alternate "bigmem" solution that uses indices rather than local storage for rapid queries. RESULTS: We profiled the performance of both of our solutions using logs with 100,000 to 600,000 entries, both for querying the chain and inserting data into it. The challenge solution requires 12 seconds time and 120 Mb of memory for querying from 100,000 entries. The memory requirement increases linearly and reaches 470 MB for a chain with 600,000 entries. Although our alternate bigmem solution is slower and requires more memory (408 seconds and 250 MB, respectively, for 100,000 entries), the memory requirement increases at a slower rate and reaches only 360 MB for 600,000 entries. CONCLUSION: Overall, we demonstrate that genomic access log files can be stored and queried efficiently with blockchain. Beyond this, our protocol potentially could be applied to other types of health data such as electronic health records.


Subject(s)
Blockchain , Datasets as Topic , Genomics , Information Storage and Retrieval , Humans
4.
Neuron ; 99(2): 302-314.e4, 2018 07 25.
Article in English | MEDLINE | ID: mdl-29983323

ABSTRACT

Congenital hydrocephalus (CH), featuring markedly enlarged brain ventricles, is thought to arise from failed cerebrospinal fluid (CSF) homeostasis and is treated with lifelong surgical CSF shunting with substantial morbidity. CH pathogenesis is poorly understood. Exome sequencing of 125 CH trios and 52 additional probands identified three genes with significant burden of rare damaging de novo or transmitted mutations: TRIM71 (p = 2.15 × 10-7), SMARCC1 (p = 8.15 × 10-10), and PTCH1 (p = 1.06 × 10-6). Additionally, two de novo duplications were identified at the SHH locus, encoding the PTCH1 ligand (p = 1.2 × 10-4). Together, these probands account for ∼10% of studied cases. Strikingly, all four genes are required for neural tube development and regulate ventricular zone neural stem cell fate. These results implicate impaired neurogenesis (rather than active CSF accumulation) in the pathogenesis of a subset of CH patients, with potential diagnostic, prognostic, and therapeutic ramifications.


Subject(s)
Hydrocephalus/diagnosis , Hydrocephalus/genetics , Mutation/genetics , Neural Stem Cells/physiology , Cohort Studies , Exome/genetics , Female , Humans , Male , Neural Stem Cells/pathology , Patched-1 Receptor/genetics , Pedigree , Transcription Factors/genetics , Exome Sequencing/methods
5.
Nat Genet ; 49(11): 1593-1601, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28991257

ABSTRACT

Congenital heart disease (CHD) is the leading cause of mortality from birth defects. Here, exome sequencing of a single cohort of 2,871 CHD probands, including 2,645 parent-offspring trios, implicated rare inherited mutations in 1.8%, including a recessive founder mutation in GDF1 accounting for ∼5% of severe CHD in Ashkenazim, recessive genotypes in MYH6 accounting for ∼11% of Shone complex, and dominant FLT4 mutations accounting for 2.3% of Tetralogy of Fallot. De novo mutations (DNMs) accounted for 8% of cases, including ∼3% of isolated CHD patients and ∼28% with both neurodevelopmental and extra-cardiac congenital anomalies. Seven genes surpassed thresholds for genome-wide significance, and 12 genes not previously implicated in CHD had >70% probability of being disease related. DNMs in ∼440 genes were inferred to contribute to CHD. Striking overlap between genes with damaging DNMs in probands with CHD and autism was also found.


Subject(s)
Autistic Disorder/genetics , Cardiac Myosins/genetics , Genetic Predisposition to Disease , Growth Differentiation Factor 1/genetics , Heart Defects, Congenital/genetics , Myosin Heavy Chains/genetics , Vascular Endothelial Growth Factor Receptor-3/genetics , Adult , Autistic Disorder/pathology , Case-Control Studies , Child , Exome , Female , Gene Expression , Genome-Wide Association Study , Heart Defects, Congenital/pathology , Heterozygote , High-Throughput Nucleotide Sequencing , Homozygote , Humans , Male , Mutation , Pedigree , Risk
6.
Sci Rep ; 7(1): 4287, 2017 06 27.
Article in English | MEDLINE | ID: mdl-28655895

ABSTRACT

Despite efforts to interrogate human genome variation through large-scale databases, systematic preference toward populations of Caucasian descendants has resulted in unintended reduction of power in studying non-Caucasians. Here we report a compilation of coding variants from 1,055 healthy Korean individuals (KOVA; Korean Variant Archive). The samples were sequenced to a mean depth of 75x, yielding 101 singleton variants per individual. Population genetics analysis demonstrates that the Korean population is a distinct ethnic group comparable to other discrete ethnic groups in Africa and Europe, providing a rationale for such independent genomic datasets. Indeed, KOVA conferred 22.8% increased variant filtering power in addition to Exome Aggregation Consortium (ExAC) when used on Korean exomes. Functional assessment of nonsynonymous variant supported the presence of purifying selection in Koreans. Analysis of copy number variants detected 5.2 deletions and 10.3 amplifications per individual with an increased fraction of novel variants among smaller and rarer copy number variable segments. We also report a list of germline variants that are associated with increased tumor susceptibility. This catalog can function as a critical addition to the pre-existing variant databases in pursuing genetic studies of Korean individuals.


Subject(s)
Asian People/genetics , Databases, Genetic , Genetic Variation , Genetics, Population , DNA Copy Number Variations , Exome , Genetic Predisposition to Disease , Germ-Line Mutation , Humans , Neoplasms/genetics , Polymorphism, Single Nucleotide , Republic of Korea
7.
Am J Hum Genet ; 100(4): 581-591, 2017 Apr 06.
Article in English | MEDLINE | ID: mdl-28285767

ABSTRACT

Efforts to decipher the causal relationships between differences in gene regulation and corresponding differences in phenotype have been stymied by several basic technical challenges. Although detecting local, cis-eQTLs is now routine, trans-eQTLs, which are distant from the genes of origin, are far more difficult to find because millions of SNPs must currently be compared to thousands of transcripts. Here, we demonstrate an alternative approach: we looked for SNPs associated with the expression of many genes simultaneously and found that hundreds of trans-eQTLs each affect hundreds of transcripts in lymphoblastoid cell lines across three African populations. These trans-eQTLs target the same genes across the three populations and show the same direction of effect. We discovered that target transcripts of a high-confidence set of trans-eQTLs encode proteins that interact more frequently than expected by chance, are bound by the same transcription factors, and are enriched for pathway annotations indicative of roles in basic cell homeostasis. We thus demonstrate that our approach can uncover trans-acting transcriptional control circuits that affect co-regulated groups of genes: a key to understanding how cellular pathways and processes are orchestrated.


Subject(s)
Gene Expression Regulation , Quantitative Trait Loci , Transcription, Genetic , Algorithms , Black People/genetics , Cell Line , Gene Expression Profiling , HapMap Project , Humans , Polymorphism, Single Nucleotide , Protein Interaction Maps
8.
Nat Genet ; 47(9): 1011-9, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26192916

ABSTRACT

Cutaneous T cell lymphoma (CTCL) is a non-Hodgkin lymphoma of skin-homing T lymphocytes. We performed exome and whole-genome DNA sequencing and RNA sequencing on purified CTCL and matched normal cells. The results implicate mutations in 17 genes in CTCL pathogenesis, including genes involved in T cell activation and apoptosis, NF-κB signaling, chromatin remodeling and DNA damage response. CTCL is distinctive in that somatic copy number variants (SCNVs) comprise 92% of all driver mutations (mean of 11.8 pathogenic SCNVs versus 1.0 somatic single-nucleotide variant per CTCL). These findings have implications for new therapeutics.


Subject(s)
Lymphoma, T-Cell, Cutaneous/genetics , Skin Neoplasms/genetics , DNA Copy Number Variations , DNA Mutational Analysis , Exome , Gene Expression , Gene Frequency , Genetic Association Studies , Genomics , Humans , Lymphoma, T-Cell, Cutaneous/metabolism , Mutation, Missense , Polymorphism, Single Nucleotide , Tumor Cells, Cultured
9.
Genomics Proteomics Bioinformatics ; 13(1): 25-35, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25712262

ABSTRACT

We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results.


Subject(s)
Chromatography, Liquid/methods , Computational Biology/methods , Databases, Protein , Peptide Fragments/analysis , Proteome/analysis , Proteomics/methods , Tandem Mass Spectrometry/methods , Humans
10.
DNA Repair (Amst) ; 26: 44-53, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25547252

ABSTRACT

Efficient DNA double-strand break (DSB) repair is a critical determinant of cell survival in response to DNA damaging agents, and it plays a key role in the maintenance of genomic integrity. Homologous recombination (HR) and non-homologous end-joining (NHEJ) represent the two major pathways by which DSBs are repaired in mammalian cells. We now understand that HR and NHEJ repair are composed of multiple sub-pathways, some of which still remain poorly understood. As such, there is great interest in the development of novel assays to interrogate these key pathways, which could lead to the development of novel therapeutics, and a better understanding of how DSBs are repaired. Furthermore, assays which can measure repair specifically at endogenous chromosomal loci are of particular interest, because of an emerging understanding that chromatin interactions heavily influence DSB repair pathway choice. Here, we present the design and validation of a novel, next-generation sequencing-based approach to study DSB repair at chromosomal loci in cells. We demonstrate that NHEJ repair "fingerprints" can be identified using our assay, which are dependent on the status of key DSB repair proteins. In addition, we have validated that our system can be used to detect dynamic shifts in DSB repair activity in response to specific perturbations. This approach represents a unique alternative to many currently available DSB repair assays, which typical rely on the expression of reporter genes as an indirect read-out for repair. As such, we believe this tool will be useful for DNA repair researchers to study NHEJ repair in a high-throughput and sensitive manner, with the capacity to detect subtle changes in DSB repair patterns that was not possible previously.


Subject(s)
DNA Breaks, Double-Stranded , DNA End-Joining Repair , DNA Mutational Analysis/methods , High-Throughput Nucleotide Sequencing/methods , Animals , Chromatin/metabolism , DNA/metabolism , DNA-Binding Proteins/metabolism , Genetic Loci , Humans , INDEL Mutation , Mammals , Recombinational DNA Repair
11.
Genome Biol Evol ; 6(10): 2811-9, 2014 Oct 05.
Article in English | MEDLINE | ID: mdl-25287146

ABSTRACT

The Trypanosoma brucei complex contains a number of subspecies with exceptionally variable life histories, including zoonotic subspecies, which are causative agents of human African trypanosomiasis (HAT) in sub-Saharan Africa. Paradoxically, genomic variation between taxa is extremely low. We analyzed the whole-genome sequences of 39 isolates across the T. brucei complex from diverse hosts and regions, identifying 608,501 single nucleotide polymorphisms that represent 2.33% of the nuclear genome. We show that human pathogenicity occurs across a wide range of parasite genotypes, and taxonomic designation does not reflect genetic variation across the group, as previous studies have suggested based on a small number of genes. This genome-wide study allowed the identification of significant host and geographic location associations. Strong purifying selection was detected in genomic regions associated with cytoskeleton structure, and regulatory genes associated with antigenic variation, suggesting conservation of these regions in African trypanosomes. In agreement with expectations drawn from meiotic reciprocal recombination, differences in average linkage disequilibrium between chromosomes in T. brucei correlate positively with chromosome size. In addition to insights into the life history of a diverse group of eukaryotic parasites, the documentation of genomic variation across the T. brucei complex and its association with specific hosts and geographic localities will aid in the development of comprehensive monitoring tools crucial to the proposed elimination of HAT by 2020, and on a shorter term, for monitoring the feared merger between the two human infective parasites, T. brucei rhodesiense and T. b. gambiense, in northern Uganda.


Subject(s)
Genomics/methods , Trypanosoma brucei brucei/pathogenicity , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Humans , Trypanosoma brucei brucei/genetics , Virulence/genetics
12.
BMC Bioinformatics ; 15: 231, 2014 Jul 03.
Article in English | MEDLINE | ID: mdl-24990767

ABSTRACT

BACKGROUND: Current research suggests that a small set of "driver" mutations are responsible for tumorigenesis while a larger body of "passenger" mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical. RESULTS: We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html. CONCLUSION: SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.


Subject(s)
Computational Biology/methods , Mutation , Proteins/chemistry , Proteins/genetics , Algorithms , Cluster Analysis , Databases, Protein , Genes, Neoplasm/genetics , Humans , Neoplasms/genetics , Protein Structure, Tertiary
13.
Pigment Cell Melanoma Res ; 27(2): 253-62, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24283590

ABSTRACT

BRAF inhibitors improve melanoma patient survival, but resistance invariably develops. Here we report the discovery of a novel BRAF mutation that confers resistance to PLX4032 employing whole-exome sequencing of drug-resistant BRAF(V600K) melanoma cells. We further describe a new screening approach, a genome-wide piggyBac mutagenesis screen that revealed clinically relevant aberrations (N-terminal BRAF truncations and CRAF overexpression). The novel BRAF mutation, a Leu505 to His substitution (BRAF(L505H) ), is the first resistance-conferring second-site mutation identified in BRAF mutant cells. The mutation replaces a small nonpolar amino acid at the BRAF-PLX4032 interface with a larger polar residue. Moreover, we show that BRAF(L505H) , found in human prostate cancer, is itself a MAPK-activating, PLX4032-resistant oncogenic mutation. Lastly, we demonstrate that the PLX4032-resistant melanoma cells are sensitive to novel, next-generation BRAF inhibitors, especially the 'paradox-blocker' PLX8394, supporting its use in clinical trials for treatment of melanoma patients with BRAF-mutations.


Subject(s)
Drug Resistance, Neoplasm/drug effects , Indoles/pharmacology , Protein Kinase Inhibitors/pharmacology , Proto-Oncogene Proteins B-raf/antagonists & inhibitors , Sulfonamides/pharmacology , Amino Acid Sequence , Cell Line, Tumor , Cell Proliferation/drug effects , DNA Transposable Elements/genetics , Humans , MAP Kinase Signaling System/drug effects , Melanoma/enzymology , Melanoma/pathology , Models, Molecular , Molecular Sequence Data , Mutagenesis, Insertional/genetics , Mutant Proteins/metabolism , Mutation/genetics , Proto-Oncogene Proteins B-raf/metabolism , Vemurafenib
14.
Proc Natl Acad Sci U S A ; 110(38): E3640-9, 2013 Sep 17.
Article in English | MEDLINE | ID: mdl-24003131

ABSTRACT

Despite considerable efforts to sequence hypermutated cancers such as melanoma, distinguishing cancer-driving genes from thousands of recurrently mutated genes remains a significant challenge. To circumvent the problematic background mutation rates and identify new melanoma driver genes, we carried out a low-copy piggyBac transposon mutagenesis screen in mice. We induced eleven melanomas with mutation burdens that were 100-fold lower relative to human melanomas. Thirty-eight implicated genes, including two known drivers of human melanoma, were classified into three groups based on high, low, or background-level mutation frequencies in human melanomas, and we further explored the functional significance of genes in each group. For two genes overlooked by prevailing discovery methods, we found that loss of membrane associated guanylate kinase, WW and PDZ domain containing 2 and protein tyrosine phosphatase, receptor type, O cooperated with the v-raf murine sarcoma viral oncogene homolog B (BRAF) recurrent V600E mutation to promote cellular transformation. Moreover, for infrequently mutated genes often disregarded by current methods, we discovered recurrent mitogen-activated protein kinase kinase kinase 1 (Map3k1)-activating insertions in our screen, mirroring recurrent MAP3K1 up-regulation in human melanomas. Aberrant expression of Map3k1 enabled growth factor-autonomous proliferation and drove BRAF-independent ERK signaling, thus shedding light on alternative means of activating this prominent signaling pathway in melanoma. In summary, our study contributes several previously undescribed genes involved in melanoma and establishes an important proof-of-principle for the utility of the low-copy transposon mutagenesis approach for identifying cancer-driving genes, especially those masked by hypermutation.


Subject(s)
DNA Transposable Elements/genetics , Gene Expression Regulation, Neoplastic/physiology , MAP Kinase Kinase Kinase 1/metabolism , Melanoma/genetics , Mutagenesis, Insertional/genetics , Signal Transduction/physiology , Animals , Blotting, Western , DNA Primers/genetics , Gene Expression Regulation, Neoplastic/genetics , Genetic Testing , HEK293 Cells , Humans , Immunohistochemistry , Mice , Mice, Transgenic , Reverse Transcriptase Polymerase Chain Reaction , Signal Transduction/genetics , Species Specificity
15.
Nature ; 498(7453): 220-3, 2013 Jun 13.
Article in English | MEDLINE | ID: mdl-23665959

ABSTRACT

Congenital heart disease (CHD) is the most frequent birth defect, affecting 0.8% of live births. Many cases occur sporadically and impair reproductive fitness, suggesting a role for de novo mutations. Here we compare the incidence of de novo mutations in 362 severe CHD cases and 264 controls by analysing exome sequencing of parent-offspring trios. CHD cases show a significant excess of protein-altering de novo mutations in genes expressed in the developing heart, with an odds ratio of 7.5 for damaging (premature termination, frameshift, splice site) mutations. Similar odds ratios are seen across the main classes of severe CHD. We find a marked excess of de novo mutations in genes involved in the production, removal or reading of histone 3 lysine 4 (H3K4) methylation, or ubiquitination of H2BK120, which is required for H3K4 methylation. There are also two de novo mutations in SMAD2, which regulates H3K27 methylation in the embryonic left-right organizer. The combination of both activating (H3K4 methylation) and inactivating (H3K27 methylation) chromatin marks characterizes 'poised' promoters and enhancers, which regulate expression of key developmental genes. These findings implicate de novo point mutations in several hundreds of genes that collectively contribute to approximately 10% of severe CHD.


Subject(s)
Heart Diseases/congenital , Heart Diseases/genetics , Histones/metabolism , Adult , Case-Control Studies , Child , Chromatin/chemistry , Chromatin/metabolism , DNA Mutational Analysis , Enhancer Elements, Genetic/genetics , Exome/genetics , Female , Genes, Developmental/genetics , Heart Diseases/metabolism , Histones/chemistry , Humans , Lysine/chemistry , Lysine/metabolism , Male , Methylation , Mutation , Odds Ratio , Promoter Regions, Genetic/genetics
16.
Nature ; 485(7397): 237-41, 2012 Apr 04.
Article in English | MEDLINE | ID: mdl-22495306

ABSTRACT

Multiple studies have confirmed the contribution of rare de novo copy number variations to the risk for autism spectrum disorders. But whereas de novo single nucleotide variants have been identified in affected individuals, their contribution to risk has yet to be clarified. Specifically, the frequency and distribution of these mutations have not been well characterized in matched unaffected controls, and such data are vital to the interpretation of de novo coding mutations observed in probands. Here we show, using whole-exome sequencing of 928 individuals, including 200 phenotypically discordant sibling pairs, that highly disruptive (nonsense and splice-site) de novo mutations in brain-expressed genes are associated with autism spectrum disorders and carry large effects. On the basis of mutation rates in unaffected individuals, we demonstrate that multiple independent de novo single nucleotide variants in the same gene among unrelated probands reliably identifies risk alleles, providing a clear path forward for gene discovery. Among a total of 279 identified de novo coding mutations, there is a single instance in probands, and none in siblings, in which two independent nonsense variants disrupt the same gene, SCN2A (sodium channel, voltage-gated, type II, α subunit), a result that is highly unlikely by chance.


Subject(s)
Autistic Disorder/genetics , Exome/genetics , Exons/genetics , Genetic Predisposition to Disease/genetics , Mutation/genetics , Nerve Tissue Proteins/genetics , Sodium Channels/genetics , Alleles , Codon, Nonsense/genetics , Genetic Heterogeneity , Humans , NAV1.2 Voltage-Gated Sodium Channel , RNA Splice Sites/genetics , Siblings
17.
mBio ; 3(1)2012.
Article in English | MEDLINE | ID: mdl-22334516

ABSTRACT

UNLABELLED: Ancient endosymbionts have been associated with extreme genome structural stability with little differentiation in gene inventory between sister species. Tsetse flies (Diptera: Glossinidae) harbor an obligate endosymbiont, Wigglesworthia, which has coevolved with the Glossina radiation. We report on the ~720-kb Wigglesworthia genome and its associated plasmid from Glossina morsitans morsitans and compare them to those of the symbiont from Glossina brevipalpis. While there was overall high synteny between the two genomes, a large inversion was noted. Furthermore, symbiont transcriptional analyses demonstrated host tissue and development-specific gene expression supporting robust transcriptional regulation in Wigglesworthia, an unprecedented observation in other obligate mutualist endosymbionts. Expression and immunohistochemistry confirmed the role of flagella during the vertical transmission process from mother to intrauterine progeny. The expression of nutrient provisioning genes (thiC and hemH) suggests that Wigglesworthia may function in dietary supplementation tailored toward host development. Furthermore, despite extensive conservation, unique genes were identified within both symbiont genomes that may result in distinct metabolomes impacting host physiology. One of these differences involves the chorismate, phenylalanine, and folate biosynthetic pathways, which are uniquely present in Wigglesworthia morsitans. Interestingly, African trypanosomes are auxotrophs for phenylalanine and folate and salvage both exogenously. It is possible that W. morsitans contributes to the higher parasite susceptibility of its host species. IMPORTANCE: Genomic stasis has historically been associated with obligate endosymbionts and their sister species. Here we characterize the Wigglesworthia genome of the tsetse fly species Glossina morsitans and compare it to its sister genome within G. brevipalpis. The similarity and variation between the genomes enabled specific hypotheses regarding functional biology. Expression analyses indicate significant levels of transcriptional regulation and support development- and tissue-specific functional roles for the symbiosis previously not observed in obligate mutualist symbionts. Retention of the genetically expensive flagella within these small genomes was demonstrated to be significant in symbiont transmission and tailored to the unique tsetse fly reproductive biology. Distinctions in metabolomes were also observed. We speculate an additional role for Wigglesworthia symbiosis where infections with pathogenic trypanosomes may depend upon symbiont species-specific metabolic products and thus influence the vector competence traits of different tsetse fly host species.


Subject(s)
Genome, Bacterial , Genome, Insect , Symbiosis , Tsetse Flies/microbiology , Wigglesworthia/physiology , Amino Acid Sequence , Animals , Chorismic Acid/biosynthesis , DNA, Bacterial/genetics , DNA, Bacterial/metabolism , Evolution, Molecular , Flagella/genetics , Flagella/metabolism , Folic Acid/biosynthesis , Gene Expression Regulation, Bacterial , Immunohistochemistry , Inheritance Patterns , Molecular Sequence Data , Phenylalanine/biosynthesis , Plasmids/genetics , Plasmids/metabolism , Species Specificity , Synteny , Transcription, Genetic , Tsetse Flies/genetics , Tsetse Flies/metabolism , Wigglesworthia/genetics , Wigglesworthia/metabolism
18.
Nature ; 482(7383): 98-102, 2012 Jan 22.
Article in English | MEDLINE | ID: mdl-22266938

ABSTRACT

Hypertension affects one billion people and is a principal reversible risk factor for cardiovascular disease. Pseudohypoaldosteronism type II (PHAII), a rare Mendelian syndrome featuring hypertension, hyperkalaemia and metabolic acidosis, has revealed previously unrecognized physiology orchestrating the balance between renal salt reabsorption and K(+) and H(+) excretion. Here we used exome sequencing to identify mutations in kelch-like 3 (KLHL3) or cullin 3 (CUL3) in PHAII patients from 41 unrelated families. KLHL3 mutations are either recessive or dominant, whereas CUL3 mutations are dominant and predominantly de novo. CUL3 and BTB-domain-containing kelch proteins such as KLHL3 are components of cullin-RING E3 ligase complexes that ubiquitinate substrates bound to kelch propeller domains. Dominant KLHL3 mutations are clustered in short segments within the kelch propeller and BTB domains implicated in substrate and cullin binding, respectively. Diverse CUL3 mutations all result in skipping of exon 9, producing an in-frame deletion. Because dominant KLHL3 and CUL3 mutations both phenocopy recessive loss-of-function KLHL3 mutations, they may abrogate ubiquitination of KLHL3 substrates. Disease features are reversed by thiazide diuretics, which inhibit the Na-Cl cotransporter in the distal nephron of the kidney; KLHL3 and CUL3 are expressed in this location, suggesting a mechanistic link between KLHL3 and CUL3 mutations, increased Na-Cl reabsorption, and disease pathogenesis. These findings demonstrate the utility of exome sequencing in disease gene identification despite the combined complexities of locus heterogeneity, mixed models of transmission and frequent de novo mutation, and establish a fundamental role for KLHL3 and CUL3 in blood pressure, K(+) and pH homeostasis.


Subject(s)
Carrier Proteins/genetics , Cullin Proteins/genetics , Hypertension/genetics , Mutation/genetics , Pseudohypoaldosteronism/genetics , Water-Electrolyte Imbalance/genetics , Adaptor Proteins, Signal Transducing , Amino Acid Sequence , Animals , Base Sequence , Blood Pressure/genetics , Carrier Proteins/chemistry , Cohort Studies , Cullin Proteins/chemistry , Electrolytes , Exons/genetics , Female , Gene Expression Profiling , Genes, Dominant/genetics , Genes, Recessive/genetics , Genotype , Homeostasis/genetics , Humans , Hydrogen-Ion Concentration , Hypertension/complications , Hypertension/physiopathology , Male , Mice , Microfilament Proteins , Models, Molecular , Molecular Sequence Data , Phenotype , Potassium/metabolism , Pseudohypoaldosteronism/complications , Pseudohypoaldosteronism/physiopathology , Sodium Chloride/metabolism , Water-Electrolyte Imbalance/complications , Water-Electrolyte Imbalance/physiopathology
19.
Hum Hered ; 72(2): 85-97, 2011.
Article in English | MEDLINE | ID: mdl-21934324

ABSTRACT

BACKGROUND: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). METHODS: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. RESULTS: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. CONCLUSIONS: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.


Subject(s)
Data Mining/methods , Epistasis, Genetic , Genetic Association Studies , Algorithms , Computer Simulation , Genetic Loci , Genome, Human , Haplotypes , Humans , Lymphoma, Non-Hodgkin/genetics , Monte Carlo Method , Polymorphism, Single Nucleotide
20.
Appl Environ Microbiol ; 77(23): 8400-8, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21948847

ABSTRACT

Vertical transmission of obligate symbionts generates a predictable evolutionary history of symbionts that reflects that of their hosts. In insects, evolutionary associations between symbionts and their hosts have been investigated primarily among species, leaving population-level processes largely unknown. In this study, we investigated the tsetse (Diptera: Glossinidae) bacterial symbiont, Wigglesworthia glossinidia, to determine whether observed codiversification of symbiont and tsetse host species extends to a single host species (Glossina fuscipes fuscipes) in Uganda. To explore symbiont genetic variation in G. f. fuscipes populations, we screened two variable loci (lon and lepA) from the Wigglesworthia glossinidia bacterium in the host species Glossina fuscipes fuscipes (W. g. fuscipes) and examined phylogeographic and demographic characteristics in multiple host populations. Symbiont genetic variation was apparent within and among populations. We identified two distinct symbiont lineages, in northern and southern Uganda. Incongruence length difference (ILD) tests indicated that the two lineages corresponded exactly to northern and southern G. f. fuscipes mitochondrial DNA (mtDNA) haplogroups (P = 1.0). Analysis of molecular variance (AMOVA) confirmed that most variation was partitioned between the northern and southern lineages defined by host mtDNA (85.44%). However, ILD tests rejected finer-scale congruence within the northern and southern populations (P = 0.009). This incongruence was potentially due to incomplete lineage sorting that resulted in novel combinations of symbiont genetic variants and host background. Identifying these novel combinations may have public health significance, since tsetse is the sole vector of sleeping sickness and Wigglesworthia is known to influence host vector competence. Thus, understanding the adaptive value of these host-symbiont combinations may afford opportunities to develop vector control methods.


Subject(s)
Genetic Variation , Phylogeography , Symbiosis , Tsetse Flies/microbiology , Wigglesworthia/classification , Wigglesworthia/isolation & purification , Animals , DNA, Mitochondrial/chemistry , DNA, Mitochondrial/genetics , Molecular Sequence Data , Protease La/genetics , Sequence Analysis, DNA , Transcriptional Elongation Factors/genetics , Tsetse Flies/genetics , Uganda , Wigglesworthia/genetics , Wigglesworthia/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...