ABSTRACT
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Physical Chromosome Mapping , Amino Acid Sequence , Genetic Predisposition to Disease , Genetics, Medical , Genetics, Population , Genome-Wide Association Study , Genomics , Genotype , Haplotypes/genetics , Homozygote , Humans , Molecular Sequence Data , Mutation Rate , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Sequence Analysis, DNA , Sequence Deletion/geneticsABSTRACT
Next-generation sequencing studies have revealed genome-wide structural variation patterns in cancer, such as chromothripsis and chromoplexy, that do not engage a single discernable driver mutation, and whose clinical relevance is unclear. We devised a robust genomic metric able to identify cancers with a chromotype called tandem duplicator phenotype (TDP) characterized by frequent and distributed tandem duplications (TDs). Enriched only in triple-negative breast cancer (TNBC) and in ovarian, endometrial, and liver cancers, TDP tumors conjointly exhibit tumor protein p53 (TP53) mutations, disruption of breast cancer 1 (BRCA1), and increased expression of DNA replication genes pointing at rereplication in a defective checkpoint environment as a plausible causal mechanism. The resultant TDs in TDP augment global oncogene expression and disrupt tumor suppressor genes. Importantly, the TDP strongly correlates with cisplatin sensitivity in both TNBC cell lines and primary patient-derived xenografts. We conclude that the TDP is a common cancer chromotype that coordinately alters oncogene/tumor suppressor expression with potential as a marker for chemotherapeutic response.
Subject(s)
Endometrial Neoplasms/genetics , Ovarian Neoplasms/genetics , Segmental Duplications, Genomic/genetics , Triple Negative Breast Neoplasms/genetics , Antineoplastic Agents/pharmacology , Female , Genes, Neoplasm/genetics , Genetic Markers/genetics , Humans , PhenotypeABSTRACT
Chromosomal structural variations play an important role in determining the transcriptional landscape of human breast cancers. To assess the nature of these structural variations, we analyzed eight breast tumor samples with a focus on regions of gene amplification using mate-pair sequencing of long-insert genomic DNA with matched transcriptome profiling. We found that tandem duplications appear to be early events in tumor evolution, especially in the genesis of amplicons. In a detailed reconstruction of events on chromosome 17, we found large unpaired inversions and deletions connect a tandemly duplicated ERBB2 with neighboring 17q21.3 amplicons while simultaneously deleting the intervening BRCA1 tumor suppressor locus. This series of events appeared to be unusually common when examined in larger genomic data sets of breast cancers albeit using approaches with lesser resolution. Using siRNAs in breast cancer cell lines, we showed that the 17q21.3 amplicon harbored a significant number of weak oncogenes that appeared consistently coamplified in primary tumors. Down-regulation of BRCA1 expression augmented the cell proliferation in ERBB2-transfected human normal mammary epithelial cells. Coamplification of other functionally tested oncogenic elements in other breast tumors examined, such as RIPK2 and MYC on chromosome 8, also parallel these findings. Our analyses suggest that structural variations efficiently orchestrate the gain and loss of cancer gene cassettes that engage many oncogenic pathways simultaneously and that such oncogenic cassettes are favored during the evolution of a cancer.
Subject(s)
BRCA1 Protein/genetics , Breast Neoplasms/genetics , Chromosome Aberrations , Chromosomes, Human, Pair 17/genetics , Receptor, ErbB-2/genetics , Base Sequence , Cell Line, Tumor , Female , Gene Amplification , Gene Duplication , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , MCF-7 Cells , Molecular Sequence Data , Sequence Analysis, DNAABSTRACT
New types of small RNAs distinct from microRNAs (miRNAs) are progressively being discovered in various organisms. In order to discover such novel small RNAs, a library of 17- to 26-base-long RNAs was created from prostate cancer cell lines and sequenced by ultra-high-throughput sequencing. A significant number of the sequences are derived from precise processing at the 5' or 3' end of mature or precursor tRNAs to form three series of tRFs (tRNA-derived RNA fragments): the tRF-5, tRF-3, and tRF-1 series. These sequences constitute a class of short RNAs that are second most abundant to miRNAs. Northern hybridization, quantitative RT-PCR, and splinted ligation assays independently measured the levels of at least 17 tRFs. To demonstrate the biological importance of tRFs, we further investigated tRF-1001, derived from the 3' end of a Ser-TGA tRNA precursor transcript that is not retained in the mature tRNA. tRF-1001 is expressed highly in a wide range of cancer cell lines but much less in tissues, and its expression in cell lines was tightly correlated with cell proliferation. siRNA-mediated knockdown of tRF-1001 impaired cell proliferation with the specific accumulation of cells in G2, phenotypes that were reversed specifically by cointroducing a synthetic 2'-O-methyl tRF-1001 oligoribonucleotide resistant to the siRNA. tRF-1001 is generated in the cytoplasm by tRNA 3'-endonuclease ELAC2, a prostate cancer susceptibility gene. Our data suggest that tRFs are not random by-products of tRNA degradation or biogenesis, but an abundant and novel class of short RNAs with precise sequence structure that have specific expression patterns and specific biological roles.
Subject(s)
MicroRNAs/chemistry , MicroRNAs/genetics , RNA, Transfer/chemistry , RNA, Transfer/genetics , Cell Line, Tumor , Cell Proliferation , Cloning, Molecular , Cytoplasm/metabolism , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Molecular Sequence Data , Neoplasm Proteins/metabolismABSTRACT
Tumor genomes are generally thought to evolve through a gradual accumulation of mutations, but the observation that extraordinarily complex rearrangements can arise through single mutational events suggests that evolution may be accelerated by punctuated changes in genome architecture. To assess the prevalence and origins of complex genomic rearrangements (CGRs), we mapped 6179 somatic structural variation breakpoints in 64 cancer genomes from seven tumor types and screened for clusters of three or more interconnected breakpoints. We find that complex breakpoint clusters are extremely common: 154 clusters comprise 25% of all somatic breakpoints, and 75% of tumors exhibit at least one complex cluster. Based on copy number state profiling, 63% of breakpoint clusters are consistent with being CGRs that arose through a single mutational event. CGRs have diverse architectures including focal breakpoint clusters, large-scale rearrangements joining clusters from one or more chromosomes, and staggeringly complex chromothripsis events. Notably, chromothripsis has a significantly higher incidence in glioblastoma samples (39%) relative to other tumor types (9%). Chromothripsis breakpoints also show significantly elevated intra-tumor allele frequencies relative to simple SVs, which indicates that they arise early during tumorigenesis or confer selective advantage. Finally, assembly and analysis of 4002 somatic and 6982 germline breakpoint sequences reveal that somatic breakpoints show significantly less microhomology and fewer templated insertions than germline breakpoints, and this effect is stronger at CGRs than at simple variants. These results are inconsistent with replication-based models of CGR genesis and strongly argue that nonhomologous repair of concurrently arising DNA double-strand breaks is the predominant mechanism underlying complex cancer genome rearrangements.
Subject(s)
Chromosome Aberrations , Chromosome Breakpoints , Mutation/genetics , Neoplasms/genetics , Base Sequence , DNA Breaks, Double-Stranded , DNA Replication/genetics , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/pathologyABSTRACT
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Subject(s)
Genome, Human/genetics , Genomics , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation , Conserved Sequence/genetics , DNA Replication , Evolution, Molecular , Exons/genetics , Genetic Variation/genetics , Heterozygote , Histones/metabolism , Humans , Pilot Projects , Protein Binding , RNA, Messenger/genetics , RNA, Untranslated/genetics , Transcription Factors/metabolism , Transcription Initiation SiteABSTRACT
OBJECTIVE: The objective was to develop and operate a cloud-based federated system for managing, analyzing, and sharing patient data for research purposes, while allowing each resource sharing patient data to operate their component based upon their own governance rules. The federated system is called the Biomedical Research Hub (BRH). MATERIALS AND METHODS: The BRH is a cloud-based federated system built over a core set of software services called framework services. BRH framework services include authentication and authorization, services for generating and assessing findable, accessible, interoperable, and reusable (FAIR) data, and services for importing and exporting bulk clinical data. The BRH includes data resources providing data operated by different entities and workspaces that can access and analyze data from one or more of the data resources in the BRH. RESULTS: The BRH contains multiple data commons that in aggregate provide access to over 6 PB of research data from over 400 000 research participants. DISCUSSION AND CONCLUSION: With the growing acceptance of using public cloud computing platforms for biomedical research, and the growing use of opaque persistent digital identifiers for datasets, data objects, and other entities, there is now a foundation for systems that federate data from multiple independently operated data resources that expose FAIR application programming interfaces, each using a separate data model. Applications can be built that access data from one or more of the data resources.
Subject(s)
Biomedical Research , Cloud Computing , Humans , SoftwareABSTRACT
Three muscle-specific microRNAs, miR-206, -1, and -133, are induced during differentiation of C2C12 myoblasts in vitro. Transfection of miR-206 promotes differentiation despite the presence of serum, whereas inhibition of the microRNA by antisense oligonucleotide inhibits cell cycle withdrawal and differentiation, which are normally induced by serum deprivation. Among the many mRNAs that are down-regulated by miR-206, the p180 subunit of DNA polymerase alpha and three other genes are shown to be direct targets. Down-regulation of the polymerase inhibits DNA synthesis, an important component of the differentiation program. The direct targets are decreased by mRNA cleavage that is dependent on predicted microRNA target sites. Unlike small interfering RNA-directed cleavage, however, the 5' ends of the cleavage fragments are distributed and not confined to the target sites, suggesting involvement of exonucleases in the degradation process. In addition, inhibitors of myogenic transcription factors, Id1-3 and MyoR, are decreased upon miR-206 introduction, suggesting the presence of additional mechanisms by which microRNAs enforce the differentiation program.
Subject(s)
Cell Differentiation , MicroRNAs/metabolism , Myoblasts, Skeletal/cytology , Transcription Factors/metabolism , Animals , Basic Helix-Loop-Helix Transcription Factors , Cell Line , Cell Proliferation , Connexin 43/genetics , Connexin 43/metabolism , DNA Polymerase I/genetics , DNA Polymerase I/metabolism , Down-Regulation , Lymphokines/genetics , Lymphokines/metabolism , Mice , MicroRNAs/biosynthesis , MicroRNAs/genetics , Muscle Development , Myoblasts, Skeletal/metabolism , Oligonucleotide Array Sequence Analysis , Oligonucleotides, Antisense/genetics , Oligonucleotides, Antisense/metabolism , Proteins/genetics , Proteins/metabolism , RNA, Messenger/metabolism , Time Factors , TransfectionABSTRACT
Paired end mapping of chromosomal fragments has been used in human cells to identify numerous structural variations in chromosomes of individuals and of cancer cell lines; however, the molecular, biological and bioinformatics methods for this technology are still in development. Here, we present a parallel bioinformatics approach to analyze chromosomal paired-end tag (ChromPET) sequence data and demonstrate its application in identifying gene rearrangements in the model organism Saccharomyces cerevisiae. We detected several expected events, including a chromosomal rearrangement of the nonessential arm of chromosome V induced by selective pressure, rearrangements introduced during strain construction and gene conversion at the MAT locus. In addition, we discovered several unannotated Ty element insertions that are present in the reference yeast strain, but not in the reference genome sequence, suggesting a few revisions are necessary in the latter. These data demonstrate that application of the chromPET technique to a genetically tractable organism like yeast provides an easy screen for studying the mechanisms of chromosomal rearrangements during the propagation of a species.
Subject(s)
DNA Transposable Elements , Gene Conversion , Genome, Fungal , Genomics/methods , Saccharomyces cerevisiae/genetics , Translocation, Genetic , Amino Acid Transport Systems, Basic/genetics , Chromosomes, Fungal , Genes, Fungal , Genetic Linkage , Genomic Library , Saccharomyces cerevisiae Proteins/geneticsABSTRACT
A 16-year-old boy presented with a tumor located in fourth ventricle, which showed histological features of an ependymoma replete with perivascular pseudorosettes and true ependymal rosettes. Interestingly, many of the tumor cells exhibited abundant cytoplasm stuffed with a grayish brown pigment. Histochemical stains showed the pigment to be acid fast and periodic acid-Schiff positive and negative for Masson-Fontana melanin stain. Additionally, the pigment displayed brilliant autofluorescence under ultraviolet light of a fluorescent microscope. Ultrastructure examination of the pigment revealed a non-membrane-bound biphasic structure with an electron-dense core and electron-lucent periphery. Only few similar case reports mention such pigmented ependymomas to contain a mixture of neuromelanin and lipofuscin while others mention it to be melanin itself. Our workup suggests the pigment to represent lipofuscin or its derivative. Generally known to be a pigment of wear and tear, the significance of finding it in a tumor with such abundance remains to be understood and explored.
Subject(s)
Ependymoma/diagnosis , Fourth Ventricle/pathology , Adolescent , Craniotomy , Ependymoma/pathology , Ependymoma/surgery , Fourth Ventricle/diagnostic imaging , Fourth Ventricle/surgery , Humans , Lipofuscin/analysis , Magnetic Resonance Imaging , Male , Melanins/analysis , Silver NitrateABSTRACT
Angioimmunoblastic T-cell lymphoma (AITL) is an aggressive variant of peripheral T-cell lymphoma, occurring in elderly patients without any gender predisposition. It accounts for 1-2% of all non-Hodgkin lymphoma. Although characterized by some peculiar histological features, diagnosis of AITL can sometimes be challenging and a definite diagnosis requires a complete immunophenotypic and molecular workup. Peripheral Blood (PB) involvement in AITL has not been studied in detail and there is a paucity of published data about leukemic presentation of AITL. We present a case of a 38-year-old female diagnosed as AITL with PB involvement. Flow cytometric (FCM) examination of PB showed 40% abnormal lymphoid cells which were CD45+, CD4+, CD2+, cCD3+, CD5+, CD10+, CD16+ and TCRγδ restricted. PB involvement by AITL appears to be more common and under-reported. Nevertheless, detection of these tumoral T lymphocytes needs to be assessed in large case studies for assessing the true incidence of PB involvement. FCM analysis is an effective and reliable approach in the identification of leukemic phase of AITL and can lead to timely and effective intervention.
ABSTRACT
INTRODUCTION: CD123 is overexpressed in many hematologic malignancies and found to be useful in characterizing leukemic blasts of both acute myeloid leukemia (AML) and B-acute lymphoblastic leukemia (B-ALL). CD123 has been recently found to be a marker of leukemic stem cells, and its utility to measure residual disease and potential role in disease relapse is under evaluation. MATERIALS AND METHODS: Herein, we have evaluated the expression of CD123 in 757 samples of acute leukemia including 479 treatment-naive and 278 follow-up samples and compared with post-induction morphologic complete remission and measurable residual disease (MRD) status. Multiparametric flow cytometry was used for assessment of CD123 expression and immunophenotypic characterization of leukemic blasts at diagnostic and MRD assessment time points. RESULTS: Using variable cutoffs of 5%, 10%, and 20% to define a case as CD123-positive, expression of CD123 was observed in 75.6%, 66.2%, and 50% of AML and 88.6%, 81.8%, and 75% of B-ALL, respectively. Of 11 patients, 7 (63.63%) had mixed phenotype acute leukemia, but none of the 12 patients with T-acute lymphoblastic leukemia showed positivity for CD123. CD123 expression at diagnosis was associated with post-induction MRD-positive status in both B-ALL (P < .001) and AML (P = .001). We also evaluated the utility of CD123 as a leukemia-associated aberrant immunophenotype and found it to be useful in both patients with AML (baseline, 50.6%; follow-up, 53%) and B-ALL (baseline, 75%; follow-up, 73.07%). CONCLUSIONS: In conclusion, CD123 may be considered as a cardinal marker for residual disease assessment and response evaluation in AML and B-ALL.
Subject(s)
Biomarkers, Tumor/metabolism , Interleukin-3 Receptor alpha Subunit/metabolism , Leukemia, Myeloid, Acute/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Adolescent , Adult , Aged , Child , Child, Preschool , Female , Humans , Infant , Male , Middle Aged , Treatment Outcome , Young AdultABSTRACT
Differences in the genetic and epigenetic make up of cell lines have been very useful for dissecting the roles of specific genes in the biology of a cell. Targeted comparative RNAi (TARCOR) analysis uses high throughput RNA interference (RNAi) against a targeted gene set and rigorous quantitation of the phenotype to identify genes with a differential requirement for proliferation between cell lines of different genetic backgrounds. To demonstrate the utility of such an analysis, we examined 257 growth-regulated genes in parallel in a breast epithelial cell line, MCF10A, and a prostate cancer cell line, PC3. Depletion of an unexpectedly high number of genes (25%) differentially affected proliferation of the two cell lines. Knockdown of many genes that spare PC3 (p53-) but inhibit MCF10A (p53+) proliferation induces p53 in MCF10A cells. EBNA1BP2, involved in ribosome biogenesis, is an example of such a gene, with its depletion arresting MCF10A at G1/S in a p53-dependent manner. TARCOR is thus useful for identifying cell type-specific genes and pathways involved in proliferation and also for exploring the heterogeneity of cell lines. In particular, our data emphasize the importance of considering the genetic status, when performing siRNA screens in mammalian cells.
Subject(s)
Cell Proliferation , Genes, Essential/genetics , RNA Interference , Cell Line, Tumor , Female , Gene Expression Regulation, Neoplastic , Genes, Neoplasm/genetics , Humans , Male , RNA, Messenger/genetics , RNA, Messenger/metabolism , Ribosomes/metabolism , Tumor Suppressor Protein p53/metabolismABSTRACT
Magnetic Particle Spectroscopy (MPS) is a measurement technique to determine the magnetic properties of superparamagnetic iron oxide nanoparticles (SPIONs) in an oscillating magnetic field as applied in Magnetic Particle Imaging (MPI). State of the art MPS devices are solely capable of measuring the magnetization response of the SPIONs to an oscillatory magnetic excitation retrospectively, i.e. after the synthesis process. In this contribution, a novel in-situ magnetic particle spectrometer (INSPECT) is presented, which can be used to monitor the entire synthesis process from particle genesis via growth to the stable colloidal suspension of the nanoparticles in real time. The device is suitable for the use in a biochemistry environment. It has a chamber size of 72 mm such that a 100 ml reaction flask can be used for synthesis. For an alkaline-based precipitation, the change of magnetic properties of SPIONs during the nucleation and growth phase of the synthesis is demonstrated. The device is able to record the changes in the amplitude and phase spectra, and, in turn, the hysteresis. Hence, it is a powerful tool for an in-depth understanding of the nanoparticle formation dynamics during the synthesis process.
ABSTRACT
Pediatric small round cell tumors (PSRCTs) constitute a large proportion of childhood malignancies with overlapping diagnostic and clinical features but radically different therapies. Here, we report a case of 16-year-old male child presenting with diffuse abdominal and mediastinal mass, axillary lymphadenopathy, and pleural effusion. Bone marrow aspirate showed near total replacement by small round malignant cells. The bone marrow biopsy showed interstitial infiltration by malignant cells, which were CD45- CD3- CD20- MIC2+ FLI1+ and diagnosis of Ewing's sarcoma was established. In contrast, flowcytometric immunophenotyping of the bone marrow aspirate showed CD45- cells, which were CD19+ cytCD79a+ CD10+ CD81+ CD38+ HLA-DR+ CD22+ CD20- consistent with B-cell acute lymphoblastic leukemia (B-ALL). The extended immunostaining panel on bone marrow biopsy also showed positivity for cytCD79a, CD10, CD19, and BCL-2, whereas fluorescent in-situ hybridization for EWSR1 gene rearrangement was negative. Thus, a final diagnosis of CD45- FLI1+ MIC2+ B-ALL was established. Rare cases of CD45- B-ALL with immunoreactivity for MIC2 and Friend leukemia virus integration 1 (FLI1) have posed a diagnostic challenge for PSRCTs in the recent past. This case report highlights the role of multimodality approach in establishing a correct diagnosis in CD45- PSRCTs to ensure definitive therapy and better clinical outcome.
Subject(s)
12E7 Antigen/genetics , Burkitt Lymphoma/pathology , Desmoplastic Small Round Cell Tumor/diagnosis , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Proto-Oncogene Protein c-fli-1/genetics , Adolescent , Biopsy , Bone Marrow/pathology , Bone Neoplasms , Desmoplastic Small Round Cell Tumor/genetics , Flow Cytometry , Humans , Male , Precursor Cell Lymphoblastic Leukemia-Lymphoma/pathology , Sarcoma, EwingABSTRACT
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.
Subject(s)
Genome, Human/genetics , Genomic Structural Variation , Genomics/methods , Haplotypes/genetics , Algorithms , Chromosome Mapping/methods , Databases, Genetic , High-Throughput Nucleotide Sequencing/methods , Humans , INDEL Mutation , Whole Genome Sequencing/methodsABSTRACT
Recent technical and methodological advances have greatly enhanced genome-wide association studies (GWAS). The advent of low-cost, whole-genome sequencing facilitates high-resolution variant identification, and the development of linear mixed models (LMM) allows improved identification of putatively causal variants. While essential for correcting false positive associations due to sample relatedness and population stratification, LMMs have commonly been restricted to quantitative variables. However, phenotypic traits in association studies are often categorical, coded as binary case-control or ordered variables describing disease stages. To address these issues, we have devised a method for genomic association studies that implements a generalized LMM (GLMM) in a Bayesian framework, called Bayes-GLMM Bayes-GLMM has four major features: (1) support of categorical, binary, and quantitative variables; (2) cohesive integration of previous GWAS results for related traits; (3) correction for sample relatedness by mixed modeling; and (4) model estimation by both Markov chain Monte Carlo sampling and maximal likelihood estimation. We applied Bayes-GLMM to the whole-genome sequencing cohort of the Alzheimer's Disease Sequencing Project. This study contains 570 individuals from 111 families, each with Alzheimer's disease diagnosed at one of four confidence levels. Using Bayes-GLMM we identified four variants in three loci significantly associated with Alzheimer's disease. Two variants, rs140233081 and rs149372995, lie between PRKAR1B and PDGFA The coded proteins are localized to the glial-vascular unit, and PDGFA transcript levels are associated with Alzheimer's disease-related neuropathology. In summary, this work provides implementation of a flexible, generalized mixed-model approach in a Bayesian framework for association studies.
Subject(s)
Alzheimer Disease/genetics , Bayes Theorem , Genetic Predisposition to Disease , Linear Models , Quantitative Trait Loci , Age of Onset , Algorithms , Animals , Genome-Wide Association Study , Humans , Markov Chains , Mice , Models, Biological , Monte Carlo Method , Whole Genome SequencingABSTRACT
Comprehensive and accurate identification of structural variations (SVs) from next generation sequencing data remains a major challenge. We develop FusorSV, which uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. It includes a fusion model built using analysis of 27 deep-coverage human genomes from the 1000 Genomes Project. We identify 843 novel SV calls that were not reported by the 1000 Genomes Project for these 27 samples. Experimental validation of a subset of these calls yields a validation rate of 86.7%. FusorSV is available at https://github.com/TheJacksonLaboratory/SVE .
Subject(s)
Algorithms , Genome, Human , Genomic Structural Variation , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA , SoftwareABSTRACT
UNLABELLED: Recent analyses of next-generation sequencing datasets have shown that cell-specific regulatory elements in stem cells are marked with distinguishable patterns of transcription factor (TF) binding and epigenetic marks. For example, we recently demonstrated that promoters of cell-specific genes are covered with expanded trimethylation of histone H3 at lysine 4 (H3K4me3) marks (i.e., broad H3K4me3 domains). Moreover, binding of specific TFs, such as OCT4, NANOG, and SOX2, have been shown to play a critical role in maintaining the pluripotency of stem cells. Despite these observations, a systematic exploration of genomic and epigenomic features of stem-cell-specific gene promoters has not been conducted. Advanced machine-learning models can capture distinguishable genomic and epigenomic characteristics of stem-cell-specific promoters by taking advantage of the wealth of publicly available datasets. Here, we propose a three-step framework to discover novel data characteristics of high-throughput next generation sequencing datasets that distinguish pluripotency genes in human and mouse embryonic stem cells (ESCs). Our framework involves: i) feature extraction to identify novel features of genomic datasets; ii) feature selection using a logistic regression model combined with the Least Absolute Shrinkage and Selection Operator (LASSO) method to find the most critical datasets and features; and iii) cross validation with features selected using LASSO method to assess the predictive power of selected data features in distinguishing pluripotency genes. We show that specific epigenetic marks, and specific features of these marks, are enriched at pluripotency gene promoters. Moreover, we also assess both the individual and combined effect of TF binding, epigenetic mark deposition, gene expression datasets for marking pluripotency genes. Our findings are consistent with the existence of a conserved, complex and integrative genomic signature in ESCs that can be exploited to flag important candidate pluripotency genes. They also validate our computational framework for fostering a deeper understanding of genomic datasets in stem cells, in the future, could be extended to study cell-type-specific genomic landscapes in other cell types. REVIEWERS: This article was reviewed by Zoltan Gaspari and Piotr Zielenkiewicz.