Search | VHL Regional Portal

1.

Resolving the 22q11.2 deletion using CTLR-Seq reveals chromosomal rearrangement mechanisms and individual variance in breakpoints.

Zhou, Bo; Purmann, Carolin; Guo, Hanmin; Shin, GiWon; Huang, Yiling; Pattni, Reenal; Meng, Qingxi; Greer, Stephanie U; Roychowdhury, Tanmoy; Wood, Raegan N; Ho, Marcus; Dohna, Heinrich Zu; Abyzov, Alexej; Hallmayer, Joachim F; Wong, Wing H; Ji, Hanlee P; Urban, Alexander E.

Proc Natl Acad Sci U S A ; 121(31): e2322834121, 2024 Jul 30.

Article in English | MEDLINE | ID: mdl-39042694

ABSTRACT

We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.

Subject(s)

DiGeorge Syndrome , Humans , DiGeorge Syndrome/genetics , CRISPR-Cas Systems , Chromosome Breakpoints , Chromosomes, Human, Pair 22/genetics , Genome, Human , Gene Rearrangement , Sequence Analysis, DNA/methods , Chromosome Deletion

2.

Data integration and inference of gene regulation using single-cell temporal multimodal data with scTIE.

Lin, Yingxin; Wu, Tung-Yu; Chen, Xi; Wan, Sheng; Chao, Brian; Xin, Jingxue; Yang, Jean Y H; Wong, Wing H; Wang, Y X Rachel.

Genome Res ; 34(1): 119-133, 2024 02 07.

Article in English | MEDLINE | ID: mdl-38190633

ABSTRACT

Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.

Subject(s)

Gene Expression Profiling , Single-Cell Analysis , Animals , Mice , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Gene Expression Regulation

3.

scTIE: data integration and inference of gene regulation using single-cell temporal multimodal data.

Lin, Yingxin; Wu, Tung-Yu; Chen, Xi; Wan, Sheng; Chao, Brian; Xin, Jingxue; Yang, Jean Y H; Wong, Wing H; Wang, Y X Rachel.

bioRxiv ; 2023 May 22.

Article in English | MEDLINE | ID: mdl-37292801

ABSTRACT

Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal datasets, we demonstrate scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome dataset we generated from differentiating mouse embryonic stem cells over time, we demonstrate scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.

4.

Deterministic evolution and stringent selection during preneoplasia.

Karlsson, Kasper; Przybilla, Moritz J; Kotler, Eran; Khan, Aziz; Xu, Hang; Karagyozova, Kremena; Sockell, Alexandra; Wong, Wing H; Liu, Katherine; Mah, Amanda; Lo, Yuan-Hung; Lu, Bingxin; Houlahan, Kathleen E; Ma, Zhicheng; Suarez, Carlos J; Barnes, Chris P; Kuo, Calvin J; Curtis, Christina.

Nature ; 618(7964): 383-393, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37258665

ABSTRACT

The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention1. Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.

Subject(s)

Cell Transformation, Neoplastic , Clonal Evolution , Precancerous Conditions , Selection, Genetic , Stomach Neoplasms , Humans , Cell Transformation, Neoplastic/genetics , Cell Transformation, Neoplastic/pathology , Clonal Evolution/genetics , Genomic Instability , Mutation , Stomach Neoplasms/genetics , Stomach Neoplasms/pathology , Precancerous Conditions/genetics , Precancerous Conditions/pathology , Organoids/metabolism , Organoids/pathology , Aneuploidy , DNA Copy Number Variations , Single-Cell Analysis , Tumor Suppressor Protein p53/deficiency , Tumor Suppressor Protein p53/genetics , Disease Progression , Cell Lineage

5.

NeuronMotif: Deciphering cis-regulatory codes by layer-wise demixing of deep neural networks.

Wei, Zheng; Hua, Kui; Wei, Lei; Ma, Shining; Jiang, Rui; Zhang, Xuegong; Li, Yanda; Wong, Wing H; Wang, Xiaowo.

Proc Natl Acad Sci U S A ; 120(15): e2216698120, 2023 04 11.

Article in English | MEDLINE | ID: mdl-37023129

ABSTRACT

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.

Subject(s)

Algorithms , Neural Networks, Computer , Nucleotide Motifs/genetics , Regulatory Sequences, Nucleic Acid/genetics , Databases, Factual

6.

The origins and functional effects of postzygotic mutations throughout the human life span.

Rockweiler, Nicole B; Ramu, Avinash; Nagirnaja, Liina; Wong, Wing H; Noordam, Michiel J; Drubin, Casey W; Huang, Ni; Miller, Brian; Todres, Ellen Z; Vigh-Conrad, Katinka A; Zito, Antonino; Small, Kerrin S; Ardlie, Kristin G; Cohen, Barak A; Conrad, Donald F.

Science ; 380(6641): eabn7113, 2023 04 14.

Article in English | MEDLINE | ID: mdl-37053313

ABSTRACT

Postzygotic mutations (PZMs) begin to accrue in the human genome immediately after fertilization, but how and when PZMs affect development and lifetime health remain unclear. To study the origins and functional consequences of PZMs, we generated a multitissue atlas of PZMs spanning 54 tissue and cell types from 948 donors. Nearly half the variation in mutation burden among tissue samples can be explained by measured technical and biological effects, and 9% can be attributed to donor-specific effects. Through phylogenetic reconstruction of PZMs, we found that their type and predicted functional impact vary during prenatal development, across tissues, and through the germ cell life cycle. Thus, methods for interpreting effects across the body and the life span are needed to fully understand the consequences of genetic variants.

Subject(s)

DNA Mutational Analysis , Longevity , Zygote , Female , Humans , Longevity/genetics , Mutation , Phylogeny , RNA-Seq

7.

scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning.

Lin, Yingxin; Wu, Tung-Yu; Wan, Sheng; Yang, Jean Y H; Wong, Wing H; Wang, Y X Rachel.

Nat Biotechnol ; 40(5): 703-710, 2022 05.

Article in English | MEDLINE | ID: mdl-35058621

ABSTRACT

Single-cell multiomics data continues to grow at an unprecedented pace. Although several methods have demonstrated promising results in integrating several data modalities from the same tissue, the complexity and scale of data compositions present in cell atlases still pose a challenge. Here, we present scJoint, a transfer learning method to integrate atlas-scale, heterogeneous collections of scRNA-seq and scATAC-seq data. scJoint leverages information from annotated scRNA-seq data in a semisupervised framework and uses a neural network to simultaneously train labeled and unlabeled data, allowing label transfer and joint visualization in an integrative framework. Using atlas data as well as multimodal datasets generated with ASAP-seq and CITE-seq, we demonstrate that scJoint is computationally efficient and consistently achieves substantially higher cell-type label accuracy than existing methods while providing meaningful joint visualizations. Thus, scJoint overcomes the heterogeneity of different data modalities to enable a more comprehensive understanding of cellular phenotypes.

Subject(s)

Chromatin Immunoprecipitation Sequencing , Single-Cell Analysis , Machine Learning , RNA-Seq , Sequence Analysis, RNA , Single-Cell Analysis/methods , Exome Sequencing

8.

On the identifiability of the isoform deconvolution problem: application to select the proper fragment length in an RNA-seq library.

Ferrer-Bonsoms, Juan A; Morales, Xabier; Afshar, Pegah T; Wong, Wing H; Rubio, Angel.

Bioinformatics ; 38(6): 1491-1496, 2022 03 04.

Article in English | MEDLINE | ID: mdl-34978563

ABSTRACT

MOTIVATION: Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. RESULTS: A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability. AVAILABILITY AND IMPLEMENTATION: Code is available in GitHub (https://github.com/JFerrer-B/transcriptome-identifiability). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genome , Transcriptome , Humans , RNA-Seq , Gene Library , Protein Isoforms/genetics , Software

9.

Loss of KMT2C reprograms the epigenomic landscape in hPSCs resulting in NODAL overexpression and a failure of hemogenic endothelium specification.

Maurya, Shailendra; Yang, Wei; Tamai, Minori; Zhang, Qiang; Erdmann-Gilmore, Petra; Bystry, Amelia; Martins Rodrigues, Fernanda; Valentine, Mark C; Wong, Wing H; Townsend, Reid; Druley, Todd E.

Epigenetics ; 17(2): 220-238, 2022.

Article in English | MEDLINE | ID: mdl-34304711

ABSTRACT

Germline or somatic variation in the family of KMT2 lysine methyltransferases have been associated with a variety of congenital disorders and cancers. Notably, KMT2A-fusions are prevalent in 70% of infant leukaemias but fail to phenocopy short latency leukaemogenesis in mammalian models, suggesting additional factors are necessary for transformation. Given the lack of additional somatic mutation, the role of epigenetic regulation in cell specification, and our prior results of germline KMT2C variation in infant leukaemia patients, we hypothesized that germline dysfunction of KMT2C altered haematopoietic specification. In isogenic KMT2C KO hPSCs, we found genome-wide differences in histone modifications at active and poised enhancers, leading to gene expression profiles akin to mesendoderm rather than mesoderm highlighted by a significant increase in NODAL expression and WNT inhibition, ultimately resulting in a lack of in vitro hemogenic endothelium specification. These unbiased multi-omic results provide new evidence for germline mechanisms increasing risk of early leukaemogenesis.

Subject(s)

Epigenesis, Genetic , Hemangioblasts , Animals , DNA Methylation , Epigenomics , Humans , Mammals , Mutation

10.

Breast cancer-derived GM-CSF regulates arginase 1 in myeloid cells to promote an immunosuppressive microenvironment.

Su, Xinming; Xu, Yalin; Fox, Gregory C; Xiang, Jingyu; Kwakwa, Kristin A; Davis, Jennifer L; Belle, Jad I; Lee, Wen-Chih; Wong, Wing H; Fontana, Francesca; Hernandez-Aya, Leonel F; Kobayashi, Takayuki; Tomasson, Helen M; Su, Junyi; Bakewell, Suzanne J; Stewart, Sheila A; Egbulefu, Christopher; Karmakar, Partha; Meyer, Melisa A; Veis, Deborah J; DeNardo, David G; Lanza, Gregory M; Achilefu, Samuel; Weilbaecher, Katherine N.

J Clin Invest ; 131(20)2021 10 15.

Article in English | MEDLINE | ID: mdl-34520398

ABSTRACT

Tumor-infiltrating myeloid cells contribute to the development of the immunosuppressive tumor microenvironment. Myeloid cell expression of arginase 1 (ARG1) promotes a protumor phenotype by inhibiting T cell function and depleting extracellular l-arginine, but the mechanism underlying this expression, especially in breast cancer, is poorly understood. In breast cancer clinical samples and in our mouse models, we identified tumor-derived GM-CSF as the primary regulator of myeloid cell ARG1 expression and local immune suppression through a gene-KO screen of breast tumor cell-produced factors. The induction of myeloid cell ARG1 required GM-CSF and a low pH environment. GM-CSF signaling through STAT3 and p38 MAPK and acid signaling through cAMP were required to activate myeloid cell ARG1 expression in a STAT6-independent manner. Importantly, breast tumor cell-derived GM-CSF promoted tumor progression by inhibiting host antitumor immunity, driving a significant accumulation of ARG1-expressing myeloid cells compared with lung and melanoma tumors with minimal GM-CSF expression. Blockade of tumoral GM-CSF enhanced the efficacy of tumor-specific adoptive T cell therapy and immune checkpoint blockade. Taken together, we show that breast tumor cell-derived GM-CSF contributes to the development of the immunosuppressive breast cancer microenvironment by regulating myeloid cell ARG1 expression and can be targeted to enhance breast cancer immunotherapy.

Subject(s)

Arginase/physiology , Breast Neoplasms/immunology , Granulocyte-Macrophage Colony-Stimulating Factor/physiology , Immune Tolerance , Myeloid Cells/enzymology , Tumor Microenvironment , Animals , Breast Neoplasms/pathology , Cell Line, Tumor , Cyclic AMP/physiology , Female , Humans , Mice , Mice, Inbred C57BL

11.

Dynamic chromatin regulatory landscape of human CAR T cell exhaustion.

Gennert, David G; Lynn, Rachel C; Granja, Jeff M; Weber, Evan W; Mumbach, Maxwell R; Zhao, Yang; Duren, Zhana; Sotillo, Elena; Greenleaf, William J; Wong, Wing H; Satpathy, Ansuman T; Mackall, Crystal L; Chang, Howard Y.

Proc Natl Acad Sci U S A ; 118(30)2021 07 27.

Article in English | MEDLINE | ID: mdl-34285077

ABSTRACT

Dysfunction in T cells limits the efficacy of cancer immunotherapy. We profiled the epigenome, transcriptome, and enhancer connectome of exhaustion-prone GD2-targeting HA-28z chimeric antigen receptor (CAR) T cells and control CD19-targeting CAR T cells, which present less exhaustion-inducing tonic signaling, at multiple points during their ex vivo expansion. We found widespread, dynamic changes in chromatin accessibility and three-dimensional (3D) chromosome conformation preceding changes in gene expression, notably at loci proximal to exhaustion-associated genes such as PDCD1, CTLA4, and HAVCR2, and increased DNA motif access for AP-1 family transcription factors, which are known to promote exhaustion. Although T cell exhaustion has been studied in detail in mice, we find that the regulatory networks of T cell exhaustion differ between species and involve distinct loci of accessible chromatin and cis-regulated target genes in human CAR T cell exhaustion. Deletion of exhaustion-specific candidate enhancers of PDCD1 suppress the expression of PD-1 in an in vitro model of T cell dysfunction and in HA-28z CAR T cells, suggesting enhancer editing as a path forward in improving cancer immunotherapy.

Subject(s)

Chromatin/metabolism , Neoplasms/therapy , Programmed Cell Death 1 Receptor/metabolism , Receptors, Chimeric Antigen , T-Lymphocytes/physiology , Animals , Antigens, CD19 , Cell Line , Chromatin/genetics , Gene Expression Regulation, Neoplastic , Humans , Mice , Programmed Cell Death 1 Receptor/genetics

12.

A Non-stop identity complex (NIC) supervises enterocyte identity and protects from premature aging.

Erez, Neta; Israitel, Lena; Bitman-Lotan, Eliya; Wong, Wing H; Raz, Gal; Cornelio-Parra, Dayanne V; Danial, Salwa; Flint Brodsly, Na'ama; Belova, Elena; Maksimenko, Oksana; Georgiev, Pavel; Druley, Todd; Mohan, Ryan D; Orian, Amir.

Elife ; 102021 02 25.

Article in English | MEDLINE | ID: mdl-33629655

ABSTRACT

A hallmark of aging is loss of differentiated cell identity. Aged Drosophila midgut differentiated enterocytes (ECs) lose their identity, impairing tissue homeostasis. To discover identity regulators, we performed an RNAi screen targeting ubiquitin-related genes in ECs. Seventeen genes were identified, including the deubiquitinase Non-stop (CG4166). Lineage tracing established that acute loss of Non-stop in young ECs phenocopies aged ECs at cellular and tissue levels. Proteomic analysis unveiled that Non-stop maintains identity as part of a Non-stop identity complex (NIC) containing E(y)2, Sgf11, Cp190, (Mod) mdg4, and Nup98. Non-stop ensured chromatin accessibility, maintaining the EC-gene signature, and protected NIC subunit stability. Upon aging, the levels of Non-stop and NIC subunits declined, distorting the unique organization of the EC nucleus. Maintaining youthful levels of Non-stop in wildtype aged ECs safeguards NIC subunits, nuclear organization, and suppressed aging phenotypes. Thus, Non-stop and NIC, supervise EC identity and protects from premature aging.

Subject(s)

Aging, Premature/genetics , Aging/genetics , Drosophila Proteins/genetics , Drosophila melanogaster/physiology , Enterocytes/physiology , Animals , Disease Models, Animal , Drosophila Proteins/metabolism , Female , Male , Phenotype , Proteome

13.

The evolutionary dynamics and fitness landscape of clonal hematopoiesis.

Watson, Caroline J; Papula, A L; Poon, Gladys Y P; Wong, Wing H; Young, Andrew L; Druley, Todd E; Fisher, Daniel S; Blundell, Jamie R.

Science ; 367(6485): 1449-1454, 2020 03 27.

Article in English | MEDLINE | ID: mdl-32217721

ABSTRACT

Somatic mutations acquired in healthy tissues as we age are major determinants of cancer risk. Whether variants confer a fitness advantage or rise to detectable frequencies by chance remains largely unknown. Blood sequencing data from ~50,000 individuals reveal how mutation, genetic drift, and fitness shape the genetic diversity of healthy blood (clonal hematopoiesis). We show that positive selection, not drift, is the major force shaping clonal hematopoiesis, provide bounds on the number of hematopoietic stem cells, and quantify the fitness advantages of key pathogenic variants, at single-nucleotide resolution, as well as the distribution of fitness effects (fitness landscape) within commonly mutated driver genes. These data are consistent with clonal hematopoiesis being driven by a continuing risk of mutations and clonal expansions that become increasingly detectable with age.

Subject(s)

Aging , Biological Evolution , Genetic Drift , Genetic Fitness , Hematopoiesis/genetics , Selection, Genetic , Gene Frequency , Genetics, Population , Hematopoietic Stem Cells/cytology , Humans , Models, Genetic , Mutation , Mutation Rate

14.

NF1 glioblastoma clonal profiling reveals KMT2B mutations as potential somatic oncogenic events.

Wong, Wing H; Junck, Larry; Druley, Todd E; Gutmann, David H.

Neurology ; 93(24): 1067-1069, 2019 12 10.

Article in English | MEDLINE | ID: mdl-31690684

Subject(s)

Brain Neoplasms/genetics , Glioblastoma/genetics , Histone-Lysine N-Methyltransferase/genetics , Neurofibromatosis 1/complications , Neurofibromatosis 1/genetics , Adult , Autopsy , Humans , Male , Mutation

15.

Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2.

Zhou, Bo; Ho, Steve S; Greer, Stephanie U; Spies, Noah; Bell, John M; Zhang, Xianglong; Zhu, Xiaowei; Arthur, Joseph G; Byeon, Seunggyu; Pattni, Reenal; Saha, Ishan; Huang, Yiling; Song, Giltae; Perrin, Dimitri; Wong, Wing H; Ji, Hanlee P; Abyzov, Alexej; Urban, Alexander E.

Nucleic Acids Res ; 47(8): 3846-3861, 2019 05 07.

Article in English | MEDLINE | ID: mdl-30864654

ABSTRACT

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.

Subject(s)

Chromosome Mapping/methods , Genome, Human , Genomics/methods , Haplotypes , Sequence Analysis, DNA/statistics & numerical data , Alleles , Aneuploidy , DNA Methylation , Genomic Structural Variation , Hep G2 Cells , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Karyotyping , Loss of Heterozygosity , Polymorphism, Single Nucleotide , Retroelements

16.

Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562.

Zhou, Bo; Ho, Steve S; Greer, Stephanie U; Zhu, Xiaowei; Bell, John M; Arthur, Joseph G; Spies, Noah; Zhang, Xianglong; Byeon, Seunggyu; Pattni, Reenal; Ben-Efraim, Noa; Haney, Michael S; Haraksingh, Rajini R; Song, Giltae; Ji, Hanlee P; Perrin, Dimitri; Wong, Wing H; Abyzov, Alexej; Urban, Alexander E.

Genome Res ; 29(3): 472-484, 2019 03.

Article in English | MEDLINE | ID: mdl-30737237

ABSTRACT

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.

Subject(s)

Genome, Human , Humans , K562 Cells , Karyotype , Polymorphism, Genetic , Whole Genome Sequencing

17.

Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis.

Li, Qigang; Zhao, Keyan; Bustamante, Carlos D; Ma, Xin; Wong, Wing H.

Genet Med ; 21(9): 2126-2134, 2019 09.

Article in English | MEDLINE | ID: mdl-30675030

ABSTRACT

PURPOSE: Despite the successful progress next-generation sequencing technologies has achieved in diagnosing the genetic cause of rare Mendelian diseases, the current diagnostic rate is still far from satisfactory because of heterogeneity, imprecision, and noise in disease phenotype descriptions and insufficient utilization of expert knowledge in clinical genetics. To overcome these difficulties, we present a novel method called Xrare for the prioritization of causative gene variants in rare disease diagnosis. METHODS: We propose a new phenotype similarity scoring method called Emission-Reception Information Content (ERIC), which is highly tolerant of noise and imprecision in clinical phenotypes. We utilize medical genetic domain knowledge by designing genetic features implementing American College of Medical Genetics and Genomics (ACMG) guidelines. RESULTS: ERIC score ranked consistently higher for disease genes than other phenotypic similarity scores in the presence of imprecise and noisy phenotypes. Extensive simulations and real clinical data demonstrated that Xrare outperforms existing alternative methods by 10-40% at various genetic diagnosis scenarios. CONCLUSION: The Xrare model is learned from a large database of clinical variants, and derives its strength from the tight integration of medical genetics features and phenotypic features similarity scores. Xrare provides the clinical community with a robust and powerful tool for variant prioritization.

Subject(s)

Genomics/methods , Machine Learning , Rare Diseases/diagnosis , Software , Computational Biology , Exome/genetics , Genetic Testing , Genetic Variation/genetics , Genotype , High-Throughput Nucleotide Sequencing , Humans , Mutation , Phenotype , Rare Diseases/genetics

18.

TFAP2C- and p63-Dependent Networks Sequentially Rearrange Chromatin Landscapes to Drive Human Epidermal Lineage Commitment.

Li, Lingjie; Wang, Yong; Torkelson, Jessica L; Shankar, Gautam; Pattison, Jillian M; Zhen, Hanson H; Fang, Fengqin; Duren, Zhana; Xin, Jingxue; Gaddam, Sadhana; Melo, Sandra P; Piekos, Samantha N; Li, Jiang; Liaw, Eric J; Chen, Lang; Li, Rui; Wernig, Marius; Wong, Wing H; Chang, Howard Y; Oro, Anthony E.

Cell Stem Cell ; 24(2): 271-284.e8, 2019 02 07.

Article in English | MEDLINE | ID: mdl-30686763

ABSTRACT

Tissue development results from lineage-specific transcription factors (TFs) programming a dynamic chromatin landscape through progressive cell fate transitions. Here, we define epigenomic landscape during epidermal differentiation of human pluripotent stem cells (PSCs) and create inference networks that integrate gene expression, chromatin accessibility, and TF binding to define regulatory mechanisms during keratinocyte specification. We found two critical chromatin networks during surface ectoderm initiation and keratinocyte maturation, which are driven by TFAP2C and p63, respectively. Consistently, TFAP2C, but not p63, is sufficient to initiate surface ectoderm differentiation, and TFAP2C-initiated progenitor cells are capable of maturing into functional keratinocytes. Mechanistically, TFAP2C primes the surface ectoderm chromatin landscape and induces p63 expression and binding sites, thus allowing maturation factor p63 to positively autoregulate its own expression and close a subset of the TFAP2C-initiated surface ectoderm program. Our work provides a general framework to infer TF networks controlling chromatin transitions that will facilitate future regenerative medicine advances.

Subject(s)

Cell Lineage , Chromatin/metabolism , Epidermis/metabolism , Gene Regulatory Networks , Transcription Factor AP-2/metabolism , Transcription Factors/metabolism , Tumor Suppressor Proteins/metabolism , Cell Differentiation , Ectoderm/cytology , Epigenesis, Genetic , Feedback, Physiological , Humans , Keratinocytes/cytology , Transcriptome/genetics

19.

Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools.

Zhou, Bo; Arthur, Joseph G; Ho, Steve S; Pattni, Reenal; Huang, Yiling; Wong, Wing H; Urban, Alexander E.

Sci Data ; 5: 180261, 2018 12 18.

Article in English | MEDLINE | ID: mdl-30561434

ABSTRACT

We produced an extensive collection of deep re-sequencing datasets for the Venter/HuRef genome using the Illumina massively-parallel DNA sequencing platform. The original Venter genome sequence is a very-high quality phased assembly based on Sanger sequencing. Therefore, researchers developing novel computational tools for the analysis of human genome sequence variation for the dominant Illumina sequencing technology can test and hone their algorithms by making variant calls from these Venter/HuRef datasets and then immediately confirm the detected variants in the Sanger assembly, freeing them of the need for further experimental validation. This process also applies to implementing and benchmarking existing genome analysis pipelines. We prepared and sequenced 200 bp and 350 bp short-insert whole-genome sequencing libraries (sequenced to 100x and 40x genomic coverages respectively) as well as 2 kb, 5 kb, and 12 kb mate-pair libraries (49x, 122x, and 145x physical coverages respectively). Lastly, we produced a linked-read library (128x physical coverage) from which we also performed haplotype phasing.

Subject(s)

Benchmarking/methods , Genome, Human , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/standards , Algorithms , Gene Library , Genetic Variation , Humans

20.

Rare Event Detection Using Error-corrected DNA and RNA Sequencing.

Wong, Wing H; Tong, R Spencer; Young, Andrew L; Druley, Todd E.

J Vis Exp ; (138)2018 08 03.

Article in English | MEDLINE | ID: mdl-30124656

ABSTRACT

Conventional next-generation sequencing techniques (NGS) have allowed for immense genomic characterization for over a decade. Specifically, NGS has been used to analyze the spectrum of clonal mutations in malignancy. Though far more efficient than traditional Sanger methods, NGS struggles with identifying rare clonal and subclonal mutations due to its high error rate of ~0.5-2.0%. Thus, standard NGS has a limit of detection for mutations that are >0.02 variant allele fraction (VAF). While the clinical significance for mutations this rare in patients without known disease remains unclear, patients treated for leukemia have significantly improved outcomes when residual disease is <0.0001 by flow cytometry. In order to mitigate this artefactual background of NGS, numerous methods have been developed. Here we describe a method for Error-corrected DNA and RNA Sequencing (ECS), which involves tagging individual molecules with both a 16 bp random index for error-correction and an 8 bp patient-specific index for multiplexing. Our method can detect and track clonal mutations at variant allele fractions (VAFs) two orders of magnitude lower than the detection limit of NGS and as rare as 0.0001 VAF.

Subject(s)

Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Sequence Analysis, RNA/methods , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL