Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 78
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nature ; 618(7964): 383-393, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37258665

ABSTRACT

The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention1. Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.


Subject(s)
Cell Transformation, Neoplastic , Clonal Evolution , Precancerous Conditions , Selection, Genetic , Stomach Neoplasms , Humans , Cell Transformation, Neoplastic/genetics , Cell Transformation, Neoplastic/pathology , Clonal Evolution/genetics , Genomic Instability , Mutation , Stomach Neoplasms/genetics , Stomach Neoplasms/pathology , Precancerous Conditions/genetics , Precancerous Conditions/pathology , Organoids/metabolism , Organoids/pathology , Aneuploidy , DNA Copy Number Variations , Single-Cell Analysis , Tumor Suppressor Protein p53/deficiency , Tumor Suppressor Protein p53/genetics , Disease Progression , Cell Lineage
2.
Genome Res ; 34(1): 119-133, 2024 Feb 07.
Article in English | MEDLINE | ID: mdl-38190633

ABSTRACT

Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Animals , Mice , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Gene Expression Regulation
3.
Cell ; 151(3): 547-58, 2012 Oct 26.
Article in English | MEDLINE | ID: mdl-23101625

ABSTRACT

Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency.


Subject(s)
Cell-Penetrating Peptides/metabolism , Cellular Reprogramming , Immunity, Innate , Induced Pluripotent Stem Cells/metabolism , Signal Transduction , Cell Line , Fibroblasts/metabolism , Humans , Inflammation/metabolism , Kruppel-Like Factor 4 , NF-kappa B/metabolism , Octamer Transcription Factor-3/metabolism , Retroviridae/metabolism , Toll-Like Receptor 3/metabolism
4.
Proc Natl Acad Sci U S A ; 120(15): e2216698120, 2023 04 11.
Article in English | MEDLINE | ID: mdl-37023129

ABSTRACT

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.


Subject(s)
Algorithms , Neural Networks, Computer , Nucleotide Motifs/genetics , Regulatory Sequences, Nucleic Acid/genetics , Databases, Factual
5.
Proc Natl Acad Sci U S A ; 118(30)2021 07 27.
Article in English | MEDLINE | ID: mdl-34285077

ABSTRACT

Dysfunction in T cells limits the efficacy of cancer immunotherapy. We profiled the epigenome, transcriptome, and enhancer connectome of exhaustion-prone GD2-targeting HA-28z chimeric antigen receptor (CAR) T cells and control CD19-targeting CAR T cells, which present less exhaustion-inducing tonic signaling, at multiple points during their ex vivo expansion. We found widespread, dynamic changes in chromatin accessibility and three-dimensional (3D) chromosome conformation preceding changes in gene expression, notably at loci proximal to exhaustion-associated genes such as PDCD1, CTLA4, and HAVCR2, and increased DNA motif access for AP-1 family transcription factors, which are known to promote exhaustion. Although T cell exhaustion has been studied in detail in mice, we find that the regulatory networks of T cell exhaustion differ between species and involve distinct loci of accessible chromatin and cis-regulated target genes in human CAR T cell exhaustion. Deletion of exhaustion-specific candidate enhancers of PDCD1 suppress the expression of PD-1 in an in vitro model of T cell dysfunction and in HA-28z CAR T cells, suggesting enhancer editing as a path forward in improving cancer immunotherapy.


Subject(s)
Chromatin/metabolism , Neoplasms/therapy , Programmed Cell Death 1 Receptor/metabolism , Receptors, Chimeric Antigen , T-Lymphocytes/physiology , Animals , Antigens, CD19 , Cell Line , Chromatin/genetics , Gene Expression Regulation, Neoplastic , Humans , Mice , Programmed Cell Death 1 Receptor/genetics
6.
Bioinformatics ; 38(6): 1491-1496, 2022 03 04.
Article in English | MEDLINE | ID: mdl-34978563

ABSTRACT

MOTIVATION: Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. RESULTS: A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability. AVAILABILITY AND IMPLEMENTATION: Code is available in GitHub (https://github.com/JFerrer-B/transcriptome-identifiability). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Transcriptome , Humans , RNA-Seq , Gene Library , Protein Isoforms/genetics , Software
7.
Genome Res ; 29(3): 472-484, 2019 03.
Article in English | MEDLINE | ID: mdl-30737237

ABSTRACT

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.


Subject(s)
Genome, Human , Humans , K562 Cells , Karyotype , Polymorphism, Genetic , Whole Genome Sequencing
8.
Nucleic Acids Res ; 47(8): 3846-3861, 2019 05 07.
Article in English | MEDLINE | ID: mdl-30864654

ABSTRACT

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.


Subject(s)
Chromosome Mapping/methods , Genome, Human , Genomics/methods , Haplotypes , Sequence Analysis, DNA/statistics & numerical data , Alleles , Aneuploidy , DNA Methylation , Genomic Structural Variation , Hep G2 Cells , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Karyotyping , Loss of Heterozygosity , Polymorphism, Single Nucleotide , Retroelements
9.
Genes Dev ; 26(24): 2802-16, 2012 Dec 15.
Article in English | MEDLINE | ID: mdl-23249739

ABSTRACT

In the vertebrate neural tube, regional Sonic hedgehog (Shh) signaling invokes a time- and concentration-dependent induction of six different cell populations mediated through Gli transcriptional regulators. Elsewhere in the embryo, Shh/Gli responses invoke different tissue-appropriate regulatory programs. A genome-scale analysis of DNA binding by Gli1 and Sox2, a pan-neural determinant, identified a set of shared regulatory regions associated with key factors central to cell fate determination and neural tube patterning. Functional analysis in transgenic mice validates core enhancers for each of these factors and demonstrates the dual requirement for Gli1 and Sox2 inputs for neural enhancer activity. Furthermore, through an unbiased determination of Gli-binding site preferences and analysis of binding site variants in the developing mammalian CNS, we demonstrate that differential Gli-binding affinity underlies threshold-level activator responses to Shh input. In summary, our results highlight Sox2 input as a context-specific determinant of the neural-specific Shh response and differential Gli-binding site affinity as an important cis-regulatory property critical for interpreting Shh morphogen action in the mammalian neural tube.


Subject(s)
Body Patterning/physiology , Hedgehog Proteins/metabolism , Kruppel-Like Transcription Factors/metabolism , SOXB1 Transcription Factors/metabolism , Animals , Body Patterning/genetics , Mice , Mice, Transgenic , Neural Tube/embryology , Neural Tube/metabolism , Protein Binding , Zinc Finger Protein GLI1
10.
Genet Med ; 21(9): 2126-2134, 2019 09.
Article in English | MEDLINE | ID: mdl-30675030

ABSTRACT

PURPOSE: Despite the successful progress next-generation sequencing technologies has achieved in diagnosing the genetic cause of rare Mendelian diseases, the current diagnostic rate is still far from satisfactory because of heterogeneity, imprecision, and noise in disease phenotype descriptions and insufficient utilization of expert knowledge in clinical genetics. To overcome these difficulties, we present a novel method called Xrare for the prioritization of causative gene variants in rare disease diagnosis. METHODS: We propose a new phenotype similarity scoring method called Emission-Reception Information Content (ERIC), which is highly tolerant of noise and imprecision in clinical phenotypes. We utilize medical genetic domain knowledge by designing genetic features implementing American College of Medical Genetics and Genomics (ACMG) guidelines. RESULTS: ERIC score ranked consistently higher for disease genes than other phenotypic similarity scores in the presence of imprecise and noisy phenotypes. Extensive simulations and real clinical data demonstrated that Xrare outperforms existing alternative methods by 10-40% at various genetic diagnosis scenarios. CONCLUSION: The Xrare model is learned from a large database of clinical variants, and derives its strength from the tight integration of medical genetics features and phenotypic features similarity scores. Xrare provides the clinical community with a robust and powerful tool for variant prioritization.


Subject(s)
Genomics/methods , Machine Learning , Rare Diseases/diagnosis , Software , Computational Biology , Exome/genetics , Genetic Testing , Genetic Variation/genetics , Genotype , High-Throughput Nucleotide Sequencing , Humans , Mutation , Phenotype , Rare Diseases/genetics
11.
Nature ; 470(7333): 269-73, 2011 Feb 10.
Article in English | MEDLINE | ID: mdl-21289624

ABSTRACT

Effective clinical management of prostate cancer (PCA) has been challenged by significant intratumoural heterogeneity on the genomic and pathological levels and limited understanding of the genetic elements governing disease progression. Here, we exploited the experimental merits of the mouse to test the hypothesis that pathways constraining progression might be activated in indolent Pten-null mouse prostate tumours and that inactivation of such progression barriers in mice would engender a metastasis-prone condition. Comparative transcriptomic and canonical pathway analyses, followed by biochemical confirmation, of normal prostate epithelium versus poorly progressive Pten-null prostate cancers revealed robust activation of the TGFß/BMP-SMAD4 signalling axis. The functional relevance of SMAD4 was further supported by emergence of invasive, metastatic and lethal prostate cancers with 100% penetrance upon genetic deletion of Smad4 in the Pten-null mouse prostate. Pathological and molecular analysis as well as transcriptomic knowledge-based pathway profiling of emerging tumours identified cell proliferation and invasion as two cardinal tumour biological features in the metastatic Smad4/Pten-null PCA model. Follow-on pathological and functional assessment confirmed cyclin D1 and SPP1 as key mediators of these biological processes, which together with PTEN and SMAD4, form a four-gene signature that is prognostic of prostate-specific antigen (PSA) biochemical recurrence and lethal metastasis in human PCA. This model-informed progression analysis, together with genetic, functional and translational studies, establishes SMAD4 as a key regulator of PCA progression in mice and humans.


Subject(s)
Disease Progression , Neoplasm Metastasis/pathology , Prostatic Neoplasms/pathology , Smad4 Protein/metabolism , Animals , Bone Morphogenetic Proteins/metabolism , Cell Proliferation , Cyclin D1/genetics , Cyclin D1/metabolism , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Genes, Tumor Suppressor/physiology , Humans , Lung Neoplasms/secondary , Lymphatic Metastasis , Male , Mice , Mice, Transgenic , Models, Biological , Neoplasm Invasiveness/genetics , Neoplasm Invasiveness/pathology , Neoplasm Metastasis/genetics , Osteopontin/genetics , Osteopontin/metabolism , PTEN Phosphohydrolase/deficiency , PTEN Phosphohydrolase/genetics , Penetrance , Prognosis , Prostate/metabolism , Prostate-Specific Antigen/metabolism , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/genetics , Smad4 Protein/deficiency , Smad4 Protein/genetics , Transforming Growth Factor beta
12.
Genome Res ; 23(1): 129-41, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23093720

ABSTRACT

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.


Subject(s)
Sequence Analysis, DNA/methods , DNA, Bacterial/chemistry , DNA, Mitochondrial/chemistry , Escherichia coli/chemistry , Guanosine/analogs & derivatives , Guanosine/chemistry , Humans , Kinetics , Oxidation-Reduction
13.
Bioinformatics ; 31(16): 2741-4, 2015 Aug 15.
Article in English | MEDLINE | ID: mdl-25861968

ABSTRACT

UNLABELLED: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. AVAILABILITY AND IMPLEMENTATION: Code in Python is at http://bioinform.github.io/metasv/. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Software , Mutagenesis, Insertional , Sequence Deletion
14.
Bioinformatics ; 31(9): 1469-71, 2015 May 01.
Article in English | MEDLINE | ID: mdl-25524895

ABSTRACT

SUMMARY: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. AVAILABILITY AND IMPLEMENTATION: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Software , Computer Simulation , Genomics , Humans , Mutation , Neoplasms/genetics , Sequence Alignment
15.
Biostatistics ; 15(1): 182-95, 2014 Jan.
Article in English | MEDLINE | ID: mdl-23902636

ABSTRACT

Analyzing the failure times of multiple events is of interest in many fields. Estimating the joint distribution of the failure times in a non-parametric way is not straightforward because some failure times are often right-censored and only known to be greater than observed follow-up times. Although it has been studied, there is no universally optimal solution for this problem. It is still challenging and important to provide alternatives that may be more suitable than existing ones in specific settings. Related problems of the existing methods are not only limited to infeasible computations, but also include the lack of optimality and possible non-monotonicity of the estimated survival function. In this paper, we proposed a non-parametric Bayesian approach for directly estimating the density function of multivariate survival times, where the prior is constructed based on the optional Pólya tree. We investigated several theoretical aspects of the procedure and derived an efficient iterative algorithm for implementing the Bayesian procedure. The empirical performance of the method was examined via extensive simulation studies. Finally, we presented a detailed analysis using the proposed method on the relationship among organ recovery times in severely injured patients. From the analysis, we suggested interesting medical information that can be further pursued in clinics.


Subject(s)
Algorithms , Bayes Theorem , Data Interpretation, Statistical , Multivariate Analysis , Survival Analysis , Cardiovascular System/pathology , Central Nervous System/pathology , Computer Simulation , Humans , Wounds and Injuries/pathology
16.
Mol Syst Biol ; 9: 632, 2013.
Article in English | MEDLINE | ID: mdl-23295861

ABSTRACT

Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development.


Subject(s)
Blastocyst/physiology , DNA-Binding Proteins/genetics , Embryonic Stem Cells/physiology , Gene Regulatory Networks , Homeodomain Proteins/genetics , Octamer Transcription Factor-3/genetics , Transcription Factors/genetics , Animals , Blastocyst/metabolism , DNA (Cytosine-5-)-Methyltransferases/genetics , DNA (Cytosine-5-)-Methyltransferases/metabolism , DNA-Binding Proteins/metabolism , Embryo Culture Techniques , Embryo, Mammalian/metabolism , Embryonic Development , Female , Gene Expression Profiling , Gene Expression Regulation, Developmental , Gene Knockdown Techniques , Homeodomain Proteins/metabolism , Male , Mice , Mice, Inbred C57BL , Mice, Inbred DBA , MicroRNAs/genetics , Nanog Homeobox Protein , Octamer Transcription Factor-3/metabolism , Oligonucleotide Array Sequence Analysis , Transcription Factors/metabolism , DNA Methyltransferase 3B
17.
Nature ; 455(7216): 1129-33, 2008 Oct 23.
Article in English | MEDLINE | ID: mdl-18948956

ABSTRACT

Glioblastoma (GBM) is a highly lethal brain tumour presenting as one of two subtypes with distinct clinical histories and molecular profiles. The primary GBM subtype presents acutely as a high-grade disease that typically harbours mutations in EGFR, PTEN and INK4A/ARF (also known as CDKN2A), and the secondary GBM subtype evolves from the slow progression of a low-grade disease that classically possesses PDGF and TP53 events. Here we show that concomitant central nervous system (CNS)-specific deletion of p53 and Pten in the mouse CNS generates a penetrant acute-onset high-grade malignant glioma phenotype with notable clinical, pathological and molecular resemblance to primary GBM in humans. This genetic observation prompted TP53 and PTEN mutational analysis in human primary GBM, demonstrating unexpectedly frequent inactivating mutations of TP53 as well as the expected PTEN mutations. Integrated transcriptomic profiling, in silico promoter analysis and functional studies of murine neural stem cells (NSCs) established that dual, but not singular, inactivation of p53 and Pten promotes an undifferentiated state with high renewal potential and drives increased Myc protein levels and its associated signature. Functional studies validated increased Myc activity as a potent contributor to the impaired differentiation and enhanced renewal of NSCs doubly null for p53 and Pten (p53(-/-) Pten(-/-)) as well as tumour neurospheres (TNSs) derived from this model. Myc also serves to maintain robust tumorigenic potential of p53(-/-) Pten(-/-) TNSs. These murine modelling studies, together with confirmatory transcriptomic/promoter studies in human primary GBM, validate a pathogenetic role of a common tumour suppressor mutation profile in human primary GBM and establish Myc as an important target for cooperative actions of p53 and Pten in the regulation of normal and malignant stem/progenitor cell differentiation, self-renewal and tumorigenic potential.


Subject(s)
Brain Neoplasms/pathology , Cell Differentiation , Glioma/pathology , Neoplastic Stem Cells/pathology , Neurons/pathology , PTEN Phosphohydrolase/metabolism , Tumor Suppressor Protein p53/metabolism , Animals , Brain Neoplasms/genetics , Cell Proliferation , Gene Expression Regulation , Glioblastoma/genetics , Glioblastoma/pathology , Glioma/genetics , Humans , Immunohistochemistry , Mice , Neoplastic Stem Cells/metabolism , Neurons/metabolism , PTEN Phosphohydrolase/genetics , Proto-Oncogene Proteins c-myc/genetics , Proto-Oncogene Proteins c-myc/metabolism , Tumor Suppressor Protein p53/genetics
18.
Bioinformatics ; 28(18): 2366-73, 2012 Sep 15.
Article in English | MEDLINE | ID: mdl-22811546

ABSTRACT

MOTIVATION: Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. RESULTS: We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data. AVAILABILITY: Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto CONTACT: whwong@stanford.edu.


Subject(s)
High-Throughput Nucleotide Sequencing , Sequence Alignment/methods , Sequence Analysis, DNA , Software , Algorithms , Genome, Human , Genomics , Humans , INDEL Mutation
19.
Proc Natl Acad Sci U S A ; 107(21): 9736-41, 2010 May 25.
Article in English | MEDLINE | ID: mdl-20460306

ABSTRACT

Many genes initially identified for their roles in cell fate determination or signaling during development can have a significant impact on tumorigenesis. In the developing cerebellum, Sonic hedgehog (Shh) stimulates the proliferation of granule neuron precursor cells (GNPs) by activating the Gli transcription factors. Inappropriate activation of Shh target genes results in unrestrained cell division and eventually medulloblastoma, the most common pediatric brain malignancy. We find dramatic differences in the gene networks that are directly driven by the Gli1 transcription factor in GNPs and medulloblastoma. Gli1 binding location analysis revealed hundreds of genomic loci bound by Gli1 in normal and cancer cells. Only one third of the genes bound by Gli1 in GNPs were also bound in tumor cells. Correlation with gene expression levels indicated that 116 genes were preferentially transcribed in tumors, whereas 132 genes were target genes in both GNPs and medulloblastoma. Quantitative PCR and in situ hybridization for some putative target genes support their direct regulation by Gli. The results indicate that transformation of normal GNPs into deadly tumor cells is accompanied by a distinct set of Gli-regulated genes and may provide candidates for targeted therapies.


Subject(s)
Cell Transformation, Neoplastic/genetics , Cerebellum/growth & development , Cerebellum/metabolism , Gene Expression Regulation, Developmental , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Signal Transduction , Animals , Cell Transformation, Neoplastic/metabolism , Cell Transformation, Neoplastic/pathology , Hedgehog Proteins/metabolism , Kruppel-Like Transcription Factors/genetics , Kruppel-Like Transcription Factors/metabolism , Mice , Protein Binding , Transcriptional Activation , Zinc Finger Protein GLI1
20.
Proc Natl Acad Sci U S A ; 107(31): 13570-5, 2010 Aug 03.
Article in English | MEDLINE | ID: mdl-20643955

ABSTRACT

Nearly 75% of in vitro fertilization (IVF) treatments do not result in live births and patients are largely guided by a generalized age-based prognostic stratification. We sought to provide personalized and validated prognosis by using available clinical and embryo data from prior, failed treatments to predict live birth probabilities in the subsequent treatment. We generated a boosted tree model, IVFBT, by training it with IVF outcomes data from 1,676 first cycles (C1s) from 2003-2006, followed by external validation with 634 cycles from 2007-2008, respectively. We tested whether this model could predict the probability of having a live birth in the subsequent treatment (C2). By using nondeterministic methods to identify prognostic factors and their relative nonredundant contribution, we generated a prediction model, IVF(BT), that was superior to the age-based control by providing over 1,000-fold improvement to fit new data (p<0.05), and increased discrimination by receiver-operative characteristic analysis (area-under-the-curve, 0.80 vs. 0.68 for C1, 0.68 vs. 0.58 for C2). IVFBT provided predictions that were more accurate for approximately 83% of C1 and approximately 60% of C2 cycles that were out of the range predicted by age. Over half of those patients were reclassified to have higher live birth probabilities. We showed that data from a prior cycle could be used effectively to provide personalized and validated live birth probabilities in a subsequent cycle. Our approach may be replicated and further validated in other IVF clinics.


Subject(s)
Fertilization in Vitro/statistics & numerical data , Live Birth , Pregnancy Outcome , Adult , Age Distribution , Biometry , Cryopreservation , Female , Humans , Male , Phenotype , Pregnancy , Pregnancy Rate , Probability
SELECTION OF CITATIONS
SEARCH DETAIL