RESUMEN
Complex structural variations (cxSVs) are often overlooked in genome analyses due to detection challenges. We developed ARC-SV, a probabilistic and machine-learning-based method that enables accurate detection and reconstruction of cxSVs from standard datasets. By applying ARC-SV across 4,262 genomes representing all continental populations, we identified cxSVs as a significant source of natural human genetic variation. Rare cxSVs have a propensity to occur in neural genes and loci that underwent rapid human-specific evolution, including those regulating corticogenesis. By performing single-nucleus multiomics in postmortem brains, we discovered cxSVs associated with differential gene expression and chromatin accessibility across various brain regions and cell types. Additionally, cxSVs detected in brains of psychiatric cases are enriched for linkage with psychiatric GWAS risk alleles detected in the same brains. Furthermore, our analysis revealed significantly decreased brain-region- and cell-type-specific expression of cxSV genes, specifically for psychiatric cases, implicating cxSVs in the molecular etiology of major neuropsychiatric disorders.
RESUMEN
The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention1. Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.
Asunto(s)
Transformación Celular Neoplásica , Evolución Clonal , Lesiones Precancerosas , Selección Genética , Neoplasias Gástricas , Humanos , Transformación Celular Neoplásica/genética , Transformación Celular Neoplásica/patología , Evolución Clonal/genética , Inestabilidad Genómica , Mutación , Neoplasias Gástricas/genética , Neoplasias Gástricas/patología , Lesiones Precancerosas/genética , Lesiones Precancerosas/patología , Organoides/metabolismo , Organoides/patología , Aneuploidia , Variaciones en el Número de Copia de ADN , Análisis de la Célula Individual , Proteína p53 Supresora de Tumor/deficiencia , Proteína p53 Supresora de Tumor/genética , Progresión de la Enfermedad , Linaje de la CélulaRESUMEN
Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.
Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Animales , Ratones , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual/métodos , Regulación de la Expresión GénicaRESUMEN
Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency.
Asunto(s)
Péptidos de Penetración Celular/metabolismo , Reprogramación Celular , Inmunidad Innata , Células Madre Pluripotentes Inducidas/metabolismo , Transducción de Señal , Línea Celular , Fibroblastos/metabolismo , Humanos , Inflamación/metabolismo , Factor 4 Similar a Kruppel , FN-kappa B/metabolismo , Factor 3 de Transcripción de Unión a Octámeros/metabolismo , Retroviridae/metabolismo , Receptor Toll-Like 3/metabolismoRESUMEN
We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.
Asunto(s)
Síndrome de DiGeorge , Humanos , Síndrome de DiGeorge/genética , Sistemas CRISPR-Cas , Puntos de Rotura del Cromosoma , Cromosomas Humanos Par 22/genética , Genoma Humano , Reordenamiento Génico , Análisis de Secuencia de ADN/métodos , Deleción CromosómicaRESUMEN
Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.
Asunto(s)
Algoritmos , Redes Neurales de la Computación , Motivos de Nucleótidos/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Bases de Datos FactualesRESUMEN
Dysfunction in T cells limits the efficacy of cancer immunotherapy. We profiled the epigenome, transcriptome, and enhancer connectome of exhaustion-prone GD2-targeting HA-28z chimeric antigen receptor (CAR) T cells and control CD19-targeting CAR T cells, which present less exhaustion-inducing tonic signaling, at multiple points during their ex vivo expansion. We found widespread, dynamic changes in chromatin accessibility and three-dimensional (3D) chromosome conformation preceding changes in gene expression, notably at loci proximal to exhaustion-associated genes such as PDCD1, CTLA4, and HAVCR2, and increased DNA motif access for AP-1 family transcription factors, which are known to promote exhaustion. Although T cell exhaustion has been studied in detail in mice, we find that the regulatory networks of T cell exhaustion differ between species and involve distinct loci of accessible chromatin and cis-regulated target genes in human CAR T cell exhaustion. Deletion of exhaustion-specific candidate enhancers of PDCD1 suppress the expression of PD-1 in an in vitro model of T cell dysfunction and in HA-28z CAR T cells, suggesting enhancer editing as a path forward in improving cancer immunotherapy.
Asunto(s)
Cromatina/metabolismo , Neoplasias/terapia , Receptor de Muerte Celular Programada 1/metabolismo , Receptores Quiméricos de Antígenos , Linfocitos T/fisiología , Animales , Antígenos CD19 , Línea Celular , Cromatina/genética , Regulación Neoplásica de la Expresión Génica , Humanos , Ratones , Receptor de Muerte Celular Programada 1/genéticaRESUMEN
MOTIVATION: Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. RESULTS: A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability. AVAILABILITY AND IMPLEMENTATION: Code is available in GitHub (https://github.com/JFerrer-B/transcriptome-identifiability). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genoma , Transcriptoma , Humanos , RNA-Seq , Biblioteca de Genes , Isoformas de Proteínas/genética , Programas InformáticosRESUMEN
K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.
Asunto(s)
Genoma Humano , Humanos , Células K562 , Cariotipo , Polimorfismo Genético , Secuenciación Completa del GenomaRESUMEN
HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.
Asunto(s)
Mapeo Cromosómico/métodos , Genoma Humano , Genómica/métodos , Haplotipos , Análisis de Secuencia de ADN/estadística & datos numéricos , Alelos , Aneuploidia , Metilación de ADN , Variación Estructural del Genoma , Células Hep G2 , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Cariotipificación , Pérdida de Heterocigocidad , Polimorfismo de Nucleótido Simple , RetroelementosRESUMEN
In the vertebrate neural tube, regional Sonic hedgehog (Shh) signaling invokes a time- and concentration-dependent induction of six different cell populations mediated through Gli transcriptional regulators. Elsewhere in the embryo, Shh/Gli responses invoke different tissue-appropriate regulatory programs. A genome-scale analysis of DNA binding by Gli1 and Sox2, a pan-neural determinant, identified a set of shared regulatory regions associated with key factors central to cell fate determination and neural tube patterning. Functional analysis in transgenic mice validates core enhancers for each of these factors and demonstrates the dual requirement for Gli1 and Sox2 inputs for neural enhancer activity. Furthermore, through an unbiased determination of Gli-binding site preferences and analysis of binding site variants in the developing mammalian CNS, we demonstrate that differential Gli-binding affinity underlies threshold-level activator responses to Shh input. In summary, our results highlight Sox2 input as a context-specific determinant of the neural-specific Shh response and differential Gli-binding site affinity as an important cis-regulatory property critical for interpreting Shh morphogen action in the mammalian neural tube.
Asunto(s)
Tipificación del Cuerpo/fisiología , Proteínas Hedgehog/metabolismo , Factores de Transcripción de Tipo Kruppel/metabolismo , Factores de Transcripción SOXB1/metabolismo , Animales , Tipificación del Cuerpo/genética , Ratones , Ratones Transgénicos , Tubo Neural/embriología , Tubo Neural/metabolismo , Unión Proteica , Proteína con Dedos de Zinc GLI1RESUMEN
PURPOSE: Despite the successful progress next-generation sequencing technologies has achieved in diagnosing the genetic cause of rare Mendelian diseases, the current diagnostic rate is still far from satisfactory because of heterogeneity, imprecision, and noise in disease phenotype descriptions and insufficient utilization of expert knowledge in clinical genetics. To overcome these difficulties, we present a novel method called Xrare for the prioritization of causative gene variants in rare disease diagnosis. METHODS: We propose a new phenotype similarity scoring method called Emission-Reception Information Content (ERIC), which is highly tolerant of noise and imprecision in clinical phenotypes. We utilize medical genetic domain knowledge by designing genetic features implementing American College of Medical Genetics and Genomics (ACMG) guidelines. RESULTS: ERIC score ranked consistently higher for disease genes than other phenotypic similarity scores in the presence of imprecise and noisy phenotypes. Extensive simulations and real clinical data demonstrated that Xrare outperforms existing alternative methods by 10-40% at various genetic diagnosis scenarios. CONCLUSION: The Xrare model is learned from a large database of clinical variants, and derives its strength from the tight integration of medical genetics features and phenotypic features similarity scores. Xrare provides the clinical community with a robust and powerful tool for variant prioritization.
Asunto(s)
Genómica/métodos , Aprendizaje Automático , Enfermedades Raras/diagnóstico , Programas Informáticos , Biología Computacional , Exoma/genética , Pruebas Genéticas , Variación Genética/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación , Fenotipo , Enfermedades Raras/genéticaRESUMEN
Effective clinical management of prostate cancer (PCA) has been challenged by significant intratumoural heterogeneity on the genomic and pathological levels and limited understanding of the genetic elements governing disease progression. Here, we exploited the experimental merits of the mouse to test the hypothesis that pathways constraining progression might be activated in indolent Pten-null mouse prostate tumours and that inactivation of such progression barriers in mice would engender a metastasis-prone condition. Comparative transcriptomic and canonical pathway analyses, followed by biochemical confirmation, of normal prostate epithelium versus poorly progressive Pten-null prostate cancers revealed robust activation of the TGFß/BMP-SMAD4 signalling axis. The functional relevance of SMAD4 was further supported by emergence of invasive, metastatic and lethal prostate cancers with 100% penetrance upon genetic deletion of Smad4 in the Pten-null mouse prostate. Pathological and molecular analysis as well as transcriptomic knowledge-based pathway profiling of emerging tumours identified cell proliferation and invasion as two cardinal tumour biological features in the metastatic Smad4/Pten-null PCA model. Follow-on pathological and functional assessment confirmed cyclin D1 and SPP1 as key mediators of these biological processes, which together with PTEN and SMAD4, form a four-gene signature that is prognostic of prostate-specific antigen (PSA) biochemical recurrence and lethal metastasis in human PCA. This model-informed progression analysis, together with genetic, functional and translational studies, establishes SMAD4 as a key regulator of PCA progression in mice and humans.
Asunto(s)
Progresión de la Enfermedad , Metástasis de la Neoplasia/patología , Neoplasias de la Próstata/patología , Proteína Smad4/metabolismo , Animales , Proteínas Morfogenéticas Óseas/metabolismo , Proliferación Celular , Ciclina D1/genética , Ciclina D1/metabolismo , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Genes Supresores de Tumor/fisiología , Humanos , Neoplasias Pulmonares/secundario , Metástasis Linfática , Masculino , Ratones , Ratones Transgénicos , Modelos Biológicos , Invasividad Neoplásica/genética , Invasividad Neoplásica/patología , Metástasis de la Neoplasia/genética , Osteopontina/genética , Osteopontina/metabolismo , Fosfohidrolasa PTEN/deficiencia , Fosfohidrolasa PTEN/genética , Penetrancia , Pronóstico , Próstata/metabolismo , Antígeno Prostático Específico/metabolismo , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/genética , Proteína Smad4/deficiencia , Proteína Smad4/genética , Factor de Crecimiento Transformador betaRESUMEN
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Asunto(s)
Análisis de Secuencia de ADN/métodos , ADN Bacteriano/química , ADN Mitocondrial/química , Escherichia coli/química , Guanosina/análogos & derivados , Guanosina/química , Humanos , Cinética , Oxidación-ReducciónRESUMEN
SUMMARY: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. AVAILABILITY AND IMPLEMENTATION: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Simulación por Computador , Genómica , Humanos , Mutación , Neoplasias/genética , Alineación de SecuenciaRESUMEN
UNLABELLED: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. AVAILABILITY AND IMPLEMENTATION: Code in Python is at http://bioinform.github.io/metasv/. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Mutagénesis Insercional , Eliminación de SecuenciaRESUMEN
Analyzing the failure times of multiple events is of interest in many fields. Estimating the joint distribution of the failure times in a non-parametric way is not straightforward because some failure times are often right-censored and only known to be greater than observed follow-up times. Although it has been studied, there is no universally optimal solution for this problem. It is still challenging and important to provide alternatives that may be more suitable than existing ones in specific settings. Related problems of the existing methods are not only limited to infeasible computations, but also include the lack of optimality and possible non-monotonicity of the estimated survival function. In this paper, we proposed a non-parametric Bayesian approach for directly estimating the density function of multivariate survival times, where the prior is constructed based on the optional Pólya tree. We investigated several theoretical aspects of the procedure and derived an efficient iterative algorithm for implementing the Bayesian procedure. The empirical performance of the method was examined via extensive simulation studies. Finally, we presented a detailed analysis using the proposed method on the relationship among organ recovery times in severely injured patients. From the analysis, we suggested interesting medical information that can be further pursued in clinics.
Asunto(s)
Algoritmos , Teorema de Bayes , Interpretación Estadística de Datos , Análisis Multivariante , Análisis de Supervivencia , Sistema Cardiovascular/patología , Sistema Nervioso Central/patología , Simulación por Computador , Humanos , Heridas y Lesiones/patologíaRESUMEN
Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development.
Asunto(s)
Blastocisto/fisiología , Proteínas de Unión al ADN/genética , Células Madre Embrionarias/fisiología , Redes Reguladoras de Genes , Proteínas de Homeodominio/genética , Factor 3 de Transcripción de Unión a Octámeros/genética , Factores de Transcripción/genética , Animales , Blastocisto/metabolismo , ADN (Citosina-5-)-Metiltransferasas/genética , ADN (Citosina-5-)-Metiltransferasas/metabolismo , Proteínas de Unión al ADN/metabolismo , Técnicas de Cultivo de Embriones , Embrión de Mamíferos/metabolismo , Desarrollo Embrionario , Femenino , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Técnicas de Silenciamiento del Gen , Proteínas de Homeodominio/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Endogámicos DBA , MicroARNs/genética , Proteína Homeótica Nanog , Factor 3 de Transcripción de Unión a Octámeros/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Factores de Transcripción/metabolismo , ADN Metiltransferasa 3BRESUMEN
Glioblastoma (GBM) is a highly lethal brain tumour presenting as one of two subtypes with distinct clinical histories and molecular profiles. The primary GBM subtype presents acutely as a high-grade disease that typically harbours mutations in EGFR, PTEN and INK4A/ARF (also known as CDKN2A), and the secondary GBM subtype evolves from the slow progression of a low-grade disease that classically possesses PDGF and TP53 events. Here we show that concomitant central nervous system (CNS)-specific deletion of p53 and Pten in the mouse CNS generates a penetrant acute-onset high-grade malignant glioma phenotype with notable clinical, pathological and molecular resemblance to primary GBM in humans. This genetic observation prompted TP53 and PTEN mutational analysis in human primary GBM, demonstrating unexpectedly frequent inactivating mutations of TP53 as well as the expected PTEN mutations. Integrated transcriptomic profiling, in silico promoter analysis and functional studies of murine neural stem cells (NSCs) established that dual, but not singular, inactivation of p53 and Pten promotes an undifferentiated state with high renewal potential and drives increased Myc protein levels and its associated signature. Functional studies validated increased Myc activity as a potent contributor to the impaired differentiation and enhanced renewal of NSCs doubly null for p53 and Pten (p53(-/-) Pten(-/-)) as well as tumour neurospheres (TNSs) derived from this model. Myc also serves to maintain robust tumorigenic potential of p53(-/-) Pten(-/-) TNSs. These murine modelling studies, together with confirmatory transcriptomic/promoter studies in human primary GBM, validate a pathogenetic role of a common tumour suppressor mutation profile in human primary GBM and establish Myc as an important target for cooperative actions of p53 and Pten in the regulation of normal and malignant stem/progenitor cell differentiation, self-renewal and tumorigenic potential.
Asunto(s)
Neoplasias Encefálicas/patología , Diferenciación Celular , Glioma/patología , Células Madre Neoplásicas/patología , Neuronas/patología , Fosfohidrolasa PTEN/metabolismo , Proteína p53 Supresora de Tumor/metabolismo , Animales , Neoplasias Encefálicas/genética , Proliferación Celular , Regulación de la Expresión Génica , Glioblastoma/genética , Glioblastoma/patología , Glioma/genética , Humanos , Inmunohistoquímica , Ratones , Células Madre Neoplásicas/metabolismo , Neuronas/metabolismo , Fosfohidrolasa PTEN/genética , Proteínas Proto-Oncogénicas c-myc/genética , Proteínas Proto-Oncogénicas c-myc/metabolismo , Proteína p53 Supresora de Tumor/genéticaRESUMEN
MOTIVATION: Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. RESULTS: We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data. AVAILABILITY: Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto CONTACT: whwong@stanford.edu.