RESUMO
Complex structural variations (cxSVs) are often overlooked in genome analyses due to detection challenges. We developed ARC-SV, a probabilistic and machine-learning-based method that enables accurate detection and reconstruction of cxSVs from standard datasets. By applying ARC-SV across 4,262 genomes representing all continental populations, we identified cxSVs as a significant source of natural human genetic variation. Rare cxSVs have a propensity to occur in neural genes and loci that underwent rapid human-specific evolution, including those regulating corticogenesis. By performing single-nucleus multiomics in postmortem brains, we discovered cxSVs associated with differential gene expression and chromatin accessibility across various brain regions and cell types. Additionally, cxSVs detected in brains of psychiatric cases are enriched for linkage with psychiatric GWAS risk alleles detected in the same brains. Furthermore, our analysis revealed significantly decreased brain-region- and cell-type-specific expression of cxSV genes, specifically for psychiatric cases, implicating cxSVs in the molecular etiology of major neuropsychiatric disorders.
RESUMO
The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention1. Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.
Assuntos
Transformação Celular Neoplásica , Evolução Clonal , Lesões Pré-Cancerosas , Seleção Genética , Neoplasias Gástricas , Humanos , Transformação Celular Neoplásica/genética , Transformação Celular Neoplásica/patologia , Evolução Clonal/genética , Instabilidade Genômica , Mutação , Neoplasias Gástricas/genética , Neoplasias Gástricas/patologia , Lesões Pré-Cancerosas/genética , Lesões Pré-Cancerosas/patologia , Organoides/metabolismo , Organoides/patologia , Aneuploidia , Variações do Número de Cópias de DNA , Análise de Célula Única , Proteína Supressora de Tumor p53/deficiência , Proteína Supressora de Tumor p53/genética , Progressão da Doença , Linhagem da CélulaRESUMO
Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.
Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Animais , Camundongos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Regulação da Expressão GênicaRESUMO
Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency.
Assuntos
Peptídeos Penetradores de Células/metabolismo , Reprogramação Celular , Imunidade Inata , Células-Tronco Pluripotentes Induzidas/metabolismo , Transdução de Sinais , Linhagem Celular , Fibroblastos/metabolismo , Humanos , Inflamação/metabolismo , Fator 4 Semelhante a Kruppel , NF-kappa B/metabolismo , Fator 3 de Transcrição de Octâmero/metabolismo , Retroviridae/metabolismo , Receptor 3 Toll-Like/metabolismoRESUMO
We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.
Assuntos
Síndrome de DiGeorge , Humanos , Síndrome de DiGeorge/genética , Sistemas CRISPR-Cas , Pontos de Quebra do Cromossomo , Cromossomos Humanos Par 22/genética , Genoma Humano , Rearranjo Gênico , Análise de Sequência de DNA/métodos , Deleção CromossômicaRESUMO
Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.
Assuntos
Algoritmos , Redes Neurais de Computação , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Bases de Dados FactuaisRESUMO
Dysfunction in T cells limits the efficacy of cancer immunotherapy. We profiled the epigenome, transcriptome, and enhancer connectome of exhaustion-prone GD2-targeting HA-28z chimeric antigen receptor (CAR) T cells and control CD19-targeting CAR T cells, which present less exhaustion-inducing tonic signaling, at multiple points during their ex vivo expansion. We found widespread, dynamic changes in chromatin accessibility and three-dimensional (3D) chromosome conformation preceding changes in gene expression, notably at loci proximal to exhaustion-associated genes such as PDCD1, CTLA4, and HAVCR2, and increased DNA motif access for AP-1 family transcription factors, which are known to promote exhaustion. Although T cell exhaustion has been studied in detail in mice, we find that the regulatory networks of T cell exhaustion differ between species and involve distinct loci of accessible chromatin and cis-regulated target genes in human CAR T cell exhaustion. Deletion of exhaustion-specific candidate enhancers of PDCD1 suppress the expression of PD-1 in an in vitro model of T cell dysfunction and in HA-28z CAR T cells, suggesting enhancer editing as a path forward in improving cancer immunotherapy.
Assuntos
Cromatina/metabolismo , Neoplasias/terapia , Receptor de Morte Celular Programada 1/metabolismo , Receptores de Antígenos Quiméricos , Linfócitos T/fisiologia , Animais , Antígenos CD19 , Linhagem Celular , Cromatina/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Camundongos , Receptor de Morte Celular Programada 1/genéticaRESUMO
MOTIVATION: Isoform deconvolution is an NP-hard problem. The accuracy of the proposed solutions is far from perfect. At present, it is not known if gene structure and isoform concentration can be uniquely inferred given paired-end reads, and there is no objective method to select the fragment length to improve the number of identifiable genes. Different pieces of evidence suggest that the optimal fragment length is gene-dependent, stressing the need for a method that selects the fragment length according to a reasonable trade-off across all the genes in the whole genome. RESULTS: A gene is considered to be identifiable if it is possible to get both the structure and concentration of its transcripts univocally. Here, we present a method to state the identifiability of this deconvolution problem. Assuming a given transcriptome and that the coverage is sufficient to interrogate all junction reads of the transcripts, this method states whether or not a gene is identifiable given the read length and fragment length distribution. Applying this method using different read and fragment length combinations, the optimal average fragment length for the human transcriptome is around 400-600 nt for coding genes and 150-200 nt for long non-coding RNAs. The optimal read length is the largest one that fits in the fragment length. It is also discussed the potential profit of combining several libraries to reconstruct the transcriptome. Combining two libraries of very different fragment lengths results in a significant improvement in gene identifiability. AVAILABILITY AND IMPLEMENTATION: Code is available in GitHub (https://github.com/JFerrer-B/transcriptome-identifiability). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma , Transcriptoma , Humanos , RNA-Seq , Biblioteca Gênica , Isoformas de Proteínas/genética , SoftwareRESUMO
K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.
Assuntos
Genoma Humano , Humanos , Células K562 , Cariótipo , Polimorfismo Genético , Sequenciamento Completo do GenomaRESUMO
HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.
Assuntos
Mapeamento Cromossômico/métodos , Genoma Humano , Genômica/métodos , Haplótipos , Análise de Sequência de DNA/estatística & dados numéricos , Alelos , Aneuploidia , Metilação de DNA , Variação Estrutural do Genoma , Células Hep G2 , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Cariotipagem , Perda de Heterozigosidade , Polimorfismo de Nucleotídeo Único , RetroelementosRESUMO
In the vertebrate neural tube, regional Sonic hedgehog (Shh) signaling invokes a time- and concentration-dependent induction of six different cell populations mediated through Gli transcriptional regulators. Elsewhere in the embryo, Shh/Gli responses invoke different tissue-appropriate regulatory programs. A genome-scale analysis of DNA binding by Gli1 and Sox2, a pan-neural determinant, identified a set of shared regulatory regions associated with key factors central to cell fate determination and neural tube patterning. Functional analysis in transgenic mice validates core enhancers for each of these factors and demonstrates the dual requirement for Gli1 and Sox2 inputs for neural enhancer activity. Furthermore, through an unbiased determination of Gli-binding site preferences and analysis of binding site variants in the developing mammalian CNS, we demonstrate that differential Gli-binding affinity underlies threshold-level activator responses to Shh input. In summary, our results highlight Sox2 input as a context-specific determinant of the neural-specific Shh response and differential Gli-binding site affinity as an important cis-regulatory property critical for interpreting Shh morphogen action in the mammalian neural tube.
Assuntos
Padronização Corporal/fisiologia , Proteínas Hedgehog/metabolismo , Fatores de Transcrição Kruppel-Like/metabolismo , Fatores de Transcrição SOXB1/metabolismo , Animais , Padronização Corporal/genética , Camundongos , Camundongos Transgênicos , Tubo Neural/embriologia , Tubo Neural/metabolismo , Ligação Proteica , Proteína GLI1 em Dedos de ZincoRESUMO
PURPOSE: Despite the successful progress next-generation sequencing technologies has achieved in diagnosing the genetic cause of rare Mendelian diseases, the current diagnostic rate is still far from satisfactory because of heterogeneity, imprecision, and noise in disease phenotype descriptions and insufficient utilization of expert knowledge in clinical genetics. To overcome these difficulties, we present a novel method called Xrare for the prioritization of causative gene variants in rare disease diagnosis. METHODS: We propose a new phenotype similarity scoring method called Emission-Reception Information Content (ERIC), which is highly tolerant of noise and imprecision in clinical phenotypes. We utilize medical genetic domain knowledge by designing genetic features implementing American College of Medical Genetics and Genomics (ACMG) guidelines. RESULTS: ERIC score ranked consistently higher for disease genes than other phenotypic similarity scores in the presence of imprecise and noisy phenotypes. Extensive simulations and real clinical data demonstrated that Xrare outperforms existing alternative methods by 10-40% at various genetic diagnosis scenarios. CONCLUSION: The Xrare model is learned from a large database of clinical variants, and derives its strength from the tight integration of medical genetics features and phenotypic features similarity scores. Xrare provides the clinical community with a robust and powerful tool for variant prioritization.
Assuntos
Genômica/métodos , Aprendizado de Máquina , Doenças Raras/diagnóstico , Software , Biologia Computacional , Exoma/genética , Testes Genéticos , Variação Genética/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Fenótipo , Doenças Raras/genéticaRESUMO
Effective clinical management of prostate cancer (PCA) has been challenged by significant intratumoural heterogeneity on the genomic and pathological levels and limited understanding of the genetic elements governing disease progression. Here, we exploited the experimental merits of the mouse to test the hypothesis that pathways constraining progression might be activated in indolent Pten-null mouse prostate tumours and that inactivation of such progression barriers in mice would engender a metastasis-prone condition. Comparative transcriptomic and canonical pathway analyses, followed by biochemical confirmation, of normal prostate epithelium versus poorly progressive Pten-null prostate cancers revealed robust activation of the TGFß/BMP-SMAD4 signalling axis. The functional relevance of SMAD4 was further supported by emergence of invasive, metastatic and lethal prostate cancers with 100% penetrance upon genetic deletion of Smad4 in the Pten-null mouse prostate. Pathological and molecular analysis as well as transcriptomic knowledge-based pathway profiling of emerging tumours identified cell proliferation and invasion as two cardinal tumour biological features in the metastatic Smad4/Pten-null PCA model. Follow-on pathological and functional assessment confirmed cyclin D1 and SPP1 as key mediators of these biological processes, which together with PTEN and SMAD4, form a four-gene signature that is prognostic of prostate-specific antigen (PSA) biochemical recurrence and lethal metastasis in human PCA. This model-informed progression analysis, together with genetic, functional and translational studies, establishes SMAD4 as a key regulator of PCA progression in mice and humans.
Assuntos
Progressão da Doença , Metástase Neoplásica/patologia , Neoplasias da Próstata/patologia , Proteína Smad4/metabolismo , Animais , Proteínas Morfogenéticas Ósseas/metabolismo , Proliferação de Células , Ciclina D1/genética , Ciclina D1/metabolismo , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genes Supressores de Tumor/fisiologia , Humanos , Neoplasias Pulmonares/secundário , Metástase Linfática , Masculino , Camundongos , Camundongos Transgênicos , Modelos Biológicos , Invasividade Neoplásica/genética , Invasividade Neoplásica/patologia , Metástase Neoplásica/genética , Osteopontina/genética , Osteopontina/metabolismo , PTEN Fosfo-Hidrolase/deficiência , PTEN Fosfo-Hidrolase/genética , Penetrância , Prognóstico , Próstata/metabolismo , Antígeno Prostático Específico/metabolismo , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/genética , Proteína Smad4/deficiência , Proteína Smad4/genética , Fator de Crescimento Transformador betaRESUMO
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Assuntos
Análise de Sequência de DNA/métodos , DNA Bacteriano/química , DNA Mitocondrial/química , Escherichia coli/química , Guanosina/análogos & derivados , Guanosina/química , Humanos , Cinética , OxirreduçãoRESUMO
SUMMARY: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. AVAILABILITY AND IMPLEMENTATION: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Simulação por Computador , Genômica , Humanos , Mutação , Neoplasias/genética , Alinhamento de SequênciaRESUMO
UNLABELLED: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. AVAILABILITY AND IMPLEMENTATION: Code in Python is at http://bioinform.github.io/metasv/. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Mutagênese Insercional , Deleção de SequênciaRESUMO
Analyzing the failure times of multiple events is of interest in many fields. Estimating the joint distribution of the failure times in a non-parametric way is not straightforward because some failure times are often right-censored and only known to be greater than observed follow-up times. Although it has been studied, there is no universally optimal solution for this problem. It is still challenging and important to provide alternatives that may be more suitable than existing ones in specific settings. Related problems of the existing methods are not only limited to infeasible computations, but also include the lack of optimality and possible non-monotonicity of the estimated survival function. In this paper, we proposed a non-parametric Bayesian approach for directly estimating the density function of multivariate survival times, where the prior is constructed based on the optional Pólya tree. We investigated several theoretical aspects of the procedure and derived an efficient iterative algorithm for implementing the Bayesian procedure. The empirical performance of the method was examined via extensive simulation studies. Finally, we presented a detailed analysis using the proposed method on the relationship among organ recovery times in severely injured patients. From the analysis, we suggested interesting medical information that can be further pursued in clinics.
Assuntos
Algoritmos , Teorema de Bayes , Interpretação Estatística de Dados , Análise Multivariada , Análise de Sobrevida , Sistema Cardiovascular/patologia , Sistema Nervoso Central/patologia , Simulação por Computador , Humanos , Ferimentos e Lesões/patologiaRESUMO
Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development.
Assuntos
Blastocisto/fisiologia , Proteínas de Ligação a DNA/genética , Células-Tronco Embrionárias/fisiologia , Redes Reguladoras de Genes , Proteínas de Homeodomínio/genética , Fator 3 de Transcrição de Octâmero/genética , Fatores de Transcrição/genética , Animais , Blastocisto/metabolismo , DNA (Citosina-5-)-Metiltransferases/genética , DNA (Citosina-5-)-Metiltransferases/metabolismo , Proteínas de Ligação a DNA/metabolismo , Técnicas de Cultura Embrionária , Embrião de Mamíferos/metabolismo , Desenvolvimento Embrionário , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Técnicas de Silenciamento de Genes , Proteínas de Homeodomínio/metabolismo , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos DBA , MicroRNAs/genética , Proteína Homeobox Nanog , Fator 3 de Transcrição de Octâmero/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Transcrição/metabolismo , DNA Metiltransferase 3BRESUMO
Glioblastoma (GBM) is a highly lethal brain tumour presenting as one of two subtypes with distinct clinical histories and molecular profiles. The primary GBM subtype presents acutely as a high-grade disease that typically harbours mutations in EGFR, PTEN and INK4A/ARF (also known as CDKN2A), and the secondary GBM subtype evolves from the slow progression of a low-grade disease that classically possesses PDGF and TP53 events. Here we show that concomitant central nervous system (CNS)-specific deletion of p53 and Pten in the mouse CNS generates a penetrant acute-onset high-grade malignant glioma phenotype with notable clinical, pathological and molecular resemblance to primary GBM in humans. This genetic observation prompted TP53 and PTEN mutational analysis in human primary GBM, demonstrating unexpectedly frequent inactivating mutations of TP53 as well as the expected PTEN mutations. Integrated transcriptomic profiling, in silico promoter analysis and functional studies of murine neural stem cells (NSCs) established that dual, but not singular, inactivation of p53 and Pten promotes an undifferentiated state with high renewal potential and drives increased Myc protein levels and its associated signature. Functional studies validated increased Myc activity as a potent contributor to the impaired differentiation and enhanced renewal of NSCs doubly null for p53 and Pten (p53(-/-) Pten(-/-)) as well as tumour neurospheres (TNSs) derived from this model. Myc also serves to maintain robust tumorigenic potential of p53(-/-) Pten(-/-) TNSs. These murine modelling studies, together with confirmatory transcriptomic/promoter studies in human primary GBM, validate a pathogenetic role of a common tumour suppressor mutation profile in human primary GBM and establish Myc as an important target for cooperative actions of p53 and Pten in the regulation of normal and malignant stem/progenitor cell differentiation, self-renewal and tumorigenic potential.
Assuntos
Neoplasias Encefálicas/patologia , Diferenciação Celular , Glioma/patologia , Células-Tronco Neoplásicas/patologia , Neurônios/patologia , PTEN Fosfo-Hidrolase/metabolismo , Proteína Supressora de Tumor p53/metabolismo , Animais , Neoplasias Encefálicas/genética , Proliferação de Células , Regulação da Expressão Gênica , Glioblastoma/genética , Glioblastoma/patologia , Glioma/genética , Humanos , Imuno-Histoquímica , Camundongos , Células-Tronco Neoplásicas/metabolismo , Neurônios/metabolismo , PTEN Fosfo-Hidrolase/genética , Proteínas Proto-Oncogênicas c-myc/genética , Proteínas Proto-Oncogênicas c-myc/metabolismo , Proteína Supressora de Tumor p53/genéticaRESUMO
MOTIVATION: Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. RESULTS: We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data. AVAILABILITY: Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto CONTACT: whwong@stanford.edu.