RESUMO
DNA methylation is a key regulator of embryonic stem cell (ESC) biology, dynamically changing between naïve, primed, and differentiated states. The p53 tumor suppressor is a pivotal guardian of genomic stability, but its contributions to epigenetic regulation and stem cell biology are less explored. We report that, in naïve mouse ESCs (mESCs), p53 restricts the expression of the de novo DNA methyltransferases Dnmt3a and Dnmt3b while up-regulating Tet1 and Tet2, which promote DNA demethylation. The DNA methylation imbalance in p53-deficient (p53-/-) mESCs is the result of augmented overall DNA methylation as well as increased methylation landscape heterogeneity. In differentiating p53-/- mESCs, elevated methylation persists, albeit more mildly. Importantly, concomitant with DNA methylation heterogeneity, p53-/- mESCs display increased cellular heterogeneity both in the "naïve" state and upon induced differentiation. This impact of p53 loss on 5-methylcytosine (5mC) heterogeneity was also evident in human ESCs and mouse embryos in vivo. Hence, p53 helps maintain DNA methylation homeostasis and clonal homogeneity, a function that may contribute to its tumor suppressor activity.
Assuntos
Metilação de DNA/genética , Regulação da Expressão Gênica/genética , Heterogeneidade Genética , Homeostase/genética , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo , Animais , Diferenciação Celular/genética , Células Clonais , DNA (Citosina-5-)-Metiltransferases/genética , Células-Tronco Embrionárias , Deleção de Genes , Humanos , Camundongos , Proteínas Proto-Oncogênicas/genéticaRESUMO
DNA methylation patterns are set up in a relatively fixed programmed manner during normal embryonic development and are then stably maintained. Using genome-wide analysis, we discovered a postnatal pathway involving gender-specific demethylation that occurs exclusively in the male liver. This demodification is programmed to take place at tissue-specific enhancer sequences, and our data show that the methylation state at these loci is associated with and appears to play a role in the transcriptional regulation of nearby genes. This process is mediated by the secretion of testosterone at the time of sexual maturity, but the resulting methylation profile is stable and therefore can serve as an epigenetic memory even in the absence of this inducer. These findings add a new dimension to our understanding of the role of DNA methylation in vivo and provide the foundations for deciphering how environment can impact on the epigenetic regulation of genes in general.
Assuntos
Metilação de DNA , Epigênese Genética/genética , Fígado/metabolismo , Androgênios/farmacologia , Animais , Castração , Metilação de DNA/efeitos dos fármacos , Elementos Facilitadores Genéticos/genética , Epigênese Genética/efeitos dos fármacos , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Estudo de Associação Genômica Ampla , Histonas/genética , Histonas/metabolismo , Humanos , Fígado/efeitos dos fármacos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Caracteres Sexuais , Testosterona/metabolismo , Testosterona/farmacologiaRESUMO
Short tandem repeats (STRs) are polymorphic genomic loci valuable for various applications such as research, diagnostics and forensics. However, their polymorphic nature also introduces noise during in vitro amplification, making them difficult to analyze. Although it is possible to overcome stutter noise by using amplification-free library preparation, such protocols are presently incompatible with single cell analysis and with targeted-enrichment protocols. To address this challenge, we have designed a method for direct measurement of in vitro noise. Using a synthetic STR sequencing library, we have calibrated a Markov model for the prediction of stutter patterns at any amplification cycle. By employing this model, we have managed to genotype accurately cases of severe amplification bias, and biallelic STR signals, and validated our model for several high-fidelity PCR enzymes. Finally, we compared this model in the context of a naïve STR genotyping strategy against the state-of-the-art on a benchmark of single cells, demonstrating superior accuracy.
Assuntos
Técnicas de Genotipagem/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites/genética , Alelos , Genótipo , HumanosRESUMO
Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased by functional dependencies. Here we show an integrated biochemical-computational platform for generic single-cell lineage analysis that is retrospective, cost-effective, and scalable. It consists of a biochemical-computational pipeline that inputs individual cells, produces targeted single-cell sequencing data, and uses it to generate a lineage tree of the input cells. We validated the platform by applying it to cells sampled from an ex vivo grown tree and analyzed its feasibility landscape by computer simulations. We conclude that the platform may serve as a generic tool for lineage analysis and thus pave the way toward large-scale human cell lineage discovery.
Assuntos
Linhagem da Célula , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Algoritmos , Linhagem Celular Tumoral , Células Cultivadas , Humanos , Masculino , Microfluídica/métodos , Pessoa de Meia-Idade , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/normas , Análise de Célula Única/economia , Análise de Célula Única/normasRESUMO
Background Computational models on the basis of deep neural networks are increasingly used to analyze health care data. However, the efficacy of traditional computational models in radiology is a matter of debate. Purpose To evaluate the accuracy and efficiency of a combined machine and deep learning approach for early breast cancer detection applied to a linked set of digital mammography images and electronic health records. Materials and Methods In this retrospective study, 52 936 images were collected in 13 234 women who underwent at least one mammogram between 2013 and 2017, and who had health records for at least 1 year before undergoing mammography. The algorithm was trained on 9611 mammograms and health records of women to make two breast cancer predictions: to predict biopsy malignancy and to differentiate normal from abnormal screening examinations. The study estimated the association of features with outcomes by using t test and Fisher exact test. The model comparisons were performed with a 95% confidence interval (CI) or by using the DeLong test. Results The resulting algorithm was validated in 1055 women and tested in 2548 women (mean age, 55 years ± 10 [standard deviation]). In the test set, the algorithm identified 34 of 71 (48%) false-negative findings on mammograms. For the malignancy prediction objective, the algorithm obtained an area under the receiver operating characteristic curve (AUC) of 0.91 (95% CI: 0.89, 0.93), with specificity of 77.3% (95% CI: 69.2%, 85.4%) at a sensitivity of 87%. When trained on clinical data alone, the model performed significantly better than the Gail model (AUC, 0.78 vs 0.54, respectively; P < .004). Conclusion The algorithm, which combined machine-learning and deep-learning approaches, can be applied to assess breast cancer at a level comparable to radiologists and has the potential to substantially reduce missed diagnoses of breast cancer. © RSNA, 2019 Online supplemental material is available for this article.
Assuntos
Neoplasias da Mama/diagnóstico por imagem , Aprendizado Profundo , Registros Eletrônicos de Saúde , Mamografia/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Mama/diagnóstico por imagem , Feminino , Humanos , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Estudos Retrospectivos , Sensibilidade e EspecificidadeRESUMO
Gestational age determination by traditional tools (last menstrual period, ultrasonography measurements and Ballard Maturational Assessment in newborns) has major limitations and therefore there is a need to find different approaches. In this study, we looked for a molecular marker that can be used to determine the accurate gestational age of the newborn. To this end, we performed reduced representation bisulfite sequencing (RRBS) on 41 cord blood and matching placenta samples from women between 25 and 40 weeks of gestation and generated an epigenetic clock based on the methylation level at different loci in the genome. We identified a set of 332 differentially methylated regions (DMRs) that undergo demethylation in late gestational age in cord blood cells and can predict the gestational age (r = -.7, P = 2E-05). Once the set of 411 DMRs that undergo de novo methylation in late gestational age was used in combination with the first set, it generated a more accurate clock (R = .77, P = 1.87E-05). We have compared gestational age determined by Ballard score assessment with our epigenetic clock and found high concordance. Taken together, this study demonstrates that DNA methylation can accurately predict gestational age and thus may serve as a good clinical predictor.
Assuntos
Metilação de DNA , Idade Gestacional , Biomarcadores , Feminino , Genoma Humano , Humanos , GravidezRESUMO
There is ample evidence that somatic cell differentiation during development is accompanied by extensive DNA demethylation of specific sites that vary between cell types. Although the mechanism of this process has not yet been elucidated, it is likely to involve the conversion of 5mC to 5hmC by Tet enzymes. We show that a Tet2/Tet3 conditional knockout at early stages of B-cell development largely prevents lineage-specific programmed demethylation events. This lack of demethylation affects the expression of nearby B-cell lineage genes by impairing enhancer activity, thus causing defects in B-cell differentiation and function. Thus, tissue-specific DNA demethylation appears to be necessary for proper somatic cell development in vivo.
Assuntos
Linfócitos B/citologia , Linfócitos B/fisiologia , Metilação de DNA/genética , Proteínas de Ligação a DNA/genética , Epigênese Genética/genética , Animais , Diferenciação Celular/genética , Células Cultivadas , Camundongos , Camundongos Endogâmicos C57BL , Especificidade de Órgãos/genéticaRESUMO
Advances in single-cell (SC) genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells, as determined by phylogenetic analysis of the somatic mutations harbored by each cell. Theoretically, complete and accurate knowledge of the genome of each cell of an individual can produce an extremely accurate cell lineage tree of that individual. However, the reality of SC genomics is that such complete and accurate knowledge would be wanting, in quality and in quantity, for the foreseeable future. In this paper we offer a framework for systematically exploring the feasibility of answering cell lineage questions based on SC somatic mutational analysis, as a function of SC genomics data quality and quantity. We take into consideration the current limitations of SC genomics in terms of mutation data quality, most notably amplification bias and allele dropouts (ADO), as well as cost, which puts practical limits on mutation data quantity obtained from each cell as well as on cell sample density. We do so by generating in silico cell lineage trees using a dedicated formal language, eSTG, and show how the ability to answer correctly a cell lineage question depends on the quality and quantity of the SC mutation data. The presented framework can serve as a baseline for the potential of current SC genomics to unravel cell lineage dynamics, as well as the potential contributions of future advancement, both biochemical and computational, for the task.
Assuntos
Linhagem da Célula/genética , Genômica/métodos , Modelos Genéticos , Análise de Célula Única/métodos , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Humanos , Mutação/genéticaRESUMO
BACKGROUND: We have previously presented a formal language for describing population dynamics based on environment-dependent Stochastic Tree Grammars (eSTG). The language captures in broad terms the effect of the changing environment while abstracting away details on interaction among individuals. An eSTG program consists of a set of stochastic tree grammar transition rules that are context-free. Transition rule probabilities and rates, however, can depend on global parameters such as population size, generation count and elapsed time. In addition, each individual may have an internal state, which can change during transitions. RESULTS: This paper presents eSTGt (eSTG tool), an eSTG programming and simulation environment. When executing a program, the tool generates the corresponding lineage trees as well as the internal states values, which can then be analyzed either through the tool's GUI or using MATLAB's command-line environment. CONCLUSIONS: The presented tool allows researchers to use existing biological knowledge in order to model the dynamics of a developmental process and analyze its behavior throughout the historical events. Simulated lineage trees can be used to validate various hypotheses in silico and to predict the behavior of dynamical systems under various conditions. Written under MATLAB environment, the tool also enables to easily integrate the output data within the user's downstream analysis.
Assuntos
Biologia Computacional/métodos , Simulação por Computador , Modelos Teóricos , Dinâmica Populacional , Software , Evolução Biológica , HumanosRESUMO
BACKGROUND: Precise description of the dynamics of biological processes would enable the mathematical analysis and computational simulation of complex biological phenomena. Languages such as Chemical Reaction Networks and Process Algebras cater for the detailed description of interactions among individuals and for the simulation and analysis of ensuing behaviors of populations. However, often knowledge of such interactions is lacking or not available. Yet complete oblivion to the environment would make the description of any biological process vacuous. Here we present a language for describing population dynamics that abstracts away detailed interaction among individuals, yet captures in broad terms the effect of the changing environment, based on environment-dependent Stochastic Tree Grammars (eSTG). It is comprised of a set of stochastic tree grammar transition rules, which are context-free and as such abstract away specific interactions among individuals. Transition rule probabilities and rates, however, can depend on global parameters such as population size, generation count, and elapsed time. RESULTS: We show that eSTGs conveniently describe population dynamics at multiple levels including cellular dynamics, tissue development and niches of organisms. Notably, we show the utilization of eSTG for cases in which the dynamics is regulated by environmental factors, which affect the fate and rate of decisions of the different species. eSTGs are lineage grammars, in the sense that execution of an eSTG program generates the corresponding lineage trees, which can be used to analyze the evolutionary and developmental history of the biological system under investigation. These lineage trees contain a representation of the entire events history of the system, including the dynamics that led to the existing as well as to the extinct individuals. CONCLUSIONS: We conclude that our suggested formalism can be used to easily specify, simulate and analyze complex biological systems, and supports modular description of local biological dynamics that can be later used as "black boxes" in a larger scope, thus enabling a gradual and hierarchical definition and simulation of complex biological systems. The simple, yet robust formalism enables to target a broad class of stochastic dynamic behaviors, especially those that can be modeled using global environmental feedback regulation rather than direct interaction between individuals.
Assuntos
Biologia Computacional/métodos , Modelos Biológicos , Dinâmica Populacional , Software , Evolução Biológica , Meio Ambiente , Extinção BiológicaRESUMO
Human cancers display substantial intratumoral genetic heterogeneity, which facilitates tumor survival under changing microenvironmental conditions. Tumor substructure and its effect on disease progression and relapse are incompletely understood. In the present study, a high-throughput method that uses neutral somatic mutations accumulated in individual cells to reconstruct cell lineage trees was applied to hundreds of cells of human acute leukemia harvested from multiple patients at diagnosis and at relapse. The reconstructed cell lineage trees of patients with acute myeloid leukemia showed that leukemia cells at relapse were shallow (divide rarely) compared with cells at diagnosis and were closely related to their stem cell subpopulation, implying that in these instances relapse might have originated from rarely dividing stem cells. In contrast, among patients with acute lymphoid leukemia, no differences in cell depth were observed between diagnosis and relapse. In one case of chronic myeloid leukemia, at blast crisis, most of the cells at relapse were mismatch-repair deficient. In almost all leukemia cases, > 1 lineage was observed at relapse, indicating that diverse mechanisms can promote relapse in the same patient. In conclusion, diverse relapse mechanisms can be observed by systematic reconstruction of cell lineage trees of patients with leukemia.
Assuntos
Heterogeneidade Genética , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , Instabilidade de Microssatélites , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/patologia , Antineoplásicos/uso terapêutico , Biópsia , Crise Blástica/tratamento farmacológico , Crise Blástica/genética , Crise Blástica/patologia , Divisão Celular/efeitos dos fármacos , Divisão Celular/genética , Linhagem da Célula/genética , Resistencia a Medicamentos Antineoplásicos/genética , Citometria de Fluxo , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia-Linfoma Linfoblástico de Células Precursoras/tratamento farmacológico , Recidiva , Microambiente Tumoral/genéticaRESUMO
Organism cells proliferate and die to build, maintain, renew and repair it. The cellular history of an organism up to any point in time can be captured by a cell lineage tree in which vertices represent all organism cells, past and present, and directed edges represent progeny relations among them. The root represents the fertilized egg, and the leaves represent extant and dead cells. Somatic mutations accumulated during cell division endow each organism cell with a genomic signature that is unique with a very high probability. Distances between such genomic signatures can be used to reconstruct an organism's cell lineage tree. Cell populations possess unique features that are absent or rare in organism populations (e.g., the presence of stem cells and a small number of generations since the zygote) and do not undergo sexual reproduction, hence the reconstruction of cell lineage trees calls for careful examination and adaptation of the standard tools of population genetics. Our lab developed a method for reconstructing cell lineage trees by examining only mutations in highly variable microsatellite loci (MS, also called short tandem repeats, STR). In this study we use experimental data on somatic mutations in MS of individual cells in human and mice in order to validate and quantify the utility of known lineage tree reconstruction algorithms in this context. We employed extensive measurements of somatic mutations in individual cells which were isolated from healthy and diseased tissues of mice and humans. The validation was done by analyzing the ability to infer known and clear biological scenarios. In general, we found that if the biological scenario is simple, almost all algorithms tested can infer it. Another somewhat surprising conclusion is that the best algorithm among those tested is Neighbor Joining where the distance measure used is normalized absolute distance. We include our full dataset in Tables S1, S2, S3, S4, S5 to enable further analysis of this data by others.
Assuntos
Algoritmos , Linhagem da Célula/genética , Repetições de Microssatélites/genética , Mutação/genética , Filogenia , Animais , Células da Medula Óssea , Células Cultivadas , Análise por Conglomerados , Biologia Computacional/métodos , Simulação por Computador , Feminino , Humanos , Masculino , Camundongos , Camundongos Transgênicos , Modelos GenéticosRESUMO
Breast cancer (BC) risk models based on electronic health records (EHR) can assist physicians in estimating the probability of an individual with certain risk factors to develop BC in the future. In this retrospective study, we used clinical data combined with machine learning tools to assess the utility of a personalized BC risk model on 13,786 Israeli and 1,695 American women who underwent screening mammography in the years 2012-2018 and 2008-2018, respectively. Clinical features were extracted from EHR, personal questionnaires, and past radiologists' reports. Using a set of 1,547 features, the predictive ability for BC within 12 months was measured in both datasets and in sub-cohorts of interest. Our results highlight the improved performance of our model over previous established BC risk models, their ultimate potential for risk-based screening policies on first time patients and novel clinically relevant risk factors that can compensate for the absence of imaging history information.
Assuntos
Neoplasias da Mama , Humanos , Feminino , Mamografia , Estudos Retrospectivos , Detecção Precoce de Câncer , Mama , Medição de RiscoRESUMO
OBJECTIVES: Identifying new relations between medical entities, such as drugs, diseases, and side effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations. MATERIALS AND METHODS: We demonstrate the usefulness of the proposed representation by inferring two types of relations: a drug causes a side effect and a drug treats an indication. To predict these relations and assess their effectiveness, we applied 2 modeling approaches: multi-task modeling using neural networks and single-task modeling based on gradient boosting machines and logistic regression. RESULTS: These trained models, which predict either side effects or indications, obtained significantly better results than baseline models that use a single direct co-occurrence feature. The results demonstrate the advantage of a comprehensive representation. DISCUSSION: Selecting the appropriate representation has an immense impact on the predictive performance of machine learning models. Our proposed representation is powerful, as it spans multiple medical domains and can be used to predict a wide range of relation types. CONCLUSION: The discovery of new relations between various medical entities can be translated into meaningful insights, for example, related to drug development or disease understanding. Our representation of medical entities can be used to train models that predict such relations, thus accelerating healthcare-related discoveries.
RESUMO
Development in mammals is accompanied by specific de novo and demethylation events that are thought to stabilize differentiated cell phenotypes. We demonstrate that a large percentage of the tissue-specific methylation pattern is generated postnatally. Demethylation in the liver is observed in thousands of enhancer-like sequences associated with genes that undergo activation during the first few weeks of life. Using. conditional gene ablation strategy we show that the removal of these methyl groups is stable and necessary for assuring proper hepatocyte gene expression and function through its effect on chromatin accessibility. These postnatal changes in methylation come about through exposure to hormone signaling. These results define the molecular rules of 5-methyl-cytosine regulation as an epigenetic mechanism underlying cellular responses to. changing environment.
Assuntos
Desmetilação do DNA , Epigênese Genética/fisiologia , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Fígado/crescimento & desenvolvimento , Transdução de Sinais/fisiologia , 5-Metilcitosina/metabolismo , Animais , Animais Recém-Nascidos , Células Cultivadas , Proteínas de Ligação a DNA/genética , Dioxigenases , Feminino , Hepatócitos/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Fígado/citologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Cultura Primária de Células , Proteínas Proto-Oncogênicas/genética , Análise de Sequência de RNARESUMO
Spike sorting involves clustering spikes recorded by a micro-electrode according to the source neurons. It is a complicated task, which requires much human labor, in part due to the non-stationary nature of the data. We propose to automate the clustering process in a Bayesian framework, with the source neurons modeled as a non-stationary mixture-of-Gaussians. At a first search stage, the data are divided into short time frames, and candidate descriptions of the data as mixtures-of-Gaussians are computed for each frame separately. At a second stage, transition probabilities between candidate mixtures are computed, and a globally optimal clustering solution is found as the maximum-a-posteriori solution of the resulting probabilistic model. The transition probabilities are computed using local stationarity assumptions, and are based on a Gaussian version of the Jensen-Shannon divergence. We employ synthetically generated spike data to illustrate the method and show that it outperforms other spike sorting methods in a non-stationary scenario. We then use real spike data and find high agreement of the method with expert human sorters in two modes of operation: a fully unsupervised and a semi-supervised mode. Thus, this method differs from other methods in two aspects: its ability to account for non-stationary data, and its close to human performance.