Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 16.383
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 184(19): 4874-4885.e16, 2021 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-34433011

RESUMO

Only five species of the once-diverse Rhinocerotidae remain, making the reconstruction of their evolutionary history a challenge to biologists since Darwin. We sequenced genomes from five rhinoceros species (three extinct and two living), which we compared to existing data from the remaining three living species and a range of outgroups. We identify an early divergence between extant African and Eurasian lineages, resolving a key debate regarding the phylogeny of extant rhinoceroses. This early Miocene (∼16 million years ago [mya]) split post-dates the land bridge formation between the Afro-Arabian and Eurasian landmasses. Our analyses also show that while rhinoceros genomes in general exhibit low levels of genome-wide diversity, heterozygosity is lowest and inbreeding is highest in the modern species. These results suggest that while low genetic diversity is a long-term feature of the family, it has been particularly exacerbated recently, likely reflecting recent anthropogenic-driven population declines.


Assuntos
Evolução Molecular , Genoma , Perissodáctilos/genética , Animais , Demografia , Fluxo Gênico , Variação Genética , Geografia , Heterozigoto , Homozigoto , Especificidade de Hospedeiro , Cadeias de Markov , Mutação/genética , Filogenia , Especificidade da Espécie , Fatores de Tempo
2.
Cell ; 174(6): 1424-1435.e15, 2018 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-30078708

RESUMO

FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reanalyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selection at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in humans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revision to the adaptive history of FOXP2, a gene regarded as vital to human evolution.


Assuntos
Fatores de Transcrição Forkhead/genética , Encéfalo/citologia , Encéfalo/metabolismo , Linhagem Celular , Bases de Dados Genéticas , Éxons , Feminino , Genoma Humano , Haplótipos , Humanos , Íntrons , Masculino , Cadeias de Markov , Polimorfismo de Nucleotídeo Único , Córtex Pré-Frontal/metabolismo
3.
Cell ; 174(3): 716-729.e27, 2018 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-29961576

RESUMO

Single-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed "dropout," which can severely obscure important gene-gene relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. Applied to the epithilial to mesenchymal transition, MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures, and infers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Linhagem Celular , Epistasia Genética/genética , Redes Reguladoras de Genes/genética , Humanos , Cadeias de Markov , MicroRNAs/genética , RNA Mensageiro/genética , Software
4.
Cell ; 167(3): 803-815.e21, 2016 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-27720452

RESUMO

Do young and old protein molecules have the same probability to be degraded? We addressed this question using metabolic pulse-chase labeling and quantitative mass spectrometry to obtain degradation profiles for thousands of proteins. We find that >10% of proteins are degraded non-exponentially. Specifically, proteins are less stable in the first few hours of their life and stabilize with age. Degradation profiles are conserved and similar in two cell types. Many non-exponentially degraded (NED) proteins are subunits of complexes that are produced in super-stoichiometric amounts relative to their exponentially degraded (ED) counterparts. Within complexes, NED proteins have larger interaction interfaces and assemble earlier than ED subunits. Amplifying genes encoding NED proteins increases their initial degradation. Consistently, decay profiles can predict protein level attenuation in aneuploid cells. Together, our data show that non-exponential degradation is common, conserved, and has important consequences for complex formation and regulation of protein abundance.


Assuntos
Estabilidade Proteica , Proteínas/metabolismo , Proteólise , Alanina/análogos & derivados , Alanina/química , Aneuploidia , Linhagem Celular , Química Click , Amplificação de Genes , Humanos , Cinética , Cadeias de Markov , Complexo de Endopeptidases do Proteassoma/química , Biossíntese de Proteínas , Proteínas/química , Proteínas/genética , Proteoma , Ubiquitina/química
5.
Nature ; 628(8007): 450-457, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38408488

RESUMO

Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional computer graphics programs1,2. Here we present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality to those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy to those built by humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will therefore remove bottlenecks and increase objectivity in cryo-EM structure determination.


Assuntos
Microscopia Crioeletrônica , Aprendizado de Máquina , Modelos Moleculares , Proteínas , Sequência de Aminoácidos , Microscopia Crioeletrônica/métodos , Microscopia Crioeletrônica/normas , Cadeias de Markov , Redes Neurais de Computação , Conformação Proteica , Proteínas/química , Proteínas/ultraestrutura , Gráficos por Computador
6.
Cell ; 159(2): 333-45, 2014 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-25284152

RESUMO

In the thymus, high-affinity, self-reactive thymocytes are eliminated from the pool of developing T cells, generating central tolerance. Here, we investigate how developing T cells measure self-antigen affinity. We show that very few CD4 or CD8 coreceptor molecules are coupled with the signal-initiating kinase, Lck. To initiate signaling, an antigen-engaged T cell receptor (TCR) scans multiple coreceptor molecules to find one that is coupled to Lck; this is the first and rate-limiting step in a kinetic proofreading chain of events that eventually leads to TCR triggering and negative selection. MHCII-restricted TCRs require a shorter antigen dwell time (0.2 s) to initiate negative selection compared to MHCI-restricted TCRs (0.9 s) because more CD4 coreceptors are Lck-loaded compared to CD8. We generated a model (Lck come&stay/signal duration) that accurately predicts the observed differences in antigen dwell-time thresholds used by MHCI- and MHCII-restricted thymocytes to initiate negative selection and generate self-tolerance.


Assuntos
Autoantígenos/imunologia , Tolerância Imunológica , Receptores de Antígenos de Linfócitos T/imunologia , Animais , Antígenos de Histocompatibilidade Classe I/imunologia , Antígenos de Histocompatibilidade Classe II/imunologia , Cinética , Proteína Tirosina Quinase p56(lck) Linfócito-Específica/metabolismo , Cadeias de Markov , Camundongos Endogâmicos C57BL , Receptores de Antígenos de Linfócitos T/metabolismo , Timócitos/citologia , Timócitos/imunologia
7.
Cell ; 152(1-2): 327-39, 2013 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-23332764

RESUMO

Although the proteins that read the gene regulatory code, transcription factors (TFs), have been largely identified, it is not well known which sequences TFs can recognize. We have analyzed the sequence-specific binding of human TFs using high-throughput SELEX and ChIP sequencing. A total of 830 binding profiles were obtained, describing 239 distinctly different binding specificities. The models represent the majority of human TFs, approximately doubling the coverage compared to existing systematic studies. Our results reveal additional specificity determinants for a large number of factors for which a partial specificity was known, including a commonly observed A- or T-rich stretch that flanks the core motifs. Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.


Assuntos
Imunoprecipitação da Cromatina , Modelos Biológicos , Técnica de Seleção de Aptâmeros , Fatores de Transcrição/metabolismo , Animais , DNA/química , Humanos , Cadeias de Markov , Camundongos , Filogenia , Fatores de Transcrição/genética
8.
Cell ; 153(7): 1589-601, 2013 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-23791185

RESUMO

Deep sequencing now provides detailed snapshots of ribosome occupancy on mRNAs. We leverage these data to parameterize a computational model of translation, keeping track of every ribosome, tRNA, and mRNA molecule in a yeast cell. We determine the parameter regimes in which fast initiation or high codon bias in a transgene increases protein yield and infer the initiation rates of endogenous Saccharomyces cerevisiae genes, which vary by several orders of magnitude and correlate with 5' mRNA folding energies. Our model recapitulates the previously reported 5'-to-3' ramp of decreasing ribosome densities, although our analysis shows that this ramp is caused by rapid initiation of short genes rather than slow codons at the start of transcripts. We conclude that protein production in healthy yeast cells is typically limited by the availability of free ribosomes, whereas protein production under periods of stress can sometimes be rescued by reducing initiation or elongation rates.


Assuntos
Modelos Genéticos , Biossíntese de Proteínas , Saccharomyces cerevisiae/genética , Códon/genética , Cadeias de Markov , RNA Mensageiro/metabolismo , RNA de Transferência/metabolismo , Ribossomos/metabolismo
9.
Am J Hum Genet ; 111(5): 966-978, 2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38701746

RESUMO

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Assuntos
Asma , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Humanos , Asma/genética , Cadeias de Markov , Colite Ulcerativa/genética , Reprodutibilidade dos Testes , Fenótipo , Genótipo
10.
Nature ; 591(7849): 265-269, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33597750

RESUMO

Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that stretch well back into the Early Pleistocene subepoch. Although theoretical models suggest that DNA should survive on this timescale1, the oldest genomic data recovered so far are from a horse specimen dated to 780-560 thousand years ago2. Here we report the recovery of genome-wide data from three mammoth specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old. We find that two distinct mammoth lineages were present in eastern Siberia during the Early Pleistocene. One of these lineages gave rise to the woolly mammoth and the other represents a previously unrecognized lineage that was ancestral to the first mammoths to colonize North America. Our analyses reveal that the Columbian mammoth of North America traces its ancestry to a Middle Pleistocene hybridization between these two lineages, with roughly equal admixture proportions. Finally, we show that the majority of protein-coding changes associated with cold adaptation in woolly mammoths were already present one million years ago. These findings highlight the potential of deep-time palaeogenomics to expand our understanding of speciation and long-term adaptive evolution.


Assuntos
DNA Antigo/análise , Evolução Molecular , Genoma Mitocondrial/genética , Genômica , Mamutes/genética , Filogenia , Aclimatação/genética , Alelos , Animais , Teorema de Bayes , DNA Antigo/isolamento & purificação , Elefantes/genética , Europa (Continente) , Feminino , Fósseis , Variação Genética/genética , Cadeias de Markov , Dente Molar , América do Norte , Datação Radiométrica , Sibéria , Fatores de Tempo
11.
Proc Natl Acad Sci U S A ; 121(22): e2318329121, 2024 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-38787881

RESUMO

The Hill functions, [Formula: see text], have been widely used in biology for over a century but, with the exception of [Formula: see text], they have had no justification other than as a convenient fit to empirical data. Here, we show that they are the universal limit for the sharpness of any input-output response arising from a Markov process model at thermodynamic equilibrium. Models may represent arbitrary molecular complexity, with multiple ligands, internal states, conformations, coregulators, etc, under core assumptions that are detailed in the paper. The model output may be any linear combination of steady-state probabilities, with components other than the chosen input ligand held constant. This formulation generalizes most of the responses in the literature. We use a coarse-graining method in the graph-theoretic linear framework to show that two sharpness measures for input-output responses fall within an effectively bounded region of the positive quadrant, [Formula: see text], for any equilibrium model with [Formula: see text] input binding sites. [Formula: see text] exhibits a cusp which approaches, but never exceeds, the sharpness of [Formula: see text], but the region and the cusp can be exceeded when models are taken away from thermodynamic equilibrium. Such fundamental thermodynamic limits are called Hopfield barriers, and our results provide a biophysical justification for the Hill functions as the universal Hopfield barriers for sharpness. Our results also introduce an object, [Formula: see text], whose structure may be of mathematical interest, and suggest the importance of characterizing Hopfield barriers for other forms of cellular information processing.


Assuntos
Cadeias de Markov , Termodinâmica , Modelos Biológicos , Ligantes
12.
Proc Natl Acad Sci U S A ; 121(32): e2318805121, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39083417

RESUMO

How do we capture the breadth of behavior in animal movement, from rapid body twitches to aging? Using high-resolution videos of the nematode worm Caenorhabditis elegans, we show that a single dynamics connects posture-scale fluctuations with trajectory diffusion and longer-lived behavioral states. We take short posture sequences as an instantaneous behavioral measure, fixing the sequence length for maximal prediction. Within the space of posture sequences, we construct a fine-scale, maximum entropy partition so that transitions among microstates define a high-fidelity Markov model, which we also use as a means of principled coarse-graining. We translate these dynamics into movement using resistive force theory, capturing the statistical properties of foraging trajectories. Predictive across scales, we leverage the longest-lived eigenvectors of the inferred Markov chain to perform a top-down subdivision of the worm's foraging behavior, revealing both "runs-and-pirouettes" as well as previously uncharacterized finer-scale behaviors. We use our model to investigate the relevance of these fine-scale behaviors for foraging success, recovering a trade-off between local and global search strategies.


Assuntos
Comportamento Animal , Caenorhabditis elegans , Cadeias de Markov , Animais , Caenorhabditis elegans/fisiologia , Comportamento Animal/fisiologia , Modelos Biológicos , Movimento/fisiologia
13.
Proc Natl Acad Sci U S A ; 121(3): e2318989121, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38215186

RESUMO

The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Algoritmos , COVID-19/epidemiologia , Cadeias de Markov
14.
Genome Res ; 33(12): 2156-2173, 2023 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-38097386

RESUMO

Single nucleotide polymorphisms (SNPs) from omics data create a reidentification risk for individuals and their relatives. Although the ability of thousands of SNPs (especially rare ones) to identify individuals has been repeatedly shown, the availability of small sets of noisy genotypes, from environmental DNA samples or functional genomics data, motivated us to quantify their informativeness. We present a computational tool suite, termed Privacy Leakage by Inference across Genotypic HMM Trajectories (PLIGHT), using population-genetics-based hidden Markov models (HMMs) of recombination and mutation to find piecewise alignment of small, noisy SNP sets to reference haplotype databases. We explore cases in which query individuals are either known to be in the database, or not, and consider several genotype queries, including those from environmental sample swabs from known individuals and from simulated "mosaics" (two-individual composites). Using PLIGHT on a database with ∼5000 haplotypes, we find for common, noise-free SNPs that only ten are sufficient to identify individuals, ∼20 can identify both components in two-individual mosaics, and 20-30 can identify first-order relatives. Using noisy environmental-sample-derived SNPs, PLIGHT identifies individuals in a database using ∼30 SNPs. Even when the individuals are not in the database, local genotype matches allow for some phenotypic information leakage based on coarse-grained SNP imputation. Finally, by quantifying privacy leakage from sparse SNP sets, PLIGHT helps determine the value of selectively sanitizing released SNPs without explicit assumptions about population membership or allele frequency. To make this practical, we provide a sanitization tool to remove the most identifying SNPs from genomic data.


Assuntos
Genótipo , Haplótipos , Polimorfismo de Nucleotídeo Único , Humanos , Bases de Dados Genéticas , Cadeias de Markov , Software , Privacidade Genética , Algoritmos , Alinhamento de Sequência , Genética Populacional/métodos
15.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38628114

RESUMO

Spatial transcriptomics (ST) has become a powerful tool for exploring the spatial organization of gene expression in tissues. Imaging-based methods, though offering superior spatial resolutions at the single-cell level, are limited in either the number of imaged genes or the sensitivity of gene detection. Existing approaches for enhancing ST rely on the similarity between ST cells and reference single-cell RNA sequencing (scRNA-seq) cells. In contrast, we introduce stDiff, which leverages relationships between gene expression abundance in scRNA-seq data to enhance ST. stDiff employs a conditional diffusion model, capturing gene expression abundance relationships in scRNA-seq data through two Markov processes: one introducing noise to transcriptomics data and the other denoising to recover them. The missing portion of ST is predicted by incorporating the original ST data into the denoising process. In our comprehensive performance evaluation across 16 datasets, utilizing multiple clustering and similarity metrics, stDiff stands out for its exceptional ability to preserve topological structures among cells, positioning itself as a robust solution for cell population identification. Moreover, stDiff's enhancement outcomes closely mirror the actual ST data within the batch space. Across diverse spatial expression patterns, our model accurately reconstructs them, delineating distinct spatial boundaries. This highlights stDiff's capability to unify the observed and predicted segments of ST data for subsequent analysis. We anticipate that stDiff, with its innovative approach, will contribute to advancing ST imputation methodologies.


Assuntos
Benchmarking , Perfilação da Expressão Gênica , Análise por Conglomerados , Difusão , Cadeias de Markov , Análise de Sequência de RNA , Transcriptoma
16.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39003531

RESUMO

Profile hidden Markov models (pHMMs) are able to achieve high sensitivity in remote homology search, making them popular choices for detecting novel or highly diverged viruses in metagenomic data. However, many existing pHMM databases have different design focuses, making it difficult for users to decide the proper one to use. In this review, we provide a thorough evaluation and comparison for multiple commonly used profile HMM databases for viral sequence discovery in metagenomic data. We characterized the databases by comparing their sizes, their taxonomic coverage, and the properties of their models using quantitative metrics. Subsequently, we assessed their performance in virus identification across multiple application scenarios, utilizing both simulated and real metagenomic data. We aim to offer researchers a thorough and critical assessment of the strengths and limitations of different databases. Furthermore, based on the experimental results obtained from the simulated and real metagenomic data, we provided practical suggestions for users to optimize their use of pHMM databases, thus enhancing the quality and reliability of their findings in the field of viral metagenomics.


Assuntos
Cadeias de Markov , Metagenômica , Vírus , Metagenômica/métodos , Vírus/genética , Vírus/classificação , Bases de Dados Genéticas , Humanos , Biologia Computacional/métodos , Algoritmos
17.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39133097

RESUMO

Constructing gene regulatory networks is a widely adopted approach for investigating gene regulation, offering diverse applications in biology and medicine. A great deal of research focuses on using time series data or single-cell RNA-sequencing data to infer gene regulatory networks. However, such gene expression data lack either cellular or temporal information. Fortunately, the advent of time-lapse confocal laser microscopy enables biologists to obtain tree-shaped gene expression data of Caenorhabditis elegans, achieving both cellular and temporal resolution. Although such tree-shaped data provide abundant knowledge, they pose challenges like non-pairwise time series, laying the inaccuracy of downstream analysis. To address this issue, a comprehensive framework for data integration and a novel Bayesian approach based on Boolean network with time delay are proposed. The pre-screening process and Markov Chain Monte Carlo algorithm are applied to obtain the parameter estimates. Simulation studies show that our method outperforms existing Boolean network inference algorithms. Leveraging the proposed approach, gene regulatory networks for five subtrees are reconstructed based on the real tree-shaped datatsets of Caenorhabditis elegans, where some gene regulatory relationships confirmed in previous genetic studies are recovered. Also, heterogeneity of regulatory relationships in different cell lineage subtrees is detected. Furthermore, the exploration of potential gene regulatory relationships that bear importance in human diseases is undertaken. All source code is available at the GitHub repository https://github.com/edawu11/BBTD.git.


Assuntos
Algoritmos , Caenorhabditis elegans , Redes Reguladoras de Genes , Caenorhabditis elegans/genética , Animais , Teorema de Bayes , Biologia Computacional/métodos , Cadeias de Markov , Perfilação da Expressão Gênica/métodos
18.
Cell ; 146(4): 633-44, 2011 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-21854987

RESUMO

Cancer cells within individual tumors often exist in distinct phenotypic states that differ in functional attributes. While cancer cell populations typically display distinctive equilibria in the proportion of cells in various states, the mechanisms by which this occurs are poorly understood. Here, we study the dynamics of phenotypic proportions in human breast cancer cell lines. We show that subpopulations of cells purified for a given phenotypic state return towards equilibrium proportions over time. These observations can be explained by a Markov model in which cells transition stochastically between states. A prediction of this model is that, given certain conditions, any subpopulation of cells will return to equilibrium phenotypic proportions over time. A second prediction is that breast cancer stem-like cells arise de novo from non-stem-like cells. These findings contribute to our understanding of cancer heterogeneity and reveal how stochasticity in single-cell behaviors promotes phenotypic equilibrium in populations of cancer cells.


Assuntos
Neoplasias da Mama/patologia , Cadeias de Markov , Animais , Feminino , Citometria de Fluxo , Perfilação da Expressão Gênica , Humanos , Camundongos , Camundongos Endogâmicos NOD , Camundongos SCID , Transplante de Neoplasias , Células-Tronco Neoplásicas/patologia , Processos Estocásticos , Transplante Heterólogo
19.
Proc Natl Acad Sci U S A ; 120(12): e2221048120, 2023 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-36920924

RESUMO

The ability to predict and understand complex molecular motions occurring over diverse timescales ranging from picoseconds to seconds and even hours in biological systems remains one of the largest challenges to chemical theory. Markov state models (MSMs), which provide a memoryless description of the transitions between different states of a biochemical system, have provided numerous important physically transparent insights into biological function. However, constructing these models often necessitates performing extremely long molecular simulations to converge the rates. Here, we show that by incorporating memory via the time-convolutionless generalized master equation (TCL-GME) one can build a theoretically transparent and physically intuitive memory-enriched model of biochemical processes with up to a three order of magnitude reduction in the simulation data required while also providing a higher temporal resolution. We derive the conditions under which the TCL-GME provides a more efficient means to capture slow dynamics than MSMs and rigorously prove when the two provide equally valid and efficient descriptions of the slow configurational dynamics. We further introduce a simple averaging procedure that enables our TCL-GME approach to quickly converge and accurately predict long-time dynamics even when parameterized with noisy reference data arising from short trajectories. We illustrate the advantages of the TCL-GME using alanine dipeptide, the human argonaute complex, and FiP35 WW domain.


Assuntos
Dipeptídeos , Simulação de Dinâmica Molecular , Humanos , Cadeias de Markov
20.
PLoS Genet ; 19(7): e1010807, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37418489

RESUMO

Germline mutation is the mechanism by which genetic variation in a population is created. Inferences derived from mutation rate models are fundamental to many population genetics methods. Previous models have demonstrated that nucleotides flanking polymorphic sites-the local sequence context-explain variation in the probability that a site is polymorphic. However, limitations to these models exist as the size of the local sequence context window expands. These include a lack of robustness to data sparsity at typical sample sizes, lack of regularization to generate parsimonious models and lack of quantified uncertainty in estimated rates to facilitate comparison between models. To address these limitations, we developed Baymer, a regularized Bayesian hierarchical tree model that captures the heterogeneous effect of sequence contexts on polymorphism probabilities. Baymer implements an adaptive Metropolis-within-Gibbs Markov Chain Monte Carlo sampling scheme to estimate the posterior distributions of sequence-context based probabilities that a site is polymorphic. We show that Baymer accurately infers polymorphism probabilities and well-calibrated posterior distributions, robustly handles data sparsity, appropriately regularizes to return parsimonious models, and scales computationally at least up to 9-mer context windows. We demonstrate application of Baymer in three ways-first, identifying differences in polymorphism probabilities between continental populations in the 1000 Genomes Phase 3 dataset, second, in a sparse data setting to examine the use of polymorphism models as a proxy for de novo mutation probabilities as a function of variant age, sequence context window size, and demographic history, and third, comparing model concordance between different great ape species. We find a shared context-dependent mutation rate architecture underlying our models, enabling a transfer-learning inspired strategy for modeling germline mutations. In summary, Baymer is an accurate polymorphism probability estimation algorithm that automatically adapts to data sparsity at different sequence context levels, thereby making efficient use of the available data.


Assuntos
Genoma Humano , Taxa de Mutação , Humanos , Genoma Humano/genética , Teorema de Bayes , Mutação , Polimorfismo Genético , Cadeias de Markov , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA