Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 16.431
Filter
Add more filters

Publication year range
1.
Cell ; 184(19): 4874-4885.e16, 2021 09 16.
Article in English | MEDLINE | ID: mdl-34433011

ABSTRACT

Only five species of the once-diverse Rhinocerotidae remain, making the reconstruction of their evolutionary history a challenge to biologists since Darwin. We sequenced genomes from five rhinoceros species (three extinct and two living), which we compared to existing data from the remaining three living species and a range of outgroups. We identify an early divergence between extant African and Eurasian lineages, resolving a key debate regarding the phylogeny of extant rhinoceroses. This early Miocene (∼16 million years ago [mya]) split post-dates the land bridge formation between the Afro-Arabian and Eurasian landmasses. Our analyses also show that while rhinoceros genomes in general exhibit low levels of genome-wide diversity, heterozygosity is lowest and inbreeding is highest in the modern species. These results suggest that while low genetic diversity is a long-term feature of the family, it has been particularly exacerbated recently, likely reflecting recent anthropogenic-driven population declines.


Subject(s)
Evolution, Molecular , Genome , Perissodactyla/genetics , Animals , Demography , Gene Flow , Genetic Variation , Geography , Heterozygote , Homozygote , Host Specificity , Markov Chains , Mutation/genetics , Phylogeny , Species Specificity , Time Factors
2.
Cell ; 174(6): 1424-1435.e15, 2018 09 06.
Article in English | MEDLINE | ID: mdl-30078708

ABSTRACT

FOXP2, initially identified for its role in human speech, contains two nonsynonymous substitutions derived in the human lineage. Evidence for a recent selective sweep in Homo sapiens, however, is at odds with the presence of these substitutions in archaic hominins. Here, we comprehensively reanalyze FOXP2 in hundreds of globally distributed genomes to test for recent selection. We do not find evidence of recent positive or balancing selection at FOXP2. Instead, the original signal appears to have been due to sample composition. Our tests do identify an intronic region that is enriched for highly conserved sites that are polymorphic among humans, compatible with a loss of function in humans. This region is lowly expressed in relevant tissue types that were tested via RNA-seq in human prefrontal cortex and RT-PCR in immortalized human brain cells. Our results represent a substantial revision to the adaptive history of FOXP2, a gene regarded as vital to human evolution.


Subject(s)
Forkhead Transcription Factors/genetics , Brain/cytology , Brain/metabolism , Cell Line , Databases, Genetic , Exons , Female , Genome, Human , Haplotypes , Humans , Introns , Male , Markov Chains , Polymorphism, Single Nucleotide , Prefrontal Cortex/metabolism
3.
Cell ; 174(3): 716-729.e27, 2018 07 26.
Article in English | MEDLINE | ID: mdl-29961576

ABSTRACT

Single-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed "dropout," which can severely obscure important gene-gene relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. Applied to the epithilial to mesenchymal transition, MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures, and infers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Algorithms , Cell Line , Epistasis, Genetic/genetics , Gene Regulatory Networks/genetics , Humans , Markov Chains , MicroRNAs/genetics , RNA, Messenger/genetics , Software
4.
Cell ; 167(3): 803-815.e21, 2016 Oct 20.
Article in English | MEDLINE | ID: mdl-27720452

ABSTRACT

Do young and old protein molecules have the same probability to be degraded? We addressed this question using metabolic pulse-chase labeling and quantitative mass spectrometry to obtain degradation profiles for thousands of proteins. We find that >10% of proteins are degraded non-exponentially. Specifically, proteins are less stable in the first few hours of their life and stabilize with age. Degradation profiles are conserved and similar in two cell types. Many non-exponentially degraded (NED) proteins are subunits of complexes that are produced in super-stoichiometric amounts relative to their exponentially degraded (ED) counterparts. Within complexes, NED proteins have larger interaction interfaces and assemble earlier than ED subunits. Amplifying genes encoding NED proteins increases their initial degradation. Consistently, decay profiles can predict protein level attenuation in aneuploid cells. Together, our data show that non-exponential degradation is common, conserved, and has important consequences for complex formation and regulation of protein abundance.


Subject(s)
Protein Stability , Proteins/metabolism , Proteolysis , Alanine/analogs & derivatives , Alanine/chemistry , Aneuploidy , Cell Line , Click Chemistry , Gene Amplification , Humans , Kinetics , Markov Chains , Proteasome Endopeptidase Complex/chemistry , Protein Biosynthesis , Proteins/chemistry , Proteins/genetics , Proteome , Ubiquitin/chemistry
5.
Nature ; 628(8007): 450-457, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38408488

ABSTRACT

Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional computer graphics programs1,2. Here we present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality to those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy to those built by humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will therefore remove bottlenecks and increase objectivity in cryo-EM structure determination.


Subject(s)
Cryoelectron Microscopy , Machine Learning , Models, Molecular , Proteins , Amino Acid Sequence , Cryoelectron Microscopy/methods , Cryoelectron Microscopy/standards , Markov Chains , Neural Networks, Computer , Protein Conformation , Proteins/chemistry , Proteins/ultrastructure , Computer Graphics
6.
Cell ; 159(2): 333-45, 2014 Oct 09.
Article in English | MEDLINE | ID: mdl-25284152

ABSTRACT

In the thymus, high-affinity, self-reactive thymocytes are eliminated from the pool of developing T cells, generating central tolerance. Here, we investigate how developing T cells measure self-antigen affinity. We show that very few CD4 or CD8 coreceptor molecules are coupled with the signal-initiating kinase, Lck. To initiate signaling, an antigen-engaged T cell receptor (TCR) scans multiple coreceptor molecules to find one that is coupled to Lck; this is the first and rate-limiting step in a kinetic proofreading chain of events that eventually leads to TCR triggering and negative selection. MHCII-restricted TCRs require a shorter antigen dwell time (0.2 s) to initiate negative selection compared to MHCI-restricted TCRs (0.9 s) because more CD4 coreceptors are Lck-loaded compared to CD8. We generated a model (Lck come&stay/signal duration) that accurately predicts the observed differences in antigen dwell-time thresholds used by MHCI- and MHCII-restricted thymocytes to initiate negative selection and generate self-tolerance.


Subject(s)
Autoantigens/immunology , Immune Tolerance , Receptors, Antigen, T-Cell/immunology , Animals , Histocompatibility Antigens Class I/immunology , Histocompatibility Antigens Class II/immunology , Kinetics , Lymphocyte Specific Protein Tyrosine Kinase p56(lck)/metabolism , Markov Chains , Mice, Inbred C57BL , Receptors, Antigen, T-Cell/metabolism , Thymocytes/cytology , Thymocytes/immunology
7.
Cell ; 152(1-2): 327-39, 2013 Jan 17.
Article in English | MEDLINE | ID: mdl-23332764

ABSTRACT

Although the proteins that read the gene regulatory code, transcription factors (TFs), have been largely identified, it is not well known which sequences TFs can recognize. We have analyzed the sequence-specific binding of human TFs using high-throughput SELEX and ChIP sequencing. A total of 830 binding profiles were obtained, describing 239 distinctly different binding specificities. The models represent the majority of human TFs, approximately doubling the coverage compared to existing systematic studies. Our results reveal additional specificity determinants for a large number of factors for which a partial specificity was known, including a commonly observed A- or T-rich stretch that flanks the core motifs. Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.


Subject(s)
Chromatin Immunoprecipitation , Models, Biological , SELEX Aptamer Technique , Transcription Factors/metabolism , Animals , DNA/chemistry , Humans , Markov Chains , Mice , Phylogeny , Transcription Factors/genetics
8.
Cell ; 153(7): 1589-601, 2013 Jun 20.
Article in English | MEDLINE | ID: mdl-23791185

ABSTRACT

Deep sequencing now provides detailed snapshots of ribosome occupancy on mRNAs. We leverage these data to parameterize a computational model of translation, keeping track of every ribosome, tRNA, and mRNA molecule in a yeast cell. We determine the parameter regimes in which fast initiation or high codon bias in a transgene increases protein yield and infer the initiation rates of endogenous Saccharomyces cerevisiae genes, which vary by several orders of magnitude and correlate with 5' mRNA folding energies. Our model recapitulates the previously reported 5'-to-3' ramp of decreasing ribosome densities, although our analysis shows that this ramp is caused by rapid initiation of short genes rather than slow codons at the start of transcripts. We conclude that protein production in healthy yeast cells is typically limited by the availability of free ribosomes, whereas protein production under periods of stress can sometimes be rescued by reducing initiation or elongation rates.


Subject(s)
Models, Genetic , Protein Biosynthesis , Saccharomyces cerevisiae/genetics , Codon/genetics , Markov Chains , RNA, Messenger/metabolism , RNA, Transfer/metabolism , Ribosomes/metabolism
9.
Am J Hum Genet ; 111(5): 966-978, 2024 05 02.
Article in English | MEDLINE | ID: mdl-38701746

ABSTRACT

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Subject(s)
Asthma , Genome-Wide Association Study , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Asthma/genetics , Markov Chains , Colitis, Ulcerative/genetics , Reproducibility of Results , Phenotype , Genotype
10.
Nature ; 591(7849): 265-269, 2021 03.
Article in English | MEDLINE | ID: mdl-33597750

ABSTRACT

Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that stretch well back into the Early Pleistocene subepoch. Although theoretical models suggest that DNA should survive on this timescale1, the oldest genomic data recovered so far are from a horse specimen dated to 780-560 thousand years ago2. Here we report the recovery of genome-wide data from three mammoth specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old. We find that two distinct mammoth lineages were present in eastern Siberia during the Early Pleistocene. One of these lineages gave rise to the woolly mammoth and the other represents a previously unrecognized lineage that was ancestral to the first mammoths to colonize North America. Our analyses reveal that the Columbian mammoth of North America traces its ancestry to a Middle Pleistocene hybridization between these two lineages, with roughly equal admixture proportions. Finally, we show that the majority of protein-coding changes associated with cold adaptation in woolly mammoths were already present one million years ago. These findings highlight the potential of deep-time palaeogenomics to expand our understanding of speciation and long-term adaptive evolution.


Subject(s)
DNA, Ancient/analysis , Evolution, Molecular , Genome, Mitochondrial/genetics , Genomics , Mammoths/genetics , Phylogeny , Acclimatization/genetics , Alleles , Animals , Bayes Theorem , DNA, Ancient/isolation & purification , Elephants/genetics , Europe , Female , Fossils , Genetic Variation/genetics , Markov Chains , Molar , North America , Radiometric Dating , Siberia , Time Factors
11.
Proc Natl Acad Sci U S A ; 121(22): e2318329121, 2024 May 28.
Article in English | MEDLINE | ID: mdl-38787881

ABSTRACT

The Hill functions, [Formula: see text], have been widely used in biology for over a century but, with the exception of [Formula: see text], they have had no justification other than as a convenient fit to empirical data. Here, we show that they are the universal limit for the sharpness of any input-output response arising from a Markov process model at thermodynamic equilibrium. Models may represent arbitrary molecular complexity, with multiple ligands, internal states, conformations, coregulators, etc, under core assumptions that are detailed in the paper. The model output may be any linear combination of steady-state probabilities, with components other than the chosen input ligand held constant. This formulation generalizes most of the responses in the literature. We use a coarse-graining method in the graph-theoretic linear framework to show that two sharpness measures for input-output responses fall within an effectively bounded region of the positive quadrant, [Formula: see text], for any equilibrium model with [Formula: see text] input binding sites. [Formula: see text] exhibits a cusp which approaches, but never exceeds, the sharpness of [Formula: see text], but the region and the cusp can be exceeded when models are taken away from thermodynamic equilibrium. Such fundamental thermodynamic limits are called Hopfield barriers, and our results provide a biophysical justification for the Hill functions as the universal Hopfield barriers for sharpness. Our results also introduce an object, [Formula: see text], whose structure may be of mathematical interest, and suggest the importance of characterizing Hopfield barriers for other forms of cellular information processing.


Subject(s)
Markov Chains , Thermodynamics , Models, Biological , Ligands
12.
Proc Natl Acad Sci U S A ; 121(32): e2318805121, 2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39083417

ABSTRACT

How do we capture the breadth of behavior in animal movement, from rapid body twitches to aging? Using high-resolution videos of the nematode worm Caenorhabditis elegans, we show that a single dynamics connects posture-scale fluctuations with trajectory diffusion and longer-lived behavioral states. We take short posture sequences as an instantaneous behavioral measure, fixing the sequence length for maximal prediction. Within the space of posture sequences, we construct a fine-scale, maximum entropy partition so that transitions among microstates define a high-fidelity Markov model, which we also use as a means of principled coarse-graining. We translate these dynamics into movement using resistive force theory, capturing the statistical properties of foraging trajectories. Predictive across scales, we leverage the longest-lived eigenvectors of the inferred Markov chain to perform a top-down subdivision of the worm's foraging behavior, revealing both "runs-and-pirouettes" as well as previously uncharacterized finer-scale behaviors. We use our model to investigate the relevance of these fine-scale behaviors for foraging success, recovering a trade-off between local and global search strategies.


Subject(s)
Behavior, Animal , Caenorhabditis elegans , Markov Chains , Animals , Caenorhabditis elegans/physiology , Behavior, Animal/physiology , Models, Biological , Movement/physiology
13.
Proc Natl Acad Sci U S A ; 121(3): e2318989121, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38215186

ABSTRACT

The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Algorithms , COVID-19/epidemiology , Markov Chains
14.
Proc Natl Acad Sci U S A ; 121(35): e2322077121, 2024 Aug 27.
Article in English | MEDLINE | ID: mdl-39172779

ABSTRACT

2'-deoxy-ATP (dATP) improves cardiac function by increasing the rate of crossbridge cycling and Ca[Formula: see text] transient decay. However, the mechanisms of these effects and how therapeutic responses to dATP are achieved when dATP is only a small fraction of the total ATP pool remain poorly understood. Here, we used a multiscale computational modeling approach to analyze the mechanisms by which dATP improves ventricular function. We integrated atomistic simulations of prepowerstroke myosin and actomyosin association, filament-scale Markov state modeling of sarcomere mechanics, cell-scale analysis of myocyte Ca[Formula: see text] dynamics and contraction, organ-scale modeling of biventricular mechanoenergetics, and systems level modeling of circulatory dynamics. Molecular and Brownian dynamics simulations showed that dATP increases the actomyosin association rate by 1.9 fold. Markov state models predicted that dATP increases the pool of myosin heads available for crossbridge cycling, increasing steady-state force development at low dATP fractions by 1.3 fold due to mechanosensing and nearest-neighbor cooperativity. This was found to be the dominant mechanism by which small amounts of dATP can improve contractile function at myofilament to organ scales. Together with faster myocyte Ca[Formula: see text] handling, this led to improved ventricular contractility, especially in a failing heart model in which dATP increased ejection fraction by 16% and the energy efficiency of cardiac contraction by 1%. This work represents a complete multiscale model analysis of a small molecule myosin modulator from single molecule to organ system biophysics and elucidates how the molecular mechanisms of dATP may improve cardiovascular function in heart failure with reduced ejection fraction.


Subject(s)
Deoxyadenine Nucleotides , Heart Failure , Heart Failure/drug therapy , Heart Failure/physiopathology , Deoxyadenine Nucleotides/metabolism , Animals , Humans , Ventricular Function , Models, Cardiovascular , Myocardial Contraction/drug effects , Myosins/metabolism , Sarcomeres/metabolism , Actomyosin/metabolism , Myocytes, Cardiac/metabolism , Myocytes, Cardiac/drug effects , Calcium/metabolism , Markov Chains
15.
Genome Res ; 33(12): 2156-2173, 2023 Dec 27.
Article in English | MEDLINE | ID: mdl-38097386

ABSTRACT

Single nucleotide polymorphisms (SNPs) from omics data create a reidentification risk for individuals and their relatives. Although the ability of thousands of SNPs (especially rare ones) to identify individuals has been repeatedly shown, the availability of small sets of noisy genotypes, from environmental DNA samples or functional genomics data, motivated us to quantify their informativeness. We present a computational tool suite, termed Privacy Leakage by Inference across Genotypic HMM Trajectories (PLIGHT), using population-genetics-based hidden Markov models (HMMs) of recombination and mutation to find piecewise alignment of small, noisy SNP sets to reference haplotype databases. We explore cases in which query individuals are either known to be in the database, or not, and consider several genotype queries, including those from environmental sample swabs from known individuals and from simulated "mosaics" (two-individual composites). Using PLIGHT on a database with ∼5000 haplotypes, we find for common, noise-free SNPs that only ten are sufficient to identify individuals, ∼20 can identify both components in two-individual mosaics, and 20-30 can identify first-order relatives. Using noisy environmental-sample-derived SNPs, PLIGHT identifies individuals in a database using ∼30 SNPs. Even when the individuals are not in the database, local genotype matches allow for some phenotypic information leakage based on coarse-grained SNP imputation. Finally, by quantifying privacy leakage from sparse SNP sets, PLIGHT helps determine the value of selectively sanitizing released SNPs without explicit assumptions about population membership or allele frequency. To make this practical, we provide a sanitization tool to remove the most identifying SNPs from genomic data.


Subject(s)
Genotype , Haplotypes , Polymorphism, Single Nucleotide , Humans , Databases, Genetic , Markov Chains , Software , Genetic Privacy , Algorithms , Sequence Alignment , Genetics, Population/methods
16.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38628114

ABSTRACT

Spatial transcriptomics (ST) has become a powerful tool for exploring the spatial organization of gene expression in tissues. Imaging-based methods, though offering superior spatial resolutions at the single-cell level, are limited in either the number of imaged genes or the sensitivity of gene detection. Existing approaches for enhancing ST rely on the similarity between ST cells and reference single-cell RNA sequencing (scRNA-seq) cells. In contrast, we introduce stDiff, which leverages relationships between gene expression abundance in scRNA-seq data to enhance ST. stDiff employs a conditional diffusion model, capturing gene expression abundance relationships in scRNA-seq data through two Markov processes: one introducing noise to transcriptomics data and the other denoising to recover them. The missing portion of ST is predicted by incorporating the original ST data into the denoising process. In our comprehensive performance evaluation across 16 datasets, utilizing multiple clustering and similarity metrics, stDiff stands out for its exceptional ability to preserve topological structures among cells, positioning itself as a robust solution for cell population identification. Moreover, stDiff's enhancement outcomes closely mirror the actual ST data within the batch space. Across diverse spatial expression patterns, our model accurately reconstructs them, delineating distinct spatial boundaries. This highlights stDiff's capability to unify the observed and predicted segments of ST data for subsequent analysis. We anticipate that stDiff, with its innovative approach, will contribute to advancing ST imputation methodologies.


Subject(s)
Benchmarking , Gene Expression Profiling , Cluster Analysis , Diffusion , Markov Chains , Sequence Analysis, RNA , Transcriptome
17.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39133097

ABSTRACT

Constructing gene regulatory networks is a widely adopted approach for investigating gene regulation, offering diverse applications in biology and medicine. A great deal of research focuses on using time series data or single-cell RNA-sequencing data to infer gene regulatory networks. However, such gene expression data lack either cellular or temporal information. Fortunately, the advent of time-lapse confocal laser microscopy enables biologists to obtain tree-shaped gene expression data of Caenorhabditis elegans, achieving both cellular and temporal resolution. Although such tree-shaped data provide abundant knowledge, they pose challenges like non-pairwise time series, laying the inaccuracy of downstream analysis. To address this issue, a comprehensive framework for data integration and a novel Bayesian approach based on Boolean network with time delay are proposed. The pre-screening process and Markov Chain Monte Carlo algorithm are applied to obtain the parameter estimates. Simulation studies show that our method outperforms existing Boolean network inference algorithms. Leveraging the proposed approach, gene regulatory networks for five subtrees are reconstructed based on the real tree-shaped datatsets of Caenorhabditis elegans, where some gene regulatory relationships confirmed in previous genetic studies are recovered. Also, heterogeneity of regulatory relationships in different cell lineage subtrees is detected. Furthermore, the exploration of potential gene regulatory relationships that bear importance in human diseases is undertaken. All source code is available at the GitHub repository https://github.com/edawu11/BBTD.git.


Subject(s)
Algorithms , Caenorhabditis elegans , Gene Regulatory Networks , Caenorhabditis elegans/genetics , Animals , Bayes Theorem , Computational Biology/methods , Markov Chains , Gene Expression Profiling/methods
18.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-39003531

ABSTRACT

Profile hidden Markov models (pHMMs) are able to achieve high sensitivity in remote homology search, making them popular choices for detecting novel or highly diverged viruses in metagenomic data. However, many existing pHMM databases have different design focuses, making it difficult for users to decide the proper one to use. In this review, we provide a thorough evaluation and comparison for multiple commonly used profile HMM databases for viral sequence discovery in metagenomic data. We characterized the databases by comparing their sizes, their taxonomic coverage, and the properties of their models using quantitative metrics. Subsequently, we assessed their performance in virus identification across multiple application scenarios, utilizing both simulated and real metagenomic data. We aim to offer researchers a thorough and critical assessment of the strengths and limitations of different databases. Furthermore, based on the experimental results obtained from the simulated and real metagenomic data, we provided practical suggestions for users to optimize their use of pHMM databases, thus enhancing the quality and reliability of their findings in the field of viral metagenomics.


Subject(s)
Markov Chains , Metagenomics , Viruses , Metagenomics/methods , Viruses/genetics , Viruses/classification , Databases, Genetic , Humans , Computational Biology/methods , Algorithms
19.
Cell ; 146(4): 633-44, 2011 Aug 19.
Article in English | MEDLINE | ID: mdl-21854987

ABSTRACT

Cancer cells within individual tumors often exist in distinct phenotypic states that differ in functional attributes. While cancer cell populations typically display distinctive equilibria in the proportion of cells in various states, the mechanisms by which this occurs are poorly understood. Here, we study the dynamics of phenotypic proportions in human breast cancer cell lines. We show that subpopulations of cells purified for a given phenotypic state return towards equilibrium proportions over time. These observations can be explained by a Markov model in which cells transition stochastically between states. A prediction of this model is that, given certain conditions, any subpopulation of cells will return to equilibrium phenotypic proportions over time. A second prediction is that breast cancer stem-like cells arise de novo from non-stem-like cells. These findings contribute to our understanding of cancer heterogeneity and reveal how stochasticity in single-cell behaviors promotes phenotypic equilibrium in populations of cancer cells.


Subject(s)
Breast Neoplasms/pathology , Markov Chains , Animals , Female , Flow Cytometry , Gene Expression Profiling , Humans , Mice , Mice, Inbred NOD , Mice, SCID , Neoplasm Transplantation , Neoplastic Stem Cells/pathology , Stochastic Processes , Transplantation, Heterologous
20.
Proc Natl Acad Sci U S A ; 120(12): e2221048120, 2023 03 21.
Article in English | MEDLINE | ID: mdl-36920924

ABSTRACT

The ability to predict and understand complex molecular motions occurring over diverse timescales ranging from picoseconds to seconds and even hours in biological systems remains one of the largest challenges to chemical theory. Markov state models (MSMs), which provide a memoryless description of the transitions between different states of a biochemical system, have provided numerous important physically transparent insights into biological function. However, constructing these models often necessitates performing extremely long molecular simulations to converge the rates. Here, we show that by incorporating memory via the time-convolutionless generalized master equation (TCL-GME) one can build a theoretically transparent and physically intuitive memory-enriched model of biochemical processes with up to a three order of magnitude reduction in the simulation data required while also providing a higher temporal resolution. We derive the conditions under which the TCL-GME provides a more efficient means to capture slow dynamics than MSMs and rigorously prove when the two provide equally valid and efficient descriptions of the slow configurational dynamics. We further introduce a simple averaging procedure that enables our TCL-GME approach to quickly converge and accurately predict long-time dynamics even when parameterized with noisy reference data arising from short trajectories. We illustrate the advantages of the TCL-GME using alanine dipeptide, the human argonaute complex, and FiP35 WW domain.


Subject(s)
Dipeptides , Molecular Dynamics Simulation , Humans , Markov Chains
SELECTION OF CITATIONS
SEARCH DETAIL