|

1.

Definition of metafounders based on population structure analysis.

Anglhuber, Christine; Edel, Christian; Pimentel, Eduardo C G; Emmerling, Reiner; Götz, Kay-Uwe; Thaller, Georg.

Genet Sel Evol ; 56(1): 43, 2024 Jun 06.

Article En | MEDLINE | ID: mdl-38844876

BACKGROUND: Limitations of the concept of identity by descent in the presence of stratification within a breeding population may lead to an incomplete formulation of the conventional numerator relationship matrix ( A ). Combining A with the genomic relationship matrix ( G ) in a single-step approach for genetic evaluation may cause inconsistencies that can be a source of bias in the resulting predictions. The objective of this study was to identify stratification using genomic data and to transfer this information to matrix A , to improve the compatibility of A and G . METHODS: Using software to detect population stratification (ADMIXTURE), we developed an iterative approach. First, we identified 2 to 40 strata ( k ) with ADMIXTURE, which we then introduced in a stepwise manner into matrix A , to generate matrix A Γ using the metafounder methodology. Improvements in consistency between matrix G and A Γ were evaluated by regression analysis and through the comparison of the overall mean and mean diagonal values of both matrices. The approach was tested on genotype and pedigree information of European and North American Brown Swiss animals (85,249). Analyses with ADMIXTURE were initially performed on the full set of genotypes (S1). In addition, we used an alternative dataset where we avoided sampling of closely related animals (S2). RESULTS: Results of the regression analyses of standard A on G were - 0.489, 0.780 and 0.647 for intercept, slope and fit of the regression. When analysing S1 data results of the regression for A Γ on G corresponding values were - 0.028, 1.087 and 0.807 for k =7, while there was no clear optimum k . Analyses of S2 gave a clear optimal k =24, with - 0.020, 0.998 and 0.817 as results of the regression. For this k differences in mean and mean diagonal values between both matrices were negligible. CONCLUSIONS: The derivation of hidden stratification information based on genotyped animals and its integration into A improved compatibility of the resulting A Γ and G considerably compared to the initial situation. In dairy breeding populations with large half-sib families as sub-structures it is necessary to balance the data when applying population structure analysis to obtain meaningful results.

Genetics, Population , Models, Genetic , Pedigree , Animals , Genetics, Population/methods , Cattle/genetics , Breeding/methods , Genotype , Software , Male

2.

The speed of neutral evolution on graphs.

Gao, Shun; Liu, Yuan; Wu, Bin.

J R Soc Interface ; 21(215): 20230594, 2024 Jun.

Article En | MEDLINE | ID: mdl-38835245

The speed of evolution on structured populations is crucial for biological and social systems. The likelihood of invasion is key for evolutionary stability. But it makes little sense if it takes long. It is far from known what population structure slows down evolution. We investigate the absorption time of a single neutral mutant for all the 112 non-isomorphic undirected graphs of size 6. We find that about three-quarters of the graphs have an absorption time close to that of the complete graph, less than one-third are accelerators, and more than two-thirds are decelerators. Surprisingly, determining whether a graph has a long absorption time is too complicated to be captured by the joint degree distribution. Via the largest sojourn time, we find that echo-chamber-like graphs, which consist of two homogeneous graphs connected by few sparse links, are likely to slow down absorption. These results are robust for large graphs, mutation patterns as well as evolutionary processes. This work serves as a benchmark for timing evolution with complex interactions, and fosters the understanding of polarization in opinion formation.

Biological Evolution , Mutation , Models, Genetic

3.

Dynamics of karyotype evolution.

Kuzmin, Elena; Baker, Toby M; Van Loo, Peter; Glass, Leon.

Chaos ; 34(5)2024 May 01.

Article En | MEDLINE | ID: mdl-38717409

In the evolution of species, the karyotype changes with a timescale of tens to hundreds of thousand years. In the development of cancer, the karyotype often is modified in cancerous cells over the lifetime of an individual. Characterizing these changes and understanding the mechanisms leading to them has been of interest in a broad range of disciplines including evolution, cytogenetics, and cancer genetics. A central issue relates to the relative roles of random vs deterministic mechanisms in shaping the changes. Although it is possible that all changes result from random events followed by selection, many results point to other non-random factors that play a role in karyotype evolution. In cancer, chromosomal instability leads to characteristic changes in the karyotype, in which different individuals with a specific type of cancer display similar changes in karyotype structure over time. Statistical analyses of chromosome lengths in different species indicate that the length distribution of chromosomes is not consistent with models in which the lengths of chromosomes are random or evolve solely by simple random processes. A better understanding of the mechanisms underlying karyotype evolution should enable the development of quantitative theoretical models that combine the random and deterministic processes that can be compared to experimental determinations of the karyotype in diverse settings.

Karyotype , Humans , Animals , Evolution, Molecular , Models, Genetic , Neoplasms/genetics , Biological Evolution

4.

An interpretable model of pre-mRNA splicing for animal and plant genes.

McCue, Kayla; Burge, Christopher B.

Sci Adv ; 10(19): eadn1547, 2024 May 10.

Article En | MEDLINE | ID: mdl-38718117

Pre-mRNA splicing is a fundamental step in gene expression, conserved across eukaryotes, in which the spliceosome recognizes motifs at the 3' and 5' splice sites (SSs), excises introns, and ligates exons. SS recognition and pairing is often influenced by protein splicing factors (SFs) that bind to splicing regulatory elements (SREs). Here, we describe SMsplice, a fully interpretable model of pre-mRNA splicing that combines models of core SS motifs, SREs, and exonic and intronic length preferences. We learn models that predict SS locations with 83 to 86% accuracy in fish, insects, and plants and about 70% in mammals. Learned SRE motifs include both known SF binding motifs and unfamiliar motifs, and both motif classes are supported by genetic analyses. Our comparisons across species highlight similarities between non-mammals, increased reliance on intronic SREs in plant splicing, and a greater reliance on SREs in mammalian splicing.

Exons , Introns , RNA Precursors , RNA Splice Sites , RNA Splicing , RNA Precursors/genetics , RNA Precursors/metabolism , Animals , Introns/genetics , Exons/genetics , Genes, Plant , Models, Genetic , Spliceosomes/metabolism , Spliceosomes/genetics , Plants/genetics , Humans , RNA Splicing Factors/genetics , RNA Splicing Factors/metabolism

5.

Equivalence of variance components between standard and recursive genetic models using LDL' transformations.

Varona, Luis; López-Carbonell, David; Srihi, Houssemeddine; Hervás-Rivero, Carlos; González-Recio, Óscar; Altarriba, Juan.

Genet Sel Evol ; 56(1): 33, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698321

BACKGROUND: Recursive models are a category of structural equation models that propose a causal relationship between traits. These models are more parameterized than multiple trait models, and they require imposing restrictions on the parameter space to ensure statistical identification. Nevertheless, in certain situations, the likelihood of recursive models and multiple trait models are equivalent. Consequently, the estimates of variance components derived from the multiple trait mixed model can be converted into estimates under several recursive models through LDL' or block-LDL' transformations. RESULTS: The procedure was employed on a dataset comprising five traits (birth weight-BW, weight at 90 days-W90, weight at 210 days-W210, cold carcass weight-CCW and conformation-CON) from the Pirenaica beef cattle breed. These phenotypic records were unequally distributed among 149,029 individuals and had a high percentage of missing data. The pedigree used consisted of 343,753 individuals. A Bayesian approach involving a multiple-trait mixed model was applied using a Gibbs sampler. The variance components obtained at each iteration of the Gibbs sampler were subsequently used to estimate the variance components within three distinct recursive models. CONCLUSIONS: The LDL' or block-LDL' transformations applied to the variance component estimates achieved from a multiple trait mixed model enabled inference across multiple sets of recursive models, with the sole prerequisite of being likelihood equivalent. Furthermore, the aforementioned transformations simplify the handling of missing data when conducting inference within the realm of recursive models.

Models, Genetic , Animals , Cattle/genetics , Bayes Theorem , Phenotype , Breeding/methods , Breeding/standards , Birth Weight/genetics , Pedigree , Quantitative Trait, Heritable

6.

Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation-maximization maximum likelihood and increase of relationships.

Legarra, Andres; Bermann, Matias; Mei, Quanshun; Christensen, Ole F.

Genet Sel Evol ; 56(1): 35, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698347

BACKGROUND: The theory of "metafounders" proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders Γ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). METHODS: We derive likelihood methods to estimate Γ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of Γ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to Γ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of Γ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of Γ using estimates of the rate of increase of inbreeding ( Δ F ), resulting in an expanded Γ and in a pseudo-EM+ Δ F algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. RESULTS: For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ Δ F ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ Δ F ) approach yielded more accurate and unbiased estimates. CONCLUSIONS: We derived ML, pseudo-EM and pseudo-EM+ Δ F methods to estimate Γ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.

Breeding , Models, Genetic , Pedigree , Animals , Likelihood Functions , Breeding/methods , Algorithms , Sheep/genetics , Genomics/methods , Computer Simulation , Male , Female , Genotype

7.

Redefining and interpreting genomic relationships of metafounders.

Legarra, Andres; Bermann, Matias; Mei, Quanshun; Christensen, Ole F.

Genet Sel Evol ; 56(1): 34, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698373

Metafounders are a useful concept to characterize relationships within and across populations, and to help genetic evaluations because they help modelling the means and variances of unknown base population animals. Current definitions of metafounder relationships are sensitive to the choice of reference alleles and have not been compared to their counterparts in population genetics-namely, heterozygosities, FST coefficients, and genetic distances. We redefine the relationships across populations with an arbitrary base of a maximum heterozygosity population in Hardy-Weinberg equilibrium. Then, the relationship between or within populations is a cross-product of the form Γ b , b ' = 2 n 2 p b - 1 2 p b ' - 1 ' with p being vectors of allele frequencies at n markers in populations b and b ' . This is simply the genomic relationship of two pseudo-individuals whose genotypes are equal to twice the allele frequencies. We also show that this coding is invariant to the choice of reference alleles. In addition, standard population genetics metrics (inbreeding coefficients of various forms; FST differentiation coefficients; segregation variance; and Nei's genetic distance) can be obtained from elements of matrix Γ .

Gene Frequency , Genetics, Population , Models, Genetic , Animals , Genetics, Population/methods , Heterozygote , Alleles , Genomics/methods , Genotype , Genome

8.

How mutation accumulation depends on the structure of the cell lineage tree.

Derényi, Imre; Demeter, Márton C; Pérez-Jiménez, Mario; Grajzel, Dániel; Szöllosi, Gergely J.

Phys Rev E ; 109(4-1): 044407, 2024 Apr.

Article En | MEDLINE | ID: mdl-38755817

All the cells of a multicellular organism are the product of cell divisions that trace out a single binary tree, the so-called cell lineage tree. Because cell divisions are accompanied by replication errors, the shape of the cell lineage tree is a key determinant of how somatic evolution, which can potentially lead to cancer, proceeds. Carcinogenesis requires the accumulation of a certain number of driver mutations. By mapping the accumulation of mutations into a graph theoretical problem, we present an exact numerical method to calculate the probability of collecting a given number of mutations and show that for low mutation rates it can be approximated with a simple analytical formula, which depends only on the distribution of the lineage lengths, and is dominated by the longest lineages. Our results are crucial in understanding how natural selection can shape the cell lineage trees of multicellular organisms and curtail somatic evolution.

Cell Lineage , Models, Genetic , Mutation Accumulation , Mutation

9.

Flexible model-based non-negative matrix factorization with application to mutational signatures.

Laursen, Ragnhild; Maretty, Lasse; Hobolth, Asger.

Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.

Article En | MEDLINE | ID: mdl-38753402

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.

Algorithms , Mutation , Neoplasms , Humans , Neoplasms/genetics , Models, Genetic , Computer Simulation , Models, Statistical

10.

Accuracy of genomic prediction using multiple Atlantic salmon populations.

Ajasa, Afees A; Boison, Solomon A; Gjøen, Hans M; Lillehammer, Marie.

Genet Sel Evol ; 56(1): 38, 2024 May 15.

Article En | MEDLINE | ID: mdl-38750427

BACKGROUND: The accuracy of genomic prediction is partly determined by the size of the reference population. In Atlantic salmon breeding programs, four parallel populations often exist, thus offering the opportunity to increase the size of the reference set by combining these populations. By allowing a reduction in the number of records per population, multi-population prediction can potentially reduce cost and welfare issues related to the recording of traits, particularly for diseases. In this study, we evaluated the accuracy of multi- and across-population prediction of breeding values for resistance to amoebic gill disease (AGD) using all single nucleotide polymorphisms (SNPs) on a 55K chip or a selected subset of SNPs based on the signs of allele substitution effect estimates across populations, using both linear and nonlinear genomic prediction (GP) models in Atlantic salmon populations. In addition, we investigated genetic distance, genetic correlation estimated based on genomic relationships, and persistency of linkage disequilibrium (LD) phase across these populations. RESULTS: The genetic distance between populations ranged from 0.03 to 0.07, while the genetic correlation ranged from 0.19 to 0.99. Nonetheless, compared to within-population prediction, there was limited or no impact of combining populations for multi-population prediction across the various models used or when using the selected subset of SNPs. The estimates of across-population prediction accuracy were low and to some extent proportional to the genetic correlation estimates. The persistency of LD phase between adjacent markers across populations using all SNP data ranged from 0.51 to 0.65, indicating that LD is poorly conserved across the studied populations. CONCLUSIONS: Our results show that a high genetic correlation and a high genetic relationship between populations do not guarantee a higher prediction accuracy from multi-population genomic prediction in Atlantic salmon.

Linkage Disequilibrium , Polymorphism, Single Nucleotide , Salmo salar , Animals , Salmo salar/genetics , Genomics/methods , Fish Diseases/genetics , Genetics, Population/methods , Models, Genetic , Breeding/methods , Genome , Disease Resistance/genetics

11.

Hierarchical modelling of variance components makes analysis of resolvable incomplete block designs more efficient.

Studnicki, Marcin; Piepho, Hans Peter.

Theor Appl Genet ; 137(6): 134, 2024 May 16.

Article En | MEDLINE | ID: mdl-38753078

The standard approach to variance component estimation in linear mixed models for alpha designs is the residual maximum likelihood (REML) method. One drawback of the REML method in the context of incomplete block designs is that the block variance may be estimated as zero, which can compromise the recovery of inter-block information and hence reduce the accuracy of treatment effects estimation. Due to the development of statistical and computational methods, there is an increasing interest in adopting hierarchical approaches to analysis. In order to increase the precision of the analysis of individual trials laid out as alpha designs, we here make a proposal to create an objectively informed prior distribution for variance components for replicates, blocks and plots, based on the results of previous (historical) trials. We propose different modelling approaches for the prior distributions and evaluate the effectiveness of the hierarchical approach compared to the REML method, which is classically used for analysing individual trials in two-stage approaches for multi-environment trials.

Models, Genetic , Likelihood Functions , Linear Models , Computer Simulation , Models, Statistical

12.

Robust genetic codes enhance protein evolvability.

Rozhonová, Hana; Martí-Gómez, Carlos; McCandlish, David M; Payne, Joshua L.

PLoS Biol ; 22(5): e3002594, 2024 May.

Article En | MEDLINE | ID: mdl-38754362

The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.

Evolution, Molecular , Genetic Code , Proteins , Proteins/genetics , Proteins/metabolism , Mutation/genetics , Codon/genetics , Models, Genetic , Synthetic Biology/methods , Protein Biosynthesis , Protein Engineering/methods

13.

Generalizability of polygenic prediction models: how is the R² defined on test data?

Staerk, Christian; Klinkhammer, Hannah; Wistuba, Tobias; Maj, Carlo; Mayr, Andreas.

BMC Med Genomics ; 17(1): 132, 2024 May 16.

Article En | MEDLINE | ID: mdl-38755654

BACKGROUND: Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the R 2 is a commonly used measure to evaluate prediction accuracy. While the R 2 is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. METHODS: Based on large-scale genotype data from the UK Biobank, we compare three definitions of the R 2 on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. RESULTS: Our analysis shows that the choice of the R 2 definition can lead to considerably different results on test data, making the comparison of R 2 values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the R 2 based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis - whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of R 2 can provide valuable complementary information. CONCLUSIONS: Awareness of the different definitions of the R 2 on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting R 2 values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.

Models, Genetic , Multifactorial Inheritance , Humans , Phenotype , Genetic Predisposition to Disease

14.

Genetic code robustness and protein evolvability are correlated and protein-specific.

Metzger, Brian P H.

PLoS Biol ; 22(5): e3002627, 2024 May.

Article En | MEDLINE | ID: mdl-38758732

The relationship between genetic code robustness and protein evolvability is unknown. A new study in PLOS Biology using in silico rewiring of genetic codes and functional protein data identified a positive correlation between code robustness and protein evolvability that is protein-specific.

Evolution, Molecular , Genetic Code , Proteins , Proteins/genetics , Proteins/metabolism , Models, Genetic

15.

Improving selection decisions with mating information by accounting for Mendelian sampling variances looking two generations ahead.

Niehoff, Tobias A M; Ten Napel, Jan; Bijma, Piter; Pook, Torsten; Wientjes, Yvonne C J; Hegedus, Bernadett; Calus, Mario P L.

Genet Sel Evol ; 56(1): 41, 2024 May 21.

Article En | MEDLINE | ID: mdl-38773363

BACKGROUND: Breeding programs are judged by the genetic level of animals that are used to disseminate genetic progress. These animals are typically the best ones of the population. To maximise the genetic level of very good animals in the next generation, parents that are more likely to produce top performing offspring need to be selected. The ability of individuals to produce high-performing progeny differs because of differences in their breeding values and gametic variances. Differences in gametic variances among individuals are caused by differences in heterozygosity and linkage. The use of the gametic Mendelian sampling variance has been proposed before, for use in the usefulness criterion or Index5, and in this work, we extend existing approaches by not only considering the gametic Mendelian sampling variance of individuals, but also of their potential offspring. Thus, the criteria developed in this study plan one additional generation ahead. For simplicity, we assumed that the true quantitative trait loci (QTL) effects, genetic map and the haplotypes of all animals are known. RESULTS: In this study, we propose a new selection criterion, ExpBVSelGrOff, which describes the genetic level of selected grand-offspring that are produced by selected offspring of a particular mating. We compare our criterion with other published criteria in a stochastic simulation of an ongoing breeding program for 21 generations for proof of concept. ExpBVSelGrOff performed better than all other tested criteria, like the usefulness criterion or Index5 which have been proposed in the literature, without compromising short-term gains. After only five generations, when selection is strong (1%), selection based on ExpBVSelGrOff achieved 5.8% more commercial genetic gain and retained 25% more genetic variance without compromising inbreeding rate compared to selection based only on breeding values. CONCLUSIONS: Our proposed selection criterion offers a new tool to accelerate genetic progress for contemporary genomic breeding programs. It retains more genetic variance than previously published criteria that plan less far ahead. Considering future gametic Mendelian sampling variances in the selection process also seems promising for maintaining more genetic variance.

Models, Genetic , Quantitative Trait Loci , Selection, Genetic , Animals , Breeding/methods , Female , Male , Selective Breeding

16.

Wright's Hierarchical F-Statistics.

Uyenoyama, Marcy K.

Mol Biol Evol ; 41(5)2024 May 03.

Article En | MEDLINE | ID: mdl-38696269

This perspective article offers a meditation on FST and other quantities developed by Sewall Wright to describe the population structure, defined as any departure from reproduction through random union of gametes. Concepts related to the F-statistics draw from studies of the partitioning of variation, identity coefficients, and diversity measures. Relationships between the first two approaches have recently been clarified and unified. This essay addresses the third pillar of the discussion: Nei's GST and related measures. A hierarchy of probabilities of identity-by-state provides a description of the relationships among levels of a structured population with respect to genetic diversity. Explicit expressions for the identity-by-state probabilities are determined for models of structured populations undergoing regular inbreeding and recurrent mutation. Levels of genetic diversity within and between subpopulations reflect mutation as well as migration. Accordingly, indices of the population structure are inherently locus-specific, contrary to the intentions of Wright. Some implications of this locus-specificity are explored.

Genetic Variation , Genetics, Population , Models, Genetic , Genetics, Population/methods , Mutation , Inbreeding

17.

What can we learn when fitting a simple telegraph model to a complex gene expression model?

Jiao, Feng; Li, Jing; Liu, Ting; Zhu, Yifeng; Che, Wenhao; Bleris, Leonidas; Jia, Chen.

PLoS Comput Biol ; 20(5): e1012118, 2024 May.

Article En | MEDLINE | ID: mdl-38743803

In experiments, the distributions of mRNA or protein numbers in single cells are often fitted to the random telegraph model which includes synthesis and decay of mRNA or protein, and switching of the gene between active and inactive states. While commonly used, this model does not describe how fluctuations are influenced by crucial biological mechanisms such as feedback regulation, non-exponential gene inactivation durations, and multiple gene activation pathways. Here we investigate the dynamical properties of four relatively complex gene expression models by fitting their steady-state mRNA or protein number distributions to the simple telegraph model. We show that despite the underlying complex biological mechanisms, the telegraph model with three effective parameters can accurately capture the steady-state gene product distributions, as well as the conditional distributions in the active gene state, of the complex models. Some effective parameters are reliable and can reflect realistic dynamic behaviors of the complex models, while others may deviate significantly from their real values in the complex models. The effective parameters can also be applied to characterize the capability for a complex model to exhibit multimodality. Using additional information such as single-cell data at multiple time points, we provide an effective method of distinguishing the complex models from the telegraph model. Furthermore, using measurements under varying experimental conditions, we show that fitting the mRNA or protein number distributions to the telegraph model may even reveal the underlying gene regulation mechanisms of the complex models. The effectiveness of these methods is confirmed by analysis of single-cell data for E. coli and mammalian cells. All these results are robust with respect to cooperative transcriptional regulation and extrinsic noise. In particular, we find that faster relaxation speed to the steady state results in more precise parameter inference under large extrinsic noise.

Models, Genetic , Computational Biology/methods , RNA, Messenger/genetics , RNA, Messenger/metabolism , Gene Expression Regulation/genetics , Humans , Single-Cell Analysis/methods , Escherichia coli/genetics , Escherichia coli/metabolism , Gene Expression/genetics , Computer Simulation

18.

Unpredictability of the Fitness Effects of Antimicrobial Resistance Mutations Across Environments in Escherichia coli.

Hinz, Aaron; Amado, André; Kassen, Rees; Bank, Claudia; Wong, Alex.

Mol Biol Evol ; 41(5)2024 May 03.

Article En | MEDLINE | ID: mdl-38709811

The evolution of antimicrobial resistance (AMR) in bacteria is a major public health concern, and antibiotic restriction is often implemented to reduce the spread of resistance. These measures rely on the existence of deleterious fitness effects (i.e. costs) imposed by AMR mutations during growth in the absence of antibiotics. According to this assumption, resistant strains will be outcompeted by susceptible strains that do not pay the cost during the period of restriction. The fitness effects of AMR mutations are generally studied in laboratory reference strains grown in standard growth environments; however, the genetic and environmental context can influence the magnitude and direction of a mutation's fitness effects. In this study, we measure how three sources of variation impact the fitness effects of Escherichia coli AMR mutations: the type of resistance mutation, the genetic background of the host, and the growth environment. We demonstrate that while AMR mutations are generally costly in antibiotic-free environments, their fitness effects vary widely and depend on complex interactions between the mutation, genetic background, and environment. We test the ability of the Rough Mount Fuji fitness landscape model to reproduce the empirical data in simulation. We identify model parameters that reasonably capture the variation in fitness effects due to genetic variation. However, the model fails to accommodate the observed variation when considering multiple growth environments. Overall, this study reveals a wealth of variation in the fitness effects of resistance mutations owing to genetic background and environmental conditions, which will ultimately impact their persistence in natural populations.

Drug Resistance, Bacterial , Escherichia coli , Genetic Fitness , Mutation , Escherichia coli/genetics , Escherichia coli/drug effects , Drug Resistance, Bacterial/genetics , Anti-Bacterial Agents/pharmacology , Models, Genetic , Environment

19.

Evolving Improved Sampling Protocols for Dose-Response Modelling Using Genetic Algorithms with a Profile-Likelihood Metric.

Lam, Nicholas N; Murray, Rua; Docherty, Paul D.

Bull Math Biol ; 86(6): 70, 2024 May 08.

Article En | MEDLINE | ID: mdl-38717656

Practical limitations of quality and quantity of data can limit the precision of parameter identification in mathematical models. Model-based experimental design approaches have been developed to minimise parameter uncertainty, but the majority of these approaches have relied on first-order approximations of model sensitivity at a local point in parameter space. Practical identifiability approaches such as profile-likelihood have shown potential for quantifying parameter uncertainty beyond linear approximations. This research presents a genetic algorithm approach to optimise sample timing across various parameterisations of a demonstrative PK-PD model with the goal of aiding experimental design. The optimisation relies on a chosen metric of parameter uncertainty that is based on the profile-likelihood method. Additionally, the approach considers cases where multiple parameter scenarios may require simultaneous optimisation. The genetic algorithm approach was able to locate near-optimal sampling protocols for a wide range of sample number (n = 3-20), and it reduced the parameter variance metric by 33-37% on average. The profile-likelihood metric also correlated well with an existing Monte Carlo-based metric (with a worst-case r > 0.89), while reducing computational cost by an order of magnitude. The combination of the new profile-likelihood metric and the genetic algorithm demonstrate the feasibility of considering the nonlinear nature of models in optimal experimental design at a reasonable computational cost. The outputs of such a process could allow for experimenters to either improve parameter certainty given a fixed number of samples, or reduce sample quantity while retaining the same level of parameter certainty.

Algorithms , Computer Simulation , Mathematical Concepts , Models, Biological , Monte Carlo Method , Likelihood Functions , Humans , Dose-Response Relationship, Drug , Research Design/statistics & numerical data , Models, Genetic , Uncertainty

20.

Exploring the effects of weighting against homoplasy in genealogies of palaeontological phylogenetic matrices.

Ezcurra, Martín D.

Cladistics ; 40(3): 242-281, 2024 Jun.

Article En | MEDLINE | ID: mdl-38728134

Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (k) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson-Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of k-values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal k-values and the number of terminals of the last generations. This could be employed to inform about the range of k-values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.

Paleontology , Phylogeny , Animals , Models, Genetic , Computer Simulation , Fossils