Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 77.129
1.
PLoS Comput Biol ; 20(5): e1011408, 2024 May.
Article En | MEDLINE | ID: mdl-38768228

An important application of CRISPR interference (CRISPRi) technology is for identifying chemical-genetic interactions (CGIs). Discovery of genes that interact with exposure to antibiotics can yield insights to drug targets and mechanisms of action or resistance. The objective is to identify CRISPRi mutants whose relative abundance is suppressed (or enriched) in the presence of a drug when the target protein is depleted, reflecting synergistic behavior. Different sgRNAs for a given target can induce a wide range of protein depletion and differential effects on growth rate. The effect of sgRNA strength can be partially predicted based on sequence features. However, the actual growth phenotype depends on the sensitivity of cells to depletion of the target protein. For essential genes, sgRNA efficiency can be empirically measured by quantifying effects on growth rate. We observe that the most efficient sgRNAs are not always optimal for detecting synergies with drugs. sgRNA efficiency interacts in a non-linear way with drug sensitivity, producing an effect where the concentration-dependence is maximized for sgRNAs of intermediate strength (and less so for sgRNAs that induce too much or too little target depletion). To capture this interaction, we propose a novel statistical method called CRISPRi-DR (for Dose-Response model) that incorporates both sgRNA efficiencies and drug concentrations in a modified dose-response equation. We use CRISPRi-DR to re-analyze data from a recent CGI experiment in Mycobacterium tuberculosis to identify genes that interact with antibiotics. This approach can be generalized to non-CGI datasets, which we show via an CRISPRi dataset for E. coli growth on different carbon sources. The performance is competitive with the best of several related analytical methods. However, for noisier datasets, some of these methods generate far more significant interactions, likely including many false positives, whereas CRISPRi-DR maintains higher precision, which we observed in both empirical and simulated data.


Anti-Bacterial Agents , Anti-Bacterial Agents/pharmacology , CRISPR-Cas Systems/genetics , Escherichia coli/genetics , Escherichia coli/drug effects , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Computational Biology/methods , Dose-Response Relationship, Drug , Mycobacterium tuberculosis/genetics , Mycobacterium tuberculosis/drug effects , RNA, Guide, CRISPR-Cas Systems/genetics , Models, Statistical , Models, Genetic
2.
Chaos ; 34(5)2024 May 01.
Article En | MEDLINE | ID: mdl-38717409

In the evolution of species, the karyotype changes with a timescale of tens to hundreds of thousand years. In the development of cancer, the karyotype often is modified in cancerous cells over the lifetime of an individual. Characterizing these changes and understanding the mechanisms leading to them has been of interest in a broad range of disciplines including evolution, cytogenetics, and cancer genetics. A central issue relates to the relative roles of random vs deterministic mechanisms in shaping the changes. Although it is possible that all changes result from random events followed by selection, many results point to other non-random factors that play a role in karyotype evolution. In cancer, chromosomal instability leads to characteristic changes in the karyotype, in which different individuals with a specific type of cancer display similar changes in karyotype structure over time. Statistical analyses of chromosome lengths in different species indicate that the length distribution of chromosomes is not consistent with models in which the lengths of chromosomes are random or evolve solely by simple random processes. A better understanding of the mechanisms underlying karyotype evolution should enable the development of quantitative theoretical models that combine the random and deterministic processes that can be compared to experimental determinations of the karyotype in diverse settings.


Karyotype , Humans , Animals , Evolution, Molecular , Models, Genetic , Neoplasms/genetics , Biological Evolution
3.
Sci Adv ; 10(19): eadn1547, 2024 May 10.
Article En | MEDLINE | ID: mdl-38718117

Pre-mRNA splicing is a fundamental step in gene expression, conserved across eukaryotes, in which the spliceosome recognizes motifs at the 3' and 5' splice sites (SSs), excises introns, and ligates exons. SS recognition and pairing is often influenced by protein splicing factors (SFs) that bind to splicing regulatory elements (SREs). Here, we describe SMsplice, a fully interpretable model of pre-mRNA splicing that combines models of core SS motifs, SREs, and exonic and intronic length preferences. We learn models that predict SS locations with 83 to 86% accuracy in fish, insects, and plants and about 70% in mammals. Learned SRE motifs include both known SF binding motifs and unfamiliar motifs, and both motif classes are supported by genetic analyses. Our comparisons across species highlight similarities between non-mammals, increased reliance on intronic SREs in plant splicing, and a greater reliance on SREs in mammalian splicing.


Exons , Introns , RNA Precursors , RNA Splice Sites , RNA Splicing , RNA Precursors/genetics , RNA Precursors/metabolism , Animals , Introns/genetics , Exons/genetics , Genes, Plant , Models, Genetic , Spliceosomes/metabolism , Spliceosomes/genetics , Plants/genetics , Humans , RNA Splicing Factors/genetics , RNA Splicing Factors/metabolism
4.
Phys Rev E ; 109(4-1): 044407, 2024 Apr.
Article En | MEDLINE | ID: mdl-38755817

All the cells of a multicellular organism are the product of cell divisions that trace out a single binary tree, the so-called cell lineage tree. Because cell divisions are accompanied by replication errors, the shape of the cell lineage tree is a key determinant of how somatic evolution, which can potentially lead to cancer, proceeds. Carcinogenesis requires the accumulation of a certain number of driver mutations. By mapping the accumulation of mutations into a graph theoretical problem, we present an exact numerical method to calculate the probability of collecting a given number of mutations and show that for low mutation rates it can be approximated with a simple analytical formula, which depends only on the distribution of the lineage lengths, and is dominated by the longest lineages. Our results are crucial in understanding how natural selection can shape the cell lineage trees of multicellular organisms and curtail somatic evolution.


Cell Lineage , Models, Genetic , Mutation Accumulation , Mutation
5.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Article En | MEDLINE | ID: mdl-38753402

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.


Algorithms , Mutation , Neoplasms , Humans , Neoplasms/genetics , Models, Genetic , Computer Simulation , Models, Statistical
6.
Elife ; 122024 May 08.
Article En | MEDLINE | ID: mdl-38717010

Interacting molecules create regulatory architectures that can persist despite turnover of molecules. Although epigenetic changes occur within the context of such architectures, there is limited understanding of how they can influence the heritability of changes. Here, I develop criteria for the heritability of regulatory architectures and use quantitative simulations of interacting regulators parsed as entities, their sensors, and the sensed properties to analyze how architectures influence heritable epigenetic changes. Information contained in regulatory architectures grows rapidly with the number of interacting molecules and its transmission requires positive feedback loops. While these architectures can recover after many epigenetic perturbations, some resulting changes can become permanently heritable. Architectures that are otherwise unstable can become heritable through periodic interactions with external regulators, which suggests that mortal somatic lineages with cells that reproducibly interact with the immortal germ lineage could make a wider variety of architectures heritable. Differential inhibition of the positive feedback loops that transmit regulatory architectures across generations can explain the gene-specific differences in heritable RNA silencing observed in the nematode Caenorhabditis elegans. More broadly, these results provide a foundation for analyzing the inheritance of epigenetic changes within the context of the regulatory architectures implemented using diverse molecules in different living systems.


Caenorhabditis elegans , Epigenesis, Genetic , Caenorhabditis elegans/genetics , Animals , Models, Genetic , Gene Regulatory Networks , Inheritance Patterns
7.
Genome Biol Evol ; 16(5)2024 May 02.
Article En | MEDLINE | ID: mdl-38742287

De novo evolved genes emerge from random parts of noncoding sequences and have, therefore, no homologs from which a function could be inferred. While expression analysis and knockout experiments can provide insights into the function, they do not directly test whether the gene is beneficial for its carrier. Here, we have used a seminatural environment experiment to test the fitness of the previously identified de novo evolved mouse gene Pldi, which has been implicated to have a role in sperm differentiation. We used a knockout mouse strain for this gene and competed it against its parental wildtype strain for several generations of free reproduction. We found that the knockout (ko) allele frequency decreased consistently across three replicates of the experiment. Using an approximate Bayesian computation framework that simulated the data under a demographic scenario mimicking the experiment's demography, we could estimate a selection coefficient ranging between 0.21 and 0.61 for the wildtype allele compared to the ko allele in males, under various models. This implies a relatively strong selective advantage, which would fix the new gene in less than hundred generations after its emergence.


Genetic Fitness , Mice, Knockout , Animals , Mice , Male , Evolution, Molecular , Gene Frequency , Selection, Genetic , Bayes Theorem , Female , Models, Genetic , Alleles
8.
Animal ; 18(5): 101152, 2024 May.
Article En | MEDLINE | ID: mdl-38701710

The traditional genetic evaluation methods generally consider additive genetic effects only and often ignore non-additive (dominance and epistasis) effects that may have contributed to genetic variation of complex traits of livestock species. The available dense single nucleotide polymorphisms (SNPs) panels offer to investigate the potential benefits of including non-additive genetic effects in the genomic evaluation models. Data from 16 971 genotyped (Illumina Bovine 50 K SNP chip) Korean Hanwoo cattle were used to estimate genetic variance components and prediction accuracy of genomic breeding values (GEBVs) for four carcass and meat quality traits: carcass weight (CWT), eye muscle area (EMA), back fat thickness (BFT) and marbling score (MS). Five different genetic models were evaluated through including additive, dominance and epistatic interactions (additive by additive, A × A; additive by dominance, A × D and dominance by dominance, D × D) successively in the models. The estimates of additive genetic variances and narrow sense heritabilities (ha2) were found similar across the evaluated models and traits except when additive interaction (A × A) was included. The dominance variance estimates relative to phenotypic variance ranged from 1.7-3.4% for CWT and MS traits, whereas, they were close to zero for EMA and BFT traits. The magnitude of A × A epistatic heritability (haa2) ranged between 14.8 and 27.7% in all traits. However, heritability estimates for A × D and D × D epistatic interactions (had2 and hdd2) were quite low compared to haa2 and were contributed only 0.0-9.7% of the total phenotypic variation. In general, broad sense heritability (hG2) estimates were almost twice (ranging between 0.54 and 0.68) the ha2 for all of the investigated traits. The inclusion of dominance effects did not improve the prediction accuracy of GEBV but improved 2.0-3.0% when epistatic effects were included in the model. More importantly, rank correlation revealed that partitioning of variance components considering dominance and epistatic effects in the model would enable to re-rank of top animals with better prediction of GEBV. The present result suggests that dominance and epistatic effects could be included in the genomic evaluation model for better estimates of variance components and more accurate prediction of GEBV for carcass and meat quality traits in Korean Hanwoo cattle.


Breeding , Meat , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide , Animals , Cattle/genetics , Meat/analysis , Male , Female , Genotype , Republic of Korea , Genomics , Epistasis, Genetic , Genetic Variation
9.
Mol Biol Evol ; 41(5)2024 May 03.
Article En | MEDLINE | ID: mdl-38696269

This perspective article offers a meditation on FST and other quantities developed by Sewall Wright to describe the population structure, defined as any departure from reproduction through random union of gametes. Concepts related to the F-statistics draw from studies of the partitioning of variation, identity coefficients, and diversity measures. Relationships between the first two approaches have recently been clarified and unified. This essay addresses the third pillar of the discussion: Nei's GST and related measures. A hierarchy of probabilities of identity-by-state provides a description of the relationships among levels of a structured population with respect to genetic diversity. Explicit expressions for the identity-by-state probabilities are determined for models of structured populations undergoing regular inbreeding and recurrent mutation. Levels of genetic diversity within and between subpopulations reflect mutation as well as migration. Accordingly, indices of the population structure are inherently locus-specific, contrary to the intentions of Wright. Some implications of this locus-specificity are explored.


Genetic Variation , Genetics, Population , Models, Genetic , Genetics, Population/methods , Mutation , Inbreeding
10.
Mol Biol Evol ; 41(5)2024 May 03.
Article En | MEDLINE | ID: mdl-38709811

The evolution of antimicrobial resistance (AMR) in bacteria is a major public health concern, and antibiotic restriction is often implemented to reduce the spread of resistance. These measures rely on the existence of deleterious fitness effects (i.e. costs) imposed by AMR mutations during growth in the absence of antibiotics. According to this assumption, resistant strains will be outcompeted by susceptible strains that do not pay the cost during the period of restriction. The fitness effects of AMR mutations are generally studied in laboratory reference strains grown in standard growth environments; however, the genetic and environmental context can influence the magnitude and direction of a mutation's fitness effects. In this study, we measure how three sources of variation impact the fitness effects of Escherichia coli AMR mutations: the type of resistance mutation, the genetic background of the host, and the growth environment. We demonstrate that while AMR mutations are generally costly in antibiotic-free environments, their fitness effects vary widely and depend on complex interactions between the mutation, genetic background, and environment. We test the ability of the Rough Mount Fuji fitness landscape model to reproduce the empirical data in simulation. We identify model parameters that reasonably capture the variation in fitness effects due to genetic variation. However, the model fails to accommodate the observed variation when considering multiple growth environments. Overall, this study reveals a wealth of variation in the fitness effects of resistance mutations owing to genetic background and environmental conditions, which will ultimately impact their persistence in natural populations.


Drug Resistance, Bacterial , Escherichia coli , Genetic Fitness , Mutation , Escherichia coli/genetics , Escherichia coli/drug effects , Drug Resistance, Bacterial/genetics , Anti-Bacterial Agents/pharmacology , Models, Genetic , Environment
11.
Bull Math Biol ; 86(6): 70, 2024 May 08.
Article En | MEDLINE | ID: mdl-38717656

Practical limitations of quality and quantity of data can limit the precision of parameter identification in mathematical models. Model-based experimental design approaches have been developed to minimise parameter uncertainty, but the majority of these approaches have relied on first-order approximations of model sensitivity at a local point in parameter space. Practical identifiability approaches such as profile-likelihood have shown potential for quantifying parameter uncertainty beyond linear approximations. This research presents a genetic algorithm approach to optimise sample timing across various parameterisations of a demonstrative PK-PD model with the goal of aiding experimental design. The optimisation relies on a chosen metric of parameter uncertainty that is based on the profile-likelihood method. Additionally, the approach considers cases where multiple parameter scenarios may require simultaneous optimisation. The genetic algorithm approach was able to locate near-optimal sampling protocols for a wide range of sample number (n = 3-20), and it reduced the parameter variance metric by 33-37% on average. The profile-likelihood metric also correlated well with an existing Monte Carlo-based metric (with a worst-case r > 0.89), while reducing computational cost by an order of magnitude. The combination of the new profile-likelihood metric and the genetic algorithm demonstrate the feasibility of considering the nonlinear nature of models in optimal experimental design at a reasonable computational cost. The outputs of such a process could allow for experimenters to either improve parameter certainty given a fixed number of samples, or reduce sample quantity while retaining the same level of parameter certainty.


Algorithms , Computer Simulation , Mathematical Concepts , Models, Biological , Monte Carlo Method , Likelihood Functions , Humans , Dose-Response Relationship, Drug , Research Design/statistics & numerical data , Models, Genetic , Uncertainty
12.
Cladistics ; 40(3): 242-281, 2024 Jun.
Article En | MEDLINE | ID: mdl-38728134

Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (k) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson-Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of k-values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal k-values and the number of terminals of the last generations. This could be employed to inform about the range of k-values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.


Paleontology , Phylogeny , Animals , Models, Genetic , Computer Simulation , Fossils
13.
PLoS Genet ; 20(5): e1011245, 2024 May.
Article En | MEDLINE | ID: mdl-38728360

Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.


Genome-Wide Association Study , Genotype , Phenotype , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide/genetics , Models, Genetic , Genetic Pleiotropy , Genetic Association Studies/methods , Quantitative Trait Loci/genetics
14.
PLoS Comput Biol ; 20(5): e1011416, 2024 May.
Article En | MEDLINE | ID: mdl-38739641

During meiosis, pairing of homologous chromosomes (homologs) ensures the formation of haploid gametes from diploid precursor cells, a prerequisite for sexual reproduction. Pairing during meiotic prophase I facilitates crossover recombination and homolog segregation during the ensuing reductional cell division. Mechanisms that ensure stable homolog alignment in the presence of an excess of non-homologous chromosomes have remained elusive, but rapid chromosome movements appear to play a role in the process. Apart from homolog attraction, provided by early intermediates of homologous recombination, dissociation of non-homologous associations also appears to contribute to homolog pairing, as suggested by the detection of stable non-homologous chromosome associations in pairing-defective mutants. Here, we have developed an agent-based model for homolog pairing derived from the dynamics of a naturally occurring chromosome ensemble. The model simulates unidirectional chromosome movements, as well as collision dynamics determined by attractive and repulsive forces arising from close-range physical interactions. Chromosome number and size as well as movement velocity and repulsive forces are identified as key factors in the kinetics and efficiency of homologous pairing in addition to homolog attraction. Dissociation of interactions between non-homologous chromosomes may contribute to pairing by crowding homologs into a limited nuclear area thus creating preconditions for close-range homolog attraction. Incorporating natural chromosome lengths, the model accurately recapitulates efficiency and kinetics of homolog pairing observed for wild-type and mutant meiosis in budding yeast, and can be adapted to nuclear dimensions and chromosome sets of other organisms.


Chromosome Pairing , Meiosis , Meiosis/genetics , Chromosome Pairing/genetics , Models, Genetic , Saccharomyces cerevisiae/genetics , Chromosomes, Fungal/genetics , Cell Nucleus/genetics , Cell Nucleus/metabolism , Computer Simulation , Computational Biology
15.
Genet Sel Evol ; 56(1): 41, 2024 May 21.
Article En | MEDLINE | ID: mdl-38773363

BACKGROUND: Breeding programs are judged by the genetic level of animals that are used to disseminate genetic progress. These animals are typically the best ones of the population. To maximise the genetic level of very good animals in the next generation, parents that are more likely to produce top performing offspring need to be selected. The ability of individuals to produce high-performing progeny differs because of differences in their breeding values and gametic variances. Differences in gametic variances among individuals are caused by differences in heterozygosity and linkage. The use of the gametic Mendelian sampling variance has been proposed before, for use in the usefulness criterion or Index5, and in this work, we extend existing approaches by not only considering the gametic Mendelian sampling variance of individuals, but also of their potential offspring. Thus, the criteria developed in this study plan one additional generation ahead. For simplicity, we assumed that the true quantitative trait loci (QTL) effects, genetic map and the haplotypes of all animals are known. RESULTS: In this study, we propose a new selection criterion, ExpBVSelGrOff, which describes the genetic level of selected grand-offspring that are produced by selected offspring of a particular mating. We compare our criterion with other published criteria in a stochastic simulation of an ongoing breeding program for 21 generations for proof of concept. ExpBVSelGrOff performed better than all other tested criteria, like the usefulness criterion or Index5 which have been proposed in the literature, without compromising short-term gains. After only five generations, when selection is strong (1%), selection based on ExpBVSelGrOff achieved 5.8% more commercial genetic gain and retained 25% more genetic variance without compromising inbreeding rate compared to selection based only on breeding values. CONCLUSIONS: Our proposed selection criterion offers a new tool to accelerate genetic progress for contemporary genomic breeding programs. It retains more genetic variance than previously published criteria that plan less far ahead. Considering future gametic Mendelian sampling variances in the selection process also seems promising for maintaining more genetic variance.


Models, Genetic , Quantitative Trait Loci , Selection, Genetic , Animals , Breeding/methods , Female , Male , Selective Breeding
16.
Genome Biol ; 25(1): 127, 2024 May 21.
Article En | MEDLINE | ID: mdl-38773638

BACKGROUND: Gene regulatory network (GRN) models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such gene regulatory ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the underlying GRN governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impede either scalability, explainability, or both. RESULTS: We developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that overcomes limitations of other methods by flexibly incorporating prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of GRN ODEs. We tested the accuracy of PHOENIX in a series of in silico experiments, benchmarking it against several currently used tools. We demonstrated PHOENIX's flexibility by modeling regulation of oscillating expression profiles obtained from synchronized yeast cells. We also assessed the scalability of PHOENIX by modeling genome-scale GRNs for breast cancer samples ordered in pseudotime and for B cells treated with Rituximab. CONCLUSIONS: PHOENIX uses a combination of user-defined prior knowledge and functional forms from systems biology to encode biological "first principles" as soft constraints on the GRN allowing us to predict subsequent gene expression patterns in a biologically explainable manner.


Gene Regulatory Networks , Humans , Neural Networks, Computer , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Models, Genetic
17.
Int J Mol Sci ; 25(9)2024 May 02.
Article En | MEDLINE | ID: mdl-38732192

RNA transcripts play a crucial role as witnesses of gene expression health. Identifying disruptive short sequences in RNA transcription and regulation is essential for potentially treating diseases. Let us delve into the mathematical intricacies of these sequences. We have previously devised a mathematical approach for defining a "healthy" sequence. This sequence is characterized by having at most four distinct nucleotides (denoted as nt≤4). It serves as the generator of a group denoted as fp. The desired properties of this sequence are as follows: fp should be close to a free group of rank nt-1, it must be aperiodic, and fp should not have isolated singularities within its SL2(C) character variety (specifically within the corresponding Groebner basis). Now, let us explore the concept of singularities. There are cubic surfaces associated with the character variety of a four-punctured sphere denoted as S24. When we encounter these singularities, we find ourselves dealing with some algebraic solutions of a dynamical second-order differential (and transcendental) equation known as the Painlevé VI Equation. In certain cases, S24 degenerates, in the sense that two punctures collapse, resulting in a "wild" dynamics governed by the Painlevé equations of an index lower than VI. In our paper, we provide examples of these fascinating mathematical structures within the context of miRNAs. Specifically, we find a clear relationship between decorated character varieties of Painlevé equations and the character variety calculated from the seed of oncomirs. These findings should find many applications including cancer research and the investigation of neurodegenative diseases.


Transcriptome , Transcriptome/genetics , Humans , Gene Expression Regulation , Algorithms , Models, Genetic , MicroRNAs/genetics
18.
BMC Genomics ; 25(1): 462, 2024 May 13.
Article En | MEDLINE | ID: mdl-38735952

BACKGROUND: Detecting epistatic interactions (EIs) involves the exploration of associations among single nucleotide polymorphisms (SNPs) and complex diseases, which is an important task in genome-wide association studies. The EI detection problem is dependent on epistasis models and corresponding optimization methods. Although various models and methods have been proposed to detect EIs, identifying EIs efficiently and accurately is still a challenge. RESULTS: Here, we propose a linear mixed statistical epistasis model (LMSE) and a spherical evolution approach with a feedback mechanism (named SEEI). The LMSE model expands the existing single epistasis models such as LR-Score, K2-Score, Mutual information, and Gini index. The SEEI includes an adaptive spherical search strategy and population updating strategy, which ensures that the algorithm is not easily trapped in local optima. We analyzed the performances of 8 random disease models, 12 disease models with marginal effects, 30 disease models without marginal effects, and 10 high-order disease models. The 60 simulated disease models and a real breast cancer dataset were used to evaluate eight algorithms (SEEI, EACO, EpiACO, FDHEIW, MP-HS-DHSI, NHSA-DHSC, SNPHarvester, CSE). Three evaluation criteria (pow1, pow2, pow3), a T-test, and a Friedman test were used to compare the performances of these algorithms. The results show that the SEEI algorithm (order 1, averages ranks = 13.125) outperformed the other algorithms in detecting EIs. CONCLUSIONS: Here, we propose an LMSE model and an evolutionary computing method (SEEI) to solve the optimization problem of the LMSE model. The proposed method performed better than the other seven algorithms tested in its ability to identify EIs in genome-wide association datasets. We identified new SNP-SNP combinations in the real breast cancer dataset and verified the results. Our findings provide new insights for the diagnosis and treatment of breast cancer. AVAILABILITY AND IMPLEMENTATION: https://github.com/scutdy/SSO/blob/master/SEEI.zip .


Algorithms , Breast Neoplasms , Epistasis, Genetic , Models, Genetic , Polymorphism, Single Nucleotide , Humans , Breast Neoplasms/genetics , Genome-Wide Association Study
19.
Theor Appl Genet ; 137(6): 138, 2024 May 21.
Article En | MEDLINE | ID: mdl-38771334

KEY MESSAGE: Residual neural network genomic selection is the first GS algorithm to reach 35 layers, and its prediction accuracy surpasses previous algorithms. With the decrease in DNA sequencing costs and the development of deep learning, phenotype prediction accuracy by genomic selection (GS) continues to improve. Residual networks, a widely validated deep learning technique, are introduced to deep learning for GS. Since each locus has a different weighted impact on the phenotype, strided convolutions are more suitable for GS problems than pooling layers. Through the above technological innovations, we propose a GS deep learning algorithm, residual neural network for genomic selection (ResGS). ResGS is the first neural network to reach 35 layers in GS. In 15 cases from four public data, the prediction accuracy of ResGS is higher than that of ridge-regression best linear unbiased prediction, support vector regression, random forest, gradient boosting regressor, and deep neural network genomic prediction in most cases. ResGS performs well in dealing with gene-environment interaction. Phenotypes from other environments are imported into ResGS along with genetic data. The prediction results are much better than just providing genetic data as input, which demonstrates the effectiveness of GS multi-modal learning. Standard deviation is recommended as an auxiliary GS evaluation metric, which could improve the distribution of predicted results. Deep learning for GS, such as ResGS, is becoming more accurate in phenotype prediction.


Algorithms , Genomics , Neural Networks, Computer , Phenotype , Genomics/methods , Models, Genetic , Deep Learning , Gene-Environment Interaction , Selection, Genetic
20.
Bull Math Biol ; 86(6): 69, 2024 May 07.
Article En | MEDLINE | ID: mdl-38714590

We unify evolutionary dynamics on graphs in strategic uncertainty through a decaying Bayesian update. Our analysis focuses on the Price theorem of selection, which governs replicator(-mutator) dynamics, based on a stratified interaction mechanism and a composite strategy update rule. Our findings suggest that the replication of a certain mutation in a strategy, leading to a shift from competition to cooperation in a well-mixed population, is equivalent to the replication of a strategy in a Bayesian-structured population without any mutation. Likewise, the replication of a strategy in a Bayesian-structured population with a certain mutation, resulting in a move from competition to cooperation, is equivalent to the replication of a strategy in a well-mixed population without any mutation. This equivalence holds when the transition rate from competition to cooperation is equal to the relative strength of selection acting on either competition or cooperation in relation to the selection differential between cooperators and competitors. Our research allows us to identify situations where cooperation is more likely, irrespective of the specific payoff levels. This approach provides new perspectives into the intended purpose of Price's equation, which was initially not designed for this type of analysis.


Bayes Theorem , Biological Evolution , Game Theory , Mathematical Concepts , Models, Genetic , Mutation , Selection, Genetic , Computer Simulation , Cooperative Behavior , Competitive Behavior , Population Dynamics/statistics & numerical data , Models, Biological , Humans
...