Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76.955
Filtrar
Mais filtros








Intervalo de ano de publicação
1.
Genome Biol ; 25(1): 127, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38773638

RESUMO

BACKGROUND: Gene regulatory network (GRN) models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such gene regulatory ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the underlying GRN governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impede either scalability, explainability, or both. RESULTS: We developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that overcomes limitations of other methods by flexibly incorporating prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of GRN ODEs. We tested the accuracy of PHOENIX in a series of in silico experiments, benchmarking it against several currently used tools. We demonstrated PHOENIX's flexibility by modeling regulation of oscillating expression profiles obtained from synchronized yeast cells. We also assessed the scalability of PHOENIX by modeling genome-scale GRNs for breast cancer samples ordered in pseudotime and for B cells treated with Rituximab. CONCLUSIONS: PHOENIX uses a combination of user-defined prior knowledge and functional forms from systems biology to encode biological "first principles" as soft constraints on the GRN allowing us to predict subsequent gene expression patterns in a biologically explainable manner.


Assuntos
Redes Reguladoras de Genes , Humanos , Redes Neurais de Computação , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Modelos Genéticos
2.
Genet Sel Evol ; 56(1): 41, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38773363

RESUMO

BACKGROUND: Breeding programs are judged by the genetic level of animals that are used to disseminate genetic progress. These animals are typically the best ones of the population. To maximise the genetic level of very good animals in the next generation, parents that are more likely to produce top performing offspring need to be selected. The ability of individuals to produce high-performing progeny differs because of differences in their breeding values and gametic variances. Differences in gametic variances among individuals are caused by differences in heterozygosity and linkage. The use of the gametic Mendelian sampling variance has been proposed before, for use in the usefulness criterion or Index5, and in this work, we extend existing approaches by not only considering the gametic Mendelian sampling variance of individuals, but also of their potential offspring. Thus, the criteria developed in this study plan one additional generation ahead. For simplicity, we assumed that the true quantitative trait loci (QTL) effects, genetic map and the haplotypes of all animals are known. RESULTS: In this study, we propose a new selection criterion, ExpBVSelGrOff, which describes the genetic level of selected grand-offspring that are produced by selected offspring of a particular mating. We compare our criterion with other published criteria in a stochastic simulation of an ongoing breeding program for 21 generations for proof of concept. ExpBVSelGrOff performed better than all other tested criteria, like the usefulness criterion or Index5 which have been proposed in the literature, without compromising short-term gains. After only five generations, when selection is strong (1%), selection based on ExpBVSelGrOff achieved 5.8% more commercial genetic gain and retained 25% more genetic variance without compromising inbreeding rate compared to selection based only on breeding values. CONCLUSIONS: Our proposed selection criterion offers a new tool to accelerate genetic progress for contemporary genomic breeding programs. It retains more genetic variance than previously published criteria that plan less far ahead. Considering future gametic Mendelian sampling variances in the selection process also seems promising for maintaining more genetic variance.


Assuntos
Modelos Genéticos , Locos de Características Quantitativas , Seleção Genética , Animais , Cruzamento/métodos , Feminino , Masculino , Seleção Artificial
3.
PLoS Comput Biol ; 20(5): e1011408, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38768228

RESUMO

An important application of CRISPR interference (CRISPRi) technology is for identifying chemical-genetic interactions (CGIs). Discovery of genes that interact with exposure to antibiotics can yield insights to drug targets and mechanisms of action or resistance. The objective is to identify CRISPRi mutants whose relative abundance is suppressed (or enriched) in the presence of a drug when the target protein is depleted, reflecting synergistic behavior. Different sgRNAs for a given target can induce a wide range of protein depletion and differential effects on growth rate. The effect of sgRNA strength can be partially predicted based on sequence features. However, the actual growth phenotype depends on the sensitivity of cells to depletion of the target protein. For essential genes, sgRNA efficiency can be empirically measured by quantifying effects on growth rate. We observe that the most efficient sgRNAs are not always optimal for detecting synergies with drugs. sgRNA efficiency interacts in a non-linear way with drug sensitivity, producing an effect where the concentration-dependence is maximized for sgRNAs of intermediate strength (and less so for sgRNAs that induce too much or too little target depletion). To capture this interaction, we propose a novel statistical method called CRISPRi-DR (for Dose-Response model) that incorporates both sgRNA efficiencies and drug concentrations in a modified dose-response equation. We use CRISPRi-DR to re-analyze data from a recent CGI experiment in Mycobacterium tuberculosis to identify genes that interact with antibiotics. This approach can be generalized to non-CGI datasets, which we show via an CRISPRi dataset for E. coli growth on different carbon sources. The performance is competitive with the best of several related analytical methods. However, for noisier datasets, some of these methods generate far more significant interactions, likely including many false positives, whereas CRISPRi-DR maintains higher precision, which we observed in both empirical and simulated data.


Assuntos
Antibacterianos , Antibacterianos/farmacologia , Sistemas CRISPR-Cas/genética , Escherichia coli/genética , Escherichia coli/efeitos dos fármacos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Biologia Computacional/métodos , Relação Dose-Resposta a Droga , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/efeitos dos fármacos , RNA Guia de Sistemas CRISPR-Cas/genética , Modelos Estatísticos , Modelos Genéticos
4.
Yi Chuan ; 46(5): 421-430, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38763776

RESUMO

Inner Mongolia cashmere goat is an excellent livestock breed formed through long-term natural selection and artificial breeding, and is currently a world-class dual-purpose breed producing cashmere and meat. Multi trait animal model is considered to significantly improve the accuracy of genetic evaluation in livestock and poultry, enabling indirect selection between traits. In this study, the pedigree, genotype, environment, and phenotypic records of early growth traits of Inner Mongolia cashmere goats were used to build multi trait animal model., Then three methods including ABLUP, GBLUP, and ssGBLUP wereused to estimate the genetic parameters and genomic breeding values of early growth traits (birth weight, weaning weight, average daily weight gain before weaning, and yearling weight). The accuracy and reliability of genomic estimated breeding value are further evaluated using the five fold cross validation method. The results showed that the heritability of birth weight estimated by three methods was 0.13-0.15, the heritability of weaning weight was 0.13-0.20, heritability of daily weight gain before weaning was 0.11-0.14, and the heritability of yearling weight was 0.09-0.14, all of which belonged to moderate to low heritability. There is a strong positive genetic correlation between weaning weight and daily weight gain before weaning, daily weight gain before weaning and yearling weight, with correlation coefficients of 0.77-0.79 and 0.56-0.67, respectively. The same pattern was found in phenotype correlation among traits. The accuracy of the estimated breeding values by ABLUP, GBLUP, and ssGBLUP methods for birth weight is 0.5047, 0.6694, and 0.7156, respectively; the weaning weight is 0.6207, 0.6456, and 0.7254, respectively; the daily weight gain before weaning was 0.6110, 0.6855, and 0.7357 respectively; and the yearling weight was 0.6209, 0.7155, and 0.7756, respectively. In summary, the early growth traits of Inner Mongolia cashmere goats belong to moderate to low heritability, and the speed of genetic improvement is relatively slow. The genetic improvement of other growth traits can be achieved through the selection of weaning weight. The ssGBLUP method has the highest accuracy and reliability in estimating genomic breeding value of early growth traits in Inner Mongolia cashmere goats, and is significantly higher than that from ABLUP method, indicating that it is the best method for genomic breeding of early growth weight in Inner Mongolia cashmere goats.


Assuntos
Cruzamento , Cabras , Animais , Cabras/genética , Cabras/crescimento & desenvolvimento , Fenótipo , Genômica/métodos , Feminino , Masculino , Peso ao Nascer/genética , Modelos Genéticos
5.
Chaos ; 34(5)2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38717409

RESUMO

In the evolution of species, the karyotype changes with a timescale of tens to hundreds of thousand years. In the development of cancer, the karyotype often is modified in cancerous cells over the lifetime of an individual. Characterizing these changes and understanding the mechanisms leading to them has been of interest in a broad range of disciplines including evolution, cytogenetics, and cancer genetics. A central issue relates to the relative roles of random vs deterministic mechanisms in shaping the changes. Although it is possible that all changes result from random events followed by selection, many results point to other non-random factors that play a role in karyotype evolution. In cancer, chromosomal instability leads to characteristic changes in the karyotype, in which different individuals with a specific type of cancer display similar changes in karyotype structure over time. Statistical analyses of chromosome lengths in different species indicate that the length distribution of chromosomes is not consistent with models in which the lengths of chromosomes are random or evolve solely by simple random processes. A better understanding of the mechanisms underlying karyotype evolution should enable the development of quantitative theoretical models that combine the random and deterministic processes that can be compared to experimental determinations of the karyotype in diverse settings.


Assuntos
Cariótipo , Humanos , Animais , Evolução Molecular , Modelos Genéticos , Neoplasias/genética , Evolução Biológica
6.
Sci Adv ; 10(19): eadn1547, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38718117

RESUMO

Pre-mRNA splicing is a fundamental step in gene expression, conserved across eukaryotes, in which the spliceosome recognizes motifs at the 3' and 5' splice sites (SSs), excises introns, and ligates exons. SS recognition and pairing is often influenced by protein splicing factors (SFs) that bind to splicing regulatory elements (SREs). Here, we describe SMsplice, a fully interpretable model of pre-mRNA splicing that combines models of core SS motifs, SREs, and exonic and intronic length preferences. We learn models that predict SS locations with 83 to 86% accuracy in fish, insects, and plants and about 70% in mammals. Learned SRE motifs include both known SF binding motifs and unfamiliar motifs, and both motif classes are supported by genetic analyses. Our comparisons across species highlight similarities between non-mammals, increased reliance on intronic SREs in plant splicing, and a greater reliance on SREs in mammalian splicing.


Assuntos
Éxons , Íntrons , Precursores de RNA , Sítios de Splice de RNA , Splicing de RNA , Precursores de RNA/genética , Precursores de RNA/metabolismo , Animais , Íntrons/genética , Éxons/genética , Genes de Plantas , Modelos Genéticos , Spliceossomos/metabolismo , Spliceossomos/genética , Plantas/genética , Humanos , Fatores de Processamento de RNA/genética , Fatores de Processamento de RNA/metabolismo
7.
Int J Mol Sci ; 25(9)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38732192

RESUMO

RNA transcripts play a crucial role as witnesses of gene expression health. Identifying disruptive short sequences in RNA transcription and regulation is essential for potentially treating diseases. Let us delve into the mathematical intricacies of these sequences. We have previously devised a mathematical approach for defining a "healthy" sequence. This sequence is characterized by having at most four distinct nucleotides (denoted as nt≤4). It serves as the generator of a group denoted as fp. The desired properties of this sequence are as follows: fp should be close to a free group of rank nt-1, it must be aperiodic, and fp should not have isolated singularities within its SL2(C) character variety (specifically within the corresponding Groebner basis). Now, let us explore the concept of singularities. There are cubic surfaces associated with the character variety of a four-punctured sphere denoted as S24. When we encounter these singularities, we find ourselves dealing with some algebraic solutions of a dynamical second-order differential (and transcendental) equation known as the Painlevé VI Equation. In certain cases, S24 degenerates, in the sense that two punctures collapse, resulting in a "wild" dynamics governed by the Painlevé equations of an index lower than VI. In our paper, we provide examples of these fascinating mathematical structures within the context of miRNAs. Specifically, we find a clear relationship between decorated character varieties of Painlevé equations and the character variety calculated from the seed of oncomirs. These findings should find many applications including cancer research and the investigation of neurodegenative diseases.


Assuntos
Transcriptoma , Transcriptoma/genética , Humanos , Regulação da Expressão Gênica , Algoritmos , Modelos Genéticos , MicroRNAs/genética
8.
BMC Genomics ; 25(1): 462, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38735952

RESUMO

BACKGROUND: Detecting epistatic interactions (EIs) involves the exploration of associations among single nucleotide polymorphisms (SNPs) and complex diseases, which is an important task in genome-wide association studies. The EI detection problem is dependent on epistasis models and corresponding optimization methods. Although various models and methods have been proposed to detect EIs, identifying EIs efficiently and accurately is still a challenge. RESULTS: Here, we propose a linear mixed statistical epistasis model (LMSE) and a spherical evolution approach with a feedback mechanism (named SEEI). The LMSE model expands the existing single epistasis models such as LR-Score, K2-Score, Mutual information, and Gini index. The SEEI includes an adaptive spherical search strategy and population updating strategy, which ensures that the algorithm is not easily trapped in local optima. We analyzed the performances of 8 random disease models, 12 disease models with marginal effects, 30 disease models without marginal effects, and 10 high-order disease models. The 60 simulated disease models and a real breast cancer dataset were used to evaluate eight algorithms (SEEI, EACO, EpiACO, FDHEIW, MP-HS-DHSI, NHSA-DHSC, SNPHarvester, CSE). Three evaluation criteria (pow1, pow2, pow3), a T-test, and a Friedman test were used to compare the performances of these algorithms. The results show that the SEEI algorithm (order 1, averages ranks = 13.125) outperformed the other algorithms in detecting EIs. CONCLUSIONS: Here, we propose an LMSE model and an evolutionary computing method (SEEI) to solve the optimization problem of the LMSE model. The proposed method performed better than the other seven algorithms tested in its ability to identify EIs in genome-wide association datasets. We identified new SNP-SNP combinations in the real breast cancer dataset and verified the results. Our findings provide new insights for the diagnosis and treatment of breast cancer. AVAILABILITY AND IMPLEMENTATION: https://github.com/scutdy/SSO/blob/master/SEEI.zip .


Assuntos
Algoritmos , Neoplasias da Mama , Epistasia Genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Humanos , Neoplasias da Mama/genética , Estudo de Associação Genômica Ampla
9.
Genet Sel Evol ; 56(1): 33, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38698321

RESUMO

BACKGROUND: Recursive models are a category of structural equation models that propose a causal relationship between traits. These models are more parameterized than multiple trait models, and they require imposing restrictions on the parameter space to ensure statistical identification. Nevertheless, in certain situations, the likelihood of recursive models and multiple trait models are equivalent. Consequently, the estimates of variance components derived from the multiple trait mixed model can be converted into estimates under several recursive models through LDL' or block-LDL' transformations. RESULTS: The procedure was employed on a dataset comprising five traits (birth weight-BW, weight at 90 days-W90, weight at 210 days-W210, cold carcass weight-CCW and conformation-CON) from the Pirenaica beef cattle breed. These phenotypic records were unequally distributed among 149,029 individuals and had a high percentage of missing data. The pedigree used consisted of 343,753 individuals. A Bayesian approach involving a multiple-trait mixed model was applied using a Gibbs sampler. The variance components obtained at each iteration of the Gibbs sampler were subsequently used to estimate the variance components within three distinct recursive models. CONCLUSIONS: The LDL' or block-LDL' transformations applied to the variance component estimates achieved from a multiple trait mixed model enabled inference across multiple sets of recursive models, with the sole prerequisite of being likelihood equivalent. Furthermore, the aforementioned transformations simplify the handling of missing data when conducting inference within the realm of recursive models.


Assuntos
Modelos Genéticos , Animais , Bovinos/genética , Teorema de Bayes , Fenótipo , Cruzamento/métodos , Cruzamento/normas , Peso ao Nascer/genética , Linhagem , Característica Quantitativa Herdável
10.
Genet Sel Evol ; 56(1): 35, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38698347

RESUMO

BACKGROUND: The theory of "metafounders" proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders Γ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). METHODS: We derive likelihood methods to estimate Γ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of Γ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to Γ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of Γ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of Γ using estimates of the rate of increase of inbreeding ( Δ F ), resulting in an expanded Γ and in a pseudo-EM+ Δ F algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. RESULTS: For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ Δ F ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ Δ F ) approach yielded more accurate and unbiased estimates. CONCLUSIONS: We derived ML, pseudo-EM and pseudo-EM+ Δ F methods to estimate Γ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.


Assuntos
Cruzamento , Modelos Genéticos , Linhagem , Animais , Funções Verossimilhança , Cruzamento/métodos , Algoritmos , Ovinos/genética , Genômica/métodos , Simulação por Computador , Masculino , Feminino , Genótipo
11.
Genet Sel Evol ; 56(1): 34, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38698373

RESUMO

Metafounders are a useful concept to characterize relationships within and across populations, and to help genetic evaluations because they help modelling the means and variances of unknown base population animals. Current definitions of metafounder relationships are sensitive to the choice of reference alleles and have not been compared to their counterparts in population genetics-namely, heterozygosities, FST coefficients, and genetic distances. We redefine the relationships across populations with an arbitrary base of a maximum heterozygosity population in Hardy-Weinberg equilibrium. Then, the relationship between or within populations is a cross-product of the form Γ b , b ' = 2 n 2 p b - 1 2 p b ' - 1 ' with p being vectors of allele frequencies at n markers in populations b and b ' . This is simply the genomic relationship of two pseudo-individuals whose genotypes are equal to twice the allele frequencies. We also show that this coding is invariant to the choice of reference alleles. In addition, standard population genetics metrics (inbreeding coefficients of various forms; FST differentiation coefficients; segregation variance; and Nei's genetic distance) can be obtained from elements of matrix Γ .


Assuntos
Frequência do Gene , Genética Populacional , Modelos Genéticos , Animais , Genética Populacional/métodos , Heterozigoto , Alelos , Genômica/métodos , Genótipo , Genoma
12.
Elife ; 122024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38717010

RESUMO

Interacting molecules create regulatory architectures that can persist despite turnover of molecules. Although epigenetic changes occur within the context of such architectures, there is limited understanding of how they can influence the heritability of changes. Here, I develop criteria for the heritability of regulatory architectures and use quantitative simulations of interacting regulators parsed as entities, their sensors, and the sensed properties to analyze how architectures influence heritable epigenetic changes. Information contained in regulatory architectures grows rapidly with the number of interacting molecules and its transmission requires positive feedback loops. While these architectures can recover after many epigenetic perturbations, some resulting changes can become permanently heritable. Architectures that are otherwise unstable can become heritable through periodic interactions with external regulators, which suggests that mortal somatic lineages with cells that reproducibly interact with the immortal germ lineage could make a wider variety of architectures heritable. Differential inhibition of the positive feedback loops that transmit regulatory architectures across generations can explain the gene-specific differences in heritable RNA silencing observed in the nematode Caenorhabditis elegans. More broadly, these results provide a foundation for analyzing the inheritance of epigenetic changes within the context of the regulatory architectures implemented using diverse molecules in different living systems.


Assuntos
Caenorhabditis elegans , Epigênese Genética , Caenorhabditis elegans/genética , Animais , Modelos Genéticos , Redes Reguladoras de Genes , Padrões de Herança
13.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38753402

RESUMO

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.


Assuntos
Algoritmos , Mutação , Neoplasias , Humanos , Neoplasias/genética , Modelos Genéticos , Simulação por Computador , Modelos Estatísticos
14.
Bull Math Biol ; 86(6): 69, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38714590

RESUMO

We unify evolutionary dynamics on graphs in strategic uncertainty through a decaying Bayesian update. Our analysis focuses on the Price theorem of selection, which governs replicator(-mutator) dynamics, based on a stratified interaction mechanism and a composite strategy update rule. Our findings suggest that the replication of a certain mutation in a strategy, leading to a shift from competition to cooperation in a well-mixed population, is equivalent to the replication of a strategy in a Bayesian-structured population without any mutation. Likewise, the replication of a strategy in a Bayesian-structured population with a certain mutation, resulting in a move from competition to cooperation, is equivalent to the replication of a strategy in a well-mixed population without any mutation. This equivalence holds when the transition rate from competition to cooperation is equal to the relative strength of selection acting on either competition or cooperation in relation to the selection differential between cooperators and competitors. Our research allows us to identify situations where cooperation is more likely, irrespective of the specific payoff levels. This approach provides new perspectives into the intended purpose of Price's equation, which was initially not designed for this type of analysis.


Assuntos
Teorema de Bayes , Evolução Biológica , Teoria dos Jogos , Conceitos Matemáticos , Modelos Genéticos , Mutação , Seleção Genética , Simulação por Computador , Comportamento Cooperativo , Comportamento Competitivo , Dinâmica Populacional/estatística & dados numéricos , Modelos Biológicos , Humanos
15.
Bull Math Biol ; 86(6): 70, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38717656

RESUMO

Practical limitations of quality and quantity of data can limit the precision of parameter identification in mathematical models. Model-based experimental design approaches have been developed to minimise parameter uncertainty, but the majority of these approaches have relied on first-order approximations of model sensitivity at a local point in parameter space. Practical identifiability approaches such as profile-likelihood have shown potential for quantifying parameter uncertainty beyond linear approximations. This research presents a genetic algorithm approach to optimise sample timing across various parameterisations of a demonstrative PK-PD model with the goal of aiding experimental design. The optimisation relies on a chosen metric of parameter uncertainty that is based on the profile-likelihood method. Additionally, the approach considers cases where multiple parameter scenarios may require simultaneous optimisation. The genetic algorithm approach was able to locate near-optimal sampling protocols for a wide range of sample number (n = 3-20), and it reduced the parameter variance metric by 33-37% on average. The profile-likelihood metric also correlated well with an existing Monte Carlo-based metric (with a worst-case r > 0.89), while reducing computational cost by an order of magnitude. The combination of the new profile-likelihood metric and the genetic algorithm demonstrate the feasibility of considering the nonlinear nature of models in optimal experimental design at a reasonable computational cost. The outputs of such a process could allow for experimenters to either improve parameter certainty given a fixed number of samples, or reduce sample quantity while retaining the same level of parameter certainty.


Assuntos
Algoritmos , Simulação por Computador , Conceitos Matemáticos , Modelos Biológicos , Método de Monte Carlo , Funções Verossimilhança , Humanos , Relação Dose-Resposta a Droga , Projetos de Pesquisa/estatística & dados numéricos , Modelos Genéticos , Incerteza
16.
J Comput Biol ; 31(5): 445-457, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38752891

RESUMO

ABSTRACT An alternative transcription start site (ATSS) is a major driving force for increasing the complexity of transcripts in human tissues. As a transcriptional regulatory mechanism, ATSS has biological significance. Many studies have confirmed that ATSS plays an important role in diseases and cell development and differentiation. However, exploration of its dynamic mechanisms remains insufficient. Identifying ATSS change points during cell differentiation is critical for elucidating potential dynamic mechanisms. For relative ATSS usage as percentage data, the existing methods lack sensitivity to detect the change point for ATSS longitudinal data. In addition, some methods have strict requirements for data distribution and cannot be applied to deal with this problem. In this study, the Bayesian change point detection model was first constructed using reparameterization techniques for two parameters of a beta distribution for the percentage data type, and the posterior distributions of parameters and change points were obtained using Markov Chain Monte Carlo (MCMC) sampling. With comprehensive simulation studies, the performance of the Bayesian change point detection model is found to be consistently powerful and robust across most scenarios with different sample sizes and beta distributions. Second, differential ATSS events in the real data, whose change points were identified using our method, were clustered according to their change points. Last, for each change point, pathway and transcription factor motif analyses were performed on its differential ATSS events. The results of our analyses demonstrated the effectiveness of the Bayesian change point detection model and provided biological insights into cell differentiation.


Assuntos
Teorema de Bayes , Diferenciação Celular , Sítio de Iniciação de Transcrição , Diferenciação Celular/genética , Humanos , Cadeias de Markov , Método de Monte Carlo , Modelos Genéticos , Algoritmos , Simulação por Computador
17.
Theor Appl Genet ; 137(6): 138, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38771334

RESUMO

KEY MESSAGE: Residual neural network genomic selection is the first GS algorithm to reach 35 layers, and its prediction accuracy surpasses previous algorithms. With the decrease in DNA sequencing costs and the development of deep learning, phenotype prediction accuracy by genomic selection (GS) continues to improve. Residual networks, a widely validated deep learning technique, are introduced to deep learning for GS. Since each locus has a different weighted impact on the phenotype, strided convolutions are more suitable for GS problems than pooling layers. Through the above technological innovations, we propose a GS deep learning algorithm, residual neural network for genomic selection (ResGS). ResGS is the first neural network to reach 35 layers in GS. In 15 cases from four public data, the prediction accuracy of ResGS is higher than that of ridge-regression best linear unbiased prediction, support vector regression, random forest, gradient boosting regressor, and deep neural network genomic prediction in most cases. ResGS performs well in dealing with gene-environment interaction. Phenotypes from other environments are imported into ResGS along with genetic data. The prediction results are much better than just providing genetic data as input, which demonstrates the effectiveness of GS multi-modal learning. Standard deviation is recommended as an auxiliary GS evaluation metric, which could improve the distribution of predicted results. Deep learning for GS, such as ResGS, is becoming more accurate in phenotype prediction.


Assuntos
Algoritmos , Genômica , Redes Neurais de Computação , Fenótipo , Genômica/métodos , Modelos Genéticos , Aprendizado Profundo , Interação Gene-Ambiente , Seleção Genética
18.
Sci Rep ; 14(1): 11314, 2024 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-38760507

RESUMO

This paper focuses on the maximum speed at which biological evolution can occur. I derive inequalities that limit the rate of evolutionary processes driven by natural selection, mutations, or genetic drift. These rate limits link the variability in a population to evolutionary rates. In particular, high variances in the fitness of a population and of a quantitative trait allow for fast changes in the trait's average. In contrast, low variability makes a trait less susceptible to random changes due to genetic drift. The results in this article generalize Fisher's fundamental theorem of natural selection to dynamics that allow for mutations and genetic drift, via trade-off relations that constrain the evolutionary rates of arbitrary traits. The rate limits can be used to probe questions in various evolutionary biology and ecology settings. They apply, for instance, to trait dynamics within or across species or to the evolution of bacteria strains. They apply to any quantitative trait, e.g., from species' weights to the lengths of DNA strands.


Assuntos
Evolução Biológica , Deriva Genética , Seleção Genética , Mutação , Modelos Genéticos , Fenótipo , Característica Quantitativa Herdável , Evolução Molecular
19.
Phys Rev E ; 109(4-1): 044407, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38755817

RESUMO

All the cells of a multicellular organism are the product of cell divisions that trace out a single binary tree, the so-called cell lineage tree. Because cell divisions are accompanied by replication errors, the shape of the cell lineage tree is a key determinant of how somatic evolution, which can potentially lead to cancer, proceeds. Carcinogenesis requires the accumulation of a certain number of driver mutations. By mapping the accumulation of mutations into a graph theoretical problem, we present an exact numerical method to calculate the probability of collecting a given number of mutations and show that for low mutation rates it can be approximated with a simple analytical formula, which depends only on the distribution of the lineage lengths, and is dominated by the longest lineages. Our results are crucial in understanding how natural selection can shape the cell lineage trees of multicellular organisms and curtail somatic evolution.


Assuntos
Linhagem da Célula , Modelos Genéticos , Acúmulo de Mutações , Mutação
20.
BMC Med Genomics ; 17(1): 132, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38755654

RESUMO

BACKGROUND: Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the R 2 is a commonly used measure to evaluate prediction accuracy. While the R 2 is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. METHODS: Based on large-scale genotype data from the UK Biobank, we compare three definitions of the R 2 on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. RESULTS: Our analysis shows that the choice of the R 2  definition can lead to considerably different results on test data, making the comparison of R 2  values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the R 2 based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis - whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of R 2 can provide valuable complementary information. CONCLUSIONS: Awareness of the different definitions of the R 2 on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting R 2 values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.


Assuntos
Modelos Genéticos , Herança Multifatorial , Humanos , Fenótipo , Predisposição Genética para Doença
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA