Pesquisa | Secretaria de Estado da Saúde

1.

Maximum likelihood estimation and natural pairwise estimating equations are identical for three sequences and a symmetric 2-state substitution model.

Hobolth, Asger; Wiuf, Carsten.

Theor Popul Biol ; 156: 1-4, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38184209

RESUMO

Consider the problem of estimating the branch lengths in a symmetric 2-state substitution model with a known topology and a general, clock-like or star-shaped tree with three leaves. We show that the maximum likelihood estimates are analytically tractable and can be obtained from pairwise sequence comparisons. Furthermore, we demonstrate that this property does not generalize to larger state spaces, more complex models or larger trees. Our arguments are based on an enumeration of the free parameters of the model and the dimension of the minimal sufficient data vector. Our interest in this problem arose from discussions with our former colleague Freddy Bugge Christiansen.

Assuntos

Evolução Molecular , Modelos Genéticos , Funções Verossimilhança , Filogenia

2.

Coalescent models derived from birth-death processes.

Crespo, Fausto F; Posada, David; Wiuf, Carsten.

Theor Popul Biol ; 142: 1-11, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34563554

RESUMO

A coalescent model of a sample of size n is derived from a birth-death process that originates at a random time in the past from a single founder individual. Over time, the descendants of the founder evolve into a population of large (infinite) size from which a sample of size n is taken. The parameters and time of the birth-death process are scaled in N0, the size of the present-day population, while letting N0â∞, similarly to how the standard Kingman coalescent process arises from the Wright-Fisher model. The model is named the Limit Birth-Death (LBD) coalescent model. Simulations from the LBD coalescent model with sample size n are computationally slow compared to standard coalescent models. Therefore, we suggest different approximations to the LBD coalescent model assuming the population size is a deterministic function of time rather than a stochastic process. Furthermore, we introduce a hybrid LBD coalescent model, that combines the exactness of the LBD coalescent model model with the speed of the approximations.

Assuntos

Genética Populacional , Modelos Genéticos , Densidade Demográfica , Tamanho da Amostra , Processos Estocásticos

3.

General theory for stochastic admixture graphs and F-statistics.

Soraggi, Samuele; Wiuf, Carsten.

Theor Popul Biol ; 125: 56-66, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30562538

RESUMO

We provide a general mathematical framework based on the theory of graphical models to study admixture graphs. Admixture graphs are used to describe the ancestral relationships between past and present populations, allowing for population merges and migration events, by means of gene flow. We give various mathematical properties of admixture graphs with particular focus on properties of the so-called F-statistics. Also the Wright-Fisher model is studied and a general expression for the loss of heterozygosity is derived.

Assuntos

Deriva Genética , Genética Populacional , Processos Estocásticos , Genética Populacional/estatística & dados numéricos , Heterozigoto , Humanos , Modelos Teóricos

4.

Some properties of the conditioned reconstructed process with Bernoulli sampling.

Wiuf, Carsten.

Theor Popul Biol ; 122: 36-45, 2018 07.

Artigo em Inglês | MEDLINE | ID: mdl-29452133

RESUMO

In many areas of genetics it is of relevance to consider a population of individuals that is founded by a single individual in the past. One model for such a scenario is the conditioned reconstructed process with Bernoulli sampling that describes the evolution of a population of individuals that originates from a single individual. Several aspects of this reconstructed process are studied, in particular the Markov structure of the process. It is shown that at any given time in the past, the conditioned reconstructed process behaves as the original conditioned reconstructed process after a suitable time-dependent change of the sampling probability. Additionally, it is discussed how mutations accumulate in a sample of particles. It is shown that random sampling of particles at the present time has the effect of making the mutation rate look time-dependent. Conditions are given under which this sampling effect is negligible. A possible extension of the reconstructed process that allows for multiple founding particles is discussed.

Assuntos

Distribuição Binomial , Genética Populacional , Modelos Genéticos , Probabilidade , Algoritmos , Coeficiente de Natalidade , Genealogia e Heráldica , Humanos , Cadeias de Markov , Mortalidade , Mutação

5.

Identifying parameter regions for multistationarity.

Conradi, Carsten; Feliu, Elisenda; Mincheva, Maya; Wiuf, Carsten.

PLoS Comput Biol ; 13(10): e1005751, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-28972969

RESUMO

Mathematical modelling has become an established tool for studying the dynamics of biological systems. Current applications range from building models that reproduce quantitative data to identifying systems with predefined qualitative features, such as switching behaviour, bistability or oscillations. Mathematically, the latter question amounts to identifying parameter values associated with a given qualitative feature. We introduce a procedure to partition the parameter space of a parameterized system of ordinary differential equations into regions for which the system has a unique or multiple equilibria. The procedure is based on the computation of the Brouwer degree, and it creates a multivariate polynomial with parameter depending coefficients. The signs of the coefficients determine parameter regions with and without multistationarity. A particular strength of the procedure is the avoidance of numerical analysis and parameter sampling. The procedure consists of a number of steps. Each of these steps might be addressed algorithmically using various computer programs and available software, or manually. We demonstrate our procedure on several models of gene transcription and cell signalling, and show that in many cases we obtain a complete partitioning of the parameter space with respect to multistationarity.

Assuntos

Algoritmos , Interpretação Estatística de Dados , Modelos Biológicos , Modelos Estatísticos , Análise Multivariada , Simulação por Computador

6.

Estimation of the covariance structure from SNP allele frequencies.

van Waaij, Jan; Li, Zilong; Wiuf, Carsten.

Stat Appl Genet Mol Biol ; 21(1)2022 05 26.

Artigo em Inglês | MEDLINE | ID: mdl-35634906

Assuntos

Frequência do Gene , Análise de Sequência de DNA

7.

A bimodular mechanism of calcium control in eukaryotes.

Tidow, Henning; Poulsen, Lisbeth R; Andreeva, Antonina; Knudsen, Michael; Hein, Kim L; Wiuf, Carsten; Palmgren, Michael G; Nissen, Poul.

Nature ; 491(7424): 468-72, 2012 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-23086147

RESUMO

Calcium ions (Ca(2+)) have an important role as secondary messengers in numerous signal transduction processes, and cells invest much energy in controlling and maintaining a steep gradient between intracellular (â¼0.1-micromolar) and extracellular (â¼2-millimolar) Ca(2+) concentrations. Calmodulin-stimulated calcium pumps, which include the plasma-membrane Ca(2+)-ATPases (PMCAs), are key regulators of intracellular Ca(2+) in eukaryotes. They contain a unique amino- or carboxy-terminal regulatory domain responsible for autoinhibition, and binding of calcium-loaded calmodulin to this domain releases autoinhibition and activates the pump. However, the structural basis for the activation mechanism is unknown and a key remaining question is how calmodulin-mediated PMCA regulation can cover both basal Ca(2+) levels in the nanomolar range as well as micromolar-range Ca(2+) transients generated by cell stimulation. Here we present an integrated study combining the determination of the high-resolution crystal structure of a PMCA regulatory-domain/calmodulin complex with in vivo characterization and biochemical, biophysical and bioinformatics data that provide mechanistic insights into a two-step PMCA activation mechanism mediated by calcium-loaded calmodulin. The structure shows the entire PMCA regulatory domain and reveals an unexpected 2:1 stoichiometry with two calcium-loaded calmodulin molecules binding to different sites on a long helix. A multifaceted characterization of the role of both sites leads to a general structural model for calmodulin-mediated regulation of PMCAs that allows stringent, highly responsive control of intracellular calcium in eukaryotes, making it possible to maintain a stable, basal level at a threshold Ca(2+) concentration, where steep activation occurs.

Assuntos

Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , ATPases Transportadoras de Cálcio/química , ATPases Transportadoras de Cálcio/metabolismo , Cálcio/metabolismo , Calmodulina/química , Eucariotos/metabolismo , Sequência de Aminoácidos , Arabidopsis/química , Arabidopsis/enzimologia , Proteínas de Arabidopsis/genética , Sítios de Ligação , ATPases Transportadoras de Cálcio/genética , Calmodulina/metabolismo , Ativação Enzimática , Espaço Intracelular/química , Espaço Intracelular/metabolismo , Modelos Moleculares , Dados de Sequência Molecular , Ligação Proteica , Estrutura Terciária de Proteína , Alinhamento de Sequência

8.

LandScape: a simple method to aggregate p-values and other stochastic variables without a priori grouping.

Wiuf, Carsten; Schaumburg-Müller Pallesen, Jonatan; Foldager, Leslie; Grove, Jakob.

Stat Appl Genet Mol Biol ; 15(4): 349-61, 2016 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-27269897

RESUMO

In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).

Assuntos

Modelos Teóricos , Software , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Internet , Reprodutibilidade dos Testes

9.

Intermediates and Generic Convergence to Equilibria.

de Freitas, Michael Marcondes; Wiuf, Carsten; Feliu, Elisenda.

Bull Math Biol ; 79(7): 1662-1686, 2017 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-28620882

RESUMO

Known graphical conditions for the generic and global convergence to equilibria of the dynamical system arising from a reaction network are shown to be invariant under the so-called successive removal of intermediates, a systematic procedure to simplify the network, making the graphical conditions considerably easier to check.

Assuntos

Variação Genética , Modelos Teóricos , Humanos

10.

Graphical reduction of reaction networks by linear elimination of species.

Sáez, Meritxell; Wiuf, Carsten; Feliu, Elisenda.

J Math Biol ; 74(1-2): 195-237, 2017 01.

Artigo em Inglês | MEDLINE | ID: mdl-27221101

RESUMO

The quasi-steady state approximation and time-scale separation are commonly applied methods to simplify models of biochemical reaction networks based on ordinary differential equations (ODEs). The concentrations of the "fast" species are assumed effectively to be at steady state with respect to the "slow" species. Under this assumption the steady state equations can be used to eliminate the "fast" variables and a new ODE system with only the slow species can be obtained. We interpret a reduced system obtained by time-scale separation as the ODE system arising from a unique reaction network, by identification of a set of reactions and the corresponding rate functions. The procedure is graphically based and can easily be worked out by hand for small networks. For larger networks, we provide a pseudo-algorithm. We study properties of the reduced network, its kinetics and conservation laws, and show that the kinetics of the reduced network fulfil realistic assumptions, provided the original network does. We illustrate our results using biological examples such as substrate mechanisms, post-translational modification systems and networks with intermediates (transient) steps.

Assuntos

Fenômenos Bioquímicos/fisiologia , Modelos Biológicos , Algoritmos , Cinética , Processamento de Proteína Pós-Traducional/fisiologia

11.

Intermediates, catalysts, persistence, and boundary steady states.

Marcondes de Freitas, Michael; Feliu, Elisenda; Wiuf, Carsten.

J Math Biol ; 74(4): 887-932, 2017 03.

Artigo em Inglês | MEDLINE | ID: mdl-27480320

RESUMO

For dynamical systems arising from chemical reaction networks, persistence is the property that each species concentration remains positively bounded away from zero, as long as species concentrations were all positive in the beginning. We describe two graphical procedures for simplifying reaction networks without breaking known necessary or sufficient conditions for persistence, by iteratively removing so-called intermediates and catalysts from the network. The procedures are easy to apply and, in many cases, lead to highly simplified network structures, such as monomolecular networks. For specific classes of reaction networks, we show that these conditions for persistence are equivalent to one another. Furthermore, they can also be characterized by easily checkable strong connectivity properties of a related graph. In particular, this is the case for (conservative) monomolecular networks, as well as cascades of a large class of post-translational modification systems (of which the MAPK cascade and the n-site futile cycle are prominent examples). Since one of the aforementioned sufficient conditions for persistence precludes the existence of boundary steady states, our method also provides a graphical tool to check for that.

Assuntos

Fenômenos Bioquímicos/fisiologia , Técnicas de Química Analítica/métodos , Sistema de Sinalização das MAP Quinases/fisiologia , Processamento de Proteína Pós-Traducional/fisiologia

12.

Lyapunov Functions, Stationary Distributions, and Non-equilibrium Potential for Reaction Networks.

Anderson, David F; Craciun, Gheorghe; Gopalkrishnan, Manoj; Wiuf, Carsten.

Bull Math Biol ; 77(9): 1744-67, 2015 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-26376889

RESUMO

We consider the relationship between stationary distributions for stochastic models of reaction systems and Lyapunov functions for their deterministic counterparts. Specifically, we derive the well-known Lyapunov function of reaction network theory as a scaling limit of the non-equilibrium potential of the stationary distribution of stochastically modeled complex balanced systems. We extend this result to general birth-death models and demonstrate via example that similar scaling limits can yield Lyapunov functions even for models that are not complex or detailed balanced, and may even have multiple equilibria.

Assuntos

Modelos Biológicos , Cinética , Cadeias de Markov , Conceitos Matemáticos , Redes e Vias Metabólicas , Dinâmica Populacional/estatística & dados numéricos , Processos Estocásticos

13.

A computational method to preclude multistationarity in networks of interacting species.

Feliu, Elisenda; Wiuf, Carsten.

Bioinformatics ; 29(18): 2327-34, 2013 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-23842805

RESUMO

MOTIVATION: Modeling and analysis of complex systems are important aspects of understanding systemic behavior. In the lack of detailed knowledge about a system, we often choose modeling equations out of convenience and search the (high-dimensional) parameter space randomly to learn about model properties. Qualitative modeling sidesteps the issue of choosing specific modeling equations and frees the inference from specific properties of the equations. We consider classes of ordinary differential equation (ODE) models arising from interactions of species/entities, such as (bio)chemical reaction networks or ecosystems. A class is defined by imposing mild assumptions on the interaction rates. In this framework, we investigate whether there can be multiple positive steady states in some ODE models in a given class. RESULTS: We have developed and implemented a method to decide whether any ODE model in a given class cannot have multiple steady states. The method runs efficiently on models of moderate size. We tested the method on a large set of models for gene silencing by sRNA interference and on two publicly available databases of biological models, KEGG and Biomodels. We recommend that this method is used as (i) a pre-screening step for selecting an appropriate model and (ii) for investigating the robustness of non-existence of multiple steady state for a given ODE model with respect to variation in interaction rates. AVAILABILITY AND IMPLEMENTATION: Scripts and examples in Maple are available in the Supplementary Information. CONTACT: wiuf@math.ku.dk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Modelos Biológicos , Biologia de Sistemas/métodos , Algoritmos , Biologia Computacional/métodos , Fosforilação , Interferência de RNA

14.

Phosphorelays provide tunable signal processing capabilities for the cell.

Kothamachu, Varun B; Feliu, Elisenda; Wiuf, Carsten; Cardelli, Luca; Soyer, Orkun S.

PLoS Comput Biol ; 9(11): e1003322, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24244132

RESUMO

Achieving a complete understanding of cellular signal transduction requires deciphering the relation between structural and biochemical features of a signaling system and the shape of the signal-response relationship it embeds. Using explicit analytical expressions and numerical simulations, we present here this relation for four-layered phosphorelays, which are signaling systems that are ubiquitous in prokaryotes and also found in lower eukaryotes and plants. We derive an analytical expression that relates the shape of the signal-response relationship in a relay to the kinetic rates of forward, reverse phosphorylation and hydrolysis reactions. This reveals a set of mathematical conditions which, when satisfied, dictate the shape of the signal-response relationship. We find that a specific topology also observed in nature can satisfy these conditions in such a way to allow plasticity among hyperbolic and sigmoidal signal-response relationships. Particularly, the shape of the signal-response relationship of this relay topology can be tuned by altering kinetic rates and total protein levels at different parts of the relay. These findings provide an important step towards predicting response dynamics of phosphorelays, and the nature of subsequent physiological responses that they mediate, solely from topological features and few composite measurements; measuring the ratio of reverse and forward phosphorylation rate constants could be sufficient to determine the shape of the signal-response relationship the relay exhibits. Furthermore, they highlight the potential ways in which selective pressures on signal processing could have played a role in the evolution of the observed structural and biochemical characteristic in phosphorelays.

Assuntos

Modelos Biológicos , Fosforilação/fisiologia , Transdução de Sinais/fisiologia , Biologia Computacional , Hidrólise

15.

Cellular compartments cause multistability and allow cells to process more information.

Harrington, Heather A; Feliu, Elisenda; Wiuf, Carsten; Stumpf, Michael P H.

Biophys J ; 104(8): 1824-31, 2013 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-23601329

RESUMO

Many biological, physical, and social interactions have a particular dependence on where they take place; e.g., in living cells, protein movement between the nucleus and cytoplasm affects cellular responses (i.e., proteins must be present in the nucleus to regulate their target genes). Here we use recent developments from dynamical systems and chemical reaction network theory to identify and characterize the key-role of the spatial organization of eukaryotic cells in cellular information processing. In particular, the existence of distinct compartments plays a pivotal role in whether a system is capable of multistationarity (multiple response states), and is thus directly linked to the amount of information that the signaling molecules can represent in the nucleus. Multistationarity provides a mechanism for switching between different response states in cell signaling systems and enables multiple outcomes for cellular-decision making. We combine different mathematical techniques to provide a heuristic procedure to determine if a system has the capacity for multiple steady states, and find conditions that ensure that multiple steady states cannot occur. Notably, we find that introducing species localization can alter the capacity for multistationarity, and we mathematically demonstrate that shuttling confers flexibility for and greater control of the emergence of an all-or-nothing response of a cell.

Assuntos

Compartimento Celular , Transdução de Sinais , Estruturas Celulares/metabolismo , Teoria da Informação , Modelos Biológicos

16.

Common variants at VRK2 and TCF4 conferring risk of schizophrenia.

Steinberg, Stacy; de Jong, Simone; Andreassen, Ole A; Werge, Thomas; Børglum, Anders D; Mors, Ole; Mortensen, Preben B; Gustafsson, Omar; Costas, Javier; Pietiläinen, Olli P H; Demontis, Ditte; Papiol, Sergi; Huttenlocher, Johanna; Mattheisen, Manuel; Breuer, René; Vassos, Evangelos; Giegling, Ina; Fraser, Gillian; Walker, Nicholas; Tuulio-Henriksson, Annamari; Suvisaari, Jaana; Lönnqvist, Jouko; Paunio, Tiina; Agartz, Ingrid; Melle, Ingrid; Djurovic, Srdjan; Strengman, Eric; Jürgens, Gesche; Glenthøj, Birte; Terenius, Lars; Hougaard, David M; Ørntoft, Torben; Wiuf, Carsten; Didriksen, Michael; Hollegaard, Mads V; Nordentoft, Merete; van Winkel, Ruud; Kenis, Gunter; Abramova, Lilia; Kaleda, Vasily; Arrojo, Manuel; Sanjuán, Julio; Arango, Celso; Sperling, Swetlana; Rossner, Moritz; Ribolsi, Michele; Magni, Valentina; Siracusano, Alberto; Christiansen, Claus; Kiemeney, Lambertus A.

Hum Mol Genet ; 20(20): 4076-81, 2011 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-21791550

RESUMO

Common sequence variants have recently joined rare structural polymorphisms as genetic factors with strong evidence for association with schizophrenia. Here we extend our previous genome-wide association study and meta-analysis (totalling 7 946 cases and 19 036 controls) by examining an expanded set of variants using an enlarged follow-up sample (up to 10 260 cases and 23 500 controls). In addition to previously reported alleles in the major histocompatibility complex region, near neurogranin (NRGN) and in an intron of transcription factor 4 (TCF4), we find two novel variants showing genome-wide significant association: rs2312147[C], upstream of vaccinia-related kinase 2 (VRK2) [odds ratio (OR) = 1.09, P = 1.9 × 10(-9)] and rs4309482[A], between coiled-coiled domain containing 68 (CCDC68) and TCF4, about 400 kb from the previously described risk allele, but not accounted for by its association (OR = 1.09, P = 7.8 × 10(-9)).

Assuntos

Fatores de Transcrição de Zíper de Leucina e Hélice-Alça-Hélix Básicos/genética , Polimorfismo de Nucleotídeo Único , Proteínas Serina-Treonina Quinases/genética , Esquizofrenia/genética , Fatores de Transcrição/genética , Alelos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Risco , Fator de Transcrição 4

17.

Variable elimination in post-translational modification reaction networks with mass-action kinetics.

Feliu, Elisenda; Wiuf, Carsten.

J Math Biol ; 66(1-2): 281-310, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22311196

RESUMO

We define a subclass of chemical reaction networks called post-translational modification systems. Important biological examples of such systems include MAPK cascades and two-component systems which are well-studied experimentally as well as theoretically. The steady states of such a system are solutions to a system of polynomial equations. Even for small systems the task of finding the solutions is daunting. We develop a mathematical framework based on the notion of a cut (a particular subset of species in the system), which provides a linear elimination procedure to reduce the number of variables in the system to a set of core variables. The steady states are parameterized algebraically by the core variables, and graphical conditions for when steady states with positive core variables imply positivity of all variables are given. Further, minimal cuts are the connected components of the species graph and provide conservation laws. A criterion for when a (maximal) set of independent conservation laws can be derived from cuts is given.

Assuntos

Modelos Biológicos , Processamento de Proteína Pós-Traducional , Transdução de Sinais , Cinética , Modelos Lineares , Sistema de Sinalização das MAP Quinases , Conceitos Matemáticos , Redes e Vias Metabólicas

18.

A reaction network scheme for hidden Markov model parameter learning.

Wiuf, Carsten; Behera, Abhishek; Singh, Abhinav; Gopalkrishnan, Manoj.

J R Soc Interface ; 20(203): 20220877, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37340782

RESUMO

With a view towards artificial cells, molecular communication systems, molecular multiagent systems and federated learning, we propose a novel reaction network scheme (termed the Baum-Welch (BW) reaction network) that learns parameters for hidden Markov models (HMMs). All variables including inputs and outputs are encoded by separate species. Each reaction in the scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every positive fixed point of the BW algorithm for HMMs is a fixed point of the reaction network scheme, and vice versa. Furthermore, we prove that the 'expectation' step and the 'maximization' step of the reaction network separately converge exponentially fast and compute the same values as the E-step and the M-step of the BW algorithm. We simulate example sequences, and show that our reaction network learns the same parameters for the HMM as the BW algorithm, and that the log-likelihood increases continuously along the trajectory of the reaction network.

Assuntos

Algoritmos , Cadeias de Markov

19.

Evaluation of population structure inferred by principal component analysis or the admixture model.

van Waaij, Jan; Li, Song; Garcia-Erill, Genís; Albrechtsen, Anders; Wiuf, Carsten.

Genetics ; 225(2)2023 Oct 04.

Artigo em Inglês | MEDLINE | ID: mdl-37611212

RESUMO

Principal component analysis (PCA) is commonly used in genetics to infer and visualize population structure and admixture between populations. PCA is often interpreted in a way similar to inferred admixture proportions, where it is assumed that individuals belong to one of several possible populations or are admixed between these populations. We propose a new method to assess the statistical fit of PCA (interpreted as a model spanned by the top principal components) and to show that violations of the PCA assumptions affect the fit. Our method uses the chosen top principal components to predict the genotypes. By assessing the covariance (and the correlation) of the residuals (the differences between observed and predicted genotypes), we are able to detect violation of the model assumptions. Based on simulations and genome-wide human data, we show that our assessment of fit can be used to guide the interpretation of the data and to pinpoint individuals that are not well represented by the chosen principal components. Our method works equally on other similar models, such as the admixture model, where the mean of the data is represented by linear matrix decomposition.

20.

Estimating admixture pedigrees of recent hybrids without a contiguous reference genome.

Garcia-Erill, Genís; Hanghøj, Kristian; Heller, Rasmus; Wiuf, Carsten; Albrechtsen, Anders.

Mol Ecol Resour ; 23(7): 1604-1619, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37400991

RESUMO

The genome of recently admixed individuals or hybrids has characteristic genetic patterns that can be used to learn about their recent admixture history. One of these are patterns of interancestry heterozygosity, which can be inferred from SNP data from either called genotypes or genotype likelihoods, without the need for information on genomic location. This makes them applicable to a wide range of data that are often used in evolutionary and conservation genomic studies, such as low-depth sequencing mapped to scaffolds and reduced representation sequencing. Here we implement maximum likelihood estimation of interancestry heterozygosity patterns using two complementary models. We furthermore develop apoh (Admixture Pedigrees of Hybrids), a software that uses estimates of paired ancestry proportions to detect recently admixed individuals or hybrids, and to suggest possible admixture pedigrees. It furthermore calculates several hybrid indices that make it easier to identify and rank possible admixture pedigrees that could give rise to the estimated patterns. We implemented apoh both as a command line tool and as a Graphical User Interface that allows the user to automatically and interactively explore, rank and visualize compatible recent admixture pedigrees, and calculate the different summary indices. We validate the performance of the method using admixed family trios from the 1000 Genomes Project. In addition, we show its applicability on identifying recent hybrids from RAD-seq data of Grant's gazelle (Nanger granti and Nanger petersii) and whole genome low-depth data of waterbuck (Kobus ellipsiprymnus) which shows complex admixture of up to four populations.

Assuntos

Genética Populacional , Genoma , Humanos , Linhagem , Genoma/genética , Genótipo , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa