Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
BMC Bioinformatics ; 25(1): 151, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627634

RESUMO

BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS: We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS: Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.


Assuntos
Genoma , Genômica , Animais , Humanos , Camundongos , Cadeias de Markov , Composição de Bases , Probabilidade , Algoritmos
2.
Theor Popul Biol ; 157: 55-85, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38552964

RESUMO

In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for stepwise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly.


Assuntos
Algoritmos , Frequência do Gene , Densidade Demográfica , Processos Estocásticos , Genética Populacional , Modelos Genéticos , Cadeias de Markov , Humanos
3.
Animals (Basel) ; 11(7)2021 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-34201584

RESUMO

Housing and management conditions strongly influence the health, welfare and behaviour of horses. Consequently, objective and quantifiable comparisons between domestic environments and their influence on different equine demographics are needed to establish evidence-based criteria to assess and optimize horse welfare. Therefore, the present study aimed to measure and compare the time budgets (=percentage of time spent on specific activities) of horses with chronic orthopaedic disease and geriatric (≥20 years) horses living in different husbandry systems using an automated tracking device. Horses spent 42% (range 38.3-44.8%) of their day eating, 39% (range 36.87-44.9%) resting, and 19% (range 17-20.4%) in movement, demonstrating that geriatric horses and horses suffering from chronic orthopaedic disease can exhibit behaviour time budgets equivalent to healthy controls. Time budget analysis revealed significant differences between farms, turn-out conditions and time of day, and could identify potential areas for improvement. Horses living in open-air group housing on a paddock had a more uniform temporal distribution of feeding and movement activities with less pronounced peaks compared to horses living in more restricted husbandry systems.

4.
J Theor Biol ; 439: 166-180, 2018 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-29229523

RESUMO

A central aim of population genetics is the inference of the evolutionary history of a population. To this end, the underlying process can be represented by a model of the evolution of allele frequencies parametrized by e.g., the population size, mutation rates and selection coefficients. A large class of models use forward-in-time models, such as the discrete Wright-Fisher and Moran models and the continuous forward diffusion, to obtain distributions of population allele frequencies, conditional on an ancestral initial allele frequency distribution. Backward-in-time diffusion processes have been rarely used in the context of parameter inference. Here, we demonstrate how forward and backward diffusion processes can be combined to efficiently calculate the exact joint probability distribution of sample and population allele frequencies at all times in the past, for both discrete and continuous population genetics models. This procedure is analogous to the forward-backward algorithm of hidden Markov models. While the efficiency of discrete models is limited by the population size, for continuous models it suffices to expand the transition density in orthogonal polynomials of the order of the sample size to infer marginal likelihoods of population genetic parameters. Additionally, conditional allele trajectories and marginal likelihoods of samples from single populations or from multiple populations that split in the past can be obtained. The described approaches allow for efficient maximum likelihood inference of population genetic parameters in a wide variety of demographic scenarios.


Assuntos
Genética Populacional/métodos , Modelos Genéticos , Algoritmos , Evolução Biológica , Frequência do Gene , Funções Verossimilhança , Cadeias de Markov , Métodos , Densidade Demográfica , Tempo
5.
Theor Popul Biol ; 98: 19-27, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25453604

RESUMO

The distribution of allele frequencies of a large number of biallelic sites is known as "allele-frequency spectrum" or "site-frequency spectrum" (SFS). Without selection and in regions of relatively high recombination rates, sites may be assumed to be independently and identically distributed. With a beta equilibrium distribution of allelic proportions and binomial sampling, a beta-binomial compound likelihood for each site results. The likelihood of the data and the posterior distribution of two parameters, scaled mutation rate θ and mutation bias α, is investigated in the general case and for small scaled mutation rates θ. In the general case, an expectation-maximization (EM) algorithm is derived to obtain maximum likelihood estimates of both parameters. With an appropriate prior distribution, a Markov chain Monte Carlo sampler to integrate the posterior distribution is also derived. As far as I am aware, previous maximum likelihood or Bayesian estimators of θ, explicitly or implicitly assume small scaled mutation rates, i.e., θ≪1. For θ≪1, maximum-likelihood estimators are also derived for both parameters using a Taylor series expansion of the beta-binomial distribution. The estimator of θ is a variant of the Ewens-Watterson estimator and of the maximum likelihood estimator derived with the Poisson Random Field approach. With a conjugate prior distribution, marginal and conditional beta posterior distributions are also derived for both parameters.


Assuntos
Modelos Genéticos , Taxa de Mutação , Algoritmos , Alelos , Teorema de Bayes , Genética Populacional , Humanos , Funções Verossimilhança , Cadeias de Markov , Método de Monte Carlo
6.
New Phytol ; 199(2): 609-621, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23574432

RESUMO

Allotetraploid speciation, that is, the generation of a hybrid tetraploid species from two diploid species, and the long-term evolution of tetraploid populations and species are important in plants. We developed a population genetic model to infer population genetic parameters of tetraploid populations from data of the progenitor and descendant species. Two yarrow species, Achillea alpina-4x and A. wilsoniana-4x, arose by allotetraploidization from the diploid progenitors, A. acuminata-2x and A. asiatica-2x. Yet, the population genetic process has not been studied in detail. We applied the model to sequences of three nuclear genes in populations of the four yarrow species and compared their pattern of variability with that in four plastid regions. The plastid data indicated that the two tetraploid species probably originated from multiple independent allopolyploidization events and have accumulated many mutations since. With the nuclear data, we found a low rate of homeologous recombination or gene conversion and a reduction in diversity relative to the level of both diploid species combined. The present analysis with a novel probabilistic model suggests a genetic bottleneck during tetraploid speciation, that the two tetraploid species have a long evolutionary history, and that they have a small amount of genetic exchange between the homeologous genomes.


Assuntos
Achillea/genética , Evolução Biológica , Especiação Genética , Modelos Genéticos , Tetraploidia , Alelos , Núcleo Celular/genética , Simulação por Computador , Citosol/enzimologia , DNA de Cloroplastos/genética , Genes de Plantas/genética , Genética Populacional , Glucose-6-Fosfato Isomerase/genética , Haplótipos/genética , Cadeias de Markov , Método de Monte Carlo , Plastídeos/genética , Polimorfismo Genético
7.
Methods Mol Biol ; 609: 241-53, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20221923

RESUMO

Markov and Hidden Markov models (HMMs) are introduced using examples from linkage mapping and sequence analysis. In the course, the forward-backward, the Viterbi, the Baum-Welch (EM) algorithm, and a Metropolis sampling scheme are presented.


Assuntos
Algoritmos , Inteligência Artificial , Biologia Computacional , Mineração de Dados , Bases de Dados Genéticas , Cadeias de Markov , Modelos Estatísticos , Animais , Mapeamento Cromossômico , Ligação Genética , Humanos , Funções Verossimilhança , Análise de Sequência de DNA
8.
Genetics ; 166(3): 1405-18, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-15082559

RESUMO

The genetic architecture of hybrid fitness characters can provide valuable insights into the nature and evolution of postzygotic reproductive barriers in diverged species. We determined the genome-wide distribution of barriers to introgression in an F(1) hybrid of two Eucalyptus tree species, Eucalyptus grandis (W. Hill ex Maiden.) and E. globulus (Labill.). Two interspecific backcross families (N = 186) were used to construct comparative, single-tree, genetic linkage maps of an F(1) hybrid individual and two backcross parents. A total of 1354 testcross AFLP marker loci were evaluated in the three parental maps and a substantial proportion (27.7% average) exhibited transmission ratio distortion (alpha = 0.05). The distorted markers were located in distinct regions of the parental maps and marker alleles within each region were all biased toward either of the two parental species. We used a Bayesian approach to estimate the position and effect of transmission ratio distorting loci (TRDLs) in the distorted regions of each parental linkage map. The relative viability of TRDL alleles ranged from 0.20 to 0.72. Contrary to expectation, heterospecific (donor) alleles of TRDLs were favored as often as recurrent alleles in both backcrosses, suggesting that positive and negative heterospecific interactions affect introgression rates in this wide interspecific pedigree.


Assuntos
Cruzamentos Genéticos , Eucalyptus/genética , Genoma de Planta , Alelos , Teorema de Bayes , Distribuição de Qui-Quadrado , Mapeamento Cromossômico , Cromossomos de Plantas , Ligação Genética , Marcadores Genéticos , Hibridização Genética , Cadeias de Markov , Método de Monte Carlo , Linhagem , Polimorfismo de Fragmento de Restrição , Especificidade da Espécie
9.
Genetics ; 165(3): 1385-95, 2003 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-14668389

RESUMO

Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.


Assuntos
Drosophila/genética , Variação Genética , Animais , Drosophila/classificação , Cadeias de Markov , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA