Pesquisa | BVS Economia da Saúde

Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed).

Vogl, Claus; Karapetiants, Mariia; Yildirim, Burçin; Kjartansdóttir, Hrönn; Kosiol, Carolin; Bergman, Juraj; Majka, Michal; Mikula, Lynette Caitlin.

BMC Bioinformatics ; 25(1): 151, 2024 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-38627634

RESUMO

BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS: We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS: Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.

Assuntos

Genoma , Genômica , Animais , Humanos , Camundongos , Cadeias de Markov , Composição de Bases , Probabilidade , Algoritmos

Inference in population genetics using forward and backward, discrete and continuous time processes.

Bergman, Juraj; Schrempf, Dominik; Kosiol, Carolin; Vogl, Claus.

J Theor Biol ; 439: 166-180, 2018 02 14.

Artigo em Inglês | MEDLINE | ID: mdl-29229523

RESUMO

A central aim of population genetics is the inference of the evolutionary history of a population. To this end, the underlying process can be represented by a model of the evolution of allele frequencies parametrized by e.g., the population size, mutation rates and selection coefficients. A large class of models use forward-in-time models, such as the discrete Wright-Fisher and Moran models and the continuous forward diffusion, to obtain distributions of population allele frequencies, conditional on an ancestral initial allele frequency distribution. Backward-in-time diffusion processes have been rarely used in the context of parameter inference. Here, we demonstrate how forward and backward diffusion processes can be combined to efficiently calculate the exact joint probability distribution of sample and population allele frequencies at all times in the past, for both discrete and continuous population genetics models. This procedure is analogous to the forward-backward algorithm of hidden Markov models. While the efficiency of discrete models is limited by the population size, for continuous models it suffices to expand the transition density in orthogonal polynomials of the order of the sample size to infer marginal likelihoods of population genetic parameters. Additionally, conditional allele trajectories and marginal likelihoods of samples from single populations or from multiple populations that split in the past can be obtained. The described approaches allow for efficient maximum likelihood inference of population genetic parameters in a wide variety of demographic scenarios.

Assuntos

Genética Populacional/métodos , Modelos Genéticos , Algoritmos , Evolução Biológica , Frequência do Gene , Funções Verossimilhança , Cadeias de Markov , Métodos , Densidade Demográfica , Tempo

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA