Pesquisa | Secretaria de Estado da Saúde

Grouping of genomic markers in populations with family structure.

Wittenburg, Dörte; Doschoris, Michael; Klosa, Jan.

BMC Bioinformatics ; 22(1): 79, 2021 Feb 19.

Artigo em Inglês | MEDLINE | ID: mdl-33607943

RESUMO

BACKGROUND: Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. RESULTS: Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. CONCLUSIONS: Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.

Assuntos

Genoma , Ligação Genética , Genômica , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único

Design of experiments for fine-mapping quantitative trait loci in livestock populations.

Wittenburg, Dörte; Bonk, Sarah; Doschoris, Michael; Reyer, Henry.

BMC Genet ; 21(1): 66, 2020 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-32600319

RESUMO

BACKGROUND: Single nucleotide polymorphisms (SNPs) which capture a significant impact on a trait can be identified with genome-wide association studies. High linkage disequilibrium (LD) among SNPs makes it difficult to identify causative variants correctly. Thus, often target regions instead of single SNPs are reported. Sample size has not only a crucial impact on the precision of parameter estimates, it also ensures that a desired level of statistical power can be reached. We study the design of experiments for fine-mapping of signals of a quantitative trait locus in such a target region. METHODS: A multi-locus model allows to identify causative variants simultaneously, to state their positions more precisely and to account for existing dependencies. Based on the commonly applied SNP-BLUP approach, we determine the z-score statistic for locally testing non-zero SNP effects and investigate its distribution under the alternative hypothesis. This quantity employs the theoretical instead of observed dependence between SNPs; it can be set up as a function of paternal and maternal LD for any given population structure. RESULTS: We simulated multiple paternal half-sib families and considered a target region of 1 Mbp. A bimodal distribution of estimated sample size was observed, particularly if more than two causative variants were assumed. The median of estimates constituted the final proposal of optimal sample size; it was consistently less than sample size estimated from single-SNP investigation which was used as a baseline approach. The second mode pointed to inflated sample sizes and could be explained by blocks of varying linkage phases leading to negative correlations between SNPs. Optimal sample size increased almost linearly with number of signals to be identified but depended much stronger on the assumption on heritability. For instance, three times as many samples were required if heritability was 0.1 compared to 0.3. An R package is provided that comprises all required tools. CONCLUSIONS: Our approach incorporates information about the population structure into the design of experiments. Compared to a conventional method, this leads to a reduced estimate of sample size enabling the resource-saving design of future experiments for fine-mapping of candidate variants.

Assuntos

Mapeamento Cromossômico/veterinária , Gado/genética , Modelos Genéticos , Locos de Características Quantitativas , Animais , Feminino , Ligação Genética , Masculino , Polimorfismo de Nucleotídeo Único

Estimation of Recombination Rate and Maternal Linkage Disequilibrium in Half-Sibs.

Hampel, Alexander; Teuscher, Friedrich; Gomez-Raya, Luis; Doschoris, Michael; Wittenburg, Dörte.

Front Genet ; 9: 186, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29922330

RESUMO

A livestock population can be characterized by different population genetic parameters, such as linkage disequilibrium and recombination rate between pairs of genetic markers. The population structure, which may be caused by family stratification, has an influence on the estimates of these parameters. An expectation maximization algorithm has been proposed for estimating these parameters in half-sibs without phasing the progeny. It, however, overlooks the fact that the underlying likelihood function may have two maxima. The magnitudes of the maxima depend on the maternal allele frequencies at the investigated marker pair. Which maximum the algorithm converges to depends on the chosen start values. We present a stepwise procedure in which the relationship between the two modes is exploited. The expectation maximization algorithm for the parameter estimation is applied twice using different start values, followed by a decision process to assess the most likely estimate. This approach was validated using simulated genotypes of half-sibs. It was also applied to a dairy cattle dataset consisting of multiple half-sib families and 39,780 marker genotypes, leading to estimates for 12,759,713 intrachromosomal marker pairs. Furthermore, the proper order of markers was verified by studying the mean of estimated recombination rates in a window adjacent to the investigated locus as well as in a window at its most distant chromosome end. Putatively misplaced markers or marker clusters were detected by comparing the results with the revised bovine genome assembly UMD 3.1.1. In total, 40 markers were identified as candidates of misplacement. This outcome may help improving the physical order of markers which is also required for refining the bovine genetic map.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa