Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Anim Genet ; 53(4): 498-505, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35490362

RESUMO

Creation of the bovine reference assembly paved the way to develop the high-throughput genotyping arrays of the single nucleotide polymorphisms (SNPs) based on the available map coordinates that facilitated major advances in gene mapping and selection programs. The assembly flaws, however, may cause false results in the downstream gene mapping studies. The most recent bovine reference genome (ARS-UCD1.2) is built on the long-read sequences that provides improved quality and continuity. By applying population genetic metrics in this study, we aimed to evaluate the map coordinates to which SNP markers were assigned. We employed a three-step approach by combining the recombination and linkage disequilibrium analyses to test if the markers fit into the assigned physical map coordinates. We applied the method to the bovine 50k array in a large pedigree of Holstein cattle and revealed a panel of 65 candidate markers, most of which were re-located either on a different chromosome or re-mapped as far as several million base pairs away on the same chromosome. This list of candidates accounts for 0.1% of the SNPs in the widely used 50k genotyping array and we foresee a reasonably larger set of markers being misplaced in the BovineHD 700K BeadChip. We suggest pre-removal of the candidate misplaced markers to reduce false signals in association mapping studies.


Assuntos
Genética Populacional , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Mapeamento Cromossômico , Ligação Genética , Linhagem
2.
BMC Bioinformatics ; 22(1): 79, 2021 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-33607943

RESUMO

BACKGROUND: Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. RESULTS: Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. CONCLUSIONS: Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.


Assuntos
Genoma , Ligação Genética , Genômica , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
3.
BMC Bioinformatics ; 21(1): 407, 2020 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-32933477

RESUMO

BACKGROUND: Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. RESULTS: Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. CONCLUSIONS: The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.


Assuntos
Modelos Lineares , Aprendizado de Máquina/normas , Algoritmos , Humanos
4.
BMC Genet ; 21(1): 66, 2020 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-32600319

RESUMO

BACKGROUND: Single nucleotide polymorphisms (SNPs) which capture a significant impact on a trait can be identified with genome-wide association studies. High linkage disequilibrium (LD) among SNPs makes it difficult to identify causative variants correctly. Thus, often target regions instead of single SNPs are reported. Sample size has not only a crucial impact on the precision of parameter estimates, it also ensures that a desired level of statistical power can be reached. We study the design of experiments for fine-mapping of signals of a quantitative trait locus in such a target region. METHODS: A multi-locus model allows to identify causative variants simultaneously, to state their positions more precisely and to account for existing dependencies. Based on the commonly applied SNP-BLUP approach, we determine the z-score statistic for locally testing non-zero SNP effects and investigate its distribution under the alternative hypothesis. This quantity employs the theoretical instead of observed dependence between SNPs; it can be set up as a function of paternal and maternal LD for any given population structure. RESULTS: We simulated multiple paternal half-sib families and considered a target region of 1 Mbp. A bimodal distribution of estimated sample size was observed, particularly if more than two causative variants were assumed. The median of estimates constituted the final proposal of optimal sample size; it was consistently less than sample size estimated from single-SNP investigation which was used as a baseline approach. The second mode pointed to inflated sample sizes and could be explained by blocks of varying linkage phases leading to negative correlations between SNPs. Optimal sample size increased almost linearly with number of signals to be identified but depended much stronger on the assumption on heritability. For instance, three times as many samples were required if heritability was 0.1 compared to 0.3. An R package is provided that comprises all required tools. CONCLUSIONS: Our approach incorporates information about the population structure into the design of experiments. Compared to a conventional method, this leads to a reduced estimate of sample size enabling the resource-saving design of future experiments for fine-mapping of candidate variants.


Assuntos
Mapeamento Cromossômico/veterinária , Gado/genética , Modelos Genéticos , Locos de Características Quantitativas , Animais , Feminino , Ligação Genética , Masculino , Polimorfismo de Nucleotídeo Único
5.
Genet Sel Evol ; 52(1): 73, 2020 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-33317445

RESUMO

BACKGROUND: Recombination is a process by which chromosomes are broken and recombine to generate new combinations of alleles, therefore playing a major role in shaping genome variation. Recombination frequencies ([Formula: see text]) between markers are used to construct genetic maps, which have important implications in genomic studies. Here, we report a recombination map for 44,696 autosomal single nucleotide polymorphisms (SNPs) according to the coordinates of the most recent bovine reference assembly. The recombination frequencies were estimated across 876 half-sib families with a minimum number of 39 and maximum number of 4236 progeny, comprising over 367 K genotyped German Holstein animals. RESULTS: Genome-wide, over 8.9 million paternal recombination events were identified by investigating adjacent markers. The recombination map spans 24.43 Morgan (M) for a chromosomal length of 2486 Mbp and an average of ~ 0.98 cM/Mbp, which concords with the available pedigree-based linkage maps. Furthermore, we identified 971 putative recombination hotspot intervals (defined as [Formula: see text] > 2.5 standard deviations greater than the mean). The hotspot regions were non-uniformly distributed as sharp and narrow peaks, corresponding to ~ 5.8% of the recombination that has taken place in only ~ 2.4% of the genome. We verified genetic map length by applying a likelihood-based approach for the estimation of recombination rate between all intra-chromosomal marker pairs. This resulted in a longer autosomal genetic length for male cattle (25.35 cM) and in the localization of 51 putatively misplaced SNPs in the genome assembly. CONCLUSIONS: Given the fact that this map is built on the coordinates of the ARS-UCD1.2 assembly, our results provide the most updated genetic map yet available for the cattle genome.


Assuntos
Bovinos/genética , Mapeamento Cromossômico/métodos , Cromossomos/genética , Recombinação Genética , Animais , Mapeamento Cromossômico/normas , Ligação Genética , Linhagem , Polimorfismo de Nucleotídeo Único , Padrões de Referência
6.
Biom J ; 60(6): 1096-1109, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30101421

RESUMO

Genomic information can be used to study the genetic architecture of some trait. Not only the size of the genetic effect captured by molecular markers and their position on the genome but also the mode of inheritance, which might be additive or dominant, and the presence of interactions are interesting parameters. When searching for interacting loci, estimating the effect size and determining the significant marker pairs increases the computational burden in terms of speed and memory allocation dramatically. This study revisits a rapid Bayesian approach (fastbayes). As a novel contribution, a measure of evidence is derived to select markers with effect significantly different from zero. It is based on the credibility of the highest posterior density interval next to zero in a marginalized manner. This methodology is applied to simulated data resembling a dairy cattle population in order to verify the sensitivity of testing for a given range of type-I error levels. A real data application complements this study. Sensitivity and specificity of fastbayes were similar to a variational Bayesian method, and a further reduction of computing time could be achieved. More than 50% of the simulated causative variants were identified. The most complex model containing different kinds of genetic effects and their pairwise interactions yielded the best outcome over a range of type-I error levels. The validation study showed that fastbayes is a dual-purpose tool for genomic inferences - it is applicable to predict future outcome of not-yet phenotyped individuals with high precision as well as to estimate and test single-marker effects. Furthermore, it allows the estimation of billions of interaction effects.


Assuntos
Biometria/métodos , Genômica , Animais , Teorema de Bayes , Camundongos , Polimorfismo de Nucleotídeo Único , Software
8.
Mol Genet Genomics ; 288(11): 615-25, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23996144

RESUMO

Porcine adrenergic receptor beta 2 (ADRB2) gene exhibits differential allelic expression in skeletal muscle, and its genetic variation has been associated with muscle pH. Exploring the molecular-genetic background of expression variation for porcine ADRB2 will provide insight into the mechanisms driving its regulatory divergence and may also contribute to unraveling the genetic basis of muscle-related traits in pigs. In the present study, we therefore examined haplotype effects on the expression of porcine ADRB2 in four tissues: longissimus dorsi muscle, liver, subcutaneous fat, and spleen. The diversity and structure of haplotypes of the proximal gene region segregating in German commercial breeds were characterized. Seven haplotypes falling into three clades were identified. Two clades including five haplotypes most likely originated from introgression of Asian genetics during formation of modern breeds. Expression analyses revealed that the Asian-derived haplotypes increase expression of the porcine ADRB2 compared to the major, wild-type haplotype independently of tissue type. In addition, several tissue-specific differences in the expression of the Asian-derived haplotypes were found. Inspection of haplotype sequences showed that differentially expressed haplotypes exhibit polymorphisms in a polyguanine tract located in the core promoter region. These findings demonstrate that expression variation of the porcine ADRB2 has a complex genetic basis and suggest that the promoter polyguanine tract is causally involved. This study highlights the challenges of finding causal genetic variants underlying complex traits.


Assuntos
Variação Genética , Haplótipos/genética , Receptores Adrenérgicos beta 2/genética , Suínos/genética , Alelos , Animais , Sequência de Bases , Regulação da Expressão Gênica , Frequência do Gene , Estruturas Genéticas , Genótipo , Fígado , Músculo Esquelético , Especificidade de Órgãos , Fenótipo , Poli G , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas , Baço , Gordura Subcutânea
9.
Artigo em Inglês | MEDLINE | ID: mdl-35254989

RESUMO

In life sciences, high-throughput techniques typically lead to high-dimensional data and often the number of covariates is much larger than the number of observations. This inherently comes with multicollinearity challenging a statistical analysis in a linear regression framework. Penalization methods such as the lasso, ridge regression, the group lasso, and convex combinations thereof, which introduce additional conditions on regression variables, have proven themselves effective. In this study, we introduce a novel approach by combining the lasso and the standardized group lasso leading to meaningful weighting of the predicted ("fitted") outcome which is of primary importance, e.g., in breeding populations. This "fitted" sparse-group lasso was implemented as a proximal-averaged gradient descent method and is part of the R package "seagull" available at CRAN. For the evaluation of the novel method, we executed an extensive simulation study. We simulated genotypes and phenotypes which resemble data of a dairy cattle population. Genotypes at thousands of genomic markers were used as covariates to fit a quantitative response. The proximity of markers on a chromosome determined grouping. In the majority of simulated scenarios, the new method revealed improved prediction abilities compared to other penalization approaches and was able to localize the signals of simulated features.


Assuntos
Genoma , Animais , Bovinos , Genoma/genética , Genótipo , Simulação por Computador , Modelos Lineares , Fenótipo
10.
Front Genet ; 14: 1082782, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37323679

RESUMO

The arrangement of markers on the genome can be defined in either physical or linkage terms. While a physical map represents the inter-marker distances in base pairs, a genetic (or linkage) map pictures the recombination rate between pairs of markers. High-resolution genetic maps are key elements for genomic research, such as fine-mapping of quantitative trait loci, but they are also needed for creating and updating chromosome-level assemblies of whole-genome sequences. Based on published results on a large pedigree of German Holstein cattle and newly obtained results with German/Austrian Fleckvieh cattle, we aim at providing a platform that allows users to interactively explore the bovine genetic and physical map. We developed the R Shiny app CLARITY available online at https://nmelzer.shinyapps.io/clarity and as R package at https://github.com/nmelzer/CLARITY that provides access to the genetic maps built on the Illumina Bovine SNP50 genotyping array with markers ordered according to the physical coordinates of the most recent bovine genome assembly ARS-UCD1.2. The user is able to interconnect the physical and genetic map for a whole chromosome or a specific chromosomal region and can inspect a landscape of recombination hotspots. Moreover, the user can investigate which of the frequently used genetic-map functions locally fits best. We further provide auxiliary information about markers being putatively misplaced in the ARS-UCD1.2 release. The corresponding output tables and figures can be downloaded in various formats. By ongoing data integration from different breeds, the app also facilitates comparison of different genome features, providing a valuable tool for education and research purposes.

11.
BMC Genet ; 12: 74, 2011 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-21867519

RESUMO

BACKGROUND: Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. METHODS: We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. RESULTS: If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. CONCLUSIONS: This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source.


Assuntos
Teorema de Bayes , Marcadores Genéticos , Variação Genética , Modelos Estatísticos , Fenótipo , Locos de Características Quantitativas
12.
Front Genet ; 12: 786934, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35111201

RESUMO

Pikeperch (Sander lucioperca) has emerged as a high value species to the aquaculture industry. However, its farming techniques are at an early stage and its production is often performed without a selective breeding program, potentially leading to high levels of inbreeding. In this study, we identified and characterized autozygosity based on genome-wide runs of homozygosity (ROH) on a sample of parental and offspring individuals, determined effective population size (N e ), and assessed relatedness among parental individuals. A mean of 2,235 ± 526 and 1,841 ± 363 ROH segments per individual, resulting in a mean inbreeding coefficient of 0.33 ± 0.06 and 0.25 ± 0.06 were estimated for the progeny and parents, respectively. N e was about 12 until four generations ago and at most 106 for 63 generations in the past, with varying genetic relatedness amongst the parents. This study shows the importance of genomic information when family relationships are unknown and the need of selective breeding programs for reproductive management decisions in the aquaculture industry.

13.
Genes (Basel) ; 11(8)2020 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-32722051

RESUMO

Selective breeding can significantly improve the establishment of sustainable and profitable aquaculture fish farming. For rainbow trout (Oncorhynchus mykiss), one of the main aquaculture coldwater species in Europe, a variety of selected hatchery strains are commercially available. In this study, we investigated the genetic variation between the local Born strain, selected for survival, and the commercially available Silver Steelhead strain, selected for growth. We sequenced the transcriptome of six tissues (gills, head kidney, heart, liver, spleen, and white muscle) from eight healthy individuals per strain, using RNA-seq technology to identify strain-specific gene-expression patterns and single nucleotide polymorphisms (SNPs). In total, 1760 annotated genes were differentially expressed across all tissues. Pathway analysis assigned them to different gene networks. We also identified a set of SNPs, which are heterozygous for one of the two breeding strains: 1229 of which represent polymorphisms over all tissues and individuals. Our data indicate a strong genetic differentiation between Born and Silver Steelhead trout, despite the relatively short time of evolutionary separation of the two breeding strains. The results most likely reflect their specifically adapted genotypes and might contribute to the understanding of differences regarding their robustness toward high stress and pathogenic challenge described in former studies.


Assuntos
Redes Reguladoras de Genes , Marcadores Genéticos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Oncorhynchus mykiss/genética , Polimorfismo de Nucleotídeo Único , Transcriptoma , Animais , Anotação de Sequência Molecular , Oncorhynchus mykiss/classificação , Oncorhynchus mykiss/crescimento & desenvolvimento , Especificidade da Espécie
14.
Sci Rep ; 10(1): 22335, 2020 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-33339898

RESUMO

Pikeperch (Sander lucioperca) is a fish species with growing economic significance in the aquaculture industry. However, successful positioning of pikeperch in large-scale aquaculture requires advances in our understanding of its genome organization. In this study, an ultra-high density linkage map for pikeperch comprising 24 linkage groups and 1,023,625 single nucleotide polymorphisms markers was constructed after genotyping whole-genome sequencing data from 11 broodstock and 363 progeny, belonging to 6 full-sib families. The sex-specific linkage maps spanned a total of 2985.16 cM in females and 2540.47 cM in males with an average inter-marker distance of 0.0030 and 0.0026 cM, respectively. The sex-averaged map spanned a total of 2725.53 cM with an average inter-marker distance of 0.0028 cM. Furthermore, the sex-averaged map was used for improving the contiguity and accuracy of the current pikeperch genome assembly. Based on 723,360 markers, 706 contigs were anchored and oriented into 24 pseudomolecules, covering a total of 896.48 Mb and accounting for 99.47% of the assembled genome size. The overall contiguity of the assembly improved with a scaffold N50 length of 41.06 Mb. Finally, an updated annotation of protein-coding genes and repetitive elements of the enhanced genome assembly is provided at NCBI.


Assuntos
Ligação Genética/genética , Genoma/genética , Percas/genética , Locos de Características Quantitativas/genética , Animais , Mapeamento Cromossômico , Repetições de Microssatélites/genética , Polimorfismo de Nucleotídeo Único/genética , Recombinação Genética/genética
15.
Front Genet ; 10: 590, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31316547

RESUMO

Blood values of calcium (Ca), inorganic phosphorus (IP), and alkaline phosphatase activity (ALP) are valuable indicators for mineral status and bone mineralization. The mineral homeostasis is maintained by absorption, retention, and excretion processes employing a number of known and unknown sensing and regulating factors with implications on immunity. Due to the high inter-individual variation of Ca and P levels in the blood of pigs and to clarify molecular contributions to this variation, the genetics of hematological traits related to the Ca and P balance were investigated in a German Landrace population, integrating both single-locus and multi-locus genome-wide association study (GWAS) approaches. Genomic heritability estimates suggest a moderate genetic contribution to the variation of hematological Ca (N = 456), IP (N = 1049), ALP (N = 439), and the Ca/P ratio (N = 455), with values ranging from 0.27 to 0.54. The genome-wide analysis of markers adds a number of genomic regions to the list of quantitative trait loci, some of which overlap with previous results. Despite the gaps in knowledge of genes involved in Ca and P metabolism, genes like THBS2, SHH, PTPRT, PTGS1, and FRAS1 with reported connections to bone metabolism were derived from the significantly associated genomic regions. Additionally, genomic regions included TRAFD1 and genes coding for phosphate transporters (SLC17A1-SLC17A4), which are linked to Ca and P homeostasis. The study calls for improved functional annotation of the proposed candidate genes to derive features involved in maintaining Ca and P balance. This gene information can be exploited to diagnose and predict characteristics of micronutrient utilization, bone development, and a well-functioning musculoskeletal system in pig husbandry and breeding.

16.
Genes (Basel) ; 10(9)2019 09 13.
Artigo em Inglês | MEDLINE | ID: mdl-31540274

RESUMO

The pikeperch (Sander lucioperca) is a fresh and brackish water Percid fish natively inhabiting the northern hemisphere. This species is emerging as a promising candidate for intensive aquaculture production in Europe. Specific traits like cannibalism, growth rate and meat quality require genomics based understanding, for an optimal husbandry and domestication process. Still, the aquaculture community is lacking an annotated genome sequence to facilitate genome-wide studies on pikeperch. Here, we report the first highly contiguous draft genome assembly of Sander lucioperca. In total, 413 and 66 giga base pairs of DNA sequencing raw data were generated with the Illumina platform and PacBio Sequel System, respectively. The PacBio data were assembled into a final assembly size of ~900 Mb covering 89% of the 1,014 Mb estimated genome size. The draft genome consisted of 1966 contigs ordered into 1,313 scaffolds. The contig and scaffold N50 lengths are 3.0 Mb and 4.9 Mb, respectively. The identified repetitive structures accounted for 39% of the genome. We utilized homologies to other ray-finned fishes, and ab initio gene prediction methods to predict 21,249 protein-coding genes in the Sander lucioperca genome, of which 88% were functionally annotated by either sequence homology or protein domains and signatures search. The assembled genome spans 97.6% and 96.3% of Vertebrate and Actinopterygii single-copy orthologs, respectively. The outstanding mapping rate (99.9%) of genomic PE-reads on the assembly suggests an accurate and nearly complete genome reconstruction. This draft genome sequence is the first genomic resource for this promising aquaculture species. It will provide an impetus for genomic-based breeding studies targeting phenotypic and performance traits of captive pikeperch.


Assuntos
Genoma , Percas/genética , Animais , Proteínas de Peixes/genética , Anotação de Sequência Molecular , Percas/classificação , Filogenia , Sequenciamento Completo do Genoma
17.
Front Genet ; 9: 185, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29896217

RESUMO

Quantifying the population stratification in genotype samples has become a standard procedure for data manipulation before conducting genome wide association studies, as well as for tracing patterns of migration in humans and animals, and for inference about extinct founder populations. The most widely used approach capable of providing biologically interpretable results is a likelihood formulation which allows for estimation of founder genome proportions and founder allele frequency conditional on the observed genotypes. However, if founder allele frequencies are known and samples are dominated by admixed genotypes this approach may lead to biased inference. In addition, processing time increases drastically with the number of genetic markers. This article describes a simplified approach for obtaining biologically meaningful measures of population stratification at the genotype level conditional on known founder allele frequencies. It was tested on cattle and human data sets with 4,022 and 150,000 genetic markers, respectively, and proved to be very accurate in situations where founder poplations were correctly specified, or under-, over-, and miss-specified. Moreover, processing time was only marginally affected by an increase in the number of markers.

18.
Front Genet ; 9: 186, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29922330

RESUMO

A livestock population can be characterized by different population genetic parameters, such as linkage disequilibrium and recombination rate between pairs of genetic markers. The population structure, which may be caused by family stratification, has an influence on the estimates of these parameters. An expectation maximization algorithm has been proposed for estimating these parameters in half-sibs without phasing the progeny. It, however, overlooks the fact that the underlying likelihood function may have two maxima. The magnitudes of the maxima depend on the maternal allele frequencies at the investigated marker pair. Which maximum the algorithm converges to depends on the chosen start values. We present a stepwise procedure in which the relationship between the two modes is exploited. The expectation maximization algorithm for the parameter estimation is applied twice using different start values, followed by a decision process to assess the most likely estimate. This approach was validated using simulated genotypes of half-sibs. It was also applied to a dairy cattle dataset consisting of multiple half-sib families and 39,780 marker genotypes, leading to estimates for 12,759,713 intrachromosomal marker pairs. Furthermore, the proper order of markers was verified by studying the mean of estimated recombination rates in a window adjacent to the investigated locus as well as in a window at its most distant chromosome end. Putatively misplaced markers or marker clusters were detected by comparing the results with the revised bovine genome assembly UMD 3.1.1. In total, 40 markers were identified as candidates of misplacement. This outcome may help improving the physical order of markers which is also required for refining the bovine genetic map.

19.
G3 (Bethesda) ; 6(9): 2761-72, 2016 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-27402363

RESUMO

In livestock, current statistical approaches utilize extensive molecular data, e.g., single nucleotide polymorphisms (SNPs), to improve the genetic evaluation of individuals. The number of model parameters increases with the number of SNPs, so the multicollinearity between covariates can affect the results obtained using whole genome regression methods. In this study, dependencies between SNPs due to linkage and linkage disequilibrium among the chromosome segments were explicitly considered in methods used to estimate the effects of SNPs. The population structure affects the extent of such dependencies, so the covariance among SNP genotypes was derived for half-sib families, which are typical in livestock populations. Conditional on the SNP haplotypes of the common parent (sire), the theoretical covariance was determined using the haplotype frequencies of the population from which the individual parent (dam) was derived. The resulting covariance matrix was included in a statistical model for a trait of interest, and this covariance matrix was then used to specify prior assumptions for SNP effects in a Bayesian framework. The approach was applied to one family in simulated scenarios (few and many quantitative trait loci) and using semireal data obtained from dairy cattle to identify genome segments that affect performance traits, as well as to investigate the impact on predictive ability. Compared with a method that does not explicitly consider any of the relationship among predictor variables, the accuracy of genetic value prediction was improved by 10-22%. The results show that the inclusion of dependence is particularly important for genomic inference based on small sample sizes.


Assuntos
Ligação Genética , Genoma/genética , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Animais , Teorema de Bayes , Bovinos , Genética Populacional , Genômica , Genótipo , Haplótipos , Desequilíbrio de Ligação , Modelos Genéticos , Linhagem , Irmãos
20.
PLoS One ; 8(8): e70256, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23990900

RESUMO

In this study the benefit of metabolome level analysis for the prediction of genetic value of three traditional milk traits was investigated. Our proposed approach consists of three steps: First, milk metabolite profiles are used to predict three traditional milk traits of 1,305 Holstein cows. Two regression methods, both enabling variable selection, are applied to identify important milk metabolites in this step. Second, the prediction of these important milk metabolite from single nucleotide polymorphisms (SNPs) enables the detection of SNPs with significant genetic effects. Finally, these SNPs are used to predict milk traits. The observed precision of predicted genetic values was compared to the results observed for the classical genotype-phenotype prediction using all SNPs or a reduced SNP subset (reduced classical approach). To enable a comparison between SNP subsets, a special invariable evaluation design was implemented. SNPs close to or within known quantitative trait loci (QTL) were determined. This enabled us to determine if detected important SNP subsets were enriched in these regions. The results show that our approach can lead to genetic value prediction, but requires less than 1% of the total amount of (40,317) SNPs., significantly more important SNPs in known QTL regions were detected using our approach compared to the reduced classical approach. Concluding, our approach allows a deeper insight into the associations between the different levels of the genotype-phenotype map (genotype-metabolome, metabolome-phenotype, genotype-phenotype).


Assuntos
Bovinos/genética , Leite/química , Polimorfismo de Nucleotídeo Único , Animais , Indústria de Laticínios , Feminino , Estudos de Associação Genética , Genótipo , Metaboloma , Metabolômica , Locos de Características Quantitativas , Análise de Regressão
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa