Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
J Dairy Sci ; 107(1): 398-411, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37641298

RESUMO

This study aimed at evaluating the quality of imputation accuracy (IA) by marker (IAm) and by individual (IAi) in US crossbred dairy cattle. Holstein × Jersey crossbreds were used to evaluate IA from a low- (7K) to a medium-density (50K) SNP chip. Crossbred animals, as well as their sires (53), dams (77), and maternal grandsires (63), were all genotyped with a 78K SNP chip. Seven different scenarios of reference populations were tested, in which some scenarios used different family relationships and others added random unrelated purebred and crossbred individuals to those different family relationship scenarios. The same scenarios were tested on Holstein and Jersey purebred animals to compare these outcomes against those attained in crossbred animals. The genotype imputation was performed with findhap (version 4) software (VanRaden, 2015). There were no significant differences in IA results depending on whether the sire of imputed individuals was Holstein and the dam was Jersey, or vice versa. The IA increased significantly with the addition of related individuals in the reference population, from 86.70 ± 0.06% when only sires or dams were included in the reference population to 90.09 ± 0.06% when sire (S), dam (D), and maternal grandsire genomic data were combined in the reference population. In all scenarios including related individuals in the reference population, IAm and IAi were significantly superior in purebred Jersey and Holstein animals than in crossbreds, ranging from 90.75 ± 0.06 to 94.02 ± 0.06%, and from 90.88 ± 0.11 to 94.04 ± 0.10%, respectively. Additionally, a scenario called SPB+DLD(where PB indicates purebread and LD indicates low density), similar to the genomic evaluations performed on US crossbred dairy, was tested. In this scenario, the information from the 5 evaluated breeds (Ayrshire, Brown Swiss, Guernsey, Holstein, and Jersey) genotyped with a 50K SNP chip and genomic information from the dams genotyped with a 7K SNP chip were combined in the reference population, and the IAm and IAi were 80.87 ± 0.06% and 80.85 ± 0.08%, respectively. Adding randomly nonrelated genotyped individuals in the reference population reduced IA for both purebred and crossbred cows, except for scenario SPB+DLD, where adding crossbreds to the reference population increased IA values. Our findings demonstrate that IA for US Holstein × Jersey crossbred ranged from 85 to 90%, and emphasize the significance of designing and defining the reference population for improved IA.


Assuntos
Genoma , Polimorfismo de Nucleotídeo Único , Humanos , Feminino , Bovinos/genética , Animais , Genótipo , Genômica/métodos , Hibridização Genética
2.
BMC Genomics ; 24(1): 271, 2023 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-37208589

RESUMO

BACKGROUND: To reduce the cost of genomic selection, a low-density (LD) single nucleotide polymorphism (SNP) chip can be used in combination with imputation for genotyping selection candidates instead of using a high-density (HD) SNP chip. Next-generation sequencing (NGS) techniques have been increasingly used in livestock species but remain expensive for routine use for genomic selection. An alternative and cost-efficient solution is to use restriction site-associated DNA sequencing (RADseq) techniques to sequence only a fraction of the genome using restriction enzymes. From this perspective, use of RADseq techniques followed by an imputation step on HD chip as alternatives to LD chips for genomic selection was studied in a pure layer line. RESULTS: Genome reduction and sequencing fragments were identified on reference genome using four restriction enzymes (EcoRI, TaqI, AvaII and PstI) and a double-digest RADseq (ddRADseq) method (TaqI-PstI). The SNPs contained in these fragments were detected from the 20X sequence data of the individuals in our population. Imputation accuracy on HD chip with these genotypes was assessed as the mean correlation between true and imputed genotypes. Several production traits were evaluated using single-step GBLUP methodology. The impact of imputation errors on the ranking of the selection candidates was assessed by comparing a genomic evaluation based on ancestry using true HD or imputed HD genotyping. The relative accuracy of genomic estimated breeding values (GEBVs) was investigated by considering the GEBVs estimated on offspring as a reference. With AvaII or PstI and ddRADseq with TaqI and PstI, more than 10 K SNPs were detected in common with the HD SNP chip, resulting in an imputation accuracy greater than 0.97. The impact of imputation errors on genomic evaluation of the breeders was reduced, with a Spearman correlation greater than 0.99. Finally, the relative accuracy of GEBVs was equivalent. CONCLUSIONS: RADseq approaches can be interesting alternatives to low-density SNP chips for genomic selection. With more than 10 K SNPs in common with the SNPs of the HD SNP chip, good imputation and genomic evaluation results can be obtained. However, with real data, heterogeneity between individuals with missing data must be considered.


Assuntos
Galinhas , Polimorfismo de Nucleotídeo Único , Animais , Galinhas/genética , Genoma , Genômica/métodos , Genótipo , Análise de Sequência de DNA
3.
Entropy (Basel) ; 24(3)2022 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-35327897

RESUMO

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the use of modern Machine-Learning algorithms for imputation. This originates from their capability of showing favorable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine-Learning-based methods for both imputation and prediction are used. We see that even a slight decrease in imputation accuracy can seriously affect the prediction accuracy. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as the coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.

4.
Plant J ; 102(4): 872-882, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31856318

RESUMO

Natural variation has become a prime resource to identify genetic variants that contribute to phenotypic variation. The regional mapping (RegMap) population is one of the most important populations for studying natural variation in Arabidopsis thaliana, and has been used in a large number of association studies and in studies on climatic adaptation. However, only 413 RegMap accessions have been completely sequenced, as part of the 1001 Genomes (1001G) Project, while the remaining 894 accessions have only been genotyped with the Affymetrix 250k chip. As a consequence, most association studies involving the RegMap are either restricted to the sequenced accessions, reducing power, or rely on a limited set of SNPs. Here we impute millions of SNPs to the 894 accessions that are exclusive to the RegMap, using the 1135 accessions of the 1001G Project as the reference panel. We assess imputation accuracy using a novel cross-validation scheme, which we show provides a more reliable measure of accuracy than existing methods. After filtering out low accuracy SNPs, we obtain high-quality genotypic information for 2029 accessions and 3 million markers. To illustrate the benefits of these imputed data, we reconducted genome-wide association studies on five stress-related traits and could identify novel candidate genes.


Assuntos
Arabidopsis/genética , Genoma de Planta/genética , Polimorfismo de Nucleotídeo Único/genética , Arabidopsis/fisiologia , Estudo de Associação Genômica Ampla , Genótipo , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Estresse Fisiológico
5.
Genome ; 64(10): 893-899, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34057850

RESUMO

The aim of this study was to evaluate the accuracy of imputation in a Gyr population using two medium-density panels (Bos taurus - Bos indicus) and to test whether the inclusion of the Nellore breed increases the imputation accuracy in the Gyr population. The database consisted of 289 Gyr females from Brazil genotyped with the GGP Bovine LDv4 chip containing 30 000 SNPs and 158 Gyr females from Colombia genotyped with the GGP indicus chip containing 35 000 SNPs. A customized chip was created that contained the information of 9109 SNPs (9K) to test the imputation accuracy in Gyr populations; 604 Nellore animals with information of LD SNPs tested in the scenarios were included in the reference population. Four scenarios were tested: LD9K_30KGIR, LD9K_35INDGIR, LD9K_30KGIR_NEL, and LD9K_35INDGIR_NEL. Principal component analysis (PCA) was computed for the genomic matrix and sample-specific imputation accuracies were calculated using Pearson's correlation (CS) and the concordance rate (CR) for imputed genotypes. The results of PCA of the Colombian and Brazilian Gyr populations demonstrated the genomic relationship between the two populations. The CS and CR ranged from 0.88 to 0.94 and from 0.93 to 0.96, respectively. Among the scenarios tested, the highest CS (0.94) was observed for the LD9K_30KGIR scenario. The present results highlight the importance of the choice of chip for imputation in the Gyr breed. However, the variation in SNPs may reduce the imputation accuracy even when the chip of the Bos indicus subspecies is used.


Assuntos
Bovinos , Genômica , Polimorfismo de Nucleotídeo Único , Animais , Cruzamento , Bovinos/genética , Feminino , Genoma , Genótipo , Análise de Sequência com Séries de Oligonucleotídeos/veterinária
6.
Anim Genet ; 52(5): 703-713, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34252218

RESUMO

Genotype imputation using a reference panel that combines high-density array data and publicly available whole genome sequence consortium variant data is potentially a cost-effective method to increase the density of extant lower-density array datasets. In this study, three datasets (two Border Collie; one Italian Spinone) generated using a legacy array (Illumina CanineHD, 173 662 SNPs) were utilised to assess the feasibility and accuracy of this approach and to gather additional evidence for the efficacy of canine genotype imputation. The cosmopolitan reference panels used to impute genotypes comprised dogs of 158 breeds, mixed breed dogs, wolves and Chinese indigenous dogs, as well as breed-specific individuals genotyped using the Axiom Canine HD array. The two Border Collie reference panels comprised 808 individuals including 79 Border Collies and 426 326 or 426 332 SNPs; and the Italian Spinone reference panel comprised 807 individuals including 38 Italian Spinoni and 476 313 SNPs. A high accuracy for imputation was observed, with the lowest accuracy observed for one of the Border Collie datasets (mean R2  = 0.94) and the highest for the Italian Spinone dataset (mean R2  = 0.97). This study's findings demonstrate that imputation of a legacy array study set using a reference panel comprising both breed-specific array data and multi-breed variant data derived from whole genomes is effective and accurate. The process of canine genotype imputation, using the valuable growing resource of publicly available canine genome variant datasets alongside breed-specific data, is described in detail to facilitate and encourage use of this technique in canine genetics.


Assuntos
Cães/genética , Estudos de Associação Genética/veterinária , Genômica/métodos , Genótipo , Animais , Cruzamento , Polimorfismo de Nucleotídeo Único
7.
BMC Med Res Methodol ; 20(1): 199, 2020 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-32711455

RESUMO

BACKGROUND: Missing data are common in statistical analyses, and imputation methods based on random forests (RF) are becoming popular for handling missing data especially in biomedical research. Unlike standard imputation approaches, RF-based imputation methods do not assume normality or require specification of parametric models. However, it is still inconclusive how they perform for non-normally distributed data or when there are non-linear relationships or interactions. METHODS: To examine the effects of these three factors, a variety of datasets were simulated with outcome-dependent missing at random (MAR) covariates, and the performances of the RF-based imputation methods missForest and CALIBERrfimpute were evaluated in comparison with predictive mean matching (PMM). RESULTS: Both missForest and CALIBERrfimpute have high predictive accuracy but missForest can produce severely biased regression coefficient estimates and downward biased confidence interval coverages, especially for highly skewed variables in nonlinear models. CALIBERrfimpute typically outperforms missForest when estimating regression coefficients, although its biases are still substantial and can be worse than PMM for logistic regression relationships with interaction. CONCLUSIONS: RF-based imputation, in particular missForest, should not be indiscriminately recommended as a panacea for imputing missing data, especially when data are highly skewed and/or outcome-dependent MAR. A correct analysis requires a careful critique of the missing data mechanism and the inter-relationships between the variables in the data.


Assuntos
Pesquisa Biomédica , Projetos de Pesquisa , Viés , Humanos , Dinâmica não Linear
8.
BMC Genet ; 19(1): 108, 2018 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-30514201

RESUMO

BACKGROUND: The main goal of selection is to achieve genetic gain for a population by choosing the best breeders among a set of selection candidates. Since 2013, the use of a high density genotyping chip (600K Affymetrix® Axiom® HD genotyping array) for chicken has enabled the implementation of genomic selection in layer and broiler breeding, but the genotyping costs remain high for a routine use on a large number of selection candidates. It has thus been deemed interesting to develop a low density genotyping chip that would induce lower costs. In this perspective, various simulation studies have been conducted to find the best way to select a set of SNPs for low density genotyping of two laying hen lines. RESULTS: To design low density SNP chips, two methodologies, based on equidistance (EQ) or on linkage disequilibrium (LD) were compared. Imputation accuracy was assessed as the mean correlation between true and imputed genotypes. The results showed correlations more sensitive to false imputation of SNPs having low Minor Allele Frequency (MAF) when the EQ methodology was used. An increase in imputation accuracy was obtained when SNP density was increased, either through an increase in the number of selected windows on a chromosome or through the rise of the LD threshold. Moreover, the results varied depending on the type of chromosome (macro or micro-chromosome). The LD methodology enabled to optimize the number of SNPs, by reducing the SNP density on macro-chromosomes and by increasing it on micro-chromosomes. Imputation accuracy also increased when the size of the reference population was increased. Conversely, imputation accuracy decreased when the degree of kinship between reference and candidate populations was reduced. Finally, adding selection candidates' dams in the reference population, in addition to their sire, enabled to get better imputation results. CONCLUSIONS: Whichever the SNP chip, the methodology, and the scenario studied, highly accurate imputations were obtained, with mean correlations higher than 0.83. The key point to achieve good imputation results is to take into account chicken lines' LD when designing a low density SNP chip, and to include the candidates' direct parents in the reference population.


Assuntos
Galinhas/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único , Animais , Galinhas/crescimento & desenvolvimento , Cromossomos , Frequência do Gene , Genótipo , Desequilíbrio de Ligação
9.
Anim Genet ; 49(4): 303-311, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29974966

RESUMO

The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low-density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome-wide association analysis and genomic prediction in Labrador Retrievers.


Assuntos
Genótipo , Polimorfismo de Nucleotídeo Único , Animais , Cruzamento , Cães , Feminino , Estudos de Associação Genética/veterinária , Genômica , Masculino , Análise de Sequência com Séries de Oligonucleotídeos/veterinária , Linhagem
10.
BMC Genomics ; 18(1): 798, 2017 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-29041903

RESUMO

BACKGROUND: Genotypes not directly measured in genetic studies are often imputed to improve statistical power and to increase mapping resolution. The accuracy of standard imputation techniques strongly depends on the similarity of linkage disequilibrium (LD) patterns in the study and reference populations. Here we develop a novel approach for genotype imputation in low-recombination regions that relies on the coalescent and permits to explicitly account for population demographic factors. To test the new method, study and reference haplotypes were simulated and gene trees were inferred under the basic coalescent and also considering population growth and structure. The reference haplotypes that first coalesced with study haplotypes were used as templates for genotype imputation. Computer simulations were complemented with the analysis of real data. Genotype concordance rates were used to compare the accuracies of coalescent-based and standard (IMPUTE2) imputation. RESULTS: Simulations revealed that, in LD-blocks, imputation accuracy relying on the basic coalescent was higher and less variable than with IMPUTE2. Explicit consideration of population growth and structure, even if present, did not practically improve accuracy. The advantage of coalescent-based over standard imputation increased with the minor allele frequency and it decreased with population stratification. Results based on real data indicated that, even in low-recombination regions, further research is needed to incorporate recombination in coalescence inference, in particular for studies with genetically diverse and admixed individuals. CONCLUSIONS: To exploit the full potential of coalescent-based methods for the imputation of missing genotypes in genetic studies, further methodological research is needed to reduce computer time, to take into account recombination, and to implement these methods in user-friendly computer programs. Here we provide reproducible code which takes advantage of publicly available software to facilitate further developments in the field.


Assuntos
Genótipo , Desequilíbrio de Ligação/genética , Crescimento Demográfico , Genômica , Haplótipos , Humanos , Modelos Genéticos
11.
J Theor Biol ; 399: 148-58, 2016 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-27049046

RESUMO

Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation.


Assuntos
Técnicas de Genotipagem , Aprendizado de Máquina , Pais , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Feminino , Frequência do Gene/genética , Genótipo , Humanos , Masculino , Polimorfismo de Nucleotídeo Único/genética
12.
Asian-Australas J Anim Sci ; 29(4): 464-70, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26949946

RESUMO

The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.

13.
bioRxiv ; 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38915501

RESUMO

Existing genotype imputation reference panels are mainly derived from European populations, limiting their accuracy in non-European populations. To improve imputation accuracy for Indonesians, the world's fourth most populous country, we combined Whole Genome Sequencing (WGS) data from 227 West Javanese individuals with East Asian data from the 1000 Genomes Project. This created three reference panels: EAS 1KGP3 (EASp), Indonesian (INDp), and a combined panel (EASp+INDp). We also used ten West-Javanese samples with WGS and SNP-typing data for benchmarking. We identified 1.8 million novel single nucleotide variants (SNVs) in the West Javanese population, which, while similar to the East Asians, are distinct from the Central Indonesian Flores population. Adding INDp to the EASp reference panel improved imputation accuracy (R2) from 0.85 to 0.90, and concordance from 87.88% to 91.13%. These findings underscore the importance of including Indonesian genetic data in reference panels, advocating for broader WGS of diverse Indonesian populations to enhance genomic studies.

14.
Front Genet ; 15: 1381333, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38706794

RESUMO

Sea louse (Lepeophtheirus salmonis) infestation of Atlantic salmon (Salmo salar) is a significant challenge in aquaculture. Over the years, this parasite has developed immunity to medicinal control compounds, and non-medicinal control methods have been proven to be stressful, hence the need to study the genomic architecture of salmon resistance to sea lice. Thus, this research used whole-genome sequence (WGS) data to study the genetic basis of the trait since most research using fewer SNPs did not identify significant quantitative trait loci. Mowi Genetics AS provided the genotype (50 k SNPs) and phenotype data for this research after conducting a sea lice challenge test on 3,185 salmon smolts belonging to 191 full-sib families. The 50 k SNP genotype was imputed to WGS using the information from 197 closely related individuals with sequence data. The WGS and 50 k SNPs of the challenged population were then used to estimate genetic parameters, perform a genome-wide association study (GWAS), predict genomic breeding values, and estimate its accuracy for host resistance to sea lice. The heritability of host resistance to sea lice was estimated to be 0.21 and 0.22, while the accuracy of genomic prediction was estimated to be 0.65 and 0.64 for array and WGS data, respectively. In addition, the association test using both array and WGS data did not identify any marker associated with sea lice resistance at the genome-wide level. We conclude that sea lice resistance is a polygenic trait that is moderately heritable. The genomic predictions using medium-density SNP genotyping array were equally good or better than those based on WGS data.

15.
Animal ; 18(3): 101087, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38364656

RESUMO

Genotype imputation is a standard approach used in the field of genetics. It can be used to fill in missing genotypes or to increase genotype density. Accurate imputed genotypes are required for downstream analyses. In this study, the accuracy of whole-genome sequence imputation for Angus beef cattle was examined using two different ways to form the reference panel, a within-breed reference population and a multi breed reference population. A stepwise imputation was conducted by imputing medium-density (50k) genotypes to high-density, and then to the whole genome sequence (WGS). The reference population consisted of animals with WGS information from the 1 000 Bull Genomes project. The within-breed reference panel comprised 396 Angus cattle, while an additional 2 380 Taurine cattle were added to the reference population for the multi breed reference scenario. Imputation accuracies were variant-wise average accuracies from a 10-fold cross-validation and expressed as concordance rates (CR) and Pearson's correlations (PR). The two imputation scenarios achieved moderate to high imputation accuracies ranging from 0.896 to 0.966 for CR and from 0.779 to 0.834 for PR. The accuracies from two different scenarios were similar, except for PR from WGS imputation, where the within-breed scenario outperformed the multi breed scenario. The result indicated that including a large number of animals from other breeds in the reference panel to impute purebred Angus did not improve the accuracy and may negatively impact the results. In conclusion, the imputed WGS in Angus cattle can be obtained with high accuracy using a within-breed reference panel.


Assuntos
Genoma , Polimorfismo de Nucleotídeo Único , Bovinos/genética , Animais , Masculino , Genótipo
16.
Cell Genom ; 3(6): 100332, 2023 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-37388906

RESUMO

Based on evaluations of imputation performed on a genotype dataset consisting of about 11,000 sub-Saharan African (SSA) participants, we show Trans-Omics for Precision Medicine (TOPMed) and the African Genome Resource (AGR) to be currently the best panels for imputing SSA datasets. We report notable differences in the number of single-nucleotide polymorphisms (SNPs) that are imputed by different panels in datasets from East, West, and South Africa. Comparisons with a subset of 95 SSA high-coverage whole-genome sequences (WGSs) show that despite being about 20-fold smaller, the AGR imputed dataset has higher concordance with the WGSs. Moreover, the level of concordance between imputed and WGS datasets was strongly influenced by the extent of Khoe-San ancestry in a genome, highlighting the need for integration of not only geographically but also ancestrally diverse WGS data in reference panels for further improvement in imputation of SSA datasets. Approaches that integrate imputed data from different panels could also lead to better imputation.

17.
J Anim Sci ; 100(5)2022 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-35451025

RESUMO

This study investigated using imputed genotypes from non-genotyped animals which were not in the pedigree for the purpose of genetic selection and improving genetic gain for economically relevant traits. Simulations were used to mimic a 3-breed crossbreeding system that resembled a modern swine breeding scheme. The simulation consisted of three purebred (PB) breeds A, B, and C each with 25 and 425 mating males and females, respectively. Males from A and females from B were crossed to produce AB females (n = 1,000), which were crossed with males from C to produce crossbreds (CB; n = 10,000). The genome consisted of three chromosomes with 300 quantitative trait loci and ~9,000 markers. Lowly heritable reproductive traits were simulated for A, B, and AB (h2 = 0.2, 0.2, and 0.15, respectively), whereas a moderately heritable carcass trait was simulated for C (h2 = 0.4). Genetic correlations between reproductive traits in A, B, and AB were moderate (rg = 0.65). The goal trait of the breeding program was AB performance. Selection was practiced for four generations where AB and CB animals were first produced in generations 1 and 2, respectively. Non-genotyped AB dams were imputed using FImpute beginning in generation 2. Genotypes of PB and CB were used for imputation. Imputation strategies differed by three factors: 1) AB progeny genotyped per generation (2, 3, 4, or 6), 2) known or unknown mates of AB dams, and 3) genotyping rate of females from breeds A and B (0% or 100%). PB selection candidates from A and B were selected using estimated breeding values for AB performance, whereas candidates from C were selected by phenotype. Response to selection using imputed genotypes of non-genotyped animals was then compared to the scenarios where true AB genotypes (trueGeno) or no AB genotypes/phenotypes (noGeno) were used in genetic evaluations. The simulation was replicated 20 times. The average increase in genotype concordance between unknown and known sire imputation strategies was 0.22. Genotype concordance increased as the number of genotyped CB increased with little additional gain beyond 9 progeny. When mates of AB were known and more than 4 progeny were genotyped per generation, the phenotypic response in AB did not differ (P > 0.05) from trueGeno yet was greater (P < 0.05) than noGeno. Imputed genotypes of non-genotyped animals can be used to increase performance when 4 or more progeny are genotyped and sire pedigrees of CB animals are known.


In swine breeding, phenotypic information is often gathered from elite purebred (PB) breeding stock and occasionally terminal crossbred animals (CB). Using economically relevant traits expressed by dams of CB (F1) in genetic evaluations is not common due to the lack of pedigree and/or genomic relationships to relate phenotypes of F1 to PB selection candidates. Since swine often have large litters, this study aimed to develop strategies to incorporate phenotypes of F1 into genetic evaluations by imputing F1 genotypes. Using simulation, we investigated the impact of CB pedigree completeness, the number of CB genotyped progeny, the number of parities (and thus mates) a F1 had, and genomic diversity in PB breeds on imputation accuracy and the response to selection in F1 performance. When mates of F1 were in the pedigree and 4 or more CB progeny were genotyped per generation, imputation accuracy was high and the phenotypic response in F1 did not differ compared to when true F1 genotypes were used. Our results show that imputed genotypes can be used to increase performance in swine breeding programs, but the magnitude depends upon the number of CB progeny genotyped, the number of F1 mates, and the completeness of the pedigree.


Assuntos
Hibridização Genética , Locos de Características Quantitativas , Animais , Feminino , Genótipo , Masculino , Modelos Genéticos , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único , Suínos/genética
18.
Animals (Basel) ; 12(17)2022 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-36077985

RESUMO

This study evaluated the accuracy of sequence imputation in Hanwoo beef cattle using different reference panels: a large multi-breed reference with no Hanwoo (n = 6269), a much smaller Hanwoo purebred reference (n = 88), and both datasets combined (n = 6357). The target animals were 136 cattle both sequenced and genotyped with the Illumina BovineSNP50 v2 (50K). The average imputation accuracy measured by the Pearson correlation (R) was 0.695 with the multi-breed reference, 0.876 with the purebred Hanwoo, and 0.887 with the combined data; the average concordance rates (CR) were 88.16%, 94.49%, and 94.84%, respectively. The accuracy gains from adding a large multi-breed reference of 6269 samples to only 88 Hanwoo was marginal; however, the concordance rate for the heterozygotes decreased from 85% to 82%, and the concordance rate for fixed SNPs in Hanwoo also decreased from 99.98% to 98.73%. Although the multi-breed panel was large, it was not sufficiently representative of the breed for accurate imputation without the Hanwoo animals. Additionally, we evaluated the value of high-density 700K genotypes (n = 991) as an intermediary step in the imputation process. The imputation accuracy differences were negligible between a single-step imputation strategy from 50K directly to sequence and a two-step imputation approach (50K-700K-sequence). We also observed that imputed sequence data can be used as a reference panel for imputation (mean R = 0.9650, mean CR = 98.35%). Finally, we identified 31 poorly imputed genomic regions in the Hanwoo genome and demonstrated that imputation accuracies were particularly lower at the chromosomal ends.

19.
Methods Mol Biol ; 2467: 113-138, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35451774

RESUMO

Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Animais , Genoma , Genótipo , Desequilíbrio de Ligação
20.
J Anim Sci ; 99(7)2021 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-33860324

RESUMO

A major obstacle in applying genomic selection (GS) to uniquely adapted local breeds in less-developed countries has been the cost of genotyping at high densities of single-nucleotide polymorphisms (SNP). Cost reduction can be achieved by imputing genotypes from lower to higher densities. Locally adapted breeds tend to be admixed and exhibit a high degree of genomic heterogeneity thus necessitating the optimization of SNP selection for downstream imputation. The aim of this study was to quantify the achievable imputation accuracy for a sample of 1,135 South African (SA) Drakensberger cattle using several custom-derived lower-density panels varying in both SNP density and how the SNP were selected. From a pool of 120,608 genotyped SNP, subsets of SNP were chosen (1) at random, (2) with even genomic dispersion, (3) by maximizing the mean minor allele frequency (MAF), (4) using a combined score of MAF and linkage disequilibrium (LD), (5) using a partitioning-around-medoids (PAM) algorithm, and finally (6) using a hierarchical LD-based clustering algorithm. Imputation accuracy to higher density improved as SNP density increased; animal-wise imputation accuracy defined as the within-animal correlation between the imputed and actual alleles ranged from 0.625 to 0.990 when 2,500 randomly selected SNP were chosen vs. a range of 0.918 to 0.999 when 50,000 randomly selected SNP were used. At a panel density of 10,000 SNP, the mean (standard deviation) animal-wise allele concordance rate was 0.976 (0.018) vs. 0.982 (0.014) when the worst (i.e., random) as opposed to the best (i.e., combination of MAF and LD) SNP selection strategy was employed. A difference of 0.071 units was observed between the mean correlation-based accuracy of imputed SNP categorized as low (0.01 < MAF ≤ 0.1) vs. high MAF (0.4 < MAF ≤ 0.5). Greater mean imputation accuracy was achieved for SNP located on autosomal extremes when these regions were populated with more SNP. The presented results suggested that genotype imputation can be a practical cost-saving strategy for indigenous breeds such as the SA Drakensberger. Based on the results, a genotyping panel consisting of ~10,000 SNP selected based on a combination of MAF and LD would suffice in achieving a <3% imputation error rate for a breed characterized by genomic admixture on the condition that these SNP are selected based on breed-specific selection criteria.


Assuntos
Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Frequência do Gene , Genótipo , Desequilíbrio de Ligação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA