RESUMO
Direct measurements of methane (CH4) from individual animals are difficult and expensive. Predictions based on proxies for CH4 are a viable alternative. Most prediction models are based on multiple linear regressions (MLR) and predictor variables that are not routinely available in commercial farms, such as dry matter intake (DMI) and diet composition. The use of machine learning (ML) algorithms to predict CH4 emissions from across-country heterogeneous data sets has not been reported. The objectives were to compare performances of ML ensemble algorithm random forest (RF) and MLR models in predicting CH4 emissions from proxies in dairy cows, and assess effects of imputing missing data points on prediction accuracy. Data on CH4 emissions and proxies for CH4 from 20 herds were provided by 10 countries. The integrated data set contained 43,519 records from 3,483 cows, with 18.7% missing data points imputed using k-nearest neighbor imputation. Three data sets were created, 3k (no missing records), 21k (missing DMI imputed from milk, fat, protein, body weight), and 41k (missing DMI, milk fat, and protein records imputed). These data sets were used to test scenarios (with or without DMI, imputed vs. nonimputed DMI, milk fat, and protein), and prediction models (RF vs. MLR). Model predictive ability was evaluated within and between herds through 10-fold cross-validation. Prediction accuracy was measured as correlation between observed and predicted CH4, root mean squared error (RMSE) and mean normalized discounted cumulative gain (NDCG). Inclusion of DMI in the model improved within and between-herd prediction accuracy to 0.77 (RMSE = 23.3%) and 0.58 (RMSE = 31.9%) in RF and to 0.50 (RMSE = 0.327) and 0.13 (RMSE = 42.71) in MLR, respectively than when DMI was not included in the predictive model. When missing DMI records were imputed, within and between-herd accuracy increased to 0.84 (RMSE = 18.5%) and 0.63 (RMSE = 29.9%), respectively. In all scenarios, RF models out-performed MLR models. Results suggest routinely measured variables from dairy farms can be used in developing globally robust prediction models for CH4 if coupled with state-of-the-art techniques for imputation and advanced ML algorithms for predictive modeling.
Assuntos
Lactação , Metano , Animais , Bovinos , Dieta/veterinária , Feminino , Intestino Delgado/metabolismo , Metano/metabolismo , Leite/químicaRESUMO
Dairy bull fertility is traditionally evaluated using semen production and quality traits; however, these attributes explain only part of the differences observed in fertility among bulls. Alternatively, bull fertility can be directly evaluated using cow field data. The main objective of this study was to investigate bull fertility in the Italian Brown Swiss dairy cattle population using confirmed pregnancy records. The data set included a total of 397,926 breeding records from 1,228 bulls and 129,858 lactating cows between first and fifth lactation from 2000 to 2019. We first evaluated cow pregnancy success, including factors related to the bull under evaluation, such as bull age, bull inbreeding, and AI organization, and factors associated with the cow that receives the dose of semen, including herd-year-season, cow age, parity, and milk yield. We then estimated sire conception rate using only factors related to the bull. Model predictive ability was evaluated using 10-fold cross-validation with 10 replicates. Interestingly, our analyses revealed that there is a substantial variation in conception rate among Brown Swiss bulls, with more than 20% conception rate difference between high-fertility and low-fertility bulls. We also showed that the prediction of bull fertility is feasible as our cross-validation analyses achieved predictive correlations equal to 0.30 for sire conception rate. Improving reproduction performance is one of the major challenges of the dairy industry worldwide, and for this, it is essential to have accurate predictions of service sire fertility. This study represents the foundation for the development of novel tools that will allow dairy producers, breeders, and artificial insemination companies to make enhanced management and selection decisions on Brown Swiss male fertility.
Assuntos
Fertilidade , Lactação , Animais , Bovinos , Indústria de Laticínios , Feminino , Inseminação Artificial/veterinária , Itália , Masculino , GravidezRESUMO
In this work, we performed simulations to develop and test a strategy for exploiting surrogate sire technology in animal breeding programs. Surrogate sire technology allows the creation of males that lack their own germline cells, but have transplanted spermatogonial stem cells from donor males. With this technology, a single elite male donor could give rise to huge numbers of progeny, potentially as much as all the production animals in a particular time period. One hundred replicates of various scenarios were performed. Scenarios followed a common overall structure but differed in the strategy used to identify elite donors and how these donors were used in the product development part. The results of this study showed that using surrogate sire technology would significantly increase the genetic merit of commercial sires, by as much as 6.5 to 9.2 years' worth of genetic gain compared to a conventional breeding program. The simulations suggested that a strategy involving three stages (an initial genomic test followed by two subsequent progeny tests) was the most effective of all the strategies tested. The use of one or a handful of elite donors to generate the production animals would be very different to current practice. While the results demonstrate the great potential of surrogate sire technology there are considerable risks but also other opportunities. Practical implementation of surrogate sire technology would need to account for these.
Assuntos
Células-Tronco Germinativas Adultas , Animais Domésticos/genética , Gado/genética , Seleção Genética , Animais , Animais Domésticos/crescimento & desenvolvimento , Cruzamento , Feminino , Genoma/genética , Lactação/genética , Gado/crescimento & desenvolvimento , MasculinoRESUMO
BACKGROUND: Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing. RESULTS: We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points. CONCLUSIONS: We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data.
Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Alelos , Animais , Viés , Frequência do Gene/genética , Variação Genética/genética , Genótipo , Haplótipos , Polimorfismo de Nucleotídeo Único/genética , Projetos de Pesquisa/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Suínos/genéticaRESUMO
BACKGROUND: This study uses simulation to explore and quantify the potential effect of shifting recombination hotspots on genetic gain in livestock breeding programs. METHODS: We simulated three scenarios that differed in the locations of quantitative trait nucleotides (QTN) and recombination hotspots in the genome. In scenario 1, QTN were randomly distributed along the chromosomes and recombination was restricted to occur within specific genomic regions (i.e. recombination hotspots). In the other two scenarios, both QTN and recombination hotspots were located in specific regions, but differed in whether the QTN occurred outside of (scenario 2) or inside (scenario 3) recombination hotspots. We split each chromosome into 250, 500 or 1000 regions per chromosome of which 10% were recombination hotspots and/or contained QTN. The breeding program was run for 21 generations of selection, after which recombination hotspot regions were kept the same or were shifted to adjacent regions for a further 80 generations of selection. We evaluated the effect of shifting recombination hotspots on genetic gain, genetic variance and genic variance. RESULTS: Our results show that shifting recombination hotspots reduced the decline of genetic and genic variance by releasing standing allelic variation in the form of new allele combinations. This in turn resulted in larger increases in genetic gain. However, the benefit of shifting recombination hotspots for increased genetic gain was only observed when QTN were initially outside recombination hotspots. If QTN were initially inside recombination hotspots then shifting them decreased genetic gain. DISCUSSION: Shifting recombination hotspots to regions of the genome where recombination had not occurred for 21 generations of selection (i.e. recombination deserts) released more of the standing allelic variation available in each generation and thus increased genetic gain. However, whether and how much increase in genetic gain was achieved by shifting recombination hotspots depended on the distribution of QTN in the genome, the number of recombination hotspots and whether QTN were initially inside or outside recombination hotspots. CONCLUSIONS: Our findings show future scope for targeted modification of recombination hotspots e.g. through changes in zinc-finger motifs of the PRDM9 protein to increase genetic gain in production species.
Assuntos
Cruzamento , Variação Genética/genética , Gado/genética , Locos de Características Quantitativas/genética , Recombinação Genética/genética , Alelos , Animais , Simulação por ComputadorRESUMO
BACKGROUND: This paper describes a method, called AlphaSeqOpt, for the allocation of sequencing resources in livestock populations with existing phased genomic data to maximise the ability to phase and impute sequenced haplotypes into the whole population. METHODS: We present two algorithms. The first selects focal individuals that collectively represent the maximum possible portion of the haplotype diversity in the population. The second allocates a fixed sequencing budget among the families of focal individuals to enable phasing of their haplotypes at the sequence level. We tested the performance of the two algorithms in simulated pedigrees. For each pedigree, we evaluated the proportion of population haplotypes that are carried by the focal individuals and compared our results to a variant of the widely-used key ancestors approach and to two haplotype-based approaches. We calculated the expected phasing accuracy of the haplotypes of a focal individual at the sequence level given the proportion of the fixed sequencing budget allocated to its family. RESULTS: AlphaSeqOpt maximises the ability to capture and phase the most frequent haplotypes in a population in three ways. First, it selects focal individuals that collectively represent a larger portion of the population haplotype diversity than existing methods. Second, it selects focal individuals from across the pedigree whose haplotypes can be easily phased using family-based phasing and imputation algorithms, thus maximises the ability to impute sequence into the rest of the population. Third, it allocates more of the fixed sequencing budget to focal individuals whose haplotypes are more frequent in the population than to focal individuals whose haplotypes are less frequent. Unlike existing methods, we additionally present an algorithm to allocate part of the sequencing budget to the families (i.e. immediate ancestors) of focal individuals to ensure that their haplotypes can be phased at the sequence level, which is essential for enabling and maximising subsequent sequence imputation. CONCLUSIONS: We present a new method for the allocation of a fixed sequencing budget to focal individuals and their families such that the final sequenced haplotypes, when phased at the sequence level, represent the maximum possible portion of the haplotype diversity in the population that can be sequenced and phased at that budget.
Assuntos
Algoritmos , Técnicas de Genotipagem/veterinária , Haplótipos , Gado/genética , Seleção Artificial , Análise de Sequência de DNA/métodos , Animais , Técnicas de Genotipagem/métodos , Técnicas de Genotipagem/normas , Modelos Genéticos , Linhagem , Polimorfismo GenéticoRESUMO
This paper describes AlphaSim, a software package for simulating plant and animal breeding programs. AlphaSim enables the simulation of multiple aspects of breeding programs with a high degree of flexibility. AlphaSim simulates breeding programs in a series of steps: (i) simulate haplotype sequences and pedigree; (ii) drop haplotypes into the base generation of the pedigree and select single-nucleotide polymorphism (SNP) and quantitative trait nucleotide (QTN); (iii) assign QTN effects, calculate genetic values, and simulate phenotypes; (iv) drop haplotypes into the burn-in generations; and (v) perform selection and simulate new generations. The program is flexible in terms of historical population structure and diversity, recent pedigree structure, trait architecture, and selection strategy. It integrates biotechnologies such as doubled-haploids (DHs) and gene editing and allows the user to simulate multiple traits and multiple environments, specify recombination hot spots and cold spots, specify gene jungles and deserts, perform genomic predictions, and apply optimal contribution selection. AlphaSim also includes restart functionalities, which increase its flexibility by allowing the simulation process to be paused so that the parameters can be changed or to import an externally created pedigree, trial design, or results of an analysis of previously simulated data. By combining the options, a user can simulate simple or complex breeding programs with several generations, variable population structures and variable breeding decisions over time. In conclusion, AlphaSim is a flexible and computationally efficient software package to simulate biotechnology enhanced breeding programs with the aim of performing rapid, low-cost, and objective in silico comparison of breeding technologies.
Assuntos
Simulação por Computador , Melhoramento Vegetal , Software , Animais , Modelos Genéticos , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genéticaRESUMO
BACKGROUND: In this work, we performed simulations to explore the potential of manipulating recombination rates to increase response to selection in livestock breeding programs. METHODS: We carried out ten replicates of several scenarios that followed a common overall structure but differed in the average rate of recombination along the genome (expressed as the length of a chromosome in Morgan), the genetic architecture of the trait under selection, and the selection intensity under truncation selection (expressed as the proportion of males selected). Recombination rates were defined by simulating nine different chromosome lengths: 0.10, 0.25, 0.50, 1, 2, 5, 10, 15 and 20 Morgan, respectively. One Morgan was considered to be the typical chromosome length for current livestock species. The genetic architecture was defined by the number of quantitative trait variants (QTV) that affected the trait under selection. Either a large (10,000) or a small (1000 or 500) number of QTV was simulated. Finally, the proportions of males selected under truncation selection as sires for the next generation were equal to 1.2, 2.4, 5, or 10 %. RESULTS: Increasing recombination rate increased the overall response to selection and decreased the loss of genetic variance. The difference in cumulative response between low and high recombination rates increased over generations. At low recombination rates, cumulative response to selection tended to asymptote sooner and the genetic variance was completely eroded. If the trait under selection was affected by few QTV, differences between low and high recombination rates still existed, but the selection limit was reached at all rates of recombination. CONCLUSIONS: Higher recombination rates can enhance the efficiency of breeding programs to turn genetic variation into response to selection. However, to increase response to selection significantly, the recombination rate would need to be increased 10- or 20-fold. The biological feasibility and consequences of such large increases in recombination rates are unknown.