Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 15(2): e0228957, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32078659

RESUMO

Breast cancer is the leading cause of cancer-related disease in women. Cumulative evidence supports a causal role of alcohol intake and breast cancer incidence. In this study, we explore the change on expression of genes involved in the biological pathways through which alcohol has been hypothesized to impact breast cancer risk, to shed new insights on possible mechanisms affecting the survival of breast cancer patients. Here, we performed differential expression analysis at individual genes and gene set levels, respectively, across survival and breast cancer subtype data. Information about postdiagnosis breast cancer survival was obtained from 1977 Caucasian female participants in the Molecular Taxonomy of Breast Cancer International Consortium. Expression of 16 genes that have been linked in the literature to the hypothesized alcohol-breast cancer pathways, were examined. We found that the expression of 9 out of 16 genes under study were associated with cancer survival within the first 4 years of diagnosis. Results from gene set analysis confirmed a significant differential expression of these genes as a whole too. Although alcohol consumption is not analyzed, nor available for this dataset, we believe that further study on these genes could provide important information for clinical recommendations about potential impact of alcohol drinking on breast cancer survival.


Assuntos
Consumo de Bebidas Alcoólicas/genética , Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica/genética , Adulto , Idoso , Consumo de Bebidas Alcoólicas/epidemiologia , Consumo de Bebidas Alcoólicas/mortalidade , Mama/patologia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/mortalidade , Etanol , Feminino , Humanos , Incidência , Pessoa de Meia-Idade , Medição de Risco/métodos , Fatores de Risco
2.
Plant Cell ; 32(1): 139-151, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31641024

RESUMO

The ability to predict traits from genome-wide sequence information (i.e., genomic prediction) has improved our understanding of the genetic basis of complex traits and transformed breeding practices. Transcriptome data may also be useful for genomic prediction. However, it remains unclear how well transcript levels can predict traits, particularly when traits are scored at different development stages. Using maize (Zea mays) genetic markers and transcript levels from seedlings to predict mature plant traits, we found that transcript and genetic marker models have similar performance. When the transcripts and genetic markers with the greatest weights (i.e., the most important) in those models were used in one joint model, performance increased. Furthermore, genetic markers important for predictions were not close to or identified as regulatory variants for important transcripts. These findings demonstrate that transcript levels are useful for predicting traits and that their predictive power is not simply due to genetic variation in the transcribed genomic regions. Finally, genetic marker models identified only 1 of 14 benchmark flowering-time genes, while transcript models identified 5. These data highlight that, in addition to being useful for genomic prediction, transcriptome data can provide a link between traits and variation that cannot be readily captured at the sequence level.

3.
Front Immunol ; 10: 2616, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31787983

RESUMO

Influenza, a communicable disease, affects thousands of people worldwide. Young children, elderly, immunocompromised individuals and pregnant women are at higher risk for being infected by the influenza virus. Our study aims to highlight differentially expressed genes in influenza disease compared to influenza vaccination, including variability due to age and sex. To accomplish our goals, we conducted a meta-analysis using publicly available microarray expression data. Our inclusion criteria included subjects with influenza, subjects who received the influenza vaccine and healthy controls. We curated 18 microarray datasets for a total of 3,481 samples (1,277 controls, 297 influenza infection, 1,907 influenza vaccination). We pre-processed the raw microarray expression data in R using packages available to pre-process Affymetrix and Illumina microarray platforms. We used a Box-Cox power transformation of the data prior to our down-stream analysis to identify differentially expressed genes. Statistical analyses were based on linear mixed effects model with all study factors and successive likelihood ratio tests (LRT) to identify differentially-expressed genes. We filtered LRT results by disease (Bonferroni adjusted p < 0.05) and used a two-tailed 10% quantile cutoff to identify biologically significant genes. Furthermore, we assessed age and sex effects on the disease genes by filtering for genes with a statistically significant (Bonferroni adjusted p < 0.05) interaction between disease and age, and disease and sex. We identified 4,889 statistically significant genes when we filtered the LRT results by disease factor, and gene enrichment analysis (gene ontology and pathways) included innate immune response, viral process, defense response to virus, Hematopoietic cell lineage and NF-kappa B signaling pathway. Our quantile filtered gene lists comprised of 978 genes each associated with influenza infection and vaccination. We also identified 907 and 48 genes with statistically significant (Bonferroni adjusted p < 0.05) disease-age and disease-sex interactions, respectively. Our meta-analysis approach highlights key gene signatures and their associated pathways for both influenza infection and vaccination. We also were able to identify genes with an age and sex effect. This gives potential for improving current vaccines and exploring genes that are expressed equally across ages when considering universal vaccinations for influenza.

4.
G3 (Bethesda) ; 9(11): 3691-3702, 2019 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-31533955

RESUMO

The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (i.e., ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (i.e., feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.


Assuntos
Genômica/métodos , Aprendizado de Máquina , Plantas/genética , Benchmarking , Genótipo , Redes Neurais de Computação , Fenótipo
5.
Genetics ; 212(4): 1045-1061, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31152070

RESUMO

The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction. We propose a novel approach ("HaploBlocker") for defining and inferring haplotype blocks that focuses on linkage instead of the commonly used population-wide measures of linkage disequilibrium. We define a haplotype block as a sequence of genetic markers that has a predefined minimum frequency in the population, and only haplotypes with a similar sequence of markers are considered to carry that block, effectively screening a dataset for group-wise identity-by-descent. From these haplotype blocks, we construct a haplotype library that represents a large proportion of genetic variability with a limited number of blocks. Our method is implemented in the associated R-package HaploBlocker, and provides flexibility not only to optimize the structure of the obtained haplotype library for subsequent analyses, but also to handle datasets of different marker density and genetic diversity. By using haplotype blocks instead of single nucleotide polymorphisms (SNPs), local epistatic interactions can be naturally modeled, and the reduced number of parameters enables a wide variety of new methods for further genomic analyses such as genomic prediction and the detection of selection signatures. We illustrate our methodology with a dataset comprising 501 doubled haploid lines in a European maize landrace genotyped at 501,124 SNPs. With the suggested approach, we identified 2991 haplotype blocks with an average length of 2685 SNPs that together represent 94% of the dataset.


Assuntos
Biblioteca Gênica , Haplótipos , Algoritmos , Animais , Biologia Computacional , Conjuntos de Dados como Assunto , Ligação Genética , Marcadores Genéticos , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Zea mays/genética
6.
G3 (Bethesda) ; 9(5): 1377-1383, 2019 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-30894453

RESUMO

We created a suite of packages to enable analysis of extremely large genomic data sets (potentially millions of individuals and millions of molecular markers) within the R environment. The package offers: a matrix-like interface for .bed files (PLINK's binary format for genotype data), a novel class of linked arrays that allows linking data stored in multiple files to form a single array accessible from the R computing environment, methods for parallel computing capabilities that can carry out computations on very large data sets without loading the entire data into memory and a basic set of methods for statistical genetic analyses. The package is accessible through CRAN and GitHub. In this note, we describe the classes and methods implemented in each of the packages that make the suite and illustrate the use of the packages using data from the UK Biobank.


Assuntos
Big Data , Biologia Computacional/métodos , Genômica/métodos , Software , Algoritmos
7.
G3 (Bethesda) ; 9(5): 1429-1436, 2019 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-30877081

RESUMO

The genetic architecture of complex human traits and diseases is affected by large number of possibly interacting genes, but detecting epistatic interactions can be challenging. In the last decade, several studies have alluded to problems that linkage disequilibrium can create when testing for epistatic interactions between DNA markers. However, these problems have not been formalized nor have their consequences been quantified in a precise manner. Here we use a conceptually simple three locus model involving a causal locus and two markers to show that imperfect LD can generate the illusion of epistasis, even when the underlying genetic architecture is purely additive. We describe necessary conditions for such "phantom epistasis" to emerge and quantify its relevance using simulations. Our empirical results demonstrate that phantom epistasis can be a very serious problem in GWAS studies (with rejection rates against the additive model greater than 0.28 for nominal p-values of 0.05, even when the model is purely additive). Some studies have sought to avoid this problem by only testing interactions between SNPs with R-sq. <0.1. We show that this threshold is not appropriate and demonstrate that the magnitude of the problem is even greater with large sample size, intermediate allele frequencies, and when the causal locus explains a large amount of phenotypic variance. We conclude that caution must be exercised when interpreting GWAS results derived from very large data sets showing strong evidence in support of epistatic interactions between markers.


Assuntos
Epistasia Genética , Desequilíbrio de Ligação , Característica Quantitativa Herdável , Software , Algoritmos , Big Data , Estudo de Associação Genômica Ampla , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
8.
Forensic Sci Int Genet ; 40: 192-200, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30884346

RESUMO

Forensic DNA phenotyping (FDP) has recently provided important advancements in forensic investigations, by predicting the physical appearance of a subject from a biological sample, using SNP markers. The majority of operable prediction models have been developed for iris color; however, replication studies to understand their applicability on a worldwide scale are still limited for many of them. In this work, 4 models for eye color prediction (IrisPlex, Ruiz, Allwood and Hart models) were systematically evaluated in a sample of 296 subjects of Italian origin. Genotypes were determined by a custom NGS-based panel targeting all the predictive SNPs included in the 4 tested models. Overall, 60-69% of the Italian sample could be correctly predicted with the IrisPlex, Ruiz and Allwood models, applying the recommended threshold. The IrisPlex model showed the lowest frequency of errors (17%), but also the highest number of inconclusive results (18%). In the absence of the threshold, the highest proportion of correct predictions was again obtained with the IrisPlex model (76%), followed by the Allwood (73%) and the Ruiz (65%) models. Lastly, the Hart predictive algorithm had the lowest error rate (2%), but the majority of predictions (87%) were restricted to the less informative categories of "not-blue" and "not-brown", and correct color predictions were obtained only for 11% of the sample. As observed in previous studies, the majority of incorrect and undefined predictions were ascribable to the intermediate category, which represented 25% of the Italian sample. An adjustment of the IrisPlex (multinomial logistic regression) and Ruiz models (Snipper Bayesian classifier) with Italian allele frequencies gave only minor improvements in predicting intermediate eye color and no remarkable overall changes in performance. This suggests an incomplete knowledge underlying the intermediate colors. Considering the impact of this phenotype in the Italian sample as well as in other admixed populations, future improvements of eye color prediction methods should include a better genetic and phenotypic characterization of this category.


Assuntos
Cor de Olho/genética , Modelos Genéticos , Algoritmos , Árvores de Decisões , Feminino , Genótipo , Técnicas de Genotipagem/instrumentação , Humanos , Itália , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos
9.
Plant Methods ; 15: 14, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30774704

RESUMO

Background: The selection of hybrids is an essential step in maize breeding. However, evaluating a large number of hybrids in field trials can be extremely costly. However, genomic models can be used to predict the expected performance of un-tested genotypes. Bayesian models offer a very flexible framework for hybrid prediction. The Bayesian methodology can be used with parametric and semi-parametric assumptions for additive and non-additive effects. Furthermore, samples from the posterior distribution of Bayesian models can be used to estimate the variance due to general and specific combining abilities even in cases where additive and non-additive effects are not mutually orthogonal. Also, the use of Bayesian models for analysis and prediction of hybrid performance has remained fairly limited. Results: We provided an overview of Bayesian parametric and semi-parametric genomic models for prediction of agronomic traits in maize hybrids and discussed how these models can be used to decompose the genotypic variance into components due to general and specific combining ability. We applied the methodology to data from 906 single cross tropical maize hybrids derived from a convergent population. Our results show that: (1) non-additive effects make a sizable contribution to the genetic variance of grain yield; however, the relative importance of non-additive effects was much smaller for ear and plant height; (2) genomic prediction can achieve relatively high accuracy in predicting phenotypes of un-tested hybrids and in pre-screening. Conclusions: Genomic prediction can be a useful tool in pre-screening of hybrids and could contribute to the improvement of the efficiency and efficacy of maize hybrids breeding programs. The Bayesian framework offers a great deal of flexibility in modeling hybrid performance. The methodology can be used to estimate important genetic parameters and render predictions of the expected hybrid performance as well measures of uncertainty about such predictions.

10.
Genetics ; 211(4): 1395-1407, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30796011

RESUMO

In humans, most genome-wide association studies have been conducted using data from Caucasians and many of the reported findings have not replicated in other populations. This lack of replication may be due to statistical issues (small sample sizes or confounding) or perhaps more fundamentally to differences in the genetic architecture of traits between ethnically diverse subpopulations. What aspects of the genetic architecture of traits vary between subpopulations and how can this be quantified? We consider studying effect heterogeneity using Bayesian random effect interaction models. The proposed methodology can be applied using shrinkage and variable selection methods, and produces useful information about effect heterogeneity in the form of whole-genome summaries (e.g., the proportions of variance of a complex trait explained by a set of SNPs and the average correlation of effects) as well as SNP-specific attributes. Using simulations, we show that the proposed methodology yields (nearly) unbiased estimates when the sample size is not too small relative to the number of SNPs used. Subsequently, we used the methodology for the analyses of four complex human traits (standing height, high-density lipoprotein, low-density lipoprotein, and serum urate levels) in European-Americans (EAs) and African-Americans (AAs). The estimated correlations of effects between the two subpopulations were well below unity for all the traits, ranging from 0.73 to 0.50. The extent of effect heterogeneity varied between traits and SNP sets. Height showed less differences in SNP effects between AAs and EAs whereas HDL, a trait highly influenced by lifestyle, exhibited a greater extent of effect heterogeneity. For all the traits, we observed substantial variability in effect heterogeneity across SNPs, suggesting that effect heterogeneity varies between regions of the genome.


Assuntos
Grupos Étnicos/genética , Heterogeneidade Genética , Modelos Genéticos , População/genética , Característica Quantitativa Herdável , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Humanos , Polimorfismo de Nucleotídeo Único
11.
G3 (Bethesda) ; 9(1): 13-19, 2019 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-30482799

RESUMO

Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future "bigger-data", we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM (N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix (N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources.


Assuntos
Genoma/genética , Genômica , Modelos Genéticos , Algoritmos , Animais , Bancos de Espécimes Biológicos/tendências , Cruzamento , Simulação por Computador , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Análise de Componente Principal , Software
12.
Genetics ; 210(3): 809-819, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30171033

RESUMO

The genetic analysis of complex traits does not escape the current excitement around artificial intelligence, including a renewed interest in "deep learning" (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). However, the performance of DL for genomic prediction of complex human traits has not been comprehensively tested. To provide an evaluation of MLPs and CNNs, we used data from distantly related white Caucasian individuals (n ∼100k individuals, m ∼500k SNPs, and k = 1000) of the interim release of the UK Biobank. We analyzed a total of five phenotypes: height, bone heel mineral density, body mass index, systolic blood pressure, and waist-hip ratio, with genomic heritabilities ranging from ∼0.20 to 0.70. After hyperparameter optimization using a genetic algorithm, we considered several configurations, from shallow to deep learners, and compared the predictive performance of MLPs and CNNs with that of Bayesian linear regressions across sets of SNPs (from 10k to 50k) that were preselected using single-marker regression analyses. For height, a highly heritable phenotype, all methods performed similarly, although CNNs were slightly but consistently worse. For the rest of the phenotypes, the performance of some CNNs was comparable or slightly better than linear methods. Performance of MLPs was highly dependent on SNP set and phenotype. In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin. We suggest that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.


Assuntos
Aprendizado Profundo , Genômica/métodos , Herança Multifatorial/genética , Genótipo , Humanos , Modelos Lineares , Software
13.
G3 (Bethesda) ; 8(11): 3627-3636, 2018 11 06.
Artigo em Inglês | MEDLINE | ID: mdl-30228192

RESUMO

Glioblastoma multiforme (GBM) has been recognized as the most lethal type of malignant brain tumor. Despite efforts of the medical and research community, patients' survival remains extremely low. Multi-omic profiles (including DNA sequence, methylation and gene expression) provide rich information about the tumor. These profiles are likely to reveal processes that may be predictive of patient survival. However, the integration of multi-omic profiles, which are high dimensional and heterogeneous in nature, poses great challenges. The goal of this work was to develop models for prediction of survival of GBM patients that can integrate clinical information and multi-omic profiles, using multi-layered Bayesian regressions. We apply the methodology to data from GBM patients from The Cancer Genome Atlas (TCGA, n = 501) to evaluate whether integrating multi-omic profiles (SNP-genotypes, methylation, copy number variants and gene expression) with clinical information (demographics as well as treatments) leads to an improved ability to predict patient survival. The proposed Bayesian models were used to estimate the proportion of variance explained by clinical covariates and omics and to evaluate prediction accuracy in cross validation (using the area under the Receiver Operating Characteristic curve, AUC). Among clinical and demographic covariates, age (AUC = 0.664) and the use of temozolomide (AUC = 0.606) were the most predictive of survival. Among omics, methylation (AUC = 0.623) and gene expression (AUC = 0.593) were more predictive than either SNP (AUC = 0.539) or CNV (AUC = 0.547). While there was a clear association between age and methylation, the integration of age, the use of temozolomide, and either gene expression or methylation led to a substantial increase in AUC in cross-validaton (AUC = 0.718). Finally, among the genes whose methylation was higher in aging brains, we observed a higher enrichment of these genes being also differentially methylated in cancer.


Assuntos
Neoplasias Encefálicas/genética , Glioblastoma/genética , Idoso , Antineoplásicos Alquilantes/uso terapêutico , Neoplasias Encefálicas/tratamento farmacológico , Variações do Número de Cópias de DNA , Metilação de DNA , Feminino , Genômica , Glioblastoma/tratamento farmacológico , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Análise de Sobrevida , Temozolomida/uso terapêutico
14.
Genetics ; 210(2): 477-497, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30150289

RESUMO

We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9% of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few centimeters of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from genome-wide complex trait analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier genome-wide association studies (GWAS) for out-of-sample validation of our results.


Assuntos
Estatura/genética , Modelos Genéticos , Genoma Humano , Humanos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável
15.
Trends Genet ; 34(10): 746-754, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30139641

RESUMO

Accurate prediction of complex traits requires using a large number of DNA variants. Advances in statistical and machine learning methodology enable the identification of complex patterns in high-dimensional settings. However, training these highly parameterized methods requires very large data sets. Until recently, such data sets were not available. But the situation is changing rapidly as very large biomedical data sets comprising individual genotype-phenotype data for hundreds of thousands of individuals become available in public and private domains. We argue that the convergence of advances in methodology and the advent of Big Genomic Data will enable unprecedented improvements in complex-trait prediction; we review theory and evidence supporting our claim and discuss challenges and opportunities that Big Data will bring to complex-trait prediction.


Assuntos
Big Data , Estudo de Associação Genômica Ampla/tendências , Herança Multifatorial/genética , Locos de Características Quantitativas/genética , Genômica , Genótipo , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
17.
J Dairy Sci ; 101(10): 9135-9153, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30055916

RESUMO

The relationship of the estrous cycle to milk composition and milk physical properties was assessed on Holstein (n = 10,696), Brown Swiss (n = 20,501), Simmental (n = 17,837), and Alpine Grey (n = 8,595) cows reared in northeastern Italy. The first insemination after calving for each cow was chosen to be the day of estrus and insemination. Test days surrounding the insemination date (from 10 d before to 10 d after the day of the estrus) were selected and categorized in phases relative to estrus as diestrus high-progesterone, proestrus, estrus, metestrus, and diestrus increasing-progesterone phases. Milk components and physical properties were predicted on the basis of Fourier-transform infrared spectra of milk samples and were analyzed using a linear mixed model, which included the random effects of herd, the fixed classification effects of year-month, parity number, breed, estrous cycle phase, day nested within the estrous cycle phase, conception, partial regressions on linear and quadratic effects of days in milk nested within parity number, as well as the interactions between conception outcome with estrous cycle phase and breed with estrous cycle phase. Milk composition, particularly fat, protein, and lactose, showed clear differences among the estrous cycle phases. Fat increased by 0.14% from diestrus high-progesterone to estrous phase, whereas protein concomitantly decreased by 0.03%. Lactose appeared to remain relatively constant over diestrus high-progesterone, rising 1 d before the day of estrus followed by a gradual reduction over the subsequent phases. Specific fatty acids were also affected across the estrous cycle phases: C14:0 and C16:0 decreased (-0.34 and -0.48%) from proestrus to estrus with a concomitant increase in C18:0 and C18:1 cis-9 (0.40 and 0.73%). More general categories of fatty acids showed a similar behavior; that is, unsaturated fatty acids, monounsaturated fatty acids, polyunsaturated fatty acids, trans fatty acids, and long-chain fatty acids increased, whereas the saturated fatty acids, medium-chain fatty acids, and short-chain fatty acids decreased during the estrous phase. Finally, urea, somatic cell score, freezing point, pH, and homogenization index were also affected indicating variation associated with the hormonal and behavioral changes of cows in standing estrus. Hence, the variation in milk profiles of cows showing estrus should potentially be taken into account for precision dairy farming management.


Assuntos
Bovinos/fisiologia , Ciclo Estral/metabolismo , Ácidos Graxos/análise , Leite/química , Animais , Feminino , Itália , Lactação , Gravidez
18.
Plant Methods ; 14: 46, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29991959

RESUMO

Background: Modern agriculture uses hyperspectral cameras with hundreds of reflectance data at discrete narrow bands measured in several environments. Recently, Montesinos-López et al. (Plant Methods 13(4):1-23, 2017a. 10.1186/s13007-016-0154-2; Plant Methods 13(62):1-29, 2017b. 10.1186/s13007-017-0212-4) proposed using functional regression analysis (as functional data analyses) to help reduce the dimensionality of the bands and thus decrease the computational cost. The purpose of this paper is to discuss the advantages and disadvantages that functional regression analysis offers when analyzing hyperspectral image data. We provide a brief review of functional regression analysis and examples that illustrate the methodology. We highlight critical elements of model specification: (i) type and number of basis functions, (ii) the degree of the polynomial, and (iii) the methods used to estimate regression coefficients. We also show how functional data analyses can be integrated into Bayesian models. Finally, we include an in-depth discussion of the challenges and opportunities presented by functional regression analysis. Results: We used seven model-methods, one with the conventional model (M1), three methods using the B-splines model (M2, M4, and M6) and three methods using the Fourier basis model (M3, M5, and M7). The data set we used comprises 976 wheat lines under irrigated environments with 250 wavelengths. Under a Bayesian Ridge Regression (BRR), we compared the prediction accuracy of the model-methods proposed under different numbers of basis functions, and compared the implementation time (in seconds) of the seven proposed model-methods for different numbers of basis. Our results as well as previously analyzed data (Montesinos-López et al. 2017a, 2017b) support that around 23 basis functions are enough. Concerning the degree of the polynomial in the context of B-splines, degree 3 approximates most of the curves very well. Two satisfactory types of basis are the Fourier basis for period curves and the B-splines model for non-periodic curves. Under nine different basis, the seven method-models showed similar prediction accuracy. Regarding implementation time, results show that the lower the number of basis, the lower the implementation time required. Methods M2, M3, M6 and M7 were around 3.4 times faster than methods M1, M4 and M5. Conclusions: In this study, we promote the use of functional regression modeling for analyzing high-throughput phenotypic data and indicate the advantages and disadvantages of its implementation. In addition, many key elements that are needed to understand and implement this statistical technique appropriately are provided using a real data set. We provide details for implementing Bayesian functional regression using the developed genomic functional regression (GFR) package. In summary, we believe this paper is a good guide for breeders and scientists interested in using functional regression models for implementing prediction models when their data are curves.

19.
G3 (Bethesda) ; 8(7): 2471-2481, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29794167

RESUMO

Potato (Solanum tuberosum) is a staple food crop and is considered one of the main sources of carbohydrates worldwide. Late blight (Phytophthora infestans) and common scab (Streptomyces scabies) are two of the primary production constraints faced by potato farming. Previous studies have identified a few resistance genes for both late blight and common scab; however, these genes explain only a limited fraction of the heritability of these diseases. Genomic selection has been demonstrated to be an effective methodology for breeding value prediction in many major crops (e.g., maize and wheat). However, the technology has received little attention in potato breeding. We present the first genomic selection study involving late blight and common scab in tetraploid potato. Our data involves 4,110 (Single Nucleotide Polymorphisms, SNPs) and phenotypic field evaluations for late blight (n=1,763) and common scab (n=3,885) collected in seven and nine years, respectively. We report moderately high genomic heritability estimates (0.46 ± 0.04 and 0.45 ± 0.017, for late blight and common scab, respectively). The extent of genotype-by-year interaction was high for late blight and low for common scab. Our assessment of prediction accuracy demonstrates the applicability of genomic prediction for tetraploid potato breeding. For both traits, we found that more than 90% of the genetic variance could be captured with an additive model. For common scab, the highest prediction accuracy was achieved using an additive model. For late blight, small but statistically significant gains in prediction accuracy were achieved using a model that accounted for both additive and dominance effects. Using whole-genome regression models we identified SNPs located in previously reported hotspots regions for late blight, on genes associated with systemic disease resistance responses, and a new locus located in a WRKY transcription factor for common scab.


Assuntos
Resistência à Doença/genética , Genoma de Planta , Genômica , Doenças das Plantas/genética , Seleção Genética , Solanum tuberosum/genética , Tetraploidia , Algoritmos , Genômica/métodos , Genótipo , Modelos Genéticos , Doenças das Plantas/microbiologia , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Solanum tuberosum/microbiologia , Streptomyces
20.
J Dairy Sci ; 101(3): 2496-2505, 2018 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29290427

RESUMO

Data on Holstein (16,890), Brown Swiss (31,441), Simmental (25,845), and Alpine Grey (12,535) cows reared in northeastern Italy were used to assess the ability of milk components (fat, protein, casein, and lactose) and Fourier transform infrared (FTIR) spectral data to diagnose pregnancy. Pregnancy status was defined as whether a pregnancy was confirmed by a subsequent calving and no other subsequent inseminations within 90 d of the breeding of specific interest. Milk samples were analyzed for components and FTIR full-spectrum data using a MilkoScan FT+ 6000 (Foss Electric, Hillerød, Denmark). The spectrum covered 1,060 wavenumbers (wn) from 5,010 to 925 cm-1. Pregnancy status was predicted using generalized linear models with fat, protein, lactose, casein, and individual FTIR spectral bands or wavelengths as predictors. We also fitted a generalized linear model as a simultaneous function of all wavelengths (1,060 wn) with a Bayesian variable selection model using the BGLR R-package (https://r-forge.r-project.org/projects/bglr/). Prediction accuracy was determined using the area under a receiver operating characteristic curve based on a 10-fold cross-validation (CV-AUC) assessment based on sensitivities and specificities of phenotypic predictions. Overall, the best prediction accuracies were obtained for the model that included the complete FTIR spectral data. We observed similar patterns across breeds with small differences in prediction accuracy. The highest CV-AUC value was obtained for Alpine Grey cows (CV-AUC = 0.645), whereas Brown Swiss and Simmental cows had similar performance (CV-AUC = 0.630 and 0.628, respectively), followed by Holsteins (CV-AUC = 0.607). For single-wavelength analyses, important peaks were detected at wn 2,973 to 2,872 cm-1 where Fat-B (C-H stretch) is usually filtered, wn 1,773 cm-1 where Fat-A (C=O stretch) is filtered, wn 1,546 cm-1 where protein is filtered, wn 1,468 cm-1 associated with urea and fat, wn 1,399 and 1,245 cm-1 associated with acetone, and wn 1,025 to 1,013 cm-1 where lactose is filtered. In conclusion, this research provides new insight into alternative strategies for pregnancy screening of dairy cows.


Assuntos
Leite/química , Prenhez , Espectroscopia de Infravermelho com Transformada de Fourier/veterinária , Animais , Caseínas/análise , Bovinos , Feminino , Glicolipídeos/análise , Glicoproteínas/análise , Itália , Lactose/análise , Proteínas do Leite/análise , Gravidez
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA