Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters











Language
Publication year range
1.
Int J Biometeorol ; 2024 Aug 31.
Article in English | MEDLINE | ID: mdl-39215818

ABSTRACT

Crop yield prediction gains growing importance for all stakeholders in agriculture. Since the growth and development of crops are fully connected with many weather factors, it is inevitable to incorporate meteorological information into yield prediction mechanism. The changes in climate-yield relationship are more pronounced at a local level than across relatively large regions. Hence, district or sub-region-level modeling may be an appropriate approach. To obtain a location- and crop-specific model, different models with different functional forms have to be explored. This systematic review aims to discuss research papers related to statistical and machine-learning models commonly used to predict crop yield using weather factors. It was found that Artificial Neural Network (ANN) and Multiple Linear Regression were the most applied models. Support Vector Regression (SVR) model has a high success ratio as it performed well in most of the cases. The optimization options in ANN and SVR models allow us to tune models to specific patterns of association between weather conditions of a location and crop yield. ANN model can be trained using different activation functions with optimized learning rate and number of hidden layer neurons. Similarly, the SVR model can be trained with different kernel functions and various combinations of hyperparameters. Penalized regression models namely, LASSO and Elastic Net are better alternatives to simple linear regression. The nonlinear machine learning models namely, SVR and ANN were found to perform better in most of the cases which indicates there exists a nonlinear complex association between crop yield and weather factors.

2.
Int J Biometeorol ; 66(8): 1627-1638, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35641796

ABSTRACT

Cashew is an important cash crop which is ecologically sensitive, making it vulnerable to climate change. So, the present study compares the performance of stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), elastic net, and artificial neural network (ANN) individually against the ANN model combined with SLR, LASSO, elastic net, and principal components analysis (PCA) for prediction of cashew yield based on weather parameters. The model performances were evaluated using three approaches: (1) Taylor plot; (2) statistical metrics like coefficient of determination (R2), root mean square error (RMSE), and normalized RMSE (nRMSE); and (3) ranking followed by Kruskal-Wallis and Dunn's post hoc test. The results revealed that during calibration, the R2 and RMSE ranged from 0.486 to 0.999 and 2.184 to 88.040 kg ha-1, respectively, while RMSE and nRMSE varied from 3.561 to 242.704 kg ha-1 and 0.799 to 89.949%, respectively, during validation. Kruskal-Wallis and Dunn's post hoc test revealed LASSO as the best model which was at par with ELNET, SLR, and ELNET-ANN. So, these models can be used for cashew yield prediction for the study area well in advance.


Subject(s)
Anacardium , Calibration , Linear Models , Neural Networks, Computer , Weather
3.
Pharm Stat ; 20(4): 898-915, 2021 07.
Article in English | MEDLINE | ID: mdl-33768736

ABSTRACT

One of the main problems that the drug discovery research field confronts is to identify small molecules, modulators of protein function, which are likely to be therapeutically useful. Common practices rely on the screening of vast libraries of small molecules (often 1-2 million molecules) in order to identify a molecule, known as a lead molecule, which specifically inhibits or activates the protein function. To search for the lead molecule, we investigate the molecular structure, which generally consists of an extremely large number of fragments. Presence or absence of particular fragments, or groups of fragments, can strongly affect molecular properties. We study the relationship between molecular properties and its fragment composition by building a regression model, in which predictors, represented by binary variables indicating the presence or absence of fragments, are grouped in subsets and a bi-level penalization term is introduced for the high dimensionality of the problem. We evaluate the performance of this model in two simulation studies, comparing different penalization terms and different clustering techniques to derive the best predictor subsets structure. Both studies are characterized by small sets of data relative to the number of predictors under consideration. From the results of these simulation studies, we show that our approach can generate models able to identify key features and provide accurate predictions. The good performance of these models is then exhibited with real data about the MMP-12 enzyme.


Subject(s)
Drug Discovery , Cluster Analysis , Computer Simulation , Humans
4.
Genomics Inform ; 14(4): 138-148, 2016 Dec.
Article in English | MEDLINE | ID: mdl-28154504

ABSTRACT

The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the "large p and small n" problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

5.
Genomics Inform ; 14(4): 149-159, 2016 Dec.
Article in English | MEDLINE | ID: mdl-28154505

ABSTRACT

With the success of the genome-wide association studies (GWASs), many candidate loci for complex human diseases have been reported in the GWAS catalog. Recently, many disease prediction models based on penalized regression or statistical learning methods were proposed using candidate causal variants from significant single-nucleotide polymorphisms of GWASs. However, there have been only a few systematic studies comparing existing methods. In this study, we first constructed risk prediction models, such as stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN), using a GWAS chip and GWAS catalog. We then compared the prediction accuracy by calculating the mean square error (MSE) value on data from the Korea Association Resource (KARE) with body mass index. Our results show that SLR provides a smaller MSE value than the other methods, while the numbers of selected variables in each model were similar.

6.
Genomics & Informatics ; : 149-159, 2016.
Article in English | WPRIM (Western Pacific) | ID: wpr-172206

ABSTRACT

With the success of the genome-wide association studies (GWASs), many candidate loci for complex human diseases have been reported in the GWAS catalog. Recently, many disease prediction models based on penalized regression or statistical learning methods were proposed using candidate causal variants from significant single-nucleotide polymorphisms of GWASs. However, there have been only a few systematic studies comparing existing methods. In this study, we first constructed risk prediction models, such as stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN), using a GWAS chip and GWAS catalog. We then compared the prediction accuracy by calculating the mean square error (MSE) value on data from the Korea Association Resource (KARE) with body mass index. Our results show that SLR provides a smaller MSE value than the other methods, while the numbers of selected variables in each model were similar.


Subject(s)
Humans , Body Mass Index , Decision Support Techniques , Genome-Wide Association Study , Korea , Learning , Linear Models
7.
Genomics & Informatics ; : 138-148, 2016.
Article in English | WPRIM (Western Pacific) | ID: wpr-172207

ABSTRACT

The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.


Subject(s)
Humans , Area Under Curve , Dataset , Decision Support Techniques , Diabetes Mellitus, Type 2 , Diagnosis , Genome-Wide Association Study , Logistic Models , Risk Assessment , ROC Curve
SELECTION OF CITATIONS
SEARCH DETAIL