Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
Article in English | MEDLINE | ID: mdl-38117627

ABSTRACT

Next-generation sequencing (NGS) genomic data offer valuable high-throughput genomic information for computational applications in medicine. Using genomic data to identify disease-associated genes to estimate cancer mortality risk remains challenging regarding to computational efficiency and risk integration. For determining mortality-related genes, we propose an information fusion system based on a fuzzy system to fuse the numerous deep-learning-based risk scores, consider the significance of features related to time-varying effects and risk stratifications, and interpret the directional relationship and interaction between outcome and predictors. Fuzzy rules were implemented to integrate the considerations mentioned above by merging all the risk score models to achieve advanced risk estimation. The genomic data of head and neck squamous cell carcinoma (HNSCC) were used to evaluate the performance of the proposed computational approach. The results indicated that the proposed computational approach exhibited optimal ability to identify mortality risk-related genes in HNSCC patients. The results also suggest that HNSCC mortality is associated with cancer inflammatory response, the interleukin-17A signaling pathway, stellate cell activation, and the extracellular-regulated protein kinase five signaling pathway, which might offer new therapeutic targets HNSCC through immunologic or antiangiogenic mechanisms. The proposed information fusion system can promote the determination of high-risk genes related to cancer mortality. This study contributes a valid cancer mortality risk estimate that can identify mortality-related genes.

2.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36458451

ABSTRACT

In epistasis analysis, single-nucleotide polymorphism-single-nucleotide polymorphism interactions (SSIs) among genes may, alongside other environmental factors, influence the risk of multifactorial diseases. To identify SSI between cases and controls (i.e. binary traits), the score for model quality is affected by different objective functions (i.e. measurements) because of potential disease model preferences and disease complexities. Our previous study proposed a multiobjective approach-based multifactor dimensionality reduction (MOMDR), with the results indicating that two objective functions could enhance SSI identification with weak marginal effects. However, SSI identification using MOMDR remains a challenge because the optimal measure combination of objective functions has yet to be investigated. This study extended MOMDR to the many-objective version (i.e. many-objective MDR, MaODR) by integrating various disease probability measures based on a two-way contingency table to improve the identification of SSI between cases and controls. We introduced an objective function selection approach to determine the optimal measure combination in MaODR among 10 well-known measures. In total, 6 disease models with and 40 disease models without marginal effects were used to evaluate the general algorithms, namely those based on multifactor dimensionality reduction, MOMDR and MaODR. Our results revealed that the MaODR-based three objective function model, correct classification rate, likelihood ratio and normalized mutual information (MaODR-CLN) exhibited the higher 6.47% detection success rates (Accuracy) than MOMDR and higher 17.23% detection success rates than MDR through the application of an objective function selection approach. In a Wellcome Trust Case Control Consortium, MaODR-CLN successfully identified the significant SSIs (P < 0.001) associated with coronary artery disease. We performed a systematic analysis to identify the optimal measure combination in MaODR among 10 objective functions. Our combination detected SSIs-based binary traits with weak marginal effects and thus reduced spurious variables in the score model. MOAI is freely available at https://sites.google.com/view/maodr/home.


Subject(s)
Epistasis, Genetic , Models, Genetic , Algorithms , Phenotype , Multifactor Dimensionality Reduction/methods , Polymorphism, Single Nucleotide
3.
Article in English | MEDLINE | ID: mdl-35061588

ABSTRACT

Epistasis detection is vital for understanding disease susceptibility in genetics. Multiobjective multifactor dimensionality reduction (MOMDR) was previously proposed to detect epistasis. MOMDR was performed using binary classification to distinguish the high-risk (H) and low-risk (L) groups to reduce multifactor dimensionality. However, the binary classification does not reflect the uncertainty of the H and L classification. In this study, we proposed an empirical fuzzy MOMDR (EFMOMDR) to address the limitations of binary classification using the degree of membership through an empirical fuzzy approach. The EFMOMDR can simultaneously consider two incorporated fuzzy-based measures, including correct classification rate and likelihood rate, and does not require parameter tuning. Simulation studies revealed that EFMOMDR has higher 7.14% detection success rates than MOMDR, indicating that the limitations of binary classification of MOMDR have been successfully improved by empirical fuzzy. Moreover, EFMOMDR was used to analyze coronary artery disease in the Wellcome Trust Case Control Consortium dataset.


Subject(s)
Coronary Artery Disease , Epistasis, Genetic , Humans , Epistasis, Genetic/genetics , Multifactor Dimensionality Reduction , Models, Genetic , Computer Simulation , Coronary Artery Disease/genetics , Polymorphism, Single Nucleotide , Algorithms
4.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35397164

ABSTRACT

Primers are critical for polymerase chain reaction (PCR) and influence PCR experimental outcomes. Designing numerous combinations of forward and reverse primers involves various primer constraints, posing a computational challenge. Most PCR primer design methods limit parameters because the available algorithms use general fitness functions. This study designed new fitness functions based on user-specified parameters and used the functions in a primer design approach based on the multiobjective particle swarm optimization (MOPSO) algorithm to address the challenge of primer design with user-specified parameters. Multicriteria evaluation was conducted simultaneously based on primer constraints. The fitness functions were evaluated using 7425 DNA sequences and compared with a predominant primer design approach based on optimization algorithms. Each DNA sequence was run 100 times to calculate the difference between the user-specified parameters and primer constraint values. The algorithms based on fitness functions with user-specified parameters outperformed the algorithms based on general fitness functions for 11 primer constraints. Moreover, MOPSO exhibited superior implementation in all experiments. Practical gel electrophoresis was conducted to verify the PCR experiments and established that MOPSO effectively designs primers based on user-specified parameters.


Subject(s)
Algorithms , Software , Base Sequence , DNA Primers/genetics , Polymerase Chain Reaction/methods
5.
Front Neurosci ; 16: 1018005, 2022.
Article in English | MEDLINE | ID: mdl-36620438

ABSTRACT

To understand students' learning behaviors, this study uses machine learning technologies to analyze the data of interactive learning environments, and then predicts students' learning outcomes. This study adopted a variety of machine learning classification methods, quizzes, and programming system logs, found that students' learning characteristics were correlated with their learning performance when they encountered similar programming practice. In this study, we used random forest (RF), support vector machine (SVM), logistic regression (LR), and neural network (NN) algorithms to predict whether students would submit on time for the course. Among them, the NN algorithm showed the best prediction results. Education-related data can be predicted by machine learning techniques, and different machine learning models with different hyperparameters can be used to obtain better results.

6.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34661627

ABSTRACT

Identifying and characterizing the interaction between risk factors for multiple outcomes (multi-outcome interaction) has been one of the greatest challenges faced by complex multifactorial diseases. However, the existing approaches have several limitations in identifying the multi-outcome interaction. To address this issue, we proposed a multi-outcome interaction identification approach called MOAI. MOAI was motivated by the limitations of estimating the interaction simultaneously occurring in multi-outcomes and by the success of Pareto set filter operator for identifying multi-outcome interaction. MOAI permits the identification for the interaction of multiple outcomes and is applicable in population-based study designs. Our experimental results exhibited that the existing approaches are not effectively used to identify the multi-outcome interaction, whereas MOAI obviously exhibited superior performance in identifying multi-outcome interaction. We applied MOAI to identify the interaction between risk factors for colorectal cancer (CRC) in both metastases and mortality prognostic outcomes. An interaction between vaspin and carcinoembryonic antigen (CEA) was found, and the interaction indicated that patients with CRC characterized by higher vaspin (≥30%) and CEA (≥5) levels could simultaneously increase both metastases and mortality risk. The immunostaining evidence revealed that determined multi-outcome interaction could effectively identify the difference between non-metastases/survived and metastases/deceased patients, which offers multi-prognostic outcome risk estimation for CRC. To our knowledge, this is the first report of a multi-outcome interaction associated with a complex multifactorial disease. MOAI is freely available at https://sites.google.com/view/moaitool/home.


Subject(s)
Carcinoembryonic Antigen , Colorectal Neoplasms , Biomarkers, Tumor , Humans
7.
Diagnostics (Basel) ; 10(10)2020 Oct 09.
Article in English | MEDLINE | ID: mdl-33050209

ABSTRACT

Colorectal cancer is a highly heterogeneous malignancy in the Asian population, and it is considered an important prognostic factor for baseline characteristics, tumor burden, and tumor markers. This study investigated the effect of baseline characteristics and tumor burden on tumor marker expression and progressive disease in colorectal cancer by using partial least squares variance-based path modeling (PLS-PM). PLS-PM can be used to evaluate the complex relationship between prognostic variables and progressive disease status with a small sample of measurements and structural models. A total of 89 tissue samples of colorectal cancer were analyzed. Our results suggested that the expression of visceral adipose tissue-derived serpin (vaspin) is a potential indicator of colorectal cancer progression and may be affected by baseline characteristics such as age, sex, body mass index, and diabetes mellitus. Moreover, according to the characteristics of tumor burden, the expression of vaspin was generally higher in each progressive disease patient. The overall findings suggest that vaspin is a potential indicator of the progressive disease and may be affected by the baseline characteristics of patients.

8.
Artif Intell Med ; 102: 101768, 2020 01.
Article in English | MEDLINE | ID: mdl-31980105

ABSTRACT

OBJECTIVE: Epistasis identification is critical for determining susceptibility to human genetic diseases. The rapid development of technology has enabled scalability to make multifactor dimensionality reduction (MDR) measurements an effective calculation tool that achieves superior detection. However, the classification of high-risk (H) or low-risk (L) groups in multidrug resistance operations calls for extensive research. METHODS AND MATERIAL: In this study, an improved fuzzy sigmoid (FS) method using the membership degree in MDR (FSMDR) was proposed for solving the limitations of binary classification. The FS method combined with MDR measurements yielded an improved ability to distinguish similar frequencies of potential multifactor genotypes. RESULTS: We compared our results with other MDR-based methods and FSMDR achieved superior detection rates on simulated data sets. The results indicated that the fuzzy classifications can provide insight into the uncertainty of H/L classification in MDR operation. CONCLUSION: FSMDR successfully detected significant epistasis of coronary artery disease in the Wellcome Trust Case Control Consortium data set.


Subject(s)
Epistasis, Genetic , Fuzzy Logic , Multifactor Dimensionality Reduction/methods , Algorithms , Artificial Intelligence , Case-Control Studies , Drug Resistance/genetics , Drug Resistance, Multiple/genetics , Genotype , Humans , Models, Genetic
9.
Article in English | MEDLINE | ID: mdl-30040653

ABSTRACT

Detecting gene-gene interactions in single-nucleotide polymorphism data is vital for understanding disease susceptibility. However, existing approaches may be limited by the sample size in case-control studies. Herein, we propose a balance approach for the multifactor dimensionality reduction (BMDR) method to increase the accuracy of estimates of the prediction error rate in small samples. BMDR explicitly selects the best model by evaluating the average of prediction error rates over k-fold cross-validation without cross-validation consistency selection. In this study, we used several epistatic models with and without marginal effects under different parameter settings (heritability and minor allele frequencies) to evaluate the performance of existing approaches. Using simulated data sets, BMDR successfully detected gene-gene interactions, particularly for data sets with small sample sizes. A large data set was obtained from the Wellcome Trust Case Control Consortium, and results indicated that BMDR could effectively detect significant gene-gene interactions.


Subject(s)
Computational Biology/methods , Epistasis, Genetic/genetics , Models, Genetic , Multifactor Dimensionality Reduction/methods , Algorithms , Polymorphism, Single Nucleotide/genetics
10.
Comput Biol Med ; 113: 103397, 2019 10.
Article in English | MEDLINE | ID: mdl-31494431

ABSTRACT

Hydrophobic-polar (HP) models are widely used to predict protein folding and hydrophobic interactions. Numerous optimization algorithms have been proposed to predict protein folding using the two-dimensional (2D) HP model. However, to obtain an optimal protein structure from the 2D HP model remains challenging. In this study, an algorithm integrating particle swarm optimization (PSO) and Tabu search (TS), named PSO-TS, was proposed to predict protein structures based on the 2D HP model. TS can help PSO to avoid getting trapped in a local optima and thus to remove the limitation of PSO in predicting protein folding by the 2D HP model. In this study, a total of 28 protein sequences were used to evaluate the accuracy of PSO-TS in protein folding prediction. The proposed PSO-TS method was compared with 15 other approaches for predicting short and long protein sequences. Experimental results demonstrated that PSO-TS provides a highly accurate, reproducible, and stabile prediction ability for the protein folding by the 2D HP model.


Subject(s)
Algorithms , Models, Molecular , Protein Folding , Proteins , Sequence Analysis, Protein , Amino Acid Sequence , Hydrophobic and Hydrophilic Interactions , Protein Domains , Proteins/chemistry , Proteins/genetics
11.
IEEE J Biomed Health Inform ; 23(1): 416-426, 2019 01.
Article in English | MEDLINE | ID: mdl-29993963

ABSTRACT

Gene-gene interactions (GGIs) are important markers for determining susceptibility to a disease. Multifactor dimensionality reduction (MDR) is a popular algorithm for detecting GGIs and primarily adopts the correct classification rate (CCR) to assess the quality of a GGI. However, CCR measurement alone may not successfully detect certain GGIs because of potential model preferences and disease complexities. In this study, multiple-criteria decision analysis (MCDA) based on MDR was named MCDA-MDR and proposed for detecting GGIs. MCDA facilitates MDR to simultaneously adopt multiple measures within the two-way contingency table of MDR to assess GGIs; the CCR and rule utility measure were employed. Cross-validation consistency was adopted to determine the most favorable GGIs among the Pareto sets. Simulation studies were conducted to compare the detection success rates of the MDR-only-based measure and MCDA-MDR, revealing that MCDA-MDR had superior detection success rates. The Wellcome Trust Case Control Consortium dataset was analyzed using MCDA-MDR to detect GGIs associated with coronary artery disease, and MCDA-MDR successfully detected numerous significant GGIs (p < 0.001). MCDA-MDR performance assessment revealed that the applied MCDA successfully enhanced the GGI detection success rate of the MDR-based method compared with MDR alone.


Subject(s)
Algorithms , Computational Biology/methods , Epistasis, Genetic/genetics , Models, Genetic , Polymorphism, Single Nucleotide/genetics , Computer Simulation , Humans
12.
IEEE Trans Nanobioscience ; 17(3): 291-299, 2018 07.
Article in English | MEDLINE | ID: mdl-29994217

ABSTRACT

Single-nucleotide polymorphism (SNP)-SNP interactions are crucial for understanding the association between disease-related multifactorials for disease analysis. Existing statistical methods for determining such interactions are limited by the considerable computation required for evaluating all potential associations between disease-related multifactorials. Identifying SNP-SNP interactions is thus a major challenge in genetic association studies. This paper proposes a catfish Taguchi-based binary differential evolution (CT-BDE) algorithm for identifying SNP-SNP interactions. In the search space, the catfish effect prevents the premature convergence of the population, and the Taguchi method improves the search ability of the BDE algorithm. Hence, the proposed algorithm enables obtaining a favorable solution regarding the identification of high-order SNP-SNP interactions. Additionally, the proposed algorithm applies an effective fitness function derived from a multifactor dimensionality reduction (MDR) operation to evaluate the solutions from BDE-based algorithms. Simulated and real data sets were used to evaluate the ability of several BDE-based algorithms in identifying specific SNP-SNP interactions. We compared the fitness function derived from the MDR operation with that derived according to the difference between cases and controls, by using the different BDE-based algorithms. The results showed that the proposed CT-BDE algorithm applying the fitness function derived from the MDR operation exhibited a superior ability in identifying SNP-SNP interactions compared with the other BDE-based algorithms.


Subject(s)
Algorithms , Computational Biology/methods , Genetic Association Studies/methods , Polymorphism, Single Nucleotide/genetics , Databases, Factual , Humans , Multifactor Dimensionality Reduction/methods , Renal Dialysis
13.
Bioinformatics ; 34(13): 2228-2236, 2018 07 01.
Article in English | MEDLINE | ID: mdl-29471406

ABSTRACT

Motivation: Single-nucleotide polymorphism (SNP)-SNP interactions (SSIs) are popular markers for understanding disease susceptibility. Multifactor dimensionality reduction (MDR) can successfully detect considerable SSIs. Currently, MDR-based methods mainly adopt a single-objective function (a single measure based on contingency tables) to detect SSIs. However, generally, a single-measure function might not yield favorable results due to potential model preferences and disease complexities. Approach: This study proposes a multiobjective MDR (MOMDR) method that is based on a contingency table of MDR as an objective function. MOMDR considers the incorporated measures, including correct classification and likelihood rates, to detect SSIs and adopts set theory to predict the most favorable SSIs with cross-validation consistency. MOMDR enables simultaneously using multiple measures to determine potential SSIs. Results: Three simulation studies were conducted to compare the detection success rates of MOMDR and single-objective MDR (SOMDR), revealing that MOMDR had higher detection success rates than SOMDR. Furthermore, the Wellcome Trust Case Control Consortium dataset was analyzed by MOMDR to detect SSIs associated with coronary artery disease. Availability and implementation: MOMDR is freely available at https://goo.gl/M8dpDg. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Epistasis, Genetic , Models, Genetic , Multifactor Dimensionality Reduction/methods , Polymorphism, Single Nucleotide , Case-Control Studies , Coronary Artery Disease/genetics , Genetic Predisposition to Disease , Humans
14.
J Comput Biol ; 25(2): 158-169, 2018 02.
Article in English | MEDLINE | ID: mdl-29048940

ABSTRACT

Many CpG island detection methods have been proposed based on sliding window and clustering technology, but the accuracy of these methods is proportional to the time required. Therefore, an accurate and rapid method for identifying CpG islands remains an important challenge in the complete human genome. We propose a hybrid method CpGTLBO to detect the CpG islands in the human genome. The method uses the clustering approach and the teaching-learning-based optimization (TLBO) algorithm. The clustering approach is used to detect CpG island candidates, and it can effectively reduce the huge volume of unnecessary DNA fragments. TLBO was used to accurately predict CpG islands among promising CpG island candidates. A comparison based on six contig data sets and a whole human genome analysis showed that the identifying stability of CpGTLBO outperformed eight existing methods in terms of sensitivity (SN), specificity (SP), accuracy (ACC), performance coefficient (PC), and correlation coefficient (CC) and processing time. Results indicated that ClusterTLBO can effectively overcome the drawbacks and maintain the advantages in both the CpGcluster and TLBO.


Subject(s)
Algorithms , Computational Biology/methods , CpG Islands , Genome, Human , Whole Genome Sequencing/methods , Cluster Analysis , Humans
15.
Eur J Med Res ; 22(1): 54, 2017 Dec 28.
Article in English | MEDLINE | ID: mdl-29282123

ABSTRACT

OBJECTIVES: To survey by measuring patient's functional status which is crucial when end-stage renal disease patients begin a dialysis program. The influence of the disease on patients can be examined by the measurement of Karnofsky Performance Status (KPS) scores, together with a quality of life survey, and clinical variables. METHODS: The details for the dataset in the study were collected from patients receiving regular hemodialysis (HD) in one hospital, which were available retrospectively for 1166 patients during the 5-year study period. KPS scores were applied for quantifying functional status. To identify risk factors for functional status, clinical factors including demographics, laboratory data, and HD vintage were selected. This study applied a classification and regression tree approach (CART) and logistic regression to determine risk factors on functional impairment among HD patients. RESULTS: Ten risk factors were identified by CART and regression model (age, primary kidney disease subclass, treatment years, hemoglobin, albumin, creatinine, phosphorus, intact parathyroid hormone, ferritin, and cardiothoracic ratio). The results of logistic regression with selected interaction models showed older age or higher hematocrit, blood urea nitrogen, and glucose levels could significantly increase the log-odds of obtaining low KPS scores at in-person visits. CONCLUSIONS: In interaction results, the combination of older age with higher albumin level and higher creatinine level with longer HD treatment years could significantly decrease the log-odds of a low KPS score assessment during in-person visits. Age, hemoglobin, albumin, urea, creatinine levels, primary kidney disease subclass, and HD duration are the major determinants for functional status in HD patients.


Subject(s)
Karnofsky Performance Status , Kidney Failure, Chronic/therapy , Quality of Life , Renal Dialysis , Aged , Algorithms , Female , Humans , Male , Middle Aged , Risk Factors
16.
Sci Rep ; 7(1): 16520, 2017 11 23.
Article in English | MEDLINE | ID: mdl-29170430

ABSTRACT

A correction to this article has been published and is linked from the HTML version of this paper. The error has been fixed in the paper.

17.
Sci Rep ; 7(1): 12869, 2017 10 09.
Article in English | MEDLINE | ID: mdl-28993686

ABSTRACT

Epistasis within disease-related genes (gene-gene interactions) was determined through contingency table measures based on multifactor dimensionality reduction (MDR) using single-nucleotide polymorphisms (SNPs). Most MDR-based methods use the single contingency table measure to detect gene-gene interactions; however, some gene-gene interactions may require identification through multiple contingency table measures. In this study, a multiobjective differential evolution method (called MODEMDR) was proposed to merge the various contingency table measures based on MDR to detect significant gene-gene interactions. Two contingency table measures, namely the correct classification rate and normalized mutual information, were selected to design the fitness functions in MODEMDR. The characteristics of multiobjective optimization enable MODEMDR to use multiple measures to efficiently and synchronously detect significant gene-gene interactions within a reasonable time frame. Epistatic models with and without marginal effects under various parameter settings (heritability and minor allele frequencies) were used to assess existing methods by comparing the detection success rates of gene-gene interactions. The results of the simulation datasets show that MODEMDR is superior to existing methods. Moreover, a large dataset obtained from the Wellcome Trust Case Control Consortium was used to assess MODEMDR. MODEMDR exhibited efficiency in identifying significant gene-gene interactions in genome-wide association studies.


Subject(s)
Epistasis, Genetic , Models, Genetic , Multifactor Dimensionality Reduction , Computer Simulation , Coronary Artery Disease/genetics , Gene Frequency/genetics , Genetic Loci , Humans , Polymorphism, Single Nucleotide/genetics
18.
J Comput Biol ; 24(12): 1212-1225, 2017 Dec.
Article in English | MEDLINE | ID: mdl-28876085

ABSTRACT

In previous studies, both single-nucleotide polymorphism (SNP)-SNP or gene-gene (G × G) interactions and SNP-environmental factor (G × E) interactions were reported to partially account for "missing" heritability. However, (G × G) × E interactions were less commonly addressed. The purpose of this study was to develop a novel strategy to evaluate possible (G × G) × E interactions in D-loop-based chronic dialysis association. Using values from our previously published data set (704 controls and 193 cases) of 77 D-loop SNPs and 7 environmental factors (coronary heart disease, hypertension, diabetes mellitus, triglyceride, cholesterol, blood thiol, and TBARS levels), we compared the performances of G, G × G, G × E, and (G × G) × E. We found that the interactions of four individual SNPs previously associated with a significantly high risk of chronic dialysis [odds ratio (OR) = 1.56-4.93] with environmental factors (G × E) increased the risk of chronic dialysis (maximum OR = 35.43). We then used an improved branch and bound algorithm to identify combinations of two to four SNPs that were most highly associated with chronic dialysis (OR = 9.27-34.39). When the interactions of the two- and three-SNP combinations with environmental factors were evaluated, we found that the (G × G) × E effects increased the risk of chronic dialysis (maximum OR = 8.32-57.54 and OR = 12.52-57.81, respectively; adjusted OR = 8.67-81.81 and OR = 12.29-81.95, respectively). Taken together, the (G × G) × E interactions identified chronic dialysis-associated SNPs that would not have been found using G × G or G × E interactions, suggesting that (G × G) × E interactions may be helpful to solve the problems of missing heritability in association studies.


Subject(s)
Algorithms , Gene-Environment Interaction , Polymorphism, Single Nucleotide , Renal Dialysis/methods , Renal Insufficiency, Chronic/genetics , Case-Control Studies , Chronic Disease , Genetic Predisposition to Disease , Humans , Models, Genetic
19.
Bioinformatics ; 33(15): 2354-2362, 2017 Aug 01.
Article in English | MEDLINE | ID: mdl-28379338

ABSTRACT

MOTIVATION: Detecting epistatic interactions in genome-wide association studies (GWAS) is a computational challenge. Such huge numbers of single-nucleotide polymorphism (SNP) combinations limit the some of the powerful algorithms to be applied to detect the potential epistasis in large-scale SNP datasets. APPROACH: We propose a new algorithm which combines the differential evolution (DE) algorithm with a classification based multifactor-dimensionality reduction (CMDR), termed DECMDR. DECMDR uses the CMDR as a fitness measure to evaluate values of solutions in DE process for scanning the potential statistical epistasis in GWAS. RESULTS: The results indicated that DECMDR outperforms the existing algorithms in terms of detection success rate by the large simulation and real data obtained from the Wellcome Trust Case Control Consortium. For running time comparison, DECMDR can efficient to apply the CMDR to detect the significant association between cases and controls amongst all possible SNP combinations in GWAS. AVAILABILITY AND IMPLEMENTATION: DECMDR is freely available at https://goo.gl/p9sLuJ . CONTACT: chuang@isu.edu.tw or e0955767257@yahoo.com.tw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Epistasis, Genetic , Genome-Wide Association Study/methods , Multifactor Dimensionality Reduction/methods , Polymorphism, Single Nucleotide , Humans
20.
Artif Intell Med ; 73: 23-33, 2016 10.
Article in English | MEDLINE | ID: mdl-27926379

ABSTRACT

OBJECTIVE: Evolutionary algorithms could overcome the computational limitations for the statistical evaluation of large datasets for high-order single nucleotide polymorphism (SNP) barcodes. Previous studies have proposed several chaotic particle swarm optimization (CPSO) methods to detect SNP barcodes for disease analysis (e.g., for breast cancer and chronic diseases). This work evaluated additional chaotic maps combined with the particle swarm optimization (PSO) method to detect SNP barcodes using a high-dimensional dataset. METHODS AND MATERIAL: Nine chaotic maps were used to improve PSO method results and compared the searching ability amongst all CPSO methods. The XOR and ZZ disease models were used to compare all chaotic maps combined with PSO method. Efficacy evaluations of CPSO methods were based on statistical values from the chi-square test (χ2). RESULTS: The results showed that chaotic maps could improve the searching ability of PSO method when population are trapped in the local optimum. The minor allele frequency (MAF) indicated that, amongst all CPSO methods, the numbers of SNPs, sample size, and the highest χ2 value in all datasets were found in the Sinai chaotic map combined with PSO method. We used the simple linear regression results of the gbest values in all generations to compare the all methods. Sinai chaotic map combined with PSO method provided the highest ß values (ß≥0.32 in XOR disease model and ß≥0.04 in ZZ disease model) and the significant p-value (p-value<0.001 in both the XOR and ZZ disease models). CONCLUSION: The Sinai chaotic map was found to effectively enhance the fitness values (χ2) of PSO method, indicating that the Sinai chaotic map combined with PSO method is more effective at detecting potential SNP barcodes in both the XOR and ZZ disease models.


Subject(s)
Algorithms , Breast Neoplasms/genetics , Neural Networks, Computer , Chi-Square Distribution , Humans , Models, Genetic , Pattern Recognition, Automated , Polymorphism, Single Nucleotide
SELECTION OF CITATIONS
SEARCH DETAIL