Búsqueda | BVS Bolivia

Hybrid mRMR and multi-objective particle swarm feature selection methods and application to metabolomics of traditional Chinese medicine.

Zhang, Mengting; Du, Jianqiang; Nie, Bin; Luo, Jigen; Liu, Ming; Yuan, Yang.

PeerJ Comput Sci ; 10: e2073, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38855250

RESUMEN

Metabolomics data has high-dimensional features and a small sample size, which is typical of high-dimensional small sample (HDSS) data. Too high a dimensionality leads to the curse of dimensionality, and too small a sample size tends to trigger overfitting, which poses a challenge to deeper mining in metabolomics. Feature selection is a valuable technique for effectively handling the challenges HDSS data poses. For the feature selection problem of HDSS data in metabolomics, a hybrid Max-Relevance and Min-Redundancy (mRMR) and multi-objective particle swarm feature selection method (MCMOPSO) is proposed. Experimental results using metabolomics data and various University of California, Irvine (UCI) public datasets demonstrate the effectiveness of MCMOPSO in selecting feature subsets with a limited number of high-quality features. MCMOPSO achieves this by efficiently eliminating irrelevant and redundant features, showcasing its efficacy. Therefore, MCMOPSO is a powerful approach for selecting features from high-dimensional metabolomics data with limited sample sizes.

Discrimination of missing data types in metabolomics data based on particle swarm optimization algorithm and XGBoost model.

Yuan, Yang; Du, Jianqiang; Luo, Jigen; Zhu, Yanchen; Huang, Qiang; Zhang, Mengting.

Sci Rep ; 14(1): 152, 2024 01 02.

Artículo en Inglés | MEDLINE | ID: mdl-38168582

RESUMEN

In the field of data analysis, it is often faced with a large number of missing values, especially in metabolomics data, this problem is more prominent. Data imputation is a common method to deal with missing metabolomics data, while traditional data imputation methods usually ignore the differences in missing types, and thus the results of data imputation are not satisfactory. In order to discriminate the missing types of metabolomics data, a missing data classification model (PX-MDC) based on particle swarm algorithm and XGBoost is proposed in this paper. First, the missing values in a given missing data set are obtained by panning the missing values to obtain the largest subset of complete data, and then the particle swarm algorithm is used to search for the concentration threshold of missing data and the proportion of low concentration deletions as a percentage of overall deletions. Next, the missing data are simulated based on the search results. Finally, the training data are trained using the XGBoost model using the feature set proposed in this paper in order to build a classifier for the missing data. The experimental results show that the particle swarm algorithm is able to match the traditional enumeration method in terms of accuracy and significantly reduce the search time in concentration threshold search. Compared with the current mainstream methods, the PX-MDC model designed in this paper exhibits higher accuracy and is able to distinguish different deletion types for the same metabolite. This study is expected to make an important breakthrough in metabolomics data imputation and provide strong support for research in related fields.

Asunto(s)

Algoritmos , Metabolómica , Metabolómica/métodos

Dose-effect relationship analysis of TCM based on deep Boltzmann machine and partial least squares.

Xiong, Wangping; Zhu, Yimin; Zeng, Qingxia; Du, Jianqiang; Wang, Kaiqi; Luo, Jigen; Yang, Ming; Zhou, Xian.

Math Biosci Eng ; 20(8): 14395-14413, 2023 06 30.

Artículo en Inglés | MEDLINE | ID: mdl-37679141

RESUMEN

A dose-effect relationship analysis of traditional Chinese Medicine (TCM) is crucial to the modernization of TCM. However, due to the complex and nonlinear nature of TCM data, such as multicollinearity, it can be challenging to conduct a dose-effect relationship analysis. Partial least squares can be applied to multicollinearity data, but its internally extracted principal components cannot adequately express the nonlinear characteristics of TCM data. To address this issue, this paper proposes an analytical model based on a deep Boltzmann machine (DBM) and partial least squares. The model uses the DBM to extract nonlinear features from the feature space, replaces the components in partial least squares, and performs a multiple linear regression. Ultimately, this model is suitable for analyzing the dose-effect relationship of TCM. The model was evaluated using experimental data from Ma Xing Shi Gan Decoction and datasets from the UCI Machine Learning Repository. The experimental results demonstrate that the prediction accuracy of the model based on the DBM and partial least squares method is on average 10% higher than that of existing methods.

Asunto(s)

Aprendizaje Automático , Medicina Tradicional China , Análisis de los Mínimos Cuadrados , Modelos Lineales

Traditional Chinese Medicine Text Similarity Calculation Model Based on the Bidirectional Temporal Siamese Network.

Luo, Jigen; Xiong, Wangping; Du, Jianqiang; Liu, Yingfeng; Li, Jianwen; Hu, Dingxing.

Evid Based Complement Alternat Med ; 2021: 2337924, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34880918

RESUMEN

The text similarity calculation plays a crucial role as the core work of artificial intelligence commercial applications such as traditional Chinese medicine (TCM) auxiliary diagnosis, intelligent question and answer, and prescription recommendation. However, TCM texts have problems such as short sentence expression, inaccurate word segmentation, strong semantic relevance, high feature dimension, and sparseness. This study comprehensively considers the temporal information of sentence context and proposes a TCM text similarity calculation model based on the bidirectional temporal Siamese network (BTSN). We used the enhanced representation through knowledge integration (ERNIE) pretrained language model to train character vectors instead of word vectors and solved the problem of inaccurate word segmentation in TCM. In the Siamese network, the traditional fully connected neural network was replaced by a deep bidirectional long short-term memory (BLSTM) to capture the contextual semantics of the current word information. The improved similarity BLSTM was used to map the sentence that is to be tested into two sets of low-dimensional numerical vectors. Then, we performed similarity calculation training. Experiments on the two datasets of financial and TCM show that the performance of the BTSN model in this study was better than that of other similarity calculation models. When the number of layers of the BLSTM reached 6 layers, the accuracy of the model was the highest. This verifies that the text similarity calculation model proposed in this study has high engineering value.

Research on Hybrid Feature Selection Method Based on Iterative Approximation Markov Blanket.

Huang, Canyi; Li, Keding; Du, Jianqiang; Nie, Bin; Xu, Guoliang; Xiong, Wangping; Luo, Jigen.

Comput Math Methods Med ; 2020: 8308173, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32328156

RESUMEN

The basic experimental data of traditional Chinese medicine are generally obtained by high-performance liquid chromatography and mass spectrometry. The data often show the characteristics of high dimensionality and few samples, and there are many irrelevant features and redundant features in the data, which bring challenges to the in-depth exploration of Chinese medicine material information. A hybrid feature selection method based on iterative approximate Markov blanket (CI_AMB) is proposed in the paper. The method uses the maximum information coefficient to measure the correlation between features and target variables and achieves the purpose of filtering irrelevant features according to the evaluation criteria, firstly. The iterative approximation Markov blanket strategy analyzes the redundancy between features and implements the elimination of redundant features and then selects an effective feature subset finally. Comparative experiments using traditional Chinese medicine material basic experimental data and UCI's multiple public datasets show that the new method has a better advantage to select a small number of highly explanatory features, compared with Lasso, XGBoost, and the classic approximate Markov blanket method.

Asunto(s)

Bases de Datos Farmacéuticas/estadística & datos numéricos , Medicamentos Herbarios Chinos/química , Reconocimiento de Normas Patrones Automatizadas/estadística & datos numéricos , Algoritmos , Inteligencia Artificial , Cromatografía Líquida de Alta Presión , Biología Computacional , Humanos , Cadenas de Markov , Espectrometría de Masas , Medicina Tradicional China/estadística & datos numéricos

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA