Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Technol Health Care ; 32(1): 75-87, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-37248924

RESUMEN

BACKGROUND: In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose. OBJECTIVE: To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles. METHOD: Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison. RESULTS: The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method. CONCLUSION: The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.


Asunto(s)
Aprendizaje Profundo , Humanos , Análisis por Conglomerados , Árboles de Decisión
2.
PLoS One ; 18(11): e0295032, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38033140

RESUMEN

Data discretization aims to transform a set of continuous features into discrete features, thus simplifying the representation of information and making it easier to understand, use, and explain. In practice, users can take advantage of the discretization process to improve knowledge discovery and data analysis on medical domain problem datasets containing continuous features. However, certain feature values were frequently missing. Many data-mining algorithms cannot handle incomplete datasets. In this study, we considered the use of both discretization and missing-value imputation to process incomplete medical datasets, examining how the order of discretization and missing-value imputation combined influenced performance. The experimental results were obtained using seven different medical domain problem datasets: two discretizers, including the minimum description length principle (MDLP) and ChiMerge; three imputation methods, including the mean/mode, classification and regression tree (CART), and k-nearest neighbor (KNN) methods; and two classifiers, including support vector machines (SVM) and the C4.5 decision tree. The results show that a better performance can be obtained by first performing discretization followed by imputation, rather than vice versa. Furthermore, the highest classification accuracy rate was achieved by combining ChiMerge and KNN with SVM.


Asunto(s)
Algoritmos , Máquina de Vectores de Soporte , Minería de Datos , Análisis por Conglomerados
3.
Medicine (Baltimore) ; 99(18): e20090, 2020 May.
Artículo en Inglés | MEDLINE | ID: mdl-32358395

RESUMEN

In traditional Chinese medicine (TCM) clinics, the pharmacists responsible for dispensing the herbal medicine usually find the desired ingredients based on positions of the shelves (racks; frames; stands). Generally, these containers are arranged in an alphabetical order depending on the herbal medicine they contain. However, certain related ingredients tend to be used together in many prescriptions, even though the containers may be stored far away from each other. This can cause problems, especially when there are many patients and/or the limited number of pharmacists. If the dispensing time takes longer, it is likely to impact the satisfaction of the patients' experience. Moreover, the stamina of the pharmacists will be consumed quickly.In this study, we investigate on an association rule mining technology to improve efficiency in TCM dispensing based on the frequent pattern growth algorithm and try to identify which 2 or 3 herbal medicines will match together frequently in prescriptions. Furthermore, 3 experimental studies are conducted based on a dataset collected from a traditional Chinese medicine hospital. The dataset includes information for an entire year (2014), including 4 seasons and doctors. Afterward, a questionnaire on the usefulness of the extracted rules was administered to the pharmacists in the case hospital. The responses showed the mining results to be very valuable as a reference for the placement and ordering of the frames in the TCM pharmacies and drug stores.


Asunto(s)
Almacenaje de Medicamentos/métodos , Medicamentos Herbarios Chinos , Eficiencia Organizacional , Aprendizaje Automático , Farmacias/organización & administración , Algoritmos , Humanos , Farmacias/normas
4.
J Healthc Eng ; 2018: 3948245, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30210752

RESUMEN

Digoxin is a high-alert medication because of its narrow therapeutic range and high drug-to-drug interactions (DDIs). Approximately 50% of digoxin toxicity cases are preventable, which motivated us to improve the treatment outcomes of digoxin. The objective of this study is to apply machine learning techniques to predict the appropriateness of initial digoxin dosage. A total of 307 inpatients who had their conditions treated with digoxin between 2004 and 2013 at a medical center in Taiwan were collected in the study. Ten independent variables, including demographic information, laboratory data, and whether the patients had CHF were also noted. A patient with serum digoxin concentration being controlled at 0.5-0.9 ng/mL after his/her initial digoxin dosage was defined as having an appropriate use of digoxin; otherwise, a patient was defined as having an inappropriate use of digoxin. Weka 3.7.3, an open source machine learning software, was adopted to develop prediction models. Six machine learning techniques were considered, including decision tree (C4.5), k-nearest neighbors (kNN), classification and regression tree (CART), randomForest (RF), multilayer perceptron (MLP), and logistic regression (LGR). In the non-DDI group, the area under ROC curve (AUC) of RF (0.912) was excellent, followed by that of MLP (0.813), CART (0.791), and C4.5 (0.784); the remaining classifiers performed poorly. For the DDI group, the AUC of RF (0.892) was the best, followed by CART (0.795), MLP (0.777), and C4.5 (0.774); the other classifiers' performances were less than ideal. The decision tree-based approaches and MLP exhibited markedly superior accuracy performance, regardless of DDI status. Although digoxin is a high-alert medication, its initial dose can be accurately determined by using data mining techniques such as decision tree-based and MLP approaches. Developing a dosage decision support system may serve as a supplementary tool for clinicians and also increase drug safety in clinical practice.


Asunto(s)
Antiarrítmicos/administración & dosificación , Sistemas de Apoyo a Decisiones Clínicas , Digoxina/administración & dosificación , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/prevención & control , Aprendizaje Automático , Adulto , Anciano , Anciano de 80 o más Años , Antiarrítmicos/efectos adversos , Digoxina/efectos adversos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Adulto Joven
5.
J Healthc Eng ; 2018: 1817479, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29599943

RESUMEN

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.


Asunto(s)
Algoritmos , Bases de Datos Factuales , Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas/métodos , Investigación Biomédica , Humanos , Registros Médicos
6.
PLoS One ; 12(1): e0161501, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28060807

RESUMEN

Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.


Asunto(s)
Neoplasias de la Mama , Modelos Biológicos , Máquina de Vectores de Soporte , Algoritmos , Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/etiología , Conjuntos de Datos como Asunto , Femenino , Humanos , Aprendizaje Automático , Curva ROC , Reproducibilidad de los Resultados
7.
Springerplus ; 5(1): 1285, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27547660

RESUMEN

INTRODUCTION: More and more universities are receiving accreditation from the Association to Advance Collegiate Schools of Business (AACSB), which is an international association for promoting quality teaching and learning at business schools. To be accredited, the schools are required to meet a number of standards ensuring that certain levels of teaching quality and students' learning are met. However, there are a variety of points of view espoused in the literature regarding the relationship between research and teaching, some studies have demonstrated that research and teaching these are complementary elements of learning, but others disagree with these findings. CASE DESCRIPTION: Unlike past such studies, we focus on analyzing the research performance of accredited schools during the period prior to and after receiving accreditation. The objective is to answer the question as to whether performance has been improved by comparing the same school's performance before and after accreditation. In this study, four AACSB accredited universities in Taiwan are analyzed, including one teaching oriented and three research oriented universities. Research performance is evaluated by comparing seven citation statistics, the number of papers published, number of citations, average number of citations per paper, average citations per year, h-index (annual), h-index, and g-index. DISCUSSION AND EVALUATION: The analysis results show that business schools demonstrated enhanced research performance after AACSB accreditation, but in most accredited schools the proportion of faculty members not actively doing research is larger than active ones. CONCLUSION: This study shows that the AACSB accreditation has a positive impact on research performance. The findings can be used as a reference for current non-accredited schools whose research goals are to improve their research productivity and quality.

8.
Springerplus ; 5(1): 1304, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27547678

RESUMEN

INTRODUCTION: K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. CASE DESCRIPTION: Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. DISCUSSION AND EVALUATION: The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. CONCLUSIONS: In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.

9.
Springerplus ; 5(1): 729, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27375998

RESUMEN

In the diagnosis of late-onset hypogonadism (LOH), the Androgen Deficiency in the Aging Male (ADAM) questionnaire or Aging Males' Symptoms (AMS) scale can be used to assess related symptoms. Subsequently, blood tests are used to measure serum testosterone levels. However, results obtained using ADAM and AMS have revealed no significant correlations between ADAM and AMS scores and LOH, and the rate of misclassification is high. Recently, many studies have reported significant associations between clinical conditions such as the metabolic syndrome, obesity, lower urinary tract symptoms, and LOH. In this study, we sampled 772 clinical cases of men who completed both a health checkup and two questionnaires (ADAM and AMS). The data were obtained from the largest medical center in Taiwan. Two well-known classification techniques, the decision tree (DT) and logistic regression, were used to construct LOH prediction models on the basis of the aforementioned features. The results indicate that although the sensitivity of ADAM is the highest (0.878), it has the lowest specificity (0.099), which implies that ADAM overestimates LOH occurrence. In addition, DT combined with the AdaBoost technique (AdaBoost DT) has the second highest sensitivity (0.861) and specificity (0.842), resulting in having the best accuracy (0.851) among all classifiers. AdaBoost DT can provide robust predictions that will aid clinical decisions and can help medical staff in accurately assessing the possibilities of LOH occurrence.

10.
Springerplus ; 5: 528, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27186492

RESUMEN

Classification is one of the most important technologies used in data mining. Researchers have recently proposed several classification techniques based on the concept of association rules (also known as CBA-based methods). Experimental evaluations on these studies show that in average the CBA-based approaches can yield higher accuracy than some of conventional classification methods. However, conventional CBA-based methods adopt a single threshold of minimum support for all items, resulting in the rare item problem. In other words, the classification rules will only contain frequent items if minimum support (minsup) is set as high or any combinations of items are discovered as frequent if minsup is set as low. To solve this problem, this paper proposes a novel CBA-based method called MMSCBA, which considers the concept of multiple minimum supports (MMSs). Based on MMSs, different classification rules appear in the corresponding minsups. Several experiments were conducted with six real-world datasets selected from the UCI Machine Learning Repository. The results show that MMSCBA achieves higher accuracy than conventional CBA methods, especially when the dataset contains rare items.

11.
Technol Health Care ; 23(5): 619-25, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26410122

RESUMEN

BACKGROUND: To collect medical datasets, it is usually the case that a number of data samples contain some missing values. Performing the data mining task over the incomplete datasets is a difficult problem. In general, missing value imputation can be approached, which aims at providing estimations for missing values by reasoning from the observed data. Consequently, the effectiveness of missing value imputation is heavily dependent on the observed data (or complete data) in the incomplete datasets. OBJECTIVE: In this paper, the research objective is to perform instance selection to filter out some noisy data (or outliers) from a given (complete) dataset to see its effect on the final imputation result. Specifically, four different processes of combining instance selection and missing value imputation are proposed and compared in terms of data classification. METHODS: Experiments are conducted based on 11 medical related datasets containing categorical, numerical, and mixed attribute types of data. In addition, missing values for each dataset are introduced into all attributes (the missing data rates are 10%, 20%, 30%, 40%, and 50%). For instance selection and missing value imputation, the DROP3 and k-nearest neighbor imputation methods are employed. On the other hand, the support vector machine (SVM) classifier is used to assess the final classification accuracy of the four different processes. RESULTS: The experimental results show that the second process by performing instance selection first and imputation second allows the SVM classifiers to outperform the other processes. CONCLUSIONS: For incomplete medical datasets containing some missing values, it is necessary to perform missing value imputation. In this paper, we demonstrate that instance selection can be used to filter out some noisy data or outliers before the imputation process. In other words, the observed data for missing value imputation may contain some noisy information, which can degrade the quality of the imputation result as well as the classification performance.


Asunto(s)
Exactitud de los Datos , Minería de Datos/métodos , Minería de Datos/normas , Máquina de Vectores de Soporte , Algoritmos , Interpretación Estadística de Datos , Humanos
12.
Technol Health Care ; 23(2): 153-60, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25515050

RESUMEN

BACKGROUND: The size of medical datasets is usually very large, which directly affects the computational cost of the data mining process. Instance selection is a data preprocessing step in the knowledge discovery process, which can be employed to reduce storage requirements while also maintaining the mining quality. This process aims to filter out outliers (or noisy data) from a given (training) dataset. However, when the dataset is very large in size, more time is required to accomplish the instance selection task. OBJECTIVE: In this paper, we introduce an efficient data preprocessing approach (EDP), which is composed of two steps. The first step is based on training a model over a small amount of training data after preforming instance selection. The model is then used to identify the rest of the large amount of training data. METHODS: Experiments are conducted based on two medical datasets for breast cancer and protein homology prediction problems that contain over 100000 data samples. In addition, three well-known instance selection algorithms are used, IB3, DROP3, and genetic algorithms. On the other hand, three popular classification techniques are used to construct the learning models for comparison, namely the CART decision tree, k-nearest neighbor (k-NN), and support vector machine (SVM). RESULTS: The results show that our proposed approach not only reduces the computational cost by nearly a factor of two or three over three other state-of-the-art algorithms, but also maintains the final classification accuracy. CONCLUSIONS: To perform instance selection over large scale medical datasets, it requires a large computational cost to directly execute existing instance selection algorithms. Our proposed EDP approach solves this problem by training a learning model to recognize good and noisy data. To consider both computational complexity and final classification accuracy, the proposed EDP has been demonstrated its efficiency and effectiveness in the large scale instance selection problem.


Asunto(s)
Minería de Datos/métodos , Algoritmos , Conjuntos de Datos como Asunto , Árboles de Decisión , Humanos , Aprendizaje Automático , Modelos Teóricos
13.
Cogn Process ; 10(3): 233-42, 2009 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-19083036

RESUMEN

This paper describes the automatic assignment of images into classes described by individual keywords provided with the Corel data set. Automatic image annotation technology aims to provide an efficient and effective searching environment for users to query their images more easily, but current image retrieval systems are still not very accurate when assigning images into a large number of keyword classes. Noisy features are the main problem, causing some keywords never to be assigned to their correct images. This paper focuses on improving image classification, first by selection of features to characterise each image, and then the selection of the most suitable feature vectors as training data. A Pixel Density filter (PDfilter) and Information Gain (IG) are proposed to perform these respective tasks. We filter out the noisy features so that groups of images can be represented by their most important values. The experiments use hue, saturation and value (HSV) colour feature space to categorise images according to one of 190 concrete keywords or subsets of these. The study shows that feature selection through the PDfilter and IG can improve the problem of spurious similarity.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Inteligencia Artificial , Gestión de la Información , Modelos Teóricos , Vocabulario
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...