Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Technol Health Care ; 32(1): 75-87, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-37248924

RESUMEN

BACKGROUND: In practice, the collected datasets for data analysis are usually incomplete as some data contain missing attribute values. Many related works focus on constructing specific models to produce estimations to replace the missing values, to make the original incomplete datasets become complete. Another type of solution is to directly handle the incomplete datasets without missing value imputation, with decision trees being the major technique for this purpose. OBJECTIVE: To introduce a novel approach, namely Deep Learning-based Decision Tree Ensembles (DLDTE), which borrows the bounding box and sliding window strategies used in deep learning techniques to divide an incomplete dataset into a number of subsets and learning from each subset by a decision tree, resulting in decision tree ensembles. METHOD: Two medical domain problem datasets contain several hundred feature dimensions with the missing rates of 10% to 50% are used for performance comparison. RESULTS: The proposed DLDTE provides the highest rate of classification accuracy when compared with the baseline decision tree method, as well as two missing value imputation methods (mean and k-nearest neighbor), and the case deletion method. CONCLUSION: The results demonstrate the effectiveness of DLDTE for handling incomplete medical datasets with different missing rates.


Asunto(s)
Aprendizaje Profundo , Humanos , Análisis por Conglomerados , Árboles de Decisión
2.
PLoS One ; 12(1): e0161501, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28060807

RESUMEN

Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.


Asunto(s)
Neoplasias de la Mama , Modelos Biológicos , Máquina de Vectores de Soporte , Algoritmos , Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/etiología , Conjuntos de Datos como Asunto , Femenino , Humanos , Aprendizaje Automático , Curva ROC , Reproducibilidad de los Resultados
3.
Springerplus ; 5(1): 1285, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27547660

RESUMEN

INTRODUCTION: More and more universities are receiving accreditation from the Association to Advance Collegiate Schools of Business (AACSB), which is an international association for promoting quality teaching and learning at business schools. To be accredited, the schools are required to meet a number of standards ensuring that certain levels of teaching quality and students' learning are met. However, there are a variety of points of view espoused in the literature regarding the relationship between research and teaching, some studies have demonstrated that research and teaching these are complementary elements of learning, but others disagree with these findings. CASE DESCRIPTION: Unlike past such studies, we focus on analyzing the research performance of accredited schools during the period prior to and after receiving accreditation. The objective is to answer the question as to whether performance has been improved by comparing the same school's performance before and after accreditation. In this study, four AACSB accredited universities in Taiwan are analyzed, including one teaching oriented and three research oriented universities. Research performance is evaluated by comparing seven citation statistics, the number of papers published, number of citations, average number of citations per paper, average citations per year, h-index (annual), h-index, and g-index. DISCUSSION AND EVALUATION: The analysis results show that business schools demonstrated enhanced research performance after AACSB accreditation, but in most accredited schools the proportion of faculty members not actively doing research is larger than active ones. CONCLUSION: This study shows that the AACSB accreditation has a positive impact on research performance. The findings can be used as a reference for current non-accredited schools whose research goals are to improve their research productivity and quality.

4.
Springerplus ; 5(1): 1304, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27547678

RESUMEN

INTRODUCTION: K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. CASE DESCRIPTION: Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. DISCUSSION AND EVALUATION: The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. CONCLUSIONS: In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.

5.
Technol Health Care ; 23(5): 619-25, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26410122

RESUMEN

BACKGROUND: To collect medical datasets, it is usually the case that a number of data samples contain some missing values. Performing the data mining task over the incomplete datasets is a difficult problem. In general, missing value imputation can be approached, which aims at providing estimations for missing values by reasoning from the observed data. Consequently, the effectiveness of missing value imputation is heavily dependent on the observed data (or complete data) in the incomplete datasets. OBJECTIVE: In this paper, the research objective is to perform instance selection to filter out some noisy data (or outliers) from a given (complete) dataset to see its effect on the final imputation result. Specifically, four different processes of combining instance selection and missing value imputation are proposed and compared in terms of data classification. METHODS: Experiments are conducted based on 11 medical related datasets containing categorical, numerical, and mixed attribute types of data. In addition, missing values for each dataset are introduced into all attributes (the missing data rates are 10%, 20%, 30%, 40%, and 50%). For instance selection and missing value imputation, the DROP3 and k-nearest neighbor imputation methods are employed. On the other hand, the support vector machine (SVM) classifier is used to assess the final classification accuracy of the four different processes. RESULTS: The experimental results show that the second process by performing instance selection first and imputation second allows the SVM classifiers to outperform the other processes. CONCLUSIONS: For incomplete medical datasets containing some missing values, it is necessary to perform missing value imputation. In this paper, we demonstrate that instance selection can be used to filter out some noisy data or outliers before the imputation process. In other words, the observed data for missing value imputation may contain some noisy information, which can degrade the quality of the imputation result as well as the classification performance.


Asunto(s)
Exactitud de los Datos , Minería de Datos/métodos , Minería de Datos/normas , Máquina de Vectores de Soporte , Algoritmos , Interpretación Estadística de Datos , Humanos
6.
Technol Health Care ; 23(2): 153-60, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25515050

RESUMEN

BACKGROUND: The size of medical datasets is usually very large, which directly affects the computational cost of the data mining process. Instance selection is a data preprocessing step in the knowledge discovery process, which can be employed to reduce storage requirements while also maintaining the mining quality. This process aims to filter out outliers (or noisy data) from a given (training) dataset. However, when the dataset is very large in size, more time is required to accomplish the instance selection task. OBJECTIVE: In this paper, we introduce an efficient data preprocessing approach (EDP), which is composed of two steps. The first step is based on training a model over a small amount of training data after preforming instance selection. The model is then used to identify the rest of the large amount of training data. METHODS: Experiments are conducted based on two medical datasets for breast cancer and protein homology prediction problems that contain over 100000 data samples. In addition, three well-known instance selection algorithms are used, IB3, DROP3, and genetic algorithms. On the other hand, three popular classification techniques are used to construct the learning models for comparison, namely the CART decision tree, k-nearest neighbor (k-NN), and support vector machine (SVM). RESULTS: The results show that our proposed approach not only reduces the computational cost by nearly a factor of two or three over three other state-of-the-art algorithms, but also maintains the final classification accuracy. CONCLUSIONS: To perform instance selection over large scale medical datasets, it requires a large computational cost to directly execute existing instance selection algorithms. Our proposed EDP approach solves this problem by training a learning model to recognize good and noisy data. To consider both computational complexity and final classification accuracy, the proposed EDP has been demonstrated its efficiency and effectiveness in the large scale instance selection problem.


Asunto(s)
Minería de Datos/métodos , Algoritmos , Conjuntos de Datos como Asunto , Árboles de Decisión , Humanos , Aprendizaje Automático , Modelos Teóricos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...