RESUMEN
This work is devoted to establishing a comparatively accurate classification model between symptoms, constitutions, and regimens for traditional Chinese medicine (TCM) constitution analysis to provide preliminary screening and decision support for clinical diagnosis. However, for the analysis of massive distributed medical data in a cloud platform, the traditional data mining methods have the problems of low mining efficiency and large memory consumption, and long tuning time, an association rules method for TCM constitution analysis (ARA-TCM) is proposed that based on FP-growth algorithm and the open-source distributed file system in Hadoop framework (HDFS) to make full use of its powerful parallel processing capability. Firstly, the proposed method was used to explore the association rules between the 9 kinds of TCM constitutions and symptoms, as well as the regimen treatment plans, so as to discover the rules of typical clinical symptoms and treatment rules of different constitutions and to conduct an evidence-based medical evaluation of TCM effects in constitution-related chronic disease health management. Secondly, experiments were applied on a self-built TCM clinical records database with a total of 30,071 entries and it is found that the top three constitutions are mid constitution (42.3%), hot and humid constitution (31.3%), and inherited special constitution (26.2%), respectively. What is more, there are obvious promotions in the precision and recall rate compared with the Apriori algorithm, which indicates that the proposed method is suitable for the classification of TCM constitutions. This work is mainly focused on uncovering the rules of "disease symptoms constitution regimen" in TCM medical records, but tongue image and pulse signal are also very important to TCM constitution analysis. Therefore, this additional information should be considered into further studies to be more in line with the actual clinical needs.