RESUMO
Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, brokers, experts, and garages. Next, we establish fraud as a social phenomenon in the network and use the BiRank algorithm with a fraud-specific query vector to compute a fraud score for each claim. From the network, we extract features related to the fraud scores as well as the claims' neighborhood structure. Finally, we combine these network features with the claim-specific features and build a supervised model with fraud in motor insurance as the target variable. Although we build a model for only motor insurance, the network includes claims from all available lines of business. Our results show that models with features derived from the network perform well when detecting fraud and even outperform the models using only the classical claim-specific features. Combining network and claim-specific features further improves the performance of supervised learning models to detect fraud. The resulting model flags highly suspicions claims that need to be further investigated. Our approach provides a guided and intelligent selection of claims and contributes to a more effective fraud investigation process.
Assuntos
Fraude , Seguro , Algoritmos , Rede Social , Estados UnidosRESUMO
Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.
RESUMO
PURPOSE: Since clinically significant upgrading of the biopsy Gleason score has an adverse clinical impact, ancillary tools besides the visual determination of primary Gleason pattern are essential to aid in better risk stratification. MATERIALS AND METHODS: A total of 61 prostate biopsies were selected in patients with a diagnosis of Gleason score 7 prostatic adenocarcinoma, including 41 with primary Gleason pattern 3 and 20 with primary Gleason pattern 4. Slides from these tissues were stained using Feulgen stain, a nuclear DNA stain. Areas of Gleason pattern in all cases were analyzed for 40 nuclear morphometric descriptors of size, shape and chromatin using a CAS-200 system (BD). The primary outcome analyzed was the ability of morphometric features to identify visually determined primary Gleason pattern 4 on the biopsy. Data were analyzed using logistic regression as well as a C4.5 decision tree with and without preselection. RESULTS: Decision tree analysis yielded the best model. Automatic feature selection identified minimum nuclear diameter as the most discriminative feature in a 3-parameter model with 85% classification accuracy. Using a preselected 3-parameter model including minimum diameter, angularity and sum optical density the decision tree yielded a slightly lesser accuracy of around 79%. Bootstrap validation of logistic regression results revealed that there was no unique model that could significantly explain the variance in primary Gleason pattern status, although minimum nuclear diameter was the most frequently selected parameter. CONCLUSIONS: In this small cohort of patients with Gleason score 7 disease we report that Gleason pattern 4 nuclei from those with primary Gleason pattern 4 are generally larger with coarser chromatin compared with the Gleason pattern 4 in patients with primary Gleason pattern 3. These findings may aid in better risk stratification of the Gleason score 7 group by supplementing visual estimation of the percent of primary Gleason pattern 3 and 4 in the biopsy.
Assuntos
Adenocarcinoma/patologia , Neoplasias da Próstata/patologia , Idoso , Biópsia por Agulha , Núcleo Celular/patologia , Humanos , Masculino , Análise de RegressãoRESUMO
Various benchmarking studies have shown that artificial neural networks and support vector machines often have superior performance when compared to more traditional machine learning techniques. The main resistance against these newer techniques is based on their lack of interpretability: it is difficult for the human analyst to understand the reasoning behind these models' decisions. Various rule extraction (RE) techniques have been proposed to overcome this opacity restriction. These techniques are able to represent the behavior of the complex model with a set of easily understandable rules. However, most of the existing RE techniques can only be applied under limited circumstances, e.g., they assume that all inputs are categorical or can only be applied if the black-box model is a neural network. In this paper, we present Minerva, which is a new algorithm for RE. The main advantage of Minerva is its ability to extract a set of rules from any type of black-box model. Experiments show that the extracted models perform well in comparison with various other rule and decision tree learners.
Assuntos
Algoritmos , Inteligência Artificial , Técnicas de Apoio para a Decisão , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodosRESUMO
The goal of customer retention campaigns, by design, is to add value and enhance the operational efficiency of businesses. For organizations that strive to retain their customers in saturated, and sometimes fast moving, markets such as the telecommunication and banking industries, implementing customer churn prediction models that perform well and in accordance with the business goals is vital. The expected maximum profit (EMP) measure is tailored toward this problem by taking into account the costs and benefits of a retention campaign and estimating its worth for the organization. Unfortunately, the measure assumes fixed and equal customer lifetime value (CLV) for all customers, which has been shown to not correspond well with reality. In this article, we extend the EMP measure to take into account the variability in the lifetime values of customers, thereby basing it on individual characteristics. We demonstrate how to incorporate the heterogeneity of CLVs when CLVs are known, when their prior distribution is known, and when neither is known. By taking into account individual CLVs, our proposed approach of measuring model performance gives novel insights when deciding on a customer retention campaign. The method is dependent on the characteristics of the customer base as is compliant with modern business analytics and accommodates the data-driven culture that has manifested itself within organizations.
Assuntos
Comércio , Comportamento do Consumidor , Competição Econômica , Algoritmos , Eficiência Organizacional , Modelos Estatísticos , Estados UnidosRESUMO
The care processes of healthcare providers are typically considered as human-centric, flexible, evolving, complex and multi-disciplinary. Consequently, acquiring an insight in the dynamics of these care processes can be an arduous task. A novel event log based approach for extracting valuable medical and organizational information on past executions of the care processes is presented in this study. Care processes are analyzed with the help of a preferential set of process mining techniques in order to discover recurring patterns, analyze and characterize process variants and identify adverse medical events.
Assuntos
Atenção à Saúde , Doenças dos Genitais Femininos/terapia , Modelos Teóricos , Neoplasias/terapia , Feminino , HumanosRESUMO
This paper proposes the Clinical Pathway Analysis Method (CPAM) approach that enables the extraction of valuable organisational and medical information on past clinical pathway executions from the event logs of healthcare information systems. The method deals with the complexity of real-world clinical pathways by introducing a perspective-based segmentation of the date-stamped event log. CPAM enables the clinical pathway analyst to effectively and efficiently acquire a profound insight into the clinical pathways. By comparing the specific medical conditions of patients with the factors used for characterising the different clinical pathway variants, the medical expert can identify the best therapeutic option. Process mining-based analytics enables the acquisition of valuable insights into clinical pathways, based on the complete audit traces of previous clinical pathway instances. Additionally, the methodology is suited to assess guideline compliance and analyse adverse events. Finally, the methodology provides support for eliciting tacit knowledge and providing treatment selection assistance.
Assuntos
Procedimentos Clínicos/normas , Mineração de Dados , Sistemas de Informação Hospitalar/normas , Avaliação de Processos em Cuidados de Saúde/normas , Algoritmos , Armazenamento e Recuperação da InformaçãoRESUMO
BACKGROUND: Biopsy Gleason score (bGS) remains an important prognostic indicator for adverse outcomes in Prostate Cancer (PCA). In the light of recent studies purporting difference in prognostic outcomes for the subgroups of GS7 group (primary Gleason pattern 4 vs. 3), upgrading of a bGS of 6 to a GS≥7 has serious implications. We sought to identify pre-operative factors associated with upgrading in a cohort of GS6 patients who underwent prostatectomy. DESIGN: We identified 281 cases of GS6 PCA on biopsy with subsequent prostatectomies. Using data on pre-operative variables (age, PSA, biopsy pathology parameters), logistic regression models (LRM) were developed to identify factors that could be used to predict upgrading to GS≥7 on subsequent prostatectomy. A decision tree (DT) was constructed. RESULTS: 92 of 281 cases (32.7%) were upgraded on subsequent prostatectomy. LRM identified a model with two variables with statistically significant ability to predict upgrading, including pre-biopsy PSA (Odds Ratio 8.66; 2.03-37.49, 95% CI) and highest percentage of cancer at any single biopsy site (Odds Ratio 1.03, 1.01-1.05, 95% CI). This two-parameter model yielded an area under curve of 0.67. The decision tree was constructed using only 3 leave nodes; with a test set classification accuracy of 70%. CONCLUSIONS: A simplistic model using clinical and biopsy data is able to predict the likelihood of upgrading of GS with an acceptable level of certainty. External validation of these findings along with development of a nomogram will aid in better stratifying the cohort of low risk patients as based on the GS.
Assuntos
Adenocarcinoma/patologia , Prostatectomia , Neoplasia Prostática Intraepitelial/patologia , Neoplasias da Próstata/patologia , Adenocarcinoma/sangue , Adulto , Idoso , Biópsia , Estudos de Coortes , Árvores de Decisões , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Invasividade Neoplásica , Período Pré-Operatório , Prognóstico , Antígeno Prostático Específico/sangue , Neoplasia Prostática Intraepitelial/sangue , Neoplasias da Próstata/sangue , Medição de Risco/métodosRESUMO
While feedforward neural networks have been widely accepted as effective tools for solving classification problems, the issue of finding the best network architecture remains unresolved, particularly so in real-world problem settings. We address this issue in the context of credit card screening, where it is important to not only find a neural network with good predictive performance but also one that facilitates a clear explanation of how it produces its predictions. We show that minimal neural networks with as few as one hidden unit provide good predictive accuracy, while having the added advantage of making it easier to generate concise and comprehensible classification rules for the user. To further reduce model size, a novel approach is suggested in which network connections from the input units to this hidden unit are removed by a very straightaway pruning procedure. In terms of predictive accuracy, both the minimized neural networks and the rule sets generated from them are shown to compare favorably with other neural network based classifiers. The rules generated from the minimized neural networks are concise and thus easier to validate in a real-life setting.