Isfahan University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share this content! On (X network) By
A Generalized Multi-Aspect Distance Metric for Mixed-Type Data Clustering Publisher



Mousavi E1 ; Sehhati M2
Authors
Show Affiliations
Authors Affiliations
  1. 1. Department of Bioelectrics and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
  2. 2. Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

Source: Pattern Recognition Published:2023


Abstract

Distance calculation is straightforward when working with pure categorical or pure numerical data sets. Defining a unified distance to improve the clustering performance for a mixed data set composed of nominal, ordinal, and numerical attributes is very challenging due to the attributes’ different natures. In this study, we proposed a new measure of distance for a mixed-type data set that regards inter-attribute information and intra-attribute information depending on the type of attributes. In this regard, entropy and Jensen–Shannon divergence concepts were used to exploit the inter-attribute information of categorical-categorical and categorical-numerical attributes, respectively. Also, a modified version of Mahalanobis distance was proposed to consider the intra- and inter-attribute information of numerical attributes. We also introduced a unified framework based on mutual information to control attributes’ contribution to distance measurement. The proposed distance in conjunction with spectral clustering was extensively evaluated concerning various categorical, numerical, and mixed-type benchmark data sets, and the results demonstrated the efficacy of the proposed method. © 2023 Elsevier Ltd
Related Docs
Experts (# of related papers)