TY - GEN
T1 - A hierarchical VQSVM for imbalanced data sets
AU - Yu, Ting
AU - Jan, Tony
AU - Simoff, Simeon
AU - Debenham, John
PY - 2007
Y1 - 2007
N2 - First, a hierarchical modelling method, VQSVM, is introduced, and some remarks are discussed. Secondly the proposed VQSVM is applied to a nonstandard learning environment, imbalanced data sets. In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. The hierarchical VQSVM contains a set of local models i.e. codevectors produced by the Vector Quantization and a global model, i.e. Support Vector Machine, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling rate. Experiments compare VQSVM with random resampling techniques on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQSVM is superior or equivalent to random resampling techniques, especially in case of extremely imbalanced large datasets.
AB - First, a hierarchical modelling method, VQSVM, is introduced, and some remarks are discussed. Secondly the proposed VQSVM is applied to a nonstandard learning environment, imbalanced data sets. In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. The hierarchical VQSVM contains a set of local models i.e. codevectors produced by the Vector Quantization and a global model, i.e. Support Vector Machine, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling rate. Experiments compare VQSVM with random resampling techniques on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQSVM is superior or equivalent to random resampling techniques, especially in case of extremely imbalanced large datasets.
UR - http://www.scopus.com/inward/record.url?scp=51949101536&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2007.4371010
DO - 10.1109/IJCNN.2007.4371010
M3 - Conference contribution
AN - SCOPUS:51949101536
SN - 142441380X
SN - 9781424413805
T3 - IEEE International Conference on Neural Networks - Conference Proceedings
SP - 518
EP - 523
BT - The 2007 International Joint Conference on Neural Networks, IJCNN 2007 Conference Proceedings
T2 - 2007 International Joint Conference on Neural Networks, IJCNN 2007
Y2 - 12 August 2007 through 17 August 2007
ER -