Combine vector quantization and support vector machine for imbalanced datasets

Ting Yu, John Debenham, Tony Jan, Simeon Simoff

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

7 Citations (Scopus)


In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. This paper rebalances skewed datasets by compressing the majority class. This approach combines Vector Quantization and Support Vector Machine and constructs a new approach, VQ-SVM, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling. Experiments compare VQ-SVM and standard SVM on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQ-SVM is superior to SVM, especially in case of extremely imbalanced large datasets.

Original languageEnglish
Title of host publicationArtificial Intelligence in Theory and Practice
Subtitle of host publicationIFIP 19th World Computer Congress, TC 12: IFIP AI 2006 Stream, August 21-24, 2006, Santiago, Chile
EditorsMax Bramer
Number of pages8
Publication statusPublished - 2006
Externally publishedYes

Publication series

NameIFIP International Federation for Information Processing
ISSN (Print)1571-5736


Dive into the research topics of 'Combine vector quantization and support vector machine for imbalanced datasets'. Together they form a unique fingerprint.

Cite this