Abstract
When dealing with real-world problems, there is considerable amount of prior domain knowledge that can provide insights on various aspect of the problem. On the other hand, many machine learning methods rely solely on the data sets for their learning phase and do not take into account any explicitly expressed domain knowledge. This paper proposes a framework that investigates and enables the incorporation of prior domain knowledge with respect to three key characteristics of inductive machine learning algorithms: consistency, generalization and convergence. The framework is used to review, classify and analyse key existing approaches to incorporating domain knowledge into inductive machine learning, as well as to consider the risks of doing so. The paper also demonstrates the design of a novel hierarchical semi-parametric machine learning method, capable of incorporating prior domain knowledge. The method-VQSVM-extends the support vector machine (SVM) family of methods with vector quantization (VQ) techniques to address the problem of learning from imbalanced data sets. The paper presents the results of testing the method on a collection of imbalanced data sets with various imbalance ratios and various numbers of subclasses. The learning process of the VQSVM method utilizes some domain knowledge to solve problem of fitting imbalance data. The experiments in the paper demonstrate that enabling the incorporation of prior domain knowledge into the SVM framework is an effective way to overcome the sensitivity of SVM towards the imbalance ratio in a data set.
Original language | English |
---|---|
Pages (from-to) | 2614-2623 |
Number of pages | 10 |
Journal | Neurocomputing |
Volume | 73 |
Issue number | 13-15 |
DOIs | |
Publication status | Published - Aug 2010 |
Externally published | Yes |
Keywords
- Imbalance data
- Inductive machine learning
- Prior domain knowledge
- Support vector machine