TY - GEN
T1 - Bag-of-Visual Words for word-wise video script identification
T2 - International Joint Conference on Neural Networks, IJCNN 2015
AU - Sharma, Nabin
AU - Mandal, Ranju
AU - Sharma, Rabi
AU - Pal, Umapada
AU - Blumenstein, Michael
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/9/28
Y1 - 2015/9/28
N2 - Use of multiple scripts for information communication through various media is quite common in a multilingual country. Optical character recognition of such document images or videos assists in indexing them for effective information retrieval. Hence, script identification from multi-lingual documents/images is a necessary step for selecting the appropriate OCR, due the absence of a single OCR system capable of handling multiple scripts. Script identification from printed as well as handwritten documents is a well-researched area, but script identification from video frames has not been explored much. Low resolution, blur, noisy background, to mention a few are the major bottle necks when processing video frames, and makes script identification from video images a challenging task. This paper examines the potential of Bag-of-Visual Words based techniques for word-wise script identification from video frames. Two different approaches namely, Bag-Of-Features (BoF) and Spatial Pyramid Matching (SPM), using patch based SIFT descriptors were considered for the current study. SVM Classifier was used for analysing the three popular south Indian scripts, namely Tamil, Telugu and Kannada in combination with English and Hindi. A comparative study of Bag-of-Visual words with traditional script identification techniques involving gradient based features (e.g. HoG) and texture based features (e.g. LBP) is presented. Experimental results shows that patch-based features along with SPM outperformed the traditional techniques and promising accuracies were achieved on 2534 words from the five scripts. The study reveals that patch-based feature can be used for scripts identification in-order to overcome the inherent problems with video frames.
AB - Use of multiple scripts for information communication through various media is quite common in a multilingual country. Optical character recognition of such document images or videos assists in indexing them for effective information retrieval. Hence, script identification from multi-lingual documents/images is a necessary step for selecting the appropriate OCR, due the absence of a single OCR system capable of handling multiple scripts. Script identification from printed as well as handwritten documents is a well-researched area, but script identification from video frames has not been explored much. Low resolution, blur, noisy background, to mention a few are the major bottle necks when processing video frames, and makes script identification from video images a challenging task. This paper examines the potential of Bag-of-Visual Words based techniques for word-wise script identification from video frames. Two different approaches namely, Bag-Of-Features (BoF) and Spatial Pyramid Matching (SPM), using patch based SIFT descriptors were considered for the current study. SVM Classifier was used for analysing the three popular south Indian scripts, namely Tamil, Telugu and Kannada in combination with English and Hindi. A comparative study of Bag-of-Visual words with traditional script identification techniques involving gradient based features (e.g. HoG) and texture based features (e.g. LBP) is presented. Experimental results shows that patch-based features along with SPM outperformed the traditional techniques and promising accuracies were achieved on 2534 words from the five scripts. The study reveals that patch-based feature can be used for scripts identification in-order to overcome the inherent problems with video frames.
KW - Accuracy
KW - Computational modeling
KW - Feature extraction
KW - Image resolution
KW - Optical character recognition software
KW - Vector quantization
UR - https://www.scopus.com/pages/publications/84951169810
U2 - 10.1109/IJCNN.2015.7280631
DO - 10.1109/IJCNN.2015.7280631
M3 - Conference contribution
AN - SCOPUS:84951169810
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2015 International Joint Conference on Neural Networks, IJCNN 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 July 2015 through 17 July 2015
ER -