Please use this identifier to cite or link to this item: http://gukir.inflibnet.ac.in:8080/jspui/handle/123456789/3729
Title: Word-wise script identification from bilingual documents based on morphological reconstruction
Authors: Dhandra B.V
Mallikarjun H
Hegadi R
Malemath V.S.
Issue Date: 2006
Citation: 2006 1st International Conference on Digital Information Management, ICDIM , Vol. , , p. 389 - 394
Abstract: In a multi-lingual country like India, English has proven to be the binding language. So, a line of a bilingual document page may contain text words in regional language and numerals in English (printed or handwritten). For Optical Character Recognition (OCR) of such a document page, it is necessary to identify different script forms before running an individual OCR of the scripts. In this paper an automatic technique for script identification at word level based on morphological reconstruction is proposed for two printed bilingual documents of Kannada and Devnagari containing English numerals (printed and handwritten). The technique developed includes a feature extractor and a classifier. The feature extractor consists of two stages. In the first stage, shape (eccentricity, aspect ratio) and directional stroke features (horizontal and vertical) are extracted based on morphological erosion and opening by reconstruction using the line structuring element. The average height of all the connected components of an image is used to threshold the length of the structuring element. In the second stage, average pixel distribution is obtained from these resulting images. The k-nearest neighbour algorithm is used to classify the new word images. The proposed algorithm is tested on 2250 sample words with various font styles and sizes. The results obtained are quite encouraging. © 2006 IEEE.
URI: 10.1109/ICDIM.2007.369227
http://gukir.inflibnet.ac.in:8080/jspui/handle/123456789/3729
Appears in Collections:2. Conference Papers

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.