Please use this identifier to cite or link to this item:
http://gukir.inflibnet.ac.in:8080/jspui/handle/123456789/3701
Title: | Shape and morphological transformation based features for language identification in Indian document images |
Authors: | Hangarge M Dhandra B.V. |
Keywords: | And Binary decision tree Document images Language identification Morphological transformations |
Issue Date: | 2008 |
Citation: | Proceedings - 1st International Conference on Emerging Trends in Engineering and Technology, ICETET 2008 , Vol. , , p. 1175 - 1180 |
Abstract: | In this paper, a technique of language identification in document images is described to discriminate five major Indian languages: Hindi, Marathi, Sanskrit, Assamese and Bengali belong to Devnagari and Bangla scripts. A text block of each language containing at least two text lines is selected and characterized by employing global and local features. Morphological transformations are used to decompose a text block in two directions at three levels, to capture fine texture primitives. Shape features of connected components are used to retain the local properties of the text block. Further, combination of these features is used to classify 500 text blocks of proposed languages based on Binary decision tree and KNN classifier. Proposed method is quite different from reported method on non-Indian languages, which are based on shape coding of characters, words and document vectorization. This method directly captures word shapes without segmentation and it is tolerant to variations in font style and size. The language identification results are encouraging. © 2008 IEEE. |
URI: | 10.1109/ICETET.2008.177 http://gukir.inflibnet.ac.in:8080/jspui/handle/123456789/3701 |
Appears in Collections: | 2. Conference Papers |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.