Please use this identifier to cite or link to this item:
http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634
Title: | Language Identification of Text |
Authors: | Aishwary Mahajan, Ruhi [Guided by] |
Keywords: | Pluricentric languages |
Issue Date: | 2017 |
Publisher: | Jaypee University of Information Technology, Solan, H.P. |
Abstract: | Language Identification refers to the process of detecting the language(s) of the text in the document based on the script used for writing and observing the diacritics particular to a language. This research area has always fascinated researchers as early as 1970 and till now due to varied applications and increased demands of this field. In this work, I address the problem of detecting language of textual documents. I have introduced a method which is able to detect language of text more efficiently and accurately by determining their respective proportions and finding the greatest of them which represents the language of the text. I have demonstrated the performance comparison of three different approaches which are using n-gram approach (word-wise), using n-gram approach (character-wise) and using a combination of word search and stop words detection. My project currently contains language models for 4 languages. On an average the accuracy of my program is about 96.5%. |
URI: | http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634 |
Appears in Collections: | B.Tech. Project Reports |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Language Identification of Text.pdf | 1.27 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.