Language Identification of Text

Please use this identifier to cite or link to this item: http://www.ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634

Title:	Language Identification of Text
Authors:	Aishwary Mahajan, Ruhi [Guided by]
Keywords:	Pluricentric languages
Issue Date:	2017
Publisher:	Jaypee University of Information Technology, Solan, H.P.
Abstract:	Language Identification refers to the process of detecting the language(s) of the text in the document based on the script used for writing and observing the diacritics particular to a language. This research area has always fascinated researchers as early as 1970 and till now due to varied applications and increased demands of this field. In this work, I address the problem of detecting language of textual documents. I have introduced a method which is able to detect language of text more efficiently and accurately by determining their respective proportions and finding the greatest of them which represents the language of the text. I have demonstrated the performance comparison of three different approaches which are using n-gram approach (word-wise), using n-gram approach (character-wise) and using a combination of word search and stop words detection. My project currently contains language models for 4 languages. On an average the accuracy of my program is about 96.5%.
URI:	http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6634
Appears in Collections:	B.Tech. Project Reports

Files in This Item:

File	Description	Size	Format
Language Identification of Text.pdf		1.27 MB	Adobe PDF	View/Open