Please use this identifier to cite or link to this item:
http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6742
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kumar, Manoj | - |
dc.contributor.author | Agnihotri, Radha | - |
dc.contributor.author | Mohana, Rajni [Guided by] | - |
dc.date.accessioned | 2022-09-26T06:56:07Z | - |
dc.date.available | 2022-09-26T06:56:07Z | - |
dc.date.issued | 2017 | - |
dc.identifier.uri | http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/6742 | - |
dc.description.abstract | Language identification (LI) is an essential and integral part of “natural language processing”. “Several machine learning approaches have been proposed so far for addressing this sort of a problem.” “Language Identification “can be defined as the process of automatically determining the language(s) in which the content has been written in any document (web page, text document). Due to the rampant use of internet, identification of language has become a necessary pre-processing step for a variety of applications such as machine translation, linguistic corpus creation, Part-of-Speech tagging, accessibility of social media or user-generated content, search engines, supporting low-density languages and information extraction in addition to processing multilingual documents. In a multilingual country like India,“Language Identification” “has wider scope to bridge the digital rift between different language users. This project presents a brief overview of the challenges involved in the automatic identification of language as well as existing methodologies and some of the tools available identification. The process of” “Text categorization” “is a fundamental task in document processing that allows the automated handling of large streams of documents in the electronic form. It must work in a reliable manner” on all inputs, and therefore must tolerate problems of auto-identification up to some extent. Here, we describe an “N-gram-based approach” “to text categorization that is capable of distinguishing between Hindi and Sanskrit words. The system is small, speedy and robust. It has worked well for language classification, achieving an accuracy of 94.8%. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Jaypee University of Information Technology, Solan, H.P. | en_US |
dc.subject | Sanskrit text | en_US |
dc.subject | N- gram | en_US |
dc.title | N Gram Based Algorithm for Distinguishing Between Hindi and Sanskrit Texts | en_US |
dc.type | Project Report | en_US |
Appears in Collections: | B.Tech. Project Reports |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
N Gram Based Algorithm for Distinguishing Between Hindi and Sanskrit Texts.pdf | 1.1 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.