Please use this identifier to cite or link to this item: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/5847
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSharma, Himanshi-
dc.contributor.authorGhrera, Satya Prakash [Guided by]-
dc.date.accessioned2022-08-18T05:04:08Z-
dc.date.available2022-08-18T05:04:08Z-
dc.date.issued2015-
dc.identifier.urihttp://ir.juit.ac.in:8080/jspui//xmlui/handle/123456789/5847-
dc.description.abstractBig Data at its core is simply a way of describing problems which are not solvable using traditional tools. Big Data is defined by its three Vs i.e. Volume, Velocity and Variety. This Big Data coming from wide variety of source can help make transnational decisions, and to be able to make these decisions on data of any scale it is important to access the right kind of tools. Apache Hadoop is such a framework designed to store, process and manage such a form of data accross cluster of computers. It is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. At its core, Apache Hadoop is a framework for storing data on large clusters of commodity hardware— everyday computer hardware that is affordable and easily available— and running applications against that data. A cluster is a group of interconnected computers (known as nodes) that can work together on the same problem. Using networks of affordable compute resources to acquire business insight is the key value proposition of Hadoop. But of course, the business model is generally spilt across various dimensions, and each dimension may have their own Hadoop Clusrer, which means deploying different storage for each of the different cluster of which some data can be same in different cluster also if there is a need of merging of two dimensions for some analysis it is required to load the combined data into a new cluster which is again a time consuming and expensive task. This is where OpenStack Swift container comes into picture, Swift provides a Object storage which allows to store files or objects. Swift architectures is build in such a way that can be used directly with Hadoop clusters, which allows it provides a 'centralized' storage, which can be accessed by any cluster directly and any cluster can use it to store the data, also projects like pig, hive that are used with Hadoop can also access the Swift storage directly so as to process the data, hence removing the need of storing redundant data and help encouraging innovations because of its low cost and low complexity. Both Apache Hadoop and OpenStack Swift represent open source projects, Swift servers asen_US
dc.language.isoenen_US
dc.publisherJaypee University of Information Technology, Solan, H.P.en_US
dc.subjectBig dataen_US
dc.subjectApache hadoopen_US
dc.subjectOpenStack clouden_US
dc.subjectClusteren_US
dc.subjectOpenStack swiften_US
dc.subjectContaineren_US
dc.titleApache Hadoop on Openstack Clouden_US
dc.typeProject Reporten_US
Appears in Collections:B.Tech. Project Reports

Files in This Item:
File Description SizeFormat 
Apache Hadoop on Openstack Cloud.pdf1.71 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.