Big Data- a changing trend ~ CODE TO LIVE

Wednesday, 12 November 2014

Big Data- a changing trend

Big Data

a big hype topic

everything is big data

everyone wants to work with big data

Wikipedia: "... a collection of data sets so large and complex that it

becomes di_cult to process using on-hand database management

tools or traditional data processing applications ..."

Access to many di_erent data sources (Internet)

storage is cheep - store everything

today, CIOs get interested in the power of all their data

a lot of di_erent and complex data have to be stored

NoSQL

NoSQL: databases with less constrained consistency models )

schema-less

MongoDB:

I open source, cross-platform document-oriented database system

I most popular NoSQL database system

I supported MongoDB Inc.

I stores structured data as JSON-like documents with dynamic schemas

I MongoDB as a German / European Service

http://www.mongodb.org http://www.mongosoup.de

Hadoop

open-source software framework designed to support large scale data

processing

Map Reduce: a computational paradigm

I application is divided into many small fragments of work

HDFS: Hadoop Distributed File System

I a distributed _le system that stores data on the compute nodes

the Ecosystem: Hive, Pig, Flume, Mahout, ...

written in Java, opened up to alternatives by its Streaming API

http://hadoop.apache.org

HDFS and Hadoop cluster

HDFS is a block-structured _le system

I blocks are stored across a cluster of one or more machines with data

storage capacity: DataNode

I data is accessed in a write once and read many model

HDFS does come with its own utilities for _le management

HDFS _le system stores its metadata reliably: NameNode

Example: Rstudio

No comments: