Test

Powered by Blogger.

Wednesday, 12 November 2014

Big Data- a changing trend

Big Data

      a big hype topic

      everything is big data

      everyone wants to work with big data

      Wikipedia: "... a collection of data sets so large and complex that it

      becomes di_cult to process using on-hand database management

      tools or traditional data processing applications ..."

 

      Access to many di_erent data sources (Internet)

      storage is cheep - store everything

      today, CIOs get interested in the power of all their data

       a lot of di_erent and complex data have to be stored

       NoSQL

 

NoSQL: databases with less constrained consistency models )

schema-less

 

MongoDB:

I open source, cross-platform document-oriented database system

I most popular NoSQL database system

I supported MongoDB Inc.

I stores structured data as JSON-like documents with dynamic schemas

I MongoDB as a German / European Service

http://www.mongodb.org http://www.mongosoup.de

 

 

Hadoop

open-source software framework designed to support large scale data

processing

Map Reduce: a computational paradigm

I application is divided into many small fragments of work

HDFS: Hadoop Distributed File System

I a distributed _le system that stores data on the compute nodes

the Ecosystem: Hive, Pig, Flume, Mahout, ...

written in Java, opened up to alternatives by its Streaming API

http://hadoop.apache.org

 

HDFS and Hadoop cluster

HDFS is a block-structured _le system

I blocks are stored across a cluster of one or more machines with data

storage capacity: DataNode

I data is accessed in a write once and read many model

HDFS does come with its own utilities for _le management

HDFS _le system stores its metadata reliably: NameNode

 

Example: Rstudio

M

 

No comments:

Post a Comment

RSS

Categories

Followers

Blog Archive

Wednesday, 12 November 2014

Big Data- a changing trend

Big Data

      a big hype topic

      everything is big data

      everyone wants to work with big data

      Wikipedia: "... a collection of data sets so large and complex that it

      becomes di_cult to process using on-hand database management

      tools or traditional data processing applications ..."

 

      Access to many di_erent data sources (Internet)

      storage is cheep - store everything

      today, CIOs get interested in the power of all their data

       a lot of di_erent and complex data have to be stored

       NoSQL

 

NoSQL: databases with less constrained consistency models )

schema-less

 

MongoDB:

I open source, cross-platform document-oriented database system

I most popular NoSQL database system

I supported MongoDB Inc.

I stores structured data as JSON-like documents with dynamic schemas

I MongoDB as a German / European Service

http://www.mongodb.org http://www.mongosoup.de

 

 

Hadoop

open-source software framework designed to support large scale data

processing

Map Reduce: a computational paradigm

I application is divided into many small fragments of work

HDFS: Hadoop Distributed File System

I a distributed _le system that stores data on the compute nodes

the Ecosystem: Hive, Pig, Flume, Mahout, ...

written in Java, opened up to alternatives by its Streaming API

http://hadoop.apache.org

 

HDFS and Hadoop cluster

HDFS is a block-structured _le system

I blocks are stored across a cluster of one or more machines with data

storage capacity: DataNode

I data is accessed in a write once and read many model

HDFS does come with its own utilities for _le management

HDFS _le system stores its metadata reliably: NameNode

 

Example: Rstudio

M

 

No comments:

Post a Comment