Wednesday, September 13, 2017

Big Data Analytics (001) - Kapil Sharma

Big Data deals with huge volume of data. The core purpose of BD is streamline the quantity and quality. So that useful data analytics can be drive from clusters of data sets.



The purpose use for that is known as "ETL" (Extract >> Transform >> Load)

Where data change its shape and usability.

In opensource arena Apache PIG is used for ETL process, this Apache PIG software is not having pretty GUI to easy ETL the process.

After PIG ETL process the Apache HIVE comes in the picture for data analytics.

Apache HIVE used for data warehousing and query the big data sets drive from ETL through Apache PIG system.

I will show you all how to setup Apache PIG and HIVE on your machine in the next write-up. Till than happy coding :)