Components of hadoop

Apache Hadoop core components

Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each providing computation and storage. Rather than rely on hardware to deliver high-availability, the framework itself is designed to detect and handle failures at the application layer, thus delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

HDFS (storage) and MapReduce (processing) are the two core components of Apache Hadoop. The most important aspect of Hadoop is that both HDFS and MapReduce are designed with each other in mind and each are co-deployed such that there is a single cluster and thus provides the ability to move computation to the data not the other way around. Thus, the storage system is not physically separate from a processing system.

Apache Hadoop architecture

Hadoop Distributed File System (HDFS)

The main components of HDFS are:

• NameNode is the master of the system. It is maintains the name system (directories and files) and manages the blocks which are present on the DataNodes.

• DataNodes are the slaves which are deployed on each machine and provide the actual storage. They are responsible for serving read and write requests for the clients.

• Secondary NameNode is responsible for performing periodic checkpoints. So, in the event of NameNode failure, you can restart the NameNode using the checkpoint.

MapReduce

The main components of MapReduce are:

• JobTracker is the master of the system which manages the jobs and resources in the cluster (TaskTrackers). The JobTracker tries to schedule each map as close to the actual data being processed i.e. on the TaskTracker which is running on the same DataNode as the underlying block.

• TaskTrackers are the slaves which are deployed on each machine. They are responsible for running the map and reduce tasks as instructed by the JobTracker.

• JobHistoryServer is a daemon for serving up information about historical completed applications so that the JobTracker does not need to track them. It is typically part of the JobTracker itself, but is recommended to run it as a separate daemon.

Apache Pig

Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

Apache Hive

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. It provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

Apache HCatalog

HCatalog is a metadata abstraction layer for referencing data without using the underlying filenames or formats. It insulates users and scripts from how and where the data is physically stored.

Apache HBase

The main components of HBase are

• HBase Master is responsible for negotiating load balancing across all Region Servers and maintain the state of the cluster. It is not part of the actual data storage or retrieval path.

• The RegionServer is deployed on each computer and hosts data and processes
I/O requests.

Apache Sqoop

Apache Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases.

References: http://wiki.apache.org/hadoop/ProjectDescription

Components of hadoop

Apache Hadoop core components

Apache Hadoop architecture

Hadoop Distributed File System (HDFS)

MapReduce

Apache Pig

Apache Hive

Apache HCatalog

Apache HBase

Apache Sqoop

Trending Articles

Mp3 Download: Mdu - Mazola

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Essex Police seek Harlow man Joel Steadman

Subwoofer kukoroma kabla ya kuliwasha!

Devon police appeal for help to trace missing 13-year-old girl

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Summary of The Schoolboy by William Blake

Who’s been sentenced at Northampton Magistrates’ Court

Skint TV teen to be sentenced

Young men fined for drugs

Download EFF Album: 12 –“ASINAMALI”

The 10 Tennessee Cities With The Largest Black Population For 2021

Division 4 ya 29

Teen drug dealers who avoided jail told by judge to 'make most of lucky escape'

Henrique & Juliano – Manifesto Musical 2 (Ao Vivo) – EP 3 [iTunes Plus M4A]

BHUNP TBBP - 3BBB(UNP Renewal)

Forum Post: RE: Help: ERROR(15053): Can not initialize PSpice UI

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

NATHAN CARL DAHLIN Arrested by Clackamas County Sheriff's Office on May 15, 2020

Rapist Malachi Williams in contempt for 'uncontrolled' behaviour...