Posts by Category

Hadoop

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

HDFS - Components Overview

In this article I will explain main components of Hadoop Distributed File System (HDFS) and their responsibilities.

How to add auxiliary Jars in Hive

Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...

How to enable Log Aggregation in Yarn

Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by Yarn. It will aggregate and upload finished container or task’s log ...

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

Linux

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

How to enable debugfs on linux system.

Debugfs is Debug Filesystem , its RAM based filesystem which can be used for kernel debugging information. This makes kernel space information available in u...

Back to top ↑

Hive

How to add auxiliary Jars in Hive

Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

MacOS

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

How to edit hosts file on MacOS

On MacOS, hosts file is present at two places i.e /etc/hosts and /private/etc/hosts. Bit if you do detailed listing on /etc path, you will notice that its po...

Back to top ↑

Git

Back to top ↑

Protoc

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

Back to top ↑

Architecture

CAP Theorem

CAP is an acronym that stands for Consistency, Availability and Partition Tolerance. According to CAP theorem, any distributed system can only guarantee two ...

Back to top ↑

Strategy

DAMA Framework

In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.

Back to top ↑

Governance

DAMA Framework

In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.

Back to top ↑

Data Processing

Stream Data Processing Approaches

There are different approaches which stream processing applications take to handle reprocessing of messages. Depending on the requirements of solution an arc...

Back to top ↑