Posts by Year

2019

Stream Data Processing Approaches

There are different approaches which stream processing applications take to handle reprocessing of messages. Depending on the requirements of solution an arc...

Back to top ↑

2018

DAMA Framework

In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.

CAP Theorem

CAP is an acronym that stands for Consistency, Availability and Partition Tolerance. According to CAP theorem, any distributed system can only guarantee two ...

Back to top ↑

2017

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

Back to top ↑

2016

How to edit hosts file on MacOS

On MacOS, hosts file is present at two places i.e /etc/hosts and /private/etc/hosts. Bit if you do detailed listing on /etc path, you will notice that its po...

How to enable debugfs on linux system.

Debugfs is Debug Filesystem , its RAM based filesystem which can be used for kernel debugging information. This makes kernel space information available in u...

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

HDFS - Components Overview

In this article I will explain main components of Hadoop Distributed File System (HDFS) and their responsibilities.

How to add auxiliary Jars in Hive

Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...

Back to top ↑

2015

How to enable Log Aggregation in Yarn

Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by Yarn. It will aggregate and upload finished container or task’s log ...

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑