Posts by Tag

hadoop

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

HDFS - Components Overview

In this article I will explain main components of Hadoop Distributed File System (HDFS) and their responsibilities.

How to add auxiliary Jars in Hive

Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...

How to enable Log Aggregation in Yarn

Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by Yarn. It will aggregate and upload finished container or task’s log ...

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

bash

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

How to enable debugfs on linux system.

Debugfs is Debug Filesystem , its RAM based filesystem which can be used for kernel debugging information. This makes kernel space information available in u...

Back to top ↑

hdfs

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

HDFS - Components Overview

In this article I will explain main components of Hadoop Distributed File System (HDFS) and their responsibilities.

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

hive

How to add auxiliary Jars in Hive

Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

linux

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

metastore

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

warehouse

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

metadata

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

Back to top ↑

MacOS

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

How to edit hosts file on MacOS

On MacOS, hosts file is present at two places i.e /etc/hosts and /private/etc/hosts. Bit if you do detailed listing on /etc path, you will notice that its po...

Back to top ↑

git

Back to top ↑

apt-get

Back to top ↑

data compression

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

codec

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

lz4

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

bzip2

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

gzip

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

fourmc

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

snappy

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

yarn

How to enable Log Aggregation in Yarn

Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by Yarn. It will aggregate and upload finished container or task’s log ...

Back to top ↑

datanode

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

Back to top ↑

namenode

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

Back to top ↑

Data-Processing

CAP Theorem

CAP is an acronym that stands for Consistency, Availability and Partition Tolerance. According to CAP theorem, any distributed system can only guarantee two ...

Back to top ↑

Data-management

DAMA Framework

In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.

Back to top ↑

Streaming

Stream Data Processing Approaches

There are different approaches which stream processing applications take to handle reprocessing of messages. Depending on the requirements of solution an arc...

Back to top ↑