Posts by Tag

hadoop

How to split a string on first occurrence of character in Hive.

In this article we will see how to split a string in hive on first occurrence of a character. Lets say, you have strings like apl_finance_reporting or org_na...

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Configuration Properties

In this article, we will see configuration properties used to decide behaviour of hdfs metadata directories. All these properties are part of hdfs-site.xml f...

HDFS - Components Overview

In this article I will explain main components of Hadoop Distributed File System (HDFS) and their responsibilities.

How to add auxiliary Jars in Hive

Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...

How to enable Log Aggregation in Yarn

Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by Yarn. It will aggregate and upload finished container or task’s log ...

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Hive Tables - Notes

Notes:

Hive Databases - Notes

This article covers some notes and how to’s around hive databases.

HDFS - Quota Management

This post explains how to set space and named quotas in hdfs, hadoop

Back to top ↑

bash

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

How to install redis on MacOS using Homebrew

Using homebrew you can install redis on MacOS. This article will cover how to install and start redis. Hit the following command to install redis

lsblk - List block device information.

Lsblk is a linux utility to list block device information. In this blog post, I’ll cover some useful lsblk commands.

How to enable debugfs on linux system.

Debugfs is Debug Filesystem , its RAM based filesystem which can be used for kernel debugging information. This makes kernel space information available in u...

Linux command for Base64 encode and decode

Linux has base64 command to encode and decode using Base64 representation. Here is an example : To encode a string Chetna Chaudhari you can use following com...

How to setup cron job for last day of month

You can run any command or script any time or repeatedly with the help of linux utility cron. To add a cron, just run command crontab -e, this will open a fi...

Moving already running process under nohup.

Have you ever ran into a situation, where you have launched a long running process and you forgot to run it under nohup. Here is a workaround to move already...

Error during apt-get update - Can’t exec insserv

Today, While doing an apt-get update on a box, I faced the following issue:

Do mkdir and cd using a single command?

Most of the times, when you create a new directory, you may cd to it, to do some work.

How to enable date timestamp in bash history.

Many times while debugging I have a question, when did I execute this command? Here is a way to enable date and timestamp while listing your bash history.

Back to top ↑

hdfs

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Configuration Properties

In this article, we will see configuration properties used to decide behaviour of hdfs metadata directories. All these properties are part of hdfs-site.xml f...

HDFS - Components Overview

In this article I will explain main components of Hadoop Distributed File System (HDFS) and their responsibilities.

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Hive Tables - Notes

Notes:

Hive Databases - Notes

This article covers some notes and how to’s around hive databases.

HDFS - Quota Management

This post explains how to set space and named quotas in hdfs, hadoop

Back to top ↑

hive

How to split a string on first occurrence of character in Hive.

In this article we will see how to split a string in hive on first occurrence of a character. Lets say, you have strings like apl_finance_reporting or org_na...

How to add auxiliary Jars in Hive

Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Hive Tables - Notes

Notes:

Hive Databases - Notes

This article covers some notes and how to’s around hive databases.

Back to top ↑

linux

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Hive Tables - Notes

Notes:

Hive Databases - Notes

This article covers some notes and how to’s around hive databases.

Error during apt-get update - Can’t exec insserv

Today, While doing an apt-get update on a box, I faced the following issue:

Back to top ↑

metastore

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Hive Tables - Notes

Notes:

Hive Databases - Notes

This article covers some notes and how to’s around hive databases.

Back to top ↑

warehouse

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Hive Tables - Notes

Notes:

Hive Databases - Notes

This article covers some notes and how to’s around hive databases.

Back to top ↑

metadata

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

Back to top ↑

MacOS

How to install protoc 2.5.0 on MacOS

Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...

How to edit hosts file on MacOS

On MacOS, hosts file is present at two places i.e /etc/hosts and /private/etc/hosts. Bit if you do detailed listing on /etc path, you will notice that its po...

Back to top ↑

git

How to add pre-commit hook for JIRA tracking in git commits.

Having a clean and useful commit messages always makes debugging easier. There are many different patterns people follow to maintain neat git log history. He...

Back to top ↑

apt-get

Error during apt-get update - Can’t exec insserv

Today, While doing an apt-get update on a box, I faced the following issue:

Back to top ↑

data compression

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

codec

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

lz4

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

bzip2

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

gzip

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

fourmc

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

snappy

Compressing Hive Data

To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...

Back to top ↑

yarn

How to enable Log Aggregation in Yarn

Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by Yarn. It will aggregate and upload finished container or task’s log ...

Back to top ↑

datanode

HDFS Metadata - Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...

Back to top ↑

namenode

HDFS Metadata - Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...

Back to top ↑

Data-Processing

CAP Theorem

CAP is an acronym that stands for Consistency, Availability and Partition Tolerance. According to CAP theorem, any distributed system can only guarantee two ...

Back to top ↑

Data-management

DAMA Framework

In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.

Back to top ↑

Streaming

Stream Data Processing Approaches

There are different approaches which stream processing applications take to handle reprocessing of messages. Depending on the requirements of solution an arc...

Back to top ↑