How to split a string on first occurrence of character in Hive.
In this article we will see how to split a string in hive on first occurrence of a character. Lets say, you have strings like apl_finance_reporting or org_na...
In this article we will see how to split a string in hive on first occurrence of a character. Lets say, you have strings like apl_finance_reporting or org_na...
In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my c...
In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my c...
In this article, we will see configuration properties used to decide behaviour of hdfs metadata directories. All these properties are part of hdfs-site.xml f...
In this article I will explain main components of Hadoop Distributed File System (HDFS) and their responsibilities.
Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...
Log-Aggregation is a centralized management of logs in all NodeManager nodes provided by Yarn. It will aggregate and upload finished container or task’s log ...
To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...
Notes:
This article covers some notes and how to’s around hive databases.
This post explains how to set space and named quotas in hdfs, hadoop
Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...
Using homebrew you can install redis on MacOS. This article will cover how to install and start redis. Hit the following command to install redis
Lsblk is a linux utility to list block device information. In this blog post, I’ll cover some useful lsblk commands.
Debugfs is Debug Filesystem , its RAM based filesystem which can be used for kernel debugging information. This makes kernel space information available in u...
Linux has base64 command to encode and decode using Base64 representation. Here is an example : To encode a string Chetna Chaudhari you can use following com...
You can run any command or script any time or repeatedly with the help of linux utility cron. To add a cron, just run command crontab -e, this will open a fi...
Have you ever ran into a situation, where you have launched a long running process and you forgot to run it under nohup. Here is a workaround to move already...
Today, While doing an apt-get update on a box, I faced the following issue:
Most of the times, when you create a new directory, you may cd to it, to do some work.
Many times while debugging I have a question, when did I execute this command? Here is a way to enable date and timestamp while listing your bash history.
In this article we will see how to split a string in hive on first occurrence of a character. Lets say, you have strings like apl_finance_reporting or org_na...
Many times we need to add auxiliary (3rd party) jars in hive class path to make use of them. Some of the auxiliary jars which I use most of the times like se...
To reduce the amount of disk space hive query uses, you should enable hive compression codecs. There are two places where you can enable compression in hive ...
Notes:
This article covers some notes and how to’s around hive databases.
Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...
On MacOS, hosts file is present at two places i.e /etc/hosts and /private/etc/hosts. Bit if you do detailed listing on /etc path, you will notice that its po...
Having a clean and useful commit messages always makes debugging easier. There are many different patterns people follow to maintain neat git log history. He...
Recently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. Whil...
CAP is an acronym that stands for Consistency, Availability and Partition Tolerance. According to CAP theorem, any distributed system can only guarantee two ...
In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.
In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.
There are different approaches which stream processing applications take to handle reprocessing of messages. Depending on the requirements of solution an arc...