Jekyll2022-07-20T18:57:11+00:00https://chetnachaudhari.github.io/feed.xmlChetna’s BlogPersonal BlogChetna Chaudharichetnachaudhari@gmail.comStream Data Processing Approaches2019-01-20T00:00:00+00:002019-01-20T00:00:00+00:00https://chetnachaudhari.github.io/2019-01-20/Stream-Processing-Approaches<p>There are different approaches which stream processing applications take to handle reprocessing of messages. Depending on the requirements of solution an architect / developer can choose between one of the below</p>
<h3 id="at-least-once">At least once:</h3>
<ul>
<li>Each message is guaranteed to be processed</li>
<li>Message may get processed more than once</li>
<li>This guarantees no data loss, but does result in duplicate records passing through the system.</li>
</ul>
<h3 id="at-most-once">At most once:</h3>
<ul>
<li>Each message may or may not be processed</li>
<li>If a message is processed, it’s only processed once.</li>
<li>This mostly leads to a missing data issues.</li>
</ul>
<h3 id="exactly-once">Exactly once:</h3>
<ul>
<li>Each message is guaranteed to be processed once and only once</li>
<li>An example: Credit card transactions processing, in this case if we process a message multiple times it means we’re paying multiple times, and if we drop a message means we’re not processing a payment.</li>
</ul>Chetna Chaudharichetnachaudhari@gmail.comThere are different approaches which stream processing applications take to handle reprocessing of messages. Depending on the requirements of solution an architect / developer can choose between one of the belowDAMA Framework2018-05-14T00:00:00+00:002018-05-14T00:00:00+00:00https://chetnachaudhari.github.io/2018-05-14/dama-framework<p>In this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.</p>
<p>The term Data Management refers to the development, implementation, and supervision of policies, programs, and practices that deliver, control, protect, and improve the value of data and information assets.</p>
<p>According to DAMA framework, there are 11 knowledge areas or pillars of data management. we’ll look at each of them below</p>
<h2 id="1-data-governance">1. Data Governance</h2>
<ul>
<li>This pillar provides direction and oversight for data management by establishing a system of decision rights over data that accounts for the needs of the enterprise.</li>
<li>This pillar focuses on vision, strtegy and target operating model which enables other 10 areas.</li>
<li>Think it like a base of a building, poor data governance leads to failed/weak data management projects.</li>
</ul>
<h2 id="2-data-architecture">2. Data Architecture</h2>
<ul>
<li>This pillar defines the blueprint for managing data assets by aligning with organizational strategy to establish strategic data requirements and designs to meet these requirements.</li>
<li>This pillar focuses on Enterprise data models, tool standards, and system naming conventions</li>
</ul>
<h2 id="3-data-modeling-and-design">3: Data Modeling and Design</h2>
<ul>
<li>This is the process of discovering, analyzing, representing, and communicating data requirements in a precise form called the data model</li>
<li>This pillar focuses on data model management procedures, data modeling naming conventions, definition standards, standard domains, and standard abbreviations</li>
</ul>
<h2 id="4-data-storage-and-operations">4. Data Storage and Operations</h2>
<ul>
<li>This pillar includes the design,implementation,and support of stored data to maximize its value. Operations provide support throughout the data lifecycle from planning for to disposal of data.</li>
<li>Tool standards, standards for database recovery and business continuity, database performance, data retention, and external data acquisition</li>
</ul>
<h2 id="5-data-security">5. Data Security</h2>
<ul>
<li>This pillar ensures that data privacy and confidentiality are maintained, that data is not breached, and that data is accessed appropriately.</li>
<li>Data access security standards, monitoring and audit procedures, storage security standards, and training requirements</li>
</ul>
<h2 id="6-data-integration-and-interoperability">6. Data Integration and Interoperability</h2>
<ul>
<li>This pillar includes processes related to the movement and consolidation of data within and between data stores, applications, and organizations</li>
<li>Standard methods and tools used for data integration and interoperability</li>
</ul>
<h2 id="7-document-and-content-management">7. Document and Content Management</h2>
<ul>
<li>This pillar covers planning, implementation, and control activities used to manage the lifecycle of data and information found in a range of unstructured media, especially documents needed to support legal and regulatory compliance requirements</li>
<li>this focuses on content management standards and procedures, including use of enterprise taxonomies, support for legal discovery, document and email retention periods, electronic signatures, and report distribution approaches</li>
</ul>
<h2 id="8-reference-and-master-data">8. Reference and Master Data</h2>
<ul>
<li>This knowledge area covers ongoing reconciliation and maintenance of core critical shared data to enable consistent use across systems of the most accurate, timely, and relevant version of truth about essential business entities.</li>
<li>Reference Data Management control procedures, systems of data record, assertions establishing and mandating use, standards for entity resolution</li>
</ul>
<h2 id="9-data-warehousing-and-business-intelligence">9. Data Warehousing and Business Intelligence</h2>
<ul>
<li>This includes the planning, implementation, and control processes to manage decision support data and to enable knowledge workers to get value from data via analysis and reporting.</li>
<li>Tool standard, processing standards and procedures, report and visualization formatting standards, standards for Big Data handling</li>
</ul>
<h2 id="10-metadata">10. Metadata</h2>
<ul>
<li>This pillar includes planning, implementation,andc ontrol activities to enable access to high quality, integrated Metadata, including definitions, models, data flows, and other information critical to understanding data and the systems through which it is created, maintained, and accessed.</li>
<li>Standard business and technical Metadata to be captured, Metadata integration procedures and usage</li>
</ul>
<h2 id="11-data-quality">11. Data Quality</h2>
<ul>
<li>This pillar covers the planning and implementation of quality management techniques to measure, assess, and improve the fitness of data for use within an organization.</li>
<li>Data quality rules, standard measurement methodologies, data remediation standards and procedures</li>
</ul>
<p>I’ll try to cover each of this pillar in more detail in my coming posts.</p>Chetna Chaudharichetnachaudhari@gmail.comIn this post I’ll cover what dama framework is, what are the different pillars of it and how it can be used to implement a data strategy.CAP Theorem2018-03-30T00:00:00+00:002018-03-30T00:00:00+00:00https://chetnachaudhari.github.io/2018-03-30/cap-theorem<p>CAP is an acronym that stands for Consistency, Availability and Partition Tolerance. According to CAP theorem, any distributed system can only guarantee two of the three properties at any point of time. You can’t guarantee all three properties at once.</p>
<h3 id="consistency">Consistency</h3>
<ul>
<li>Consistency is where all nodes in our distributed system see the same data at the same time.</li>
<li>A read is guaranteed to return the most recent write for a given client.</li>
<li>This is achieved by updating multiple nodes before any reads are allowed.</li>
<li>When data is written to a single node, it is then replicated across the other nodes in the system.</li>
</ul>
<h3 id="availability">Availability</h3>
<ul>
<li>Availability means that every request gets a proper response, even if nodes have failed.</li>
<li>A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout).</li>
<li>Every request will get a response regardless of the individual state of the nodes.</li>
<li>This is accomplished by replicating data across servers.</li>
</ul>
<h3 id="partition-tolerance">Partition Tolerance</h3>
<ul>
<li>The system will continue to function when network partitions occur.</li>
<li>In CAP theorem, a “partition” is a break in communication between two nodes.</li>
<li>
<p>If a partition occurs between two a pair of nodes, say, in master-master replication, then there are two options:</p>
<ul>
<li>Mark these nodes as being down, meaning that they are no longer available</li>
<li>Allow the nodes to become out of sync, which means that we have given up consistency</li>
</ul>
</li>
</ul>
<h2 id="solutions">Solutions</h2>
<h3 id="consistency-and-availability-ca">Consistency and Availability (CA)</h3>
<ul>
<li>This one is problematic.</li>
<li>Many claim that systems that are both consistent and available are not possible.
Their reasoning lies in the idea that you do not choose to have partition tolerance, it is something that naturally arises.</li>
<li>For example, you could have a database that is not sharded, but has an entire copy of that database to retain availability.</li>
<li>When a write comes in, you either choose to accept the write, knowing that the master and the replication will be out of sync, or you choose to refuse the write.
In the former case, you’ve chosen availability, and in the latter, you’ve chosen consistency.</li>
<li>relational databases such as PostgreSQL use this principle.</li>
</ul>
<h3 id="consistencypartition-tolerance-cp">Consistency/Partition Tolerance (CP)</h3>
<ul>
<li>This method ensures that the data is consistent between all nodes and becomes unavailable in the case of a partition.</li>
<li>HBase, MongoDB and BigTable use this principle.</li>
</ul>
<h3 id="availabilitypartition-tolerance-ap">Availability/Partition Tolerance (AP)</h3>
<ul>
<li>This method ensures that all of the nodes remain available (through replication), and, in the case of a partition, will resync data between the partitioned nodes once the partition has been resolved. However, this means that the data between nodes might not be consistent.</li>
<li>Cassandra and CouchDB use this principle.</li>
</ul>Chetna Chaudharichetnachaudhari@gmail.comCAP is an acronym that stands for Consistency, Availability and Partition Tolerance. According to CAP theorem, any distributed system can only guarantee two of the three properties at any point of time. You can’t guarantee all three properties at once.How to install protoc 2.5.0 on MacOS2017-09-20T00:00:00+00:002017-09-20T00:00:00+00:00https://chetnachaudhari.github.io/2017-09-20/how-to-install-protoc-2.5-on-macos<p>Recently I faced this issue, while building hadoop on my MacOS machine.
Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. While building I got the following error:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.1.0-SNAPSHOT:protoc <span class="o">(</span>compile-protoc<span class="o">)</span> on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is <span class="s1">'libprotoc 3.4.0'</span>, expected version is <span class="s1">'2.5.0'</span> -> <span class="o">[</span>Help 1]
<span class="o">[</span>ERROR]
<span class="o">[</span>ERROR] To see the full stack trace of the errors, re-run Maven with the <span class="nt">-e</span> switch.
<span class="o">[</span>ERROR] Re-run Maven using the <span class="nt">-X</span> switch to <span class="nb">enable </span>full debug logging.
</code></pre></div></div>
<p>To fix this, install protoc 2.5.0 on your mac.</p>
<h3 id="steps">Steps:</h3>
<ol>
<li>Building from source
Download latest version of protocol buffer <a href="https://github.com/google/protobuf/releases/download">https://github.com/google/protobuf/releases/download</a>.
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2
</code></pre></div> </div>
</li>
<li>Untar the tar.bz2 file
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tar </span>xfvj protobuf-2.5.0.tar.bz2
</code></pre></div> </div>
</li>
<li>Configure the protobuf.
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nb">cd </span>protobuf-2.5.0
./configure <span class="nv">CC</span><span class="o">=</span>clang <span class="nv">CXX</span><span class="o">=</span>clang++ <span class="nv">CXXFLAGS</span><span class="o">=</span><span class="s1">'-std=c++11 -stdlib=libc++ -O3 -g'</span> <span class="nv">LDFLAGS</span><span class="o">=</span><span class="s1">'-stdlib=libc++'</span> <span class="nv">LIBS</span><span class="o">=</span><span class="s2">"-lc++ -lc++abi"</span>
</code></pre></div> </div>
</li>
<li>You can use the <code class="language-plaintext highlighter-rouge">--prefix</code> parameter to install to a location other than the default <code class="language-plaintext highlighter-rouge">/usr/local/bin</code>
Make the source
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="nt">-j</span> 4
<span class="nb">sudo </span>make <span class="nb">install</span>
</code></pre></div> </div>
<p>You’ll need to unlink old installed version.</p>
</li>
</ol>Chetna Chaudharichetnachaudhari@gmail.comRecently I faced this issue, while building hadoop on my MacOS machine. Hadoop trunk 3.0 Snapshot build fails if compiled with a protoc newer than 2.5. While building I got the following error:How to install redis on MacOS using Homebrew2016-05-20T00:00:00+00:002016-05-20T00:00:00+00:00https://chetnachaudhari.github.io/2016-05-20/how-to-install-redis-on-macos<p>Using homebrew you can install redis on MacOS. This article will cover how to install and start redis.
Hit the following command to install redis</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>brew <span class="nb">install </span>redis
</code></pre></div></div>
<h3 id="get-redis-package-information">Get redis package information:</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>brew info redis
</code></pre></div></div>
<h3 id="launch-redis-on-computer-startup">Launch Redis on computer startup:</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">ln</span> <span class="nt">-sfv</span> /usr/local/opt/redis/<span class="k">*</span>.plist ~/Library/LaunchAgents
</code></pre></div></div>
<h3 id="start-redis-server-using-launchctl">Start redis server using <code class="language-plaintext highlighter-rouge">launchctl</code>:</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>launchctl load ~/Library/LaunchAgents/homebrew.mxcl.redis.plist
</code></pre></div></div>
<h3 id="start-redis-server-using-configuration-file">Start redis server using configuration file:</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>redis-server /usr/local/etc/redis.conf
</code></pre></div></div>
<p>Here <code class="language-plaintext highlighter-rouge">/usr/local/etc/redis.conf</code> is the location of redis configuration file. You can pass different path.</p>
<h3 id="stop-redis-on-auto-startup">Stop redis on auto startup:</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.redis.plist
</code></pre></div></div>
<h3 id="to-uninstall-redis">To uninstall redis:</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>brew uninstall redis
<span class="nv">$ </span><span class="nb">rm</span> ~/Library/LaunchAgents/homebrew.mxcl.redis.plist
</code></pre></div></div>
<h3 id="test-if-redis-server-is-up-or-not">Test if redis server is up or not:</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>redis-cli ping
</code></pre></div></div>
<p>This command should return <code class="language-plaintext highlighter-rouge">PONG</code> response.</p>Chetna Chaudharichetnachaudhari@gmail.comUsing homebrew you can install redis on MacOS. This article will cover how to install and start redis. Hit the following command to install redisHow to edit hosts file on MacOS2016-05-09T00:00:00+00:002016-05-09T00:00:00+00:00https://chetnachaudhari.github.io/2016-05-09/how-to-edit-hosts-file-on-macos<p>On MacOS, hosts file is present at two places i.e <code class="language-plaintext highlighter-rouge">/etc/hosts</code> and <code class="language-plaintext highlighter-rouge">/private/etc/hosts</code>. Bit if you do detailed listing on <code class="language-plaintext highlighter-rouge">/etc</code> path, you will notice that its pointing to <code class="language-plaintext highlighter-rouge">/private/etc/hosts</code> file.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>chetnachaudhari@chetnas-MacBook-Pro:~<span class="nv">$ </span><span class="nb">ls</span> <span class="nt">-lsa</span> /etc
8 lrwxr-xr-x@ 1 root wheel 11 Jan 12 2017 /etc -> private/etc
</code></pre></div></div>
<p>To update new hosts entry on your machine, edit <code class="language-plaintext highlighter-rouge">/private/etc/hosts</code> file. Following is sample of how this file looks:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>chetnachaudhari@chetnas-MacBook-Pro:~<span class="nv">$ </span><span class="nb">cat</span> /private/etc/hosts
<span class="c">##</span>
<span class="c"># Host Database</span>
<span class="c">#</span>
<span class="c"># localhost is used to configure the loopback interface</span>
<span class="c"># when the system is booting. Do not change this entry.</span>
<span class="c">##</span>
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
</code></pre></div></div>Chetna Chaudharichetnachaudhari@gmail.comOn MacOS, hosts file is present at two places i.e /etc/hosts and /private/etc/hosts. Bit if you do detailed listing on /etc path, you will notice that its pointing to /private/etc/hosts file.How to enable debugfs on linux system.2016-05-08T00:00:00+00:002016-05-08T00:00:00+00:00https://chetnachaudhari.github.io/2016-05-08/how-to-enable-debugfs<p>Debugfs is Debug Filesystem , its RAM based filesystem which can be used for kernel debugging information. This makes kernel space information available in user space.</p>
<h3 id="how-to-enable-debugfs-">How to enable debugfs :</h3>
<p>To enable it for onetime, i.e information will be available until next boot of system.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mount <span class="nt">-t</span> debugfs none /sys/kernel/debug
</code></pre></div></div>
<p>To make the change permanent, add following line to <code class="language-plaintext highlighter-rouge">/etc/fstab</code> file.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>debugfs /sys/kernel/debug debugfs defaults 0 0
</code></pre></div></div>
<p>Once you enable debugfs, you can see multiple directories inside <code class="language-plaintext highlighter-rouge">/sys/kernel/debug</code> :</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@sandbox ~]# <span class="nb">ls</span> /sys/kernel/debug
bdi boot_params dynamic_debug gpio kprobes sched_features usb xen
block dma_buf extfrag hid mce tracing x86
</code></pre></div></div>
<p>These files holds information about kernel subsystems which helps in debugging.</p>Chetna Chaudharichetnachaudhari@gmail.comDebugfs is Debug Filesystem , its RAM based filesystem which can be used for kernel debugging information. This makes kernel space information available in user space.lsblk - List block device information.2016-05-08T00:00:00+00:002016-05-08T00:00:00+00:00https://chetnachaudhari.github.io/2016-05-08/lsblk-list-block-device-information<p>Lsblk is a linux utility to list block device information. In this blog post, I’ll cover some useful <code class="language-plaintext highlighter-rouge">lsblk</code> commands.</p>
<h3 id="to-see-list-of-devices-">To see list of devices :</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@sandbox ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 48.8G 0 disk
|-sda1 8:1 0 500M 0 part /boot
<span class="sb">`</span><span class="nt">-sda2</span> 8:2 0 48.3G 0 part
|-vg_sandbox-lv_root <span class="o">(</span>dm-0<span class="o">)</span> 253:0 0 43.5G 0 lvm /
<span class="sb">`</span><span class="nt">-vg_sandbox-lv_swap</span> <span class="o">(</span>dm-1<span class="o">)</span> 253:1 0 4.9G 0 lvm <span class="o">[</span>SWAP]
</code></pre></div></div>
<p>By default <code class="language-plaintext highlighter-rouge">lsblk</code> prints information in tree view, if you want to see information in list view, you can use <code class="language-plaintext highlighter-rouge">-l</code> option.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@sandbox ~]# lsblk <span class="nt">-l</span>
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 48.8G 0 disk
sda1 8:1 0 500M 0 part /boot
sda2 8:2 0 48.3G 0 part
vg_sandbox-lv_root <span class="o">(</span>dm-0<span class="o">)</span> 253:0 0 43.5G 0 lvm /
vg_sandbox-lv_swap <span class="o">(</span>dm-1<span class="o">)</span> 253:1 0 4.9G 0 lvm <span class="o">[</span>SWAP]
</code></pre></div></div>
<p>Here,</p>
<blockquote>
<ul>
<li><strong>NAME</strong> is name of device ,</li>
<li><strong>MAJ:MIN</strong> is major:minor version of device</li>
<li><strong>RM</strong> tells that its a removal device</li>
<li><strong>SIZE</strong> is size of device in human readable format</li>
<li><strong>RO</strong> tells that its Read Only device</li>
<li><strong>TYPE</strong> is device type</li>
<li><strong>MOUNTPOINT</strong> is location where device is mounted.</li>
</ul>
</blockquote>
<h3 id="to-see-device-size-in-bytes">To see device size in bytes</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@sandbox ~]# lsblk <span class="nt">-b</span>
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 52428800000 0 disk
|-sda1 8:1 0 524288000 0 part /boot
<span class="sb">`</span><span class="nt">-sda2</span> 8:2 0 51903463424 0 part
|-vg_sandbox-lv_root <span class="o">(</span>dm-0<span class="o">)</span> 253:0 0 46657437696 0 lvm /
<span class="sb">`</span><span class="nt">-vg_sandbox-lv_swap</span> <span class="o">(</span>dm-1<span class="o">)</span> 253:1 0 5242880000 0 lvm <span class="o">[</span>SWAP]
</code></pre></div></div>
<h3 id="to-see-filesystem-information">To see filesystem information</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@sandbox ~]# lsblk <span class="nt">-fl</span>
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
sda1 ext4 8ed32b8c-b23a-423b-b96f-29eaa1303ae1 /boot
sda2 LVM2_member 6CXjrD-6st6-olYP-BQAK-psA0-dS3T-8KeIRU
vg_sandbox-lv_root <span class="o">(</span>dm-0<span class="o">)</span> ext4 d6e7730a-608a-4e67-8814-131e23411619 /
vg_sandbox-lv_swap <span class="o">(</span>dm-1<span class="o">)</span> swap dc07cc2c-1b35-4b06-a52b-c0d162669afe <span class="o">[</span>SWAP]
</code></pre></div></div>
<p>Here</p>
<blockquote>
<ul>
<li><strong>FSTYPE</strong> is filesystem type</li>
<li><strong>LABEL</strong> is filesystem label</li>
<li><strong>UUID</strong> is filesystem UUID</li>
</ul>
</blockquote>
<h3 id="to-see-device-permissions">To see device permissions</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@sandbox ~]# lsblk <span class="nt">-m</span>
NAME SIZE OWNER GROUP MODE
sda 48.8G root disk brw-rw----
|-sda1 500M root disk brw-rw----
<span class="sb">`</span><span class="nt">-sda2</span> 48.3G root disk brw-rw----
|-vg_sandbox-lv_root <span class="o">(</span>dm-0<span class="o">)</span> 43.5G root disk brw-rw----
<span class="sb">`</span><span class="nt">-vg_sandbox-lv_swap</span> <span class="o">(</span>dm-1<span class="o">)</span> 4.9G root disk brw-rw----
</code></pre></div></div>
<p>Here,</p>
<blockquote>
<ul>
<li><strong>OWNER</strong> is user who created this device</li>
<li><strong>GROUP</strong> is group name to which user belongs</li>
<li><strong>MODE</strong> is device permissions</li>
</ul>
</blockquote>
<h3 id="to-see-device-topology-information">To see device topology information</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@sandbox ~]# lsblk <span class="nt">-tl</span>
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA
sda 0 512 0 512 512 1 cfq 128 128
sda1 0 512 0 512 512 1 cfq 128 128
sda2 0 512 0 512 512 1 cfq 128 128
vg_sandbox-lv_root <span class="o">(</span>dm-0<span class="o">)</span> 0 512 0 512 512 1 128 128
vg_sandbox-lv_swap <span class="o">(</span>dm-1<span class="o">)</span> 0 512 0 512 512 1 128 128
</code></pre></div></div>
<p>Here,</p>
<blockquote>
<ul>
<li><strong>ALIGNMENT</strong> is alignment offset of device</li>
<li><strong>MIN-IO</strong> is minimum I/O size</li>
<li><strong>OPT-IO</strong> is optimal I/O size</li>
<li><strong>PHY-SEC</strong> is physical sector size</li>
<li><strong>LOG-SEC</strong> is logical sector size</li>
<li><strong>ROTA</strong> tells that its a rotational device</li>
<li><strong>SCHED</strong> is name of I/O scheduler</li>
<li><strong>RQ-SIZE</strong> is size of request queue</li>
<li><strong>RA</strong> is read ahead of device.</li>
</ul>
</blockquote>Chetna Chaudharichetnachaudhari@gmail.comLsblk is a linux utility to list block device information. In this blog post, I’ll cover some useful lsblk commands.How to split a string on first occurrence of character in Hive.2016-03-07T00:00:00+00:002016-03-07T00:00:00+00:00https://chetnachaudhari.github.io/2016-03-07/hive-split-string-on-first-occurrence<p>In this article we will see how to split a string in hive on first occurrence of a character. Lets say, you have strings like apl_finance_reporting or org_namespace . Where you want to split by org (i.e string before first occurrence of ‘_’) or namespace (string after ‘_’).</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hive> create table testSplit<span class="o">(</span>namespace string<span class="o">)</span><span class="p">;</span>
hive> insert into table testSplit values <span class="o">(</span><span class="s2">"scp_apl_finance"</span><span class="o">)</span><span class="p">;</span>
hive> insert into table testSplit values <span class="o">(</span><span class="s2">"apl_finance_reporting"</span><span class="o">)</span><span class="p">;</span>
hive> <span class="k">select </span>namespace from testSplit<span class="p">;</span>
OK
scp_apl_finance
apl_finance_reporting
Time taken: 0.118 seconds, Fetched: 2 row<span class="o">(</span>s<span class="o">)</span>
hive> <span class="k">select </span>regexp_extract<span class="o">(</span>namespace, <span class="s1">'^(.*?)(?:_)(.*)$'</span>, 0<span class="o">)</span> from testSplit<span class="p">;</span>
OK
scp_apl_finance
apl_finance_reporting
Time taken: 0.064 seconds, Fetched: 2 row<span class="o">(</span>s<span class="o">)</span>
</code></pre></div></div>
<p>To get list of all orgs we can execute following query:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hive> <span class="k">select </span>regexp_extract<span class="o">(</span>namespace, <span class="s1">'^(.*?)(?:_)(.*)$'</span>, 1<span class="o">)</span> from testSplit<span class="p">;</span>
OK
scp
apl
Time taken: 0.056 seconds, Fetched: 2 row<span class="o">(</span>s<span class="o">)</span>
</code></pre></div></div>
<p>And to get list of all namespaces, use following one:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hive> <span class="k">select </span>regexp_extract<span class="o">(</span>namespace, <span class="s1">'^(.*?)(?:_)(.*)$'</span>, 2<span class="o">)</span> from testSplit<span class="p">;</span>
OK
apl_finance
finance_reporting
Time taken: 0.066 seconds, Fetched: 2 row<span class="o">(</span>s<span class="o">)</span>
</code></pre></div></div>Chetna Chaudharichetnachaudhari@gmail.comIn this article we will see how to split a string in hive on first occurrence of a character. Lets say, you have strings like apl_finance_reporting or org_namespace . Where you want to split by org (i.e string before first occurrence of ‘_’) or namespace (string after ‘_’).Linux command for Base64 encode and decode2016-03-05T00:00:00+00:002016-03-05T00:00:00+00:00https://chetnachaudhari.github.io/2016-03-05/linux-base64-encode-decode<p>Linux has base64 command to encode and decode using Base64 representation. Here is an example :
To encode a string <code class="language-plaintext highlighter-rouge">Chetna Chaudhari</code> you can use following command:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s2">"Chetna Chaudhari"</span> | <span class="nb">base64
</span><span class="nv">Q2hldG5hIENoYXVkaGFyaQo</span><span class="o">=</span>
</code></pre></div></div>
<p>You can enable debug mode using <code class="language-plaintext highlighter-rouge">-d</code> flag to see more details :</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s2">"Chetna Chaudhari"</span> | <span class="nb">base64</span> <span class="nt">-d</span>
May 16 10:56:35 Chetna.local <span class="nb">base64</span><span class="o">[</span>26454] <Info>: Read 17 bytes.
May 16 10:56:35 Chetna.local <span class="nb">base64</span><span class="o">[</span>26454] <Info>: Wrote 24 bytes.
<span class="nv">Q2hldG5hIENoYXVkaGFyaQo</span><span class="o">=</span>
</code></pre></div></div>
<p>To decode the encoded text,</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo </span><span class="nv">Q2hldG5hIENoYXVkaGFyaQo</span><span class="o">=</span> | <span class="nb">base64</span> <span class="nt">--decode</span>
Chetna Chaudhari
</code></pre></div></div>
<p>You can check more details using following command:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo </span><span class="nv">Q2hldG5hIENoYXVkaGFyaQo</span><span class="o">=</span> | <span class="nb">base64</span> <span class="nt">-d</span> <span class="nt">--decode</span>
May 16 10:56:37 Chetna.local <span class="nb">base64</span><span class="o">[</span>26431] <Info>: Read 25 bytes.
May 16 10:56:37 Chetna.local <span class="nb">base64</span><span class="o">[</span>26431] <Info>: Decoded to 17 bytes.
Chetna Chaudhari
May 16 10:56:37 Chetna.local <span class="nb">base64</span><span class="o">[</span>26431] <Info>: Wrote 17 bytes.
</code></pre></div></div>Chetna Chaudharichetnachaudhari@gmail.comLinux has base64 command to encode and decode using Base64 representation. Here is an example : To encode a string Chetna Chaudhari you can use following command: