HDFS Metadata - Configuration Properties
In this article, we will see configuration properties used to decide behaviour of hdfs metadata directories. All these properties are part of hdfs-site.xml file. Description and default values are picked from hdfs-default.xml.
dfs.datanode.data.dir
Description:
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.
Default Value:
file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.check.period
Description:
The SecondaryNameNode and CheckpointNode will poll the NameNode every dfs.namenode.checkpoint.check.period
seconds to query the number of uncheckpointed transactions.
Default Value:
60
dfs.namenode.checkpoint.period
Description:
The number of seconds between two periodic checkpoints.
Default Value:
3600
dfs.namenode.checkpoint.txns
Description:
The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every dfs.namenode.checkpoint.txns
transactions, regardless of whether dfs.namenode.checkpoint.period
has expired.
Default Value:
1000000
dfs.namenode.edit.log.autoroll.check.interval.ms
Description:
How often an active namenode will check if it needs to roll its edit log, in milliseconds.
Default Value:
300000
dfs.namenode.edit.log.autoroll.multiplier.threshold
Description:
Determines when an active namenode will roll its own edit log. The actual threshold (in number of edits) is determined by multiplying this value by dfs.namenode.checkpoint.txns
. This prevents extremely large edit files from accumulating on the active namenode, which can cause timeouts during namenode startup and pose an administrative hassle. This behavior is intended as a failsafe for when the standby or secondary namenode fail to roll the edit log by the normal checkpoint threshold.
Default Value:
2.0
dfs.namenode.edits.dir
Description:
Determines where on the local filesystem the DFS name node should store the transaction (edits) file. If this is a comma-delimited list of directories then the transaction file is replicated in all of the directories, for redundancy. Default value is same as dfs.namenode.name.dir
Default Value:
${dfs.namenode.name.dir}
dfs.namenode.name.dir
Description:
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
Default Value:
file://${hadoop.tmp.dir}/dfs/name
dfs.namenode.num.checkpoints.retained
Description:
The number of image checkpoint files (fsimage_*) that will be retained by the NameNode and Secondary NameNode in their storage directories. All edit logs (stored on edits_* files) necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained.
Default Value:
2
dfs.namenode.num.extra.edits.retained
Description:
The number of extra transactions which should be retained beyond what is minimally necessary for a NN restart. It does not translate directly to file’s age, or the number of files kept, but to the number of transactions (here “edits” means transactions). One edit file may contain several transactions (edits). During checkpoint, NameNode will identify the total number of edits to retain as extra by checking the latest checkpoint transaction value, subtracted by the value of this property. Then, it scans edits files to identify the older ones that don’t include the computed range of retained transactions that are to be kept around, and purges them subsequently. The retainment can be useful for audit purposes or for an HA setup where a remote Standby Node may have been offline for some time and need to have a longer backlog of retained edits in order to start again. Typically each edit is on the order of a few hundred bytes, so the default of 1 million edits should be on the order of hundreds of MBs or low GBs.
NOTE: Fewer extra edits may be retained than value specified for this setting if doing so would mean that more segments would be retained than the number configured by dfs.namenode.max.extra.edits.segments.retained
.
Default Value:
1000000
dfs.namenode.checkpoint.dir
Description:
Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Default Value:
file://${hadoop.tmp.dir}/dfs/namesecondary
dfs.namenode.checkpoint.edits.dir
Description:
Determines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directories then the edits is replicated in all of the directories for redundancy. Default value is same as dfs.namenode.checkpoint.dir
Default Value:
${dfs.namenode.checkpoint.dir}
dfs.namenode.checkpoint.max-retries
Description:
The SecondaryNameNode retries failed check pointing. If the failure occurs while loading fsimage or replaying edits, the number of retries is limited by this variable.
Default Value:
3