Dealing with ‘under-replicated’ blocks in the HDFS

Capture One fine evening, I came across a notification in our HDP cluster. This was result of planned node modification which resulted in the ephemeral disks losing data.

HDP would have managed this over time and replicated the relevant blocks. However, I wanted to trigger the process as soon as possible, so that I could take advantage of lower traffic to our cluster.

rm /tmp/under_replicated_files

hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

ls -lh /tmp/under_replicated_files

for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done

Fixing  /apps/hive/warehouse/database.db/tablename/000161_0 :Replication 3 set: /apps/hive/warehouse/database.db/tablename/000161_0.
.
.

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s