Dealing with ‘Blocks with no live replicas’ in the HDFS

In a previous post, we dealt about ‘Under-replicated blocks in the HDFS‘. However, while decommissioning a couple of worker nodes, I also noticed there were some blocks with no live replicas. That would mean, if a datanode is ‘deleted’ the file with these blocks will get corrupted.
Letting the node(s) to decommission make take sweet long time which may not be ideal situation for everyone, given we are paying by the hour in cloud!

undefined

There are two methods to deal with this issue.

  • The forceful method would be to switch between the namenodes. This replicates some of the blocks during the switch. While this may seem to be fastest way, it may not be possible for multiple switches on a production environment.
  • The graceful method would be to use ‘metasave’ option from -dfsadmin commands and deal with the blocks, one by one. Sample code is presented below.
rm /tmp/single_replica
hdfs dfsadmin -metasave metasave-report.txt
cat /path/to/logs/hadoop/hdfs/metasave-report.txt | grep "l: 1" | cut -d':' -f1 >> /tmp/single_replica
for hdfsfile in `cat /tmp/single_replica`; do hadoop fs -setrep 3 $hdfsfile; done

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s