How to export complete Hive table to a local file?

Generally, solutions provided over the internet point towards:

hive -e ‘select * from dbname.tablename;’ > /path/to/datadump.csv

However, this will not work for Big Data!

For a table with multi million rows, I generally have to use a command similar to this:

INSERT OVERWRITE LOCAL DIRECTORY 
'/path/to/datadump_FOLDER' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
select * from dbname.tablename;

The important thing to notice here is that you are providing path to a folder and not just the filename. (Why? discussed in a bit!) Also, as you notice, it is easy to change the delimiter to ‘\t’ or ‘|’. The files in the folder location will have the delimiters of your choice.

Now, about the folder. Make sure your user have necessary permissions to write to that folder (duh!). Since we are dealing with Big Data here, the data will be written in parallel to this folder as multiple files. As, the output is from reducers, it is written as 000_001 … 000_00n.

You may then navigate to that folder and generate a single file with all the files combined.

cd /path/to/datadump_FOLDER 
cat 00* >> datadump.csv 

Following this blog post will be “How to exclude certain column(s) while exporting a Hive Table to local file?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s