How to exclude certain column(s) while exporting a Hive Table to local file?

In a previous post, I documented How to export complete Hive table to a local file?

While being required to get a data dump, you may have been asked to exclude certain data eg. “creditcard” column.

You will have to use REGEX to tackle similar requirement. Verifying and setting the hive.support.quoted.identifiers to none is needed after Hive 0.13.0 releases.

SET hive.support.quoted.identifiers;
+-----------------------------------------+--+
|                   set                   |
+-----------------------------------------+--+
| hive.support.quoted.identifiers=column  |
+-----------------------------------------+--+


SET hive.support.quoted.identifiers=NONE;
No rows affected (0.011 seconds)

SET hive.support.quoted.identifiers;
+---------------------------------------+--+
|                  set                  |
+---------------------------------------+--+
| hive.support.quoted.identifiers=NONE  |
+---------------------------------------+--+

After that you may proceed to test the select statement(s) with the REGEX.

SELECT `(creditcard|cc_details)?+.+` FROM tablename;
would select all other columns except “creditcard” or “cc_details”. It will obviously be helpful to “describe” your table beforehand and use exact column names which you want to exclude.

To then, get a data dump, you will have to execute something similar to the following code:

INSERT OVERWRITE LOCAL DIRECTORY 
'/path/to/datadump_FOLDER' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
select `(creditcard|cc_details)?+.+` from dbname.tablename;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s