In a previous post, I documented How to export complete Hive table to a local file?
While being required to get a data dump, you may have been asked to exclude certain data eg. “creditcard” column.
You will have to use REGEX to tackle similar requirement. Verifying and setting the hive.support.quoted.identifiers
to none
is needed after Hive 0.13.0 releases.
SET hive.support.quoted.identifiers;
+-----------------------------------------+--+
| set |
+-----------------------------------------+--+
| hive.support.quoted.identifiers=column |
+-----------------------------------------+--+
SET hive.support.quoted.identifiers=NONE;
No rows affected (0.011 seconds)
SET hive.support.quoted.identifiers;
+---------------------------------------+--+
| set |
+---------------------------------------+--+
| hive.support.quoted.identifiers=NONE |
+---------------------------------------+--+
After that you may proceed to test the select statement(s) with the REGEX.
SELECT `(creditcard|cc_details)?+.+` FROM tablename;
would select all other columns except “creditcard” or “cc_details”. It will obviously be helpful to “describe” your table beforehand and use exact column names which you want to exclude.
To then, get a data dump, you will have to execute something similar to the following code:
INSERT OVERWRITE LOCAL DIRECTORY
'/path/to/datadump_FOLDER'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select `(creditcard|cc_details)?+.+` from dbname.tablename;