How to ingest JSON data to Hive?

Given JSON Data

{
“Person”: {
“FirstName”: “Piyush”,
“LastName”: “Routray”,
“date”: [1990, 10, 28]
},
“Friends”: [{
“Name”: “Vaibhav”,
“State”: “UP”
}, {
“Name”: “Salah”,
“State”: “Kerala”
}, {
“Name”: “Bilal”,
“State”: “Odisha”
}]
} {
“Person”: {
“FirstName”: “Ananya”,
“LastName”: “Mohanty”,
“date”: [1990, 09, 04]
},
“Friends”: [{
“Name”: “Piyush”,
“State”: “Odisha”
}, {
“Name”: “Sarmi”,
“State”: “Odisha”
}, {
“Name”: “Shah Jahan”,
“State”: “Odisha”
}]
}

Start with validating the JSON data at http://jsonlint.com/
Then, to import data to a hive table, first we should remove the line breaks. Unexpected newlines do not work well with Hadoop due to the input splits which are created according to the presence of line breaks. Sure, it decreases the readability but is important due to the reason mentioned above.

Sample script I used for the same.

import os directory = raw_input('Enter the directory path (eg. ./2016)')

for root, dirs, files in os.walk(directory): for name in files: filename = os.path.join(root, name) f = open(filename) data = f.read().replace("\n", "").replace("\t", "").replace(" ", "").replace("}{", "}\n{") f.close() filename = "NewFolder/" + name f = open(filename, 'w') f.write(data) f.close()

Upload the file to the hdfs. In our case, assume we are in ‘ /tmp/proutray/jsonfolder/‘.
Now, download and save the relevant jar for your Hive version from here.

For ease, I assume we are in the same folder as the jar was downloaded. Now, pasting the output from the terminal.

[hdfs@ip-10-0-3-67 ~]$ ls json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar [hdfs@ip-10-0-3-67 ~]$ hive

hive> add jar json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar; Added [json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar] to class path Added resources: [json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar]

hive> CREATE EXTERNAL TABLE IF NOT EXISTS jsonTable ( Person STRUCT , Friends ARRAY ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/tmp/proutray/jsonfolder/';

OK Time taken: 0.193 seconds

How to ingest JSON data to Hive?

Published by Piyush

Leave a comment Cancel reply

Share this:

Published by Piyush

Leave a comment Cancel reply