How to ingest JSON data to Hive?

Given JSON Data

{
“Person”: {
“FirstName”: “Piyush”,
“LastName”: “Routray”,
“date”: [1990, 10, 28]
},
“Friends”: [{
“Name”: “Vaibhav”,
“State”: “UP”
}, {
“Name”: “Salah”,
“State”: “Kerala”
}, {
“Name”: “Bilal”,
“State”: “Odisha”
}]
} {
“Person”: {
“FirstName”: “Ananya”,
“LastName”: “Mohanty”,
“date”: [1990, 09, 04]
},
“Friends”: [{
“Name”: “Piyush”,
“State”: “Odisha”
}, {
“Name”: “Sarmi”,
“State”: “Odisha”
}, {
“Name”: “Shah Jahan”,
“State”: “Odisha”
}]
}

Start with validating the JSON data at http://jsonlint.com/
Then, to import data to a hive table, first we should remove the line breaks. Unexpected newlines do not work well with Hadoop due to the input splits which are created according to the presence of line breaks. Sure, it decreases the readability but is important due to the reason mentioned above.

Sample script I used for the same.


import os
directory = raw_input('Enter the directory path (eg. ./2016)')

for root, dirs, files in os.walk(directory):
for name in files:
filename = os.path.join(root, name)
f = open(filename)
data = f.read().replace("\n", "").replace("\t", "").replace(" ", "").replace("}{", "}\n{")
f.close()
filename = "NewFolder/" + name
f = open(filename, 'w')
f.write(data)
f.close()

Upload the file to the hdfs. In our case, assume we are in ‘ /tmp/proutray/jsonfolder/‘.
Now, download and save the relevant jar for your Hive version from here.

For ease, I assume we are in the same folder as the jar was downloaded. Now, pasting the output from the terminal.

[hdfs@ip-10-0-3-67 ~]$ ls
json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar
[hdfs@ip-10-0-3-67 ~]$ hive

hive> add jar  json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar;
Added [json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar] to class path
Added resources: [json-serde-1.1.9.9-Hive1.2-jar-with-dependencies.jar]

hive>
CREATE EXTERNAL TABLE IF NOT EXISTS jsonTable (
Person STRUCT ,
Friends ARRAY
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/tmp/proutray/jsonfolder/';

OK
Time taken: 0.193 seconds

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s