Databricks has a method of Secret Management which allows users to save sensitive data such as JDBC connection or Snowflake … More
Category: Software Playbooks
PySpark DataFrame ordering based on multiple columns
How to deal with the requirement to order a DataFrame using more than one column simultaneously? Also, consider some values … More
Installing ‘s3cmd’ on centOS 7+
AWS S3 is one of the oldest offerings for ‘object storage’, from Amazon AWS and when working through a terminal, … More
Change java for Cloudera cluster using ambari
If you are using Hortonworks (presently Cloudera) cluster, the default value for java is Oracle JDK 1.8. If you have … More
How to encrypt files before sharing online?
As a Data Engineer, one faces the need to share files securely over the internet. An easy way of doing … More
Optimized Import of Text Data for Analytics
Text data analysis is a staple use case for the Data Analytics world! There are multiple firms which enable capture … More
How to ingest JSON data to Hive?
Given JSON Data { “Person”: { “FirstName”: “Piyush”, “LastName”: “Routray”, “date”: [1990, 10, 28] }, “Friends”: [{ “Name”: “Vaibhav”, “State”: … More
How to use an NTFS disk with mac?
When I wanted to upgrade from my present Macbook Pro, I was faced with the challenge of moving all the … More