Skip to content

Piyush Routray

  • Twitter
  • LinkedIn
  • About Piyush

PySpark DataFrame ordering based on multiple columns

How to deal with the requirement to order a DataFrame using more than one column simultaneously? Also, consider some values…

Data Engineering, Databricks, PySpark

How to retain the first row of each ‘group’ in a PySpark DataFrame?

For a given dataframe, with multiple occurrence of a particular column value, one may desire to retain only one (or…

Databricks, PySpark

Installing ‘s3cmd’ on centOS 7+

AWS S3 is one of the oldest offerings for ‘object storage’, from Amazon AWS and when working through a terminal,…

AWS S3, centOS 7, s3cmd

How to split contents of a column into *only two* by a delimiter?

While working with PySpark, I came across a requirement, where data in a column had to be split using delimiters…

PySpark

Change java for Cloudera cluster using ambari

If you are using Hortonworks (presently Cloudera) cluster, the default value for java is Oracle JDK 1.8. If you have…

ambari, cloudera, Hortonworks, java, OpenJDK, OracleJDK

How to encrypt files before sharing online?

As a Data Engineer, one faces the need to share files securely over the internet. An easy way of doing…

Data Engineering, PGP, security

Optimized Import of Text Data for Analytics

Text data analysis is a staple use case for the Data Analytics world! There are multiple firms which enable capture…

AWS Firehose, AWS S3, Data Engineering, Snowflake

How to copy a file, recreating the directory structure, using python?

To copy “Folder1/Folder2/file1” to “Folder3” source structure: Folder1/FolderA ………………….(not to be copied)Folder1/fileX…………………………(not to be copied)Folder1/Folder2/file1 desired destination structure: Folder3/Folder1/Folder2/file1 P.S.…

Python, stackoverflow

Posts navigation

Older posts

Browse by Category

ambari AWS Firehose AWS S3 centOS 7 cloudera Databricks Data Engineering datanode DevOps docker hadoop HDP hive Hortonworks java jenkins kubernetes OpenJDK OracleJDK PGP PySpark Python s3cmd security Snowflake stackoverflow

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 448 other followers

My Tweets
Create a website or blog at WordPress.com
Cancel

 
Loading Comments...
Comment
    ×