Skip to content

Piyush Routray

  • Twitter
  • LinkedIn
  • About Piyush

Tag: Databricks

PySpark DataFrame ordering based on multiple columns

How to deal with the requirement to order a DataFrame using more than one column simultaneously? Also, consider some values … More

Data Engineering, Databricks, PySpark

How to retain the first row of each ‘group’ in a PySpark DataFrame?

For a given dataframe, with multiple occurrence of a particular column value, one may desire to retain only one (or … More

Databricks, PySpark

Browse by Category

ambari AWS Firehose AWS S3 centOS 7 cloudera Databricks Data Engineering datanode DevOps docker hadoop HDP hive Hortonworks java jenkins kubernetes OpenJDK OracleJDK PGP PySpark Python s3cmd security Snowflake stackoverflow

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 448 other followers

My Tweets
Create a website or blog at WordPress.com