How to deal with the requirement to order a DataFrame using more than one column simultaneously? Also, consider some values … More
Category: Apache Spark
How to retain the first row of each ‘group’ in a PySpark DataFrame?
For a given dataframe, with multiple occurrence of a particular column value, one may desire to retain only one (or … More
How to split contents of a column into *only two* by a delimiter?
While working with PySpark, I came across a requirement, where data in a column had to be split using delimiters … More
Thoughts on attending the AWS Summit 2016.
It was fun attending the AWS Summit at Santa Clara today. Albeit a free to register event, I felt it … More