How to deal with the requirement to order a DataFrame using more than one column simultaneously? Also, consider some values to be “null” which should be put at the end. (Check reference doc. at the end for other options.)
df = spark.createDataFrame([('Tom', 80), (None, 60),\
('Alice', None), ('Alice', 10),\
('Alice', 11), ('Alice', 13)],\
["name", "height"])
df.show()
#+-----+------+
#| name|height|
#+-----+------+
#| Tom| 80|
#| null| 60|
#|Alice| null|
#|Alice| 10|
#|Alice| 11|
#|Alice| 13|
#+-----+------+
df.orderBy(df.name.asc_nulls_last(),\
df.height.desc_nulls_last()).show()
#+-----+------+
#| name|height|
#+-----+------+
#|Alice| 13|
#|Alice| 11|
#|Alice| 10|
#|Alice| null|
#| Tom| 80|
#| null| 60|
#+-----+------+
Relevant documentation can be found here.
1 Comment