Working on a project to implement LLM in Databricks using Hugging face, I faced issue of being unable to download…
Weekdays cron job, but for a particular date range.
Cron is a very simple, yet useful job scheduler on linux. To schedule a job for weekdays, one can easily…
Managing Databricks Secrets using Windows.
Databricks has a method of Secret Management which allows users to save sensitive data such as JDBC connection or Snowflake…
PySpark DataFrame ordering based on multiple columns
How to deal with the requirement to order a DataFrame using more than one column simultaneously? Also, consider some values…
How to retain the first row of each ‘group’ in a PySpark DataFrame?
For a given dataframe, with multiple occurrence of a particular column value, one may desire to retain only one (or…
Installing ‘s3cmd’ on centOS 7+
AWS S3 is one of the oldest offerings for ‘object storage’, from Amazon AWS and when working through a terminal,…
How to split contents of a column into *only two* by a delimiter?
While working with PySpark, I came across a requirement, where data in a column had to be split using delimiters…
Change java for Cloudera cluster using ambari
If you are using Hortonworks (presently Cloudera) cluster, the default value for java is Oracle JDK 1.8. If you have…