Importing HuggingFace libraries to Databricks DBFS

Working on a project to implement LLM in Databricks using Hugging face, I faced issue of being unable to download the required libraries directly. Hence, had to find a work around for downloading it to my macbook and uploading the libraries to DBFS to be able to use the model in my Databricks workbook.

This guide explains how to download the sentence-transformers/all-MiniLM-L6-v2 model locally (on macOS), prepare it for offline use, upload it to Databricks DBFS, and load it successfully in a notebook.

🧰 Step 1: Setup Local Environment

/Library/Frameworks/Python.framework/Versions/3.10/bin/python3
python3 -m venv venv
source venv/bin/activate
pip install sentence-transformers huggingface_hub

🧱 Step 2: Download Model Files (Manually via curl)

mkdir all-MiniLM-L6-v2 && cd all-MiniLM-L6-v2

# Required files
sudo curl -L -O https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/config.json
sudo curl -L -O https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/pytorch_model.bin
sudo curl -L -O https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer_config.json
sudo curl -L -O https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/vocab.txt
sudo curl -L -O https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/sentence_bert_config.json
sudo curl -L -O https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/modules.json

# Subfolders
mkdir 0_Transformer 1_Pooling
sudo curl -L -o 0_Transformer/config.json https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/0_Transformer/config.json
sudo curl -L -o 1_Pooling/config.json https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/1_Pooling/config.json

📦 Step 3: Zip & Upload to Databricks

cd all-MiniLM-L6-v2
zip -r ../all-MiniLM-L6-v2.zip *
cd ..

databricks fs mkdirs dbfs:/FileStore/models/
databricks fs cp all-MiniLM-L6-v2.zip dbfs:/FileStore/models/all-MiniLM-L6-v2.zip --overwrite

🧪 Step 4: Extract & Use in Databricks Notebook

dbutils.fs.cp("dbfs:/FileStore/models/all-MiniLM-L6-v2.zip", "file:/tmp/all-MiniLM-L6-v2.zip", True)

import zipfile
with zipfile.ZipFile("/tmp/all-MiniLM-L6-v2.zip", 'r') as zip_ref:
    zip_ref.extractall("/tmp/all-MiniLM-L6-v2")

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("/tmp/all-MiniLM-L6-v2")

sentences = ["Databricks is awesome.", "Transformers are powerful."]
embeddings = model.encode(sentences)
print(f"Embedding shape: {embeddings.shape}")

🧹 Tips

Make sure to use curl -L to follow redirects.
Verify pytorch_model.bin is ~90 MB, not a small HTML file.
Adjust extraction path if nested folder appears inside zip.

Importing HuggingFace libraries to Databricks DBFS

🧰 Step 1: Setup Local Environment

🧱 Step 2: Download Model Files (Manually via curl)

📦 Step 3: Zip & Upload to Databricks

🧪 Step 4: Extract & Use in Databricks Notebook

🧹 Tips

Published by Piyush

Leave a comment Cancel reply

🧰 Step 1: Setup Local Environment

🧱 Step 2: Download Model Files (Manually via curl)

📦 Step 3: Zip & Upload to Databricks

🧪 Step 4: Extract & Use in Databricks Notebook

🧹 Tips

Share this:

Published by Piyush

Leave a comment Cancel reply