The Ultimate Databricks dbutils Cheatsheet

You're switching between browser tabs again, hunting for that one dbutils command you used last month. I've distilled the most essential, high-impact commands into a single screen - the 20% you'll use 80% of the time.

Bookmark this for future reference

Leave your email to get notified about new Databricks learning materials.

Subscribe on Substack

1. Filesystem Operations (dbutils.fs)

Essential File Operations

List files and directories:

# List contents of a directory
dbutils.fs.ls("/mnt/data/")
dbutils.fs.ls("s3://bucket-name/folder/")

# Recursive listing
dbutils.fs.ls("/mnt/data/", True)

File and directory management:

# Create directory
dbutils.fs.mkdirs("/mnt/data/new_folder/")

# Copy files/directories
dbutils.fs.cp("/mnt/source/file.csv", "/mnt/destination/file.csv")
dbutils.fs.cp("/mnt/source/", "/mnt/destination/", recurse=True)

# Move/rename files
dbutils.fs.mv("/mnt/old_location/file.csv", "/mnt/new_location/file.csv")

# Delete files/directories
dbutils.fs.rm("/mnt/data/unwanted_file.csv")
dbutils.fs.rm("/mnt/data/unwanted_folder/", recurse=True)

File content operations:

# Read file content (small files only)
content = dbutils.fs.head("/mnt/data/config.txt")
print(content)

# Write content to file
dbutils.fs.put("/mnt/data/output.txt", "Hello World", overwrite=True)

Storage Mounting (Legacy)

Note: Unity Catalog External Locations are now the recommended best practice over manual mounting.

Mount Azure Data Lake Storage:

configs = {
    "fs.azure.account.auth.type": "OAuth",
    "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id": dbutils.secrets.get("key-vault", "client-id"),
    "fs.azure.account.oauth2.client.secret": dbutils.secrets.get("key-vault", "client-secret"),
    "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/{tenant-id}/oauth2/token"
}

dbutils.fs.mount(
    source="abfss://container@storage.dfs.core.windows.net/",
    mount_point="/mnt/storage",
    extra_configs=configs
)

Unmount storage:

dbutils.fs.unmount("/mnt/storage")

2. Secrets Management (dbutils.secrets)

Accessing Secrets

# List available secret scopes
dbutils.secrets.listScopes()

# List secrets in a scope
dbutils.secrets.list("my-secret-scope")

# Get secret value (will be redacted in output)
password = dbutils.secrets.get("my-secret-scope", "database-password")

3. Notebook Workflows (dbutils.notebook)

Running Other Notebooks

# Run another notebook and get its exit value
result = dbutils.notebook.run("/Shared/data-processing", 60)

# Run with parameters
result = dbutils.notebook.run(
    "/Shared/etl-pipeline", 
    timeout_seconds=1800,
    arguments={"date": "2024-01-15", "env": "prod"}
)

Notebook Exit and Results

# Exit notebook with a custom result
dbutils.notebook.exit("Processing completed successfully")

# Exit with a JSON result for programmatic use
import json
result = {"status": "success", "records_processed": 1000}
dbutils.notebook.exit(json.dumps(result))

4. Library Management (dbutils.library)

Installing Libraries

# Install PyPI package
dbutils.library.installPyPI("requests")
dbutils.library.installPyPI("pandas", version="1.5.0")

# Install and restart Python to make library available
dbutils.library.installPyPI("new-package")
dbutils.library.restartPython()

5. Widgets & Parameters (dbutils.widgets)

Creating and Using Widgets

# Create various types of input widgets
dbutils.widgets.text("environment", "dev", "Environment")
dbutils.widgets.dropdown("region", "us-east-1", ["us-east-1", "us-west-2"])
dbutils.widgets.multiselect("tables", "customers", ["customers", "orders"])

# Get widget values in your code
env = dbutils.widgets.get("environment")
region = dbutils.widgets.get("region")

# Remove widgets
dbutils.widgets.remove("environment")
dbutils.widgets.removeAll()

6. Job & Cluster Information

Getting Runtime Information

# Get context about the notebook's environment
context = dbutils.notebook.entry_point.getDbutils().notebook().getContext()

# Get current user, notebook path, etc.
user = context.userName().get()
path = context.notebookPath().get()

# Get cluster ID from Spark config
cluster_id = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

7. Quick Reference Commands

# File operations
dbutils.fs.ls("/mnt/data/")
dbutils.fs.cp("/source/", "/destination/", recurse=True)
dbutils.fs.rm("/path/to/delete/", recurse=True)

# Secrets
secret = dbutils.secrets.get("scope-name", "secret-key")

# Run notebook
result = dbutils.notebook.run("/path/to/notebook", 3600, {"param": "value"})

# Install package & restart
dbutils.library.installPyPI("package-name")
dbutils.library.restartPython()

# Widgets
dbutils.widgets.text("param_name", "default_value", "Label")
param_value = dbutils.widgets.get("param_name")
dbutils.widgets.removeAll()

8. Common Pitfalls & Best Practices

❌ Common Mistakes

Not handling exceptions with file operations, and hardcoding secrets.

# BAD: Will crash if file doesn't exist
dbutils.fs.rm("/path/that/might/not/exist")

# GOOD: Handle exceptions
try:
    dbutils.fs.rm("/path/that/might/not/exist")
except Exception as e:
    print(f"File deletion failed: {e}")

# BAD: Never hardcode secrets
password = "mypassword123"

# GOOD: Use secret management
password = dbutils.secrets.get("db-secrets", "password")

✅ Best Practices

Use descriptive widget labels, validate inputs, and use consistent naming conventions.

# Use clear, descriptive labels for widgets
dbutils.widgets.text("processing_date", "2024-01-01", "Data Processing Date (YYYY-MM-DD)")

# Validate inputs from widgets
import datetime
date_input = dbutils.widgets.get("processing_date")
try:
    datetime.datetime.strptime(date_input, "%Y-%m-%d")
except ValueError:
    raise ValueError("Date must be in YYYY-MM-DD format")
↑ Top