Local env with spark+delta

Minimal example of a local Python setup with Spark and DeltaLake that allows unit-testing spark/delta via pytest.

The setup is inspired by dbx by Databricks.

Delta

To include Delta in the Spark session created by pytest the spark fixture in ./tests/conftest.py runs configure_spark_with_delta_pip and adds the following settings to the spark config:

key	value
spark.sql.extensions	io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog	org.apache.spark.sql.delta.catalog.DeltaCatalog

See https://docs.delta.io/3.2.0/quick-start.html#python for more info.

Development

Requirements:

Python >= 3.10
Java 8, 11 or 17 for Spark (https://spark.apache.org/docs/3.5.1/#downloading)
- JAVA_HOME must be set

Setup Virtual environment

Following commands create and activate a virtual environment.

The [dev] also installs development tools.
The --editable makes the CLI script available.

Commands:

Makefile:

make requirements
source .venv/bin/activate

Windows:

python -m venv .venv
.venv\Scripts\activate
python -m pip install --upgrade uv
uv pip install --editable .[dev]

Updating locked dependencies

To lock dependencies from pyproject.toml into requirements.txt files:

Without dev dependencies:

uv pip compile pyproject.toml -o requirements.txt

With dev dependencies:
```
uv pip compile pyproject.toml --extra dev -o requirements-dev.txt
```
- We use uv pip install instead of uv pip sync to also have an editable install.

Windows

I recommend using wsl instead, as even with the additional hadoop libraries spark-delta occasionally simply freezes.

To run this on Windows you need additional Haddop libraries, see https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems.

"In particular, %HADOOP_HOME%\BIN\WINUTILS.EXE must be locatable."

Download the bin directory https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin (required files: hadoop.dll and winutils.exe)
Set environment variable HADOOP_HOME to the directory above the bin directory

Run tests

Makefile

make test

Windows

pytest -vv

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
.vscode		.vscode
sparkdelta		sparkdelta
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local env with spark+delta

Delta

Development

Setup Virtual environment

Updating locked dependencies

Windows

Run tests

About

Releases

Packages

Languages

License

frizzleqq/pyspark-deltalake

Folders and files

Latest commit

History

Repository files navigation

Local env with spark+delta

Delta

Development

Setup Virtual environment

Updating locked dependencies

Windows

Run tests

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages