This is a sample project for Databricks, generated via cookiecutter.
While using this project, you need Python 3.X and pip
or conda
for package management.
pip install -r unit-requirements.txt
pip install -e .
For local unit testing, please use pytest
:
pytest tests/unit
For an integration test on interactive cluster, use the following command:
dbx execute --cluster-name=<name of interactive cluster> --job=lendingclub_scoring_dbx-sample-integration-test
For a test on a automated job cluster, use launch
instead of execute
:
dbx launch --job=lendingclub_scoring_dbx-sample-integration-test
dbx
expects that cluster for interactive execution supports%pip
and%conda
magic commands.- Please configure your job in
conf/deployment.json
file. - To execute the code interactively, provide either
--cluster-id
or--cluster-name
.
dbx execute \
--cluster-name="<some-cluster-name>" \
--job=job-name
Multiple users also can use the same cluster for development. Libraries will be isolated per each execution context.
Next step would be to configure your deployment objects. To make this process easy and flexible, we're using JSON for configuration.
By default, deployment configuration is stored in conf/deployment.json
.
To start new deployment, launch the following command:
dbx deploy
You can optionally provide requirements.txt via --requirements
option, all requirements will be automatically added to the job definition.
After the deploy, launch the job via the following command:
dbx launch --job=lendingclub_scoring_dbx-sample
Please set the following secrets or environment variables. Follow the documentation for GitHub Actions or for Azure DevOps Pipelines:
DATABRICKS_HOST
DATABRICKS_TOKEN