forked from apache/superset
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(helm): Helm template for Celery beat (for reporting and alerting) (
apache#13116) * Custom superset_config.py + secret envs * Helm template for celery beat * Fix end of file * Update helm/superset/values.yaml Co-authored-by: Valentin Nourdin <[email protected]> * Rename pods * Update helm/superset/values.yaml Co-authored-by: Valentin Nourdin <[email protected]> * Update helm/superset/templates/deployment-beat.yaml Co-authored-by: Valentin Nourdin <[email protected]> * Update helm/superset/templates/deployment-beat.yaml Co-authored-by: Valentin Nourdin <[email protected]> * Update helm/superset/templates/deployment-beat.yaml Co-authored-by: Valentin Nourdin <[email protected]> * Update helm/superset/templates/deployment-beat.yaml Co-authored-by: Valentin Nourdin <[email protected]> * Update helm/superset/templates/deployment-beat.yaml Co-authored-by: Valentin Nourdin <[email protected]> * Added Kubernetes documentation * Data source declarations Co-authored-by: Valentin Nourdin <[email protected]>
- Loading branch information
Showing
6 changed files
with
489 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,363 @@ | ||
--- | ||
name: Running on Kubernetes | ||
menu: Installation and Configuration | ||
route: /docs/installation/running-on-kubernetes | ||
index: 12 | ||
version: 1 | ||
--- | ||
|
||
## Running on Kubernetes | ||
|
||
Running on Kubernetes is supported with the provided [Helm](helm.sh/) chart included in the Github repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset). | ||
|
||
### Prerequisites | ||
|
||
* A Kubernetes cluster | ||
* Helm installed | ||
|
||
### Running | ||
|
||
1. Configure your setting overrides | ||
|
||
Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on: | ||
|
||
* [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis) | ||
* [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql) | ||
|
||
More info down below on some important overrides you might need. | ||
|
||
1. Install and run | ||
|
||
```sh | ||
# From the root of the repository | ||
helm upgrade --install --values my-values.yaml my-superset helm/superset | ||
``` | ||
|
||
You should see various pods popping up, such as: | ||
|
||
```sh | ||
kubectl get pods | ||
NAME READY STATUS RESTARTS AGE | ||
superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s | ||
superset-f5c9c667-dw9lp 1/1 Running 0 4m7s | ||
superset-f5c9c667-fk8bk 1/1 Running 0 4m11s | ||
superset-init-db-zlm9z 0/1 Completed 0 111s | ||
superset-postgresql-0 1/1 Running 0 6d20h | ||
superset-redis-master-0 1/1 Running 0 6d20h | ||
superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s | ||
superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s | ||
``` | ||
|
||
The exact list will depend on some of your specific configuration overrides but you should generally expect: | ||
|
||
* N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `replicaCount` value) | ||
* 1 `superset-postgresql-0` depending on your postgres settings | ||
* 1 `superset-redis-master-0` depending on your redis settings | ||
* 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides | ||
|
||
1. Access it | ||
|
||
The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either: | ||
|
||
* Configure the Service as a `LoadBalancer` or `NodePort` | ||
* Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...) | ||
* Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost | ||
|
||
Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with: | ||
|
||
* user: `admin` | ||
* password: `admin` | ||
|
||
### Important settings | ||
|
||
#### Security settings | ||
|
||
Default security settings and passwords are included but you __SHOULD__ override those with your own, in particular: | ||
|
||
```yaml | ||
postgresql: | ||
postgresqlPassword: superset | ||
``` | ||
#### Dependencies | ||
You can specify pip packages to be installed before startup, e.g. to install extra database drivers: | ||
```yaml | ||
additionalRequirements: | ||
- psycopg2 | ||
- redis | ||
- elasticsearch-dbapi | ||
- pymssql | ||
- gsheetsdb | ||
# Force verstion to work around https://github.com/betodealmeida/gsheets-db-api/issues/15 | ||
- moz-sql-parser==4.9.21002 | ||
# For OAuth | ||
- Authlib | ||
# For webdriver / reports | ||
- gevent | ||
``` | ||
__WARNING__: The list will replace the default one from the default `values.yaml` entirely, not _add_ to it... | ||
|
||
#### superset_config.py | ||
|
||
The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.: | ||
|
||
```yaml | ||
configOverrides: | ||
my_override: | | ||
# This will make sure the redirect_uri is properly computed, even with SSL offloading | ||
ENABLE_PROXY_FIX = True | ||
FEATURE_FLAGS = { | ||
"DYNAMIC_PLUGINS": True | ||
} | ||
``` | ||
|
||
Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain. | ||
|
||
The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that. | ||
|
||
Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py` | ||
|
||
#### Environment Variables | ||
|
||
Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`. | ||
|
||
```yaml | ||
extraEnv: | ||
SMTP_HOST: smtp.gmail.com | ||
SMTP_USER: [email protected] | ||
SMTP_PORT: "587" | ||
SMTP_MAIL_FROM: [email protected] | ||
extraSecretEnv: | ||
SMTP_PASSWORD: xxxx | ||
configOverrides: | ||
smtp: | | ||
import ast | ||
SMTP_HOST = os.getenv("SMTP_HOST","localhost") | ||
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) | ||
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) | ||
SMTP_USER = os.getenv("SMTP_USER","superset") | ||
SMTP_PORT = os.getenv("SMTP_PORT",25) | ||
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") | ||
``` | ||
|
||
#### System packages | ||
|
||
If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.: | ||
|
||
```yaml | ||
supersetWorker: | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
apt update | ||
apt install -y somepackage | ||
apt autoremove -yqq --purge | ||
apt clean | ||
# Run celery worker | ||
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker | ||
``` | ||
|
||
#### Data sources | ||
|
||
Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`: | ||
|
||
```yaml | ||
extraConfigs: | ||
datasources-init.yaml: | | ||
databases: | ||
- allow_csv_upload: true | ||
allow_ctas: true | ||
allow_cvas: true | ||
database_name: example-db | ||
extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\ | ||
metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_csv_upload\": []\r\n\ | ||
}" | ||
sqlalchemy_uri: example://example-db.local | ||
tables: [] | ||
``` | ||
|
||
Those will also be mounted as secrets and can include sensitive parameters. | ||
|
||
### Configuration Examples | ||
|
||
#### Setting up OAuth | ||
|
||
```yaml | ||
extraEnv: | ||
AUTH_DOMAIN: example.com | ||
extraSecretEnv: | ||
GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com | ||
GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx | ||
configOverrides: | ||
enable_oauth: | | ||
# This will make sure the redirect_uri is properly computed, even with SSL offloading | ||
ENABLE_PROXY_FIX = True | ||
from flask_appbuilder.security.manager import (AUTH_OAUTH, AUTH_DB) | ||
AUTH_TYPE = AUTH_OAUTH | ||
OAUTH_PROVIDERS = [ | ||
{ | ||
"name": "google", | ||
"icon": "fa-google", | ||
"token_key": "access_token", | ||
"remote_app": { | ||
"client_id": os.getenv("GOOGLE_KEY"), | ||
"client_secret": os.getenv("GOOGLE_SECRET"), | ||
"api_base_url": "https://www.googleapis.com/oauth2/v2/", | ||
"client_kwargs": {"scope": "email profile"}, | ||
"request_token_url": None, | ||
"access_token_url": "https://accounts.google.com/o/oauth2/token", | ||
"authorize_url": "https://accounts.google.com/o/oauth2/auth", | ||
"authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")} | ||
}, | ||
} | ||
] | ||
# Map Authlib roles to superset roles | ||
AUTH_ROLE_ADMIN = 'Admin' | ||
AUTH_ROLE_PUBLIC = 'Public' | ||
# Will allow user self registration, allowing to create Flask users from Authorized User | ||
AUTH_USER_REGISTRATION = True | ||
# The default user self registration role | ||
AUTH_USER_REGISTRATION_ROLE = "Admin" | ||
``` | ||
|
||
#### Enable Alerts and Reports | ||
|
||
For this, as per the [Alerts and Reports doc](/docs/installation/email-reports), you will need to: | ||
|
||
##### Install a supported webdriver in the Celery worker | ||
|
||
This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`: | ||
|
||
```yaml | ||
supersetWorker: | ||
command: | ||
- /bin/sh | ||
- -c | ||
- | | ||
# Install chrome webdriver | ||
# See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx | ||
apt update | ||
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb | ||
apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb | ||
wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip | ||
unzip chromedriver_linux64.zip | ||
chmod +x chromedriver | ||
mv chromedriver /usr/bin | ||
apt autoremove -yqq --purge | ||
apt clean | ||
rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip | ||
# Run | ||
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker | ||
``` | ||
|
||
##### Run the Celery beat | ||
|
||
This pod will trigger the scheduled tasks configured in the alerts and reports UI section: | ||
|
||
```yaml | ||
supersetCeleryBeat: | ||
enabled: true | ||
``` | ||
|
||
##### Configure the appropriate Celery jobs and SMTP/Slack settings | ||
|
||
```yaml | ||
extraEnv: | ||
SMTP_HOST: smtp.gmail.com | ||
SMTP_USER: [email protected] | ||
SMTP_PORT: "587" | ||
SMTP_MAIL_FROM: [email protected] | ||
extraSecretEnv: | ||
SLACK_API_TOKEN: xoxb-xxxx-yyyy | ||
SMTP_PASSWORD: xxxx-yyyy | ||
configOverrides: | ||
feature_flags: | | ||
import ast | ||
FEATURE_FLAGS = { | ||
"ALERT_REPORTS": True | ||
} | ||
SMTP_HOST = os.getenv("SMTP_HOST","localhost") | ||
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) | ||
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) | ||
SMTP_USER = os.getenv("SMTP_USER","superset") | ||
SMTP_PORT = os.getenv("SMTP_PORT",25) | ||
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") | ||
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","[email protected]") | ||
SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None) | ||
celery_conf: | | ||
from celery.schedules import crontab | ||
class CeleryConfig(object): | ||
BROKER_URL = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" | ||
CELERY_IMPORTS = ('superset.sql_lab', ) | ||
CELERY_RESULT_BACKEND = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" | ||
CELERY_ANNOTATIONS = {'tasks.add': {'rate_limit': '10/s'}} | ||
CELERY_IMPORTS = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", ) | ||
CELERY_ANNOTATIONS = { | ||
'sql_lab.get_sql_results': { | ||
'rate_limit': '100/s', | ||
}, | ||
'email_reports.send': { | ||
'rate_limit': '1/s', | ||
'time_limit': 600, | ||
'soft_time_limit': 600, | ||
'ignore_result': True, | ||
}, | ||
} | ||
CELERYBEAT_SCHEDULE = { | ||
'reports.scheduler': { | ||
'task': 'reports.scheduler', | ||
'schedule': crontab(minute='*', hour='*'), | ||
}, | ||
'reports.prune_log': { | ||
'task': 'reports.prune_log', | ||
'schedule': crontab(minute=0, hour=0), | ||
}, | ||
'cache-warmup-hourly': { | ||
'task': 'cache-warmup', | ||
'schedule': crontab(minute='*/30', hour='*'), | ||
'kwargs': { | ||
'strategy_name': 'top_n_dashboards', | ||
'top_n': 10, | ||
'since': '7 days ago', | ||
}, | ||
} | ||
} | ||
CELERY_CONFIG = CeleryConfig | ||
reports: | | ||
EMAIL_PAGE_RENDER_WAIT = 60 | ||
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/" | ||
WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/" | ||
WEBDRIVER_TYPE= "chrome" | ||
WEBDRIVER_OPTION_ARGS = [ | ||
"--force-device-scale-factor=2.0", | ||
"--high-dpi-support=2.0", | ||
"--headless", | ||
"--disable-gpu", | ||
"--disable-dev-shm-usage", | ||
# This is required because our process runs as root (in order to install pip packages) | ||
"--no-sandbox", | ||
"--disable-setuid-sandbox", | ||
"--disable-extensions", | ||
] | ||
``` |
Oops, something went wrong.