diff --git a/docs/src/pages/docs/installation/installing_scratch.mdx b/docs/src/pages/docs/installation/installing_scratch.mdx index ba7bc241126ee..39e36ea23283a 100644 --- a/docs/src/pages/docs/installation/installing_scratch.mdx +++ b/docs/src/pages/docs/installation/installing_scratch.mdx @@ -119,14 +119,4 @@ locally by default at `localhost:8088`) and login using the username and passwor ### Installing Superset with Helm in Kubernetes -You can install Superset into Kubernetes with [Helm](https://helm.sh/). The chart is located in -`install/helm`. - -To install Superset in Kubernetes, run: - -``` -helm upgrade --install superset ./install/helm/superset -``` - -Note that the above command will install Superset into `default` namespace of your Kubernetes -cluster. +See the dedicated [Kubernetes installation](/docs/installation/running-on-kubernetes) page. diff --git a/docs/src/pages/docs/installation/kubernetes.mdx b/docs/src/pages/docs/installation/kubernetes.mdx new file mode 100644 index 0000000000000..edd8b1d26b096 --- /dev/null +++ b/docs/src/pages/docs/installation/kubernetes.mdx @@ -0,0 +1,363 @@ +--- +name: Running on Kubernetes +menu: Installation and Configuration +route: /docs/installation/running-on-kubernetes +index: 12 +version: 1 +--- + +## Running on Kubernetes + +Running on Kubernetes is supported with the provided [Helm](helm.sh/) chart included in the Github repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset). + +### Prerequisites + +* A Kubernetes cluster +* Helm installed + +### Running + +1. Configure your setting overrides + +Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on: + +* [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis) +* [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql) + +More info down below on some important overrides you might need. + +1. Install and run + +```sh +# From the root of the repository +helm upgrade --install --values my-values.yaml my-superset helm/superset +``` + +You should see various pods popping up, such as: + +```sh +kubectl get pods +NAME READY STATUS RESTARTS AGE +superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s +superset-f5c9c667-dw9lp 1/1 Running 0 4m7s +superset-f5c9c667-fk8bk 1/1 Running 0 4m11s +superset-init-db-zlm9z 0/1 Completed 0 111s +superset-postgresql-0 1/1 Running 0 6d20h +superset-redis-master-0 1/1 Running 0 6d20h +superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s +superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s +``` + +The exact list will depend on some of your specific configuration overrides but you should generally expect: + +* N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `replicaCount` value) +* 1 `superset-postgresql-0` depending on your postgres settings +* 1 `superset-redis-master-0` depending on your redis settings +* 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides + +1. Access it + +The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either: + +* Configure the Service as a `LoadBalancer` or `NodePort` +* Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...) +* Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost + +Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with: + +* user: `admin` +* password: `admin` + +### Important settings + +#### Security settings + +Default security settings and passwords are included but you __SHOULD__ override those with your own, in particular: + +```yaml +postgresql: + postgresqlPassword: superset +``` + +#### Dependencies + +You can specify pip packages to be installed before startup, e.g. to install extra database drivers: + +```yaml +additionalRequirements: + - psycopg2 + - redis + - elasticsearch-dbapi + - pymssql + - gsheetsdb + # Force verstion to work around https://github.com/betodealmeida/gsheets-db-api/issues/15 + - moz-sql-parser==4.9.21002 + # For OAuth + - Authlib + # For webdriver / reports + - gevent +``` + +__WARNING__: The list will replace the default one from the default `values.yaml` entirely, not _add_ to it... + +#### superset_config.py + +The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.: + +```yaml +configOverrides: + my_override: | + # This will make sure the redirect_uri is properly computed, even with SSL offloading + ENABLE_PROXY_FIX = True + FEATURE_FLAGS = { + "DYNAMIC_PLUGINS": True + } +``` + +Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain. + +The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that. + +Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py` + +#### Environment Variables + +Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`. + +```yaml +extraEnv: + SMTP_HOST: smtp.gmail.com + SMTP_USER: user@gmail.com + SMTP_PORT: "587" + SMTP_MAIL_FROM: user@gmail.com + +extraSecretEnv: + SMTP_PASSWORD: xxxx + +configOverrides: + smtp: | + import ast + SMTP_HOST = os.getenv("SMTP_HOST","localhost") + SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) + SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) + SMTP_USER = os.getenv("SMTP_USER","superset") + SMTP_PORT = os.getenv("SMTP_PORT",25) + SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") +``` + +#### System packages + +If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.: + +```yaml +supersetWorker: + command: + - /bin/sh + - -c + - | + apt update + apt install -y somepackage + apt autoremove -yqq --purge + apt clean + + # Run celery worker + . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker +``` + +#### Data sources + +Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`: + +```yaml +extraConfigs: + datasources-init.yaml: | + databases: + - allow_csv_upload: true + allow_ctas: true + allow_cvas: true + database_name: example-db + extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\ + metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_csv_upload\": []\r\n\ + }" + sqlalchemy_uri: example://example-db.local + tables: [] +``` + +Those will also be mounted as secrets and can include sensitive parameters. + +### Configuration Examples + +#### Setting up OAuth + +```yaml +extraEnv: + AUTH_DOMAIN: example.com + +extraSecretEnv: + GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com + GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx + +configOverrides: + enable_oauth: | + # This will make sure the redirect_uri is properly computed, even with SSL offloading + ENABLE_PROXY_FIX = True + + from flask_appbuilder.security.manager import (AUTH_OAUTH, AUTH_DB) + AUTH_TYPE = AUTH_OAUTH + OAUTH_PROVIDERS = [ + { + "name": "google", + "icon": "fa-google", + "token_key": "access_token", + "remote_app": { + "client_id": os.getenv("GOOGLE_KEY"), + "client_secret": os.getenv("GOOGLE_SECRET"), + "api_base_url": "https://www.googleapis.com/oauth2/v2/", + "client_kwargs": {"scope": "email profile"}, + "request_token_url": None, + "access_token_url": "https://accounts.google.com/o/oauth2/token", + "authorize_url": "https://accounts.google.com/o/oauth2/auth", + "authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")} + }, + } + ] + + # Map Authlib roles to superset roles + AUTH_ROLE_ADMIN = 'Admin' + AUTH_ROLE_PUBLIC = 'Public' + + # Will allow user self registration, allowing to create Flask users from Authorized User + AUTH_USER_REGISTRATION = True + + # The default user self registration role + AUTH_USER_REGISTRATION_ROLE = "Admin" +``` + +#### Enable Alerts and Reports + +For this, as per the [Alerts and Reports doc](/docs/installation/email-reports), you will need to: + +##### Install a supported webdriver in the Celery worker + +This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`: + +```yaml +supersetWorker: + command: + - /bin/sh + - -c + - | + # Install chrome webdriver + # See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx + apt update + wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb + apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb + wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip + unzip chromedriver_linux64.zip + chmod +x chromedriver + mv chromedriver /usr/bin + apt autoremove -yqq --purge + apt clean + rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip + + # Run + . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker +``` + +##### Run the Celery beat + +This pod will trigger the scheduled tasks configured in the alerts and reports UI section: + +```yaml +supersetCeleryBeat: + enabled: true +``` + +##### Configure the appropriate Celery jobs and SMTP/Slack settings + +```yaml +extraEnv: + SMTP_HOST: smtp.gmail.com + SMTP_USER: user@gmail.com + SMTP_PORT: "587" + SMTP_MAIL_FROM: user@gmail.com + +extraSecretEnv: + SLACK_API_TOKEN: xoxb-xxxx-yyyy + SMTP_PASSWORD: xxxx-yyyy + +configOverrides: + feature_flags: | + import ast + + FEATURE_FLAGS = { + "ALERT_REPORTS": True + } + + SMTP_HOST = os.getenv("SMTP_HOST","localhost") + SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) + SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) + SMTP_USER = os.getenv("SMTP_USER","superset") + SMTP_PORT = os.getenv("SMTP_PORT",25) + SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") + SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com") + + SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None) + celery_conf: | + from celery.schedules import crontab + + class CeleryConfig(object): + BROKER_URL = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" + CELERY_IMPORTS = ('superset.sql_lab', ) + CELERY_RESULT_BACKEND = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" + CELERY_ANNOTATIONS = {'tasks.add': {'rate_limit': '10/s'}} + CELERY_IMPORTS = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", ) + CELERY_ANNOTATIONS = { + 'sql_lab.get_sql_results': { + 'rate_limit': '100/s', + }, + 'email_reports.send': { + 'rate_limit': '1/s', + 'time_limit': 600, + 'soft_time_limit': 600, + 'ignore_result': True, + }, + } + CELERYBEAT_SCHEDULE = { + 'reports.scheduler': { + 'task': 'reports.scheduler', + 'schedule': crontab(minute='*', hour='*'), + }, + 'reports.prune_log': { + 'task': 'reports.prune_log', + 'schedule': crontab(minute=0, hour=0), + }, + 'cache-warmup-hourly': { + 'task': 'cache-warmup', + 'schedule': crontab(minute='*/30', hour='*'), + 'kwargs': { + 'strategy_name': 'top_n_dashboards', + 'top_n': 10, + 'since': '7 days ago', + }, + } + } + + CELERY_CONFIG = CeleryConfig + reports: | + EMAIL_PAGE_RENDER_WAIT = 60 + WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/" + WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/" + WEBDRIVER_TYPE= "chrome" + WEBDRIVER_OPTION_ARGS = [ + "--force-device-scale-factor=2.0", + "--high-dpi-support=2.0", + "--headless", + "--disable-gpu", + "--disable-dev-shm-usage", + # This is required because our process runs as root (in order to install pip packages) + "--no-sandbox", + "--disable-setuid-sandbox", + "--disable-extensions", + ] +``` diff --git a/helm/superset/templates/deployment-beat.yaml b/helm/superset/templates/deployment-beat.yaml new file mode 100644 index 0000000000000..dc147d9262699 --- /dev/null +++ b/helm/superset/templates/deployment-beat.yaml @@ -0,0 +1,95 @@ +{{- if .Values.supersetCeleryBeat.enabled -}} +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ template "superset.fullname" . }}-celerybeat + labels: + app: {{ template "superset.name" . }}-celerybeat + chart: {{ template "superset.chart" . }} + release: {{ .Release.Name }} + heritage: {{ .Release.Service }} +spec: + # This must be a singleton + replicas: 1 + selector: + matchLabels: + app: {{ template "superset.name" . }}-celerybeat + release: {{ .Release.Name }} + template: + metadata: + annotations: + checksum/superset_config.py: {{ include "superset-config" . | sha256sum }} + checksum/connections: {{ .Values.supersetNode.connections | toYaml | sha256sum }} + checksum/extraConfigs: {{ .Values.extraConfigs | toYaml | sha256sum }} + checksum/extraSecretEnv: {{ .Values.extraSecretEnv | toYaml | sha256sum }} + checksum/configOverrides: {{ .Values.configOverrides | toYaml | sha256sum }} + {{ if .Values.supersetCeleryBeat.forceReload }} + # Optionally force the thing to reload + force-reload: {{ randAlphaNum 5 | quote }} + {{ end }} + labels: + app: {{ template "superset.name" . }}-celerybeat + release: {{ .Release.Name }} + spec: + securityContext: + runAsUser: 0 # Needed in order to allow pip install to work in bootstrap + {{- if .Values.supersetCeleryBeat.initContainers }} + initContainers: + {{- tpl (toYaml .Values.supersetCeleryBeat.initContainers) . | nindent 6 }} + {{- end }} + containers: + - name: {{ .Chart.Name }} + image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + command: {{ tpl (toJson .Values.supersetCeleryBeat.command) . }} + env: + - name: "SUPERSET_PORT" + value: {{ .Values.service.port | quote}} + {{ if .Values.extraEnv }} + {{- range $key, $value := .Values.extraEnv }} + - name: {{ $key | quote}} + value: {{ $value | quote }} + {{- end }} + {{- end }} + envFrom: + - secretRef: + name: {{ tpl .Values.envFromSecret . | quote }} + volumeMounts: + - name: superset-config + mountPath: {{ .Values.configMountPath | quote }} + readOnly: true + resources: +{{ toYaml .Values.resources | indent 12 }} + {{- with .Values.nodeSelector }} + nodeSelector: +{{ toYaml . | indent 8 }} + {{- end }} + {{- with .Values.affinity }} + affinity: +{{ toYaml . | indent 8 }} + {{- end }} + {{- with .Values.tolerations }} + tolerations: +{{ toYaml . | indent 8 }} + {{- end }} + volumes: + - name: superset-config + secret: + secretName: {{ tpl .Values.configFromSecret . }} +{{- end -}} diff --git a/helm/superset/templates/deployment-worker.yaml b/helm/superset/templates/deployment-worker.yaml index 47a8af651810c..894fbb01d869a 100644 --- a/helm/superset/templates/deployment-worker.yaml +++ b/helm/superset/templates/deployment-worker.yaml @@ -31,11 +31,16 @@ spec: release: {{ .Release.Name }} template: metadata: - {{ if .Values.supersetWorker.forceReload }} annotations: + checksum/superset_config.py: {{ include "superset-config" . | sha256sum }} + checksum/connections: {{ .Values.supersetNode.connections | toYaml | sha256sum }} + checksum/extraConfigs: {{ .Values.extraConfigs | toYaml | sha256sum }} + checksum/extraSecretEnv: {{ .Values.extraSecretEnv | toYaml | sha256sum }} + checksum/configOverrides: {{ .Values.configOverrides | toYaml | sha256sum }} + {{ if .Values.supersetWorker.forceReload }} # Optionally force the thing to reload force-reload: {{ randAlphaNum 5 | quote }} - {{ end }} + {{ end }} labels: app: {{ template "superset.name" . }}-worker release: {{ .Release.Name }} diff --git a/helm/superset/templates/deployment.yaml b/helm/superset/templates/deployment.yaml index b1053341a2a9f..8fda0cd8aa663 100644 --- a/helm/superset/templates/deployment.yaml +++ b/helm/superset/templates/deployment.yaml @@ -38,10 +38,12 @@ spec: checksum/superset_bootstrap.sh: {{ include "superset-bootstrap" . | sha256sum }} checksum/connections: {{ .Values.supersetNode.connections | toYaml | sha256sum }} checksum/extraConfigs: {{ .Values.extraConfigs | toYaml | sha256sum }} + checksum/extraSecretEnv: {{ .Values.extraSecretEnv | toYaml | sha256sum }} + checksum/configOverrides: {{ .Values.configOverrides | toYaml | sha256sum }} {{- if .Values.supersetNode.forceReload }} - # Optionally force the thing to reload unconditionally + # Optionally force the thing to reload force-reload: {{ randAlphaNum 5 | quote }} - {{- end }} + {{- end }} labels: app: {{ template "superset.name" . }} release: {{ .Release.Name }} diff --git a/helm/superset/values.yaml b/helm/superset/values.yaml index 5991658ec0c5b..ab247f2d8abf7 100644 --- a/helm/superset/values.yaml +++ b/helm/superset/values.yaml @@ -168,6 +168,25 @@ supersetWorker: name: '{{ tpl .Values.envFromSecret . }}' command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ] +## +## Superset beat configuration (to trigger scheduled jobs like reports) +supersetCeleryBeat: + # This is only required if you intend to use alerts and reports + enabled: false + command: + - "/bin/sh" + - "-c" + - ". {{ .Values.configMountPath }}/superset_bootstrap.sh; celery beat --app=superset.tasks.celery_app:app --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule" + forceReload: false # If true, forces deployment to reload on each upgrade + initContainers: + - name: wait-for-postgres + image: busybox:latest + imagePullPolicy: IfNotPresent + envFrom: + - secretRef: + name: '{{ tpl .Values.envFromSecret . }}' + command: [ "/bin/sh", "-c", "until nc -zv $DB_HOST $DB_PORT -w1; do echo 'waiting for db'; sleep 1; done" ] + ## ## Init job configuration init: