Beam Playground helps facilitate trying out and adopting Apache Beam by providing a very quick way for prospective Beam users to see and run examples of Apache Beam pipelines in a web interface that requires no setup.
See playground/README.md for details on installing development dependencies.
This section describes what is needed to run the backend application.
- Go commands to run/test the backend locally
- Set up environment variables to run the backend locally
- Running the backend via Docker
Go to the backend directory:
cd backend
To run backend server on development machine without using docker you'll need first to prepare a working directory anywhere outside of Beam source tree:
mkdir ~/path/to/workdir
and then copy datasets/
and configs/
and logging.properties
from playground/backend/
directory:
cp -r {logging.properties,datasets/,configs/} ~/path/to/workdir
In case if you want to start backend for Go SDK you additionally will also need to create a prepared mod dir and export an additional environment variable:
export PREPARED_MOD_DIR=~/path/to/workdir/prepared_folder
SDK_TAG=2.44.0 bash ./containers/go/setup_sdk.sh $PREPARED_MOD_DIR
The following command will build and serve the backend locally:
SERVER_PORT=<port> \
BEAM_SDK=<beam_sdk_type> \
APP_WORK_DIR=<path_to_workdir> \
DATASTORE_EMULATOR_HOST=127.0.0.1:8888 \
DATASTORE_PROJECT_ID=test \
SDK_CONFIG=../sdks-emulator.yaml \
go run ./cmd/server
where <port>
should be the value of port on which you want to have the backend server available; <beam_sdk_type>
is a value of desired Beam SDK, possible values are SDK_UNSPECIFIED
, SDK_JAVA
, SDK_PYTHON
, SDK_GO
, SDK_SCIO
; <path_to_workdir>
should be set to path to your work dir, e.g. ~/path/to/workdir
.
Run the following command to generate a release build file:
go build ./cmd/server/server.go
Playground tests may be run using this command:
go test ./... -v
The full list of commands can be found here.
These environment variables should be set to run the backend locally:
BEAM_SDK
- is the SDK which backend could process (SDK_GO
/SDK_JAVA
/SDK_PYTHON
/SDK_SCIO
/SDK_UNSPECIFIED
)APP_WORK_DIR
- is the directory where all folders will be placed to process each code processing requestPREPARED_MOD_DIR
- is the directory where prepared go.mod and go.sum files are placed. It is used only for Go SDK
There are also environment variables which are needed for the deployment of Apache Beam Playground. These variables have default value and there is no need to set them up to launch locally:
SERVER_IP
- is the IP address of the backend server (default value =localhost
)SERVER_PORT
- is the PORT of the backend server (default value =8080
)CACHE_TYPE
- is a type of the cache service which is used for the backend server. If it is set as aremote
, then the backend server will use Redis to keep all cache values (default value =local
)CACHE_ADDRESS
- is an address of the Redis server. It is used only whenCACHE_TYPE=remote
(default value =localhost:6379
)BEAM_PATH
- it is the place where all required for the Java SDK libs are placed (default value =/opt/apache/beam/jars/*
)KEY_EXPIRATION_TIME
- is the expiration time of the keys in the cache (default value =15 min
)PIPELINE_EXPIRATION_TIMEOUT
- is the expiration time of the code processing (default value =15 min
)PROTOCOL_TYPE
- is the type of the backend server protocol. It could beTCP
orHTTP
(default value =HTTP
)NUM_PARALLEL_JOBS
- is the max number of the code processing requests which could be processed on the backend server at the same time (default value =20
). This value is used to check the readiness of the backend server. If the server reaches the max number of concurrent code-processing requests, then the load-balancer will route all other incoming requests to other instances while the instance will not ready.LAUNCH_SITE
- is the value to configure log (default value = local). If developers want to use log service on the App Engine then need to change this value toapp_engine
.SDK_CONFIG
- is the sdk configuration file path, e.g. default example for corresponding sdk. It will be saved to cloud datastore during application startup (default value =../sdks.yaml
)DATASTORE_EMULATOR_HOST
- is the datastore emulator address. If it is given in the environment, the application will connect to the datastore emulator.PROPERTY_PATH
- is the application properties path (default value =.
)CACHE_REQUEST_TIMEOUT
- is the timeout to request data from cache (default value =5 sec
)
These properties are stored in backend/properties.yaml
file:
playground_salt
- is the salt to generate the hash to avoid whatever problems a collision may cause.max_snippet_size
- is the file content size limit. Since 1 character occupies 1 byte of memory, and 1 MB is approximately equal to 1000000 bytes, then maximum size of the snippet is 1000000.id_length
- is the length of the identifier that is used to store data in the cloud datastore. It's appropriate length to save storage size in the cloud datastore and provide good randomnicity.removing_unused_snippets_cron
- is the cron expression for the scheduled task to remove unused snippets.
To run the server using Docker images there are Docker
files in the containers
folder for Java, Python and Go
languages. Each of them processes the corresponding SDK, so the backend with Go SDK will work with Go
examples/katas/tests only.
One more way to run the server is to run it locally how it is described above.
To call the server from another client – models and client code should be generated using the
playground/api/v1/api.proto
file. More information about generating models and client's code using .proto
files for each language can be found here.
The following diagram represents the execution of beam code at the server:
To clarify which validators and preparators used with the code: