A research cluster manager built at ETH Zürich
Systems groups page »
View Demo
·
Report Bug
Serverless computing optimizes cloud resource use for better performance. Yet, current serverless cluster managers are retrofitted from old systems, not built for serverless tasks. We examine Knative-on-K8s, a modern serverless cluster manager. It causes delays and 65%+ of latency for cold start function calls. These issues arise during high sandbox changes, common in production serverless setups. We identify the problem and suggest new design principles to enhance performance by rethinking cluster manager architecture.
The cluster manager in question has been developed within the systems group at ETH Zürich. It has been designed and fine-tuned specifically to the requirements of Function as a Service (FaaS) paradigms.
See the README.md
to get started with the code.
The folder structure is as follow:
cmd
is the list of programms you can startapi
represents the API handlersinternal/master_node
corresponds to the source code of the master nodeinternal/data_plane
corresponds to the source code of the data planeinternal/woker_node
corresponds to the source code of the worker nodepkg
are shared packages that are used inside of internal to perform multiple actionsscripts
are a list of scripts you can use to measures or tests the cluster manager
You can download a copy of all the files in this repository by cloning the git repository:
git clone https://github.com/eth-easl/cluster_manager
To run the cluster manager locally the following setting must be enabled:
sudo sysctl -w net.ipv4.conf.all.route_localnet=1
kubernetes-cni must be installed.
curl -L -o cni-plugins.tgz https://github.com/containernetworking/plugins/releases/download/v0.8.1/cni-plugins-linux-amd64-v0.8.1.tgz
sudo mkdir -p /opt/cni/bin
sudo tar -C /opt/cni/bin -xzf cni-plugins.tgz
If you want to install it on a custom path.
INSTALL_PATH='your/path/here'
curl -L -o cni-plugins.tgz https://github.com/containernetworking/plugins/releases/download/v0.8.1/cni-plugins-linux-amd64-v0.8.1.tgz
sudo mkdir -p /opt/cni/bin
sudo tar -C INSTALL_PATH -xzf cni-plugins.tgz
To launch the code, perform the following actions.
Once the configuration stage is complete, we can start the programs.
First we need to install a copy of a master node on one machine, a dataplane on another machine, a copy of redis on a third machine and finally copy the code base on several other machines for the workers. You can simply call the script remote_install.sh with the ssh address of the computers. Before calling the script you have to make sure you have a github token on the following path which can install ssh keys.
ACCESS_TOKEN="$(cat ~/.git_token_loader)"
./remote_install.sh ip1 ip2 ...
Once this has been done, we can move on to configuring the various programs. The most important field is to set the correct IP for the database and control plane.
Config master node
port: "9090" # Port used for the GRPC server
portRegistration: "9091" # Port for registrating a new service
verbosity: "trace" # Verbosity of the logs
traceOutputFolder: "data" # Output folder for measurements
placementPolicy: "kubernetes" # Placement policy
persistence: true # Store persistence value - if the value is false you can run the cluster without database
reconstruct: false # Reconstruct values on start
profiler:
enable: true # Enable profiler support - it makes the programm a bit slower
mutex: false # Enable mutex support in profiler
redis:
address: "127.0.0.1:6379" # Address of the database
password: "" # Password
db: 0 # Database name
Config dataplane
controlPlaneIp: "localhost" # Ip of the control plane (master node)
controlPlanePort: "9090" # GRPC port used in the control plane
portProxy: "8080" # Port used for requests
portGRPC: "8081" # Port used for the GRPC server
verbosity: "trace" # Verbosity of the logs
traceOutputFolder: "data" # Output folder for measurements
loadBalancingPolicy: "knative" # Load balancing policy
Config worker node
controlPlaneIp: "localhost" # Ip of the control plane (master node)
controlPlanePort: "9090" # GRPC port used in the control plane
port: "10010" # Port used for the GRPC server
verbosity: "trace" # Verbosity of the logs
criPath: "/run/containerd/containerd.sock" # path for CRI
cniConfigPath: "configs/cni.conf" # path for CNI
prefetchImage: true # If enabled, workers will prefetch an image (thus image download will be removed from the measures)
launch db
sudo docker-compose up
launch master node
cd cmd/master_node; go run main.go --config config_cluster.yaml
launch data plane
cd cmd/data_plane; go run main.go --config config_cluster.yaml
launch master node
cd scripts/francois; ./restart_workers.sh ip1 ip2 ip3 ....
This command will fire a single invocation.
cd scripts/francois; ./burst.sh 1
In case you get a timeout, try to run the following command before
# For local readiness probes
sudo sysctl -w net.ipv4.conf.all.route_localnet=1
# For reachability of sandboxes from other cluster nodes
sudo sysctl -w net.ipv4.ip_forward=1
- Install Firecracker
ARCH="$(uname -m)"
release_url="https://github.com/firecracker-microvm/firecracker/releases"
latest=$(basename $(curl -fsSLI -o /dev/null -w %{url_effective} ${release_url}/latest))
curl -L ${release_url}/download/${latest}/firecracker-${latest}-${ARCH}.tgz \
| tar -xz
sudo mv release-${latest}-$(uname -m) /usr/local/bin/firecracker
sudo mv /usr/local/bin/firecracker/firecracker-${latest}-${ARCH} /usr/local/bin/firecracker/firecracker
sudo sh -c "echo 'export PATH=\$PATH:/usr/local/bin/firecracker' >> /etc/profile"
- Install tun-tap
git clone https://github.com/awslabs/tc-redirect-tap.git || true
make -C tc-redirect-tap
sudo cp tc-redirect-tap/tc-redirect-tap /opt/cni/bin
- Install ARP
sudo apt-get update && sudo apt-get install net-tools
- Download Kernel
sudo apt-get update && sudo apt-get install git-lfs
git lfs fetch
git lfs checkout
git lfs pull
- Run control plane and data plane processes. Run worker daemon with
sudo
and by hardcoding environmental variablePATH
to point to the directory where Firecracker is located.
sudo env 'PATH=\$PATH:/usr/local/bin/firecracker' /usr/local/go/bin/go run cmd/worker_node/main.go
sudo iptables -t nat -F
Distributed under the MIT License. See LICENSE.txt
for more information.
Lazar Cvetković - [email protected]
François Costa - [email protected]
Ana Klimovic - [email protected]
First you have to install the protobuf compiler
make install_golang_proto_compiler
Then you can compile the proto types using the following command
make proto
First you have to install the mockgen library
make install_mockgen
Then you can create the files with the following command
make generate_mock_files
sudo go test -v ./...
golangci-lint run --fix
or with verbose
golangci-lint run -v --timeout 5m0s
Nice tutorial that explains how to use it
https://teivah.medium.com/profiling-and-execution-tracing-in-go-a5e646970f5b