Skip to content

Files

126 lines (102 loc) · 8.13 KB

README.md

File metadata and controls

126 lines (102 loc) · 8.13 KB

gopherbot

This is a rewrite of the original Go Slack Workspace chat bot. This big difference between that version and this, is that gopherbot uses the Slack Events API instead of the antiquated RTM (WebSocket) API.

This unfortunately results in a more complicated queue-based architecture, that results in a more resilient chat bot. As a result of it now being a queue-based system, the bot will not respond to any messages more than 30 seconds ago. So if Slack experiences an issue preventing us from replying, the message will be dropped.

Contributing

Adding Responses / Reactions

Pretty much all responses and reactions should be configured in the cmd/consumer/ directory, with each thing being split out by file. How to configure each should be fairly straightforward based on existing examples, and the usage of the handler package is documented via GoDoc if you have any questions.

Adding Definitions to Glossary

There is also the define command that is powered by the glossary package. If you'd like to add definitions to the glossary, you can do it here and raise a PR against this repo.

The glossary is meant to contain common words and terms relevant to the Go community. It's not Urban Dictionary.

Architecture

Slack API

As mentioned above, the old version used the RTM API for interacting with Slack. This is no longer the recommended API to use for building integrations, with them now suggesting The Slack Events API.

It's a HTTP+JSON based subscription model, with strict requirements on message acknowledgment times on delivery. Based on that, the best way to accept events is to write them to a queue to be processed by workers later so that some slow task doesn't violate the contract or introduce the risk of lossy message processing.

The Events API offers signing of requests, so that you can be confident the request originated from Slack.

Components

Gateway

The job for the gateway is to cryptographically validate the incoming event from Slack, confirm that it contains the metadata we expect, and then forward the message on to the work queue.

This is a pretty simple gateway, although it does use fastjson to avoid reflection to make queue routing logic decisions (based on JSON event type). There is effectively one queue for event type:

  • messages (private vs public)
  • new users joining workspace
  • new users joining a channel

The gateway is stateless and can be scaled horizontally.

Consumer

The consumer registers a handler for each of the queues, and those handlers process each message internally. They themselves may have sub-handlers that get executed, like reacting to messages with emoji versus responding to them.

If you're looking to add commands, reactions, a channel join message, or an update to the workspace join message this is the component that handles those.

The consumer is stateless and can be scaled horizontally.

BGTasks

The bgtasks component is meant to be a place where regular background jobs are ran, such as filling data caches, polling for Gerrit (Go CL) merges, or GoTime shows starting.

This currently has a channel cache poller, so that consumer handlers can look up channels by name without making many Slack API calls.

Things here cannot be safely scaled horizontally, as it could cause double messages or excessive API calls / cache fills. These jobs are kept here so that we can avoid dealing with cluster locking, in addition to our work queue. :)

Redis

More specifically, Heroku Redis. We use Redis Streams to implement the bot's workqueue. It's also where we cache some data for use in the handlers, such as mapping channel names to IDs.

Local Development

Let us get back to you on this one. :)

The most straightforward way is to run it in Heroku yourself, or simulate the environment with these environment variables:

Environment Var Description
PORT The port to bind to, used for the gateway component.
REDIS_URL Expects this in the format the Heroku provides it: redis://u:pass@host:port.
GOPHER_REDIS_INSECURE Set to 1 if Redis is over an insecure connection.
GOPHER_REDIS_SKIPVERIFY Set to 1 if you want Redis client to not verify TLS connection. Heroku Redis's certificate cannot be validated, so tis is required for production. :(
GOPHER_LOG_LEVEL Any level as recognized by github.com/rs/zerolog.
GOPHER_SLACK_APP_ID The App's unique ID. Starts with A.
GOPHER_SLACK_TEAM_ID The installed workspace's unique ID. Starts with T.
GOPHER_SLACK_CLIENT_ID The OAuth Client ID. Currently unused.
GOPHER_SLACK_CLIENT_SECRET The OAuth Client secret. Currently unused.
GOPHER_SLACK_REQUEST_TOKEN This is the static Verification Token in the App's configuration pane, sent with every request.
GOPHER_SLACK_REQUEST_SECRET This is the called the Signing Secret in the App's configuration pane, used to cryptographically validate the request.
GOPHER_SLACK_BOT_ACCESS_TOKEN The Slack API token for the Bot App. Starts with xoxb-.
HEROKU_APP_ID The UUID Heroku has given to the application. This should be set.
HEROKU_APP_NAME The human-readable name of the application. This is used for Redis key generation, and must be set.
HEROKU_DYNO_ID The UUID Heroku gives each Dyno (worker process). This is used for Redis key generation, and must be set.
HEROKU_SLUG_COMMIT The commit of the code running. This is used in logging, and should be set.

Deployment

The bot is currently running under the GoBridge Heroku organization, and merges to master are automatically deployed to the staging version (@glenda**. If a merge to master seems to have deployed okay automatically, you need to go into the Heroku UI and and promote each running app to production.

Please Note: Because Bill had requested our repo be a monorepo, our Heroku deployment configuration is an operational landmine. When clicking the "Promote to Production" button, you need to deselect the unrelated apps so that you don't accidentally promote the wrong build to production. For example, if you're promoting the gateway component you need to make sure not to promote it to the bgtasks or consumer apps. This will break the bot, and require some manual action to fix the production deployment.