The Yari deployer does two things. First, it's used to upload document redirects, pre-built document pages, static files (e.g. JS, CSS, and image files), and sitemap files into an existing AWS S3 bucket. Since we serve MDN document pages from an S3 bucket via a CloudFront CDN, this is the way we upload a new version of the site.
Second, it is used to update and publish changes to existing AWS Lambda functions. For example, we use it to update and publish new versions of a Lambda function that we use to transform incoming document URL's into their corresponding S3 keys.
You can install it globally or in a virtualenv environment. Whichever you prefer.
cd deployer
poetry install
poetry run deployer --help
Please refer to the boto3
documentation
with regards to configuring AWS access credentials.
The poetry run deployer upload DIRECTORY
command uploads files as well as redirects
into an existing S3 bucket. Currently, we have three main S3 buckets that we
upload into: mdn-content-dev
(for variations or experimental versions of
the site), mdn-content-stage
, and mdn-content-prod
.
As input, the upload
command takes a directory which
contains the files that should be uploaded, but it needs to know where to
find any redirects that should be uploaded as well. By default it searches
for redirects within the content directories specified by --content-root
(or CONTENT_ROOT
) and --content-translated-root
(or CONTENT_TRANSLATED_ROOT
).
It does this by searching for _redirects.txt
files within those directories,
converting each line in a _redirects.txt
file into an S3 redirect object.
The files and redirects are uploaded into a sub-folder (a.k.a. prefix
) of the
S3 bucket's root. The prefix (--prefix
option) defaults to main
, which is
most likely what you'll want for uploads to the mdn-content-stage
and
mdn-content-prod
S3 buckets. However, for uploads to the mdn-content-dev
bucket, the prefix is often used to specify a different folder for each
variation of the site that is being reviewed/considered.
When uploading files (not redirects), the deployer is intelligent about
what it uploads. If only uploads files whose content has changed, skipping
the rest. However, since the cache-control
attribute of a file is not
considered part of its content, if you'd like to change the cache-control
from what's in S3, it's important to use the --force-refresh
option to
ensure that all files are uploaded with fresh cache-control
attributes.
Redirects are always uploaded.
export CONTENT_ROOT=/path/to/content/files
export CONTENT_TRANSLATED_ROOT=/path/to/translated-content/files
cd deployer
poetry run deployer upload --bucket mdn-content-dev --prefix pr1234 ../client/build
export CONTENT_ROOT=/path/to/content/files
export CONTENT_TRANSLATED_ROOT=/path/to/translated-content/files
export DEPLOYER_BUCKET_NAME=mdn-content-dev
export DEPLOYER_BUCKET_PREFIX=pr1234
cd deployer
poetry run deployer upload ../client/build
The command:
cd deployer
poetry run deployer update-lambda-functions
will discover every folder that contains a Lambda function, create a deployment package (Zip file) for each one by running:
yarn make-package
and if the deployment package is different from what is already in AWS, it will upload and publish a new version.
You just need a URL (or host name) for an Elasticsearch server and the
root of the build directory. The command will trawl all index.json
files
and extract all metadata and blocks of prose which get their HTML stripped.
The command is:
cd deployer
poetry run deployer search-index --help
If you have built the whole site (or partially) you simply point to it with the first argument:
poetry run deployer search-index ../client/build
But by default, it does not specify the Elasticsearch URL/host. You can either use:
export DEPLOYER_ELASTICSEARCH_URL=http://localhost:9200
poetry run deployer search-index ../client/build
...or...
poetry run deployer search-index ../client/build --url http://localhost:9200
Note! If you don't specify either the environment variable or the --url
option, the script will not fail (ie. exit non-zero).
This is to make it convenient in GitHub Actions to control the
execution purely based on the presence of the
environment variable.
The default behavior is that each day you get a different index name.
E.g. mdn_docs_20210331093714
. And then there's an alias with a more "generic" name.
E.g. mdn_docs
. It's the alias name that Kuma uses to send search queries to.
The way indexing works is that we leave the existing index and its alias in place, then we fill up a new index and once that works, we atomically "move the alias" and delete the old index. To demonstrate, consider this example timeline:
- Yesterday: index
mdn_docs_20210330093714
andmdn_docs --> mdn_docs_20210330093714
- Today:
- create new index
mdn_docs_20210331094500
- populate
mdn_docs_20210331094500
(could take a long time) - atomically re-assign alias
mdn_docs --> mdn_docs_20210331094500
and delete old indexmdn_docs_20210330093714
- delete old index
mdn_docs_20210330
- create new index
Note, this only applies if you don't use --update
.
If you use --update
it will just keep adding to the existing index whose
name is based on today's date.
What this means it that there is zero downtime for the search queries. Nothing needs to be reconfigured on the Kuma side.
The default behavior is that it deletes the index first and immediately creates
it again. You can switch this off by using the --update
option. Then it
will "cake on" the documents. So if something has been deleted since the last
build, you would still have that "stuck" in Elasticsearch.
Deleting and re-creating the index is fast so it's relatively safe to use often. But the indexing can take many seconds and while indexing, Elasticsearch can only search what's been indexed so far.
An interesting pattern would be to use --update
most of the time and
only from time to time omit it for a fresh new start.
But note, if you omit the --update
(i.e. recreating the index), search
will work. It just may find less that it finds when it's fully indexed.
When you've built files you can analyze those built files to produce a Markdown comment that you can post as a PR issue comment. To do that, run:
poetry run deployer analyze-pr-build ../client/build
But the actions are controlled by various options. You can mix and match these:
This will open each built index.json
and look through the .flaws
and try to
convert each flaw into a list.
It will analyze all the content and look for content that could be "dangerous". For example, it will list all external URLs found in the content.
The prefix
refers to a prefix in the Deployer upload. I.e. what you set when
you run poetry run deployer upload --prefix=THIS
.
The prefix
is used to specify the proper Dev subdomain ({prefix}.content.dev.mdn.mozit.cloud
) for the URLs of the built documents. For example,
if --prefix experiment1
is specified, it will list:
## Preview URLs
- <https://experiment1.content.dev.mdn.mozit.cloud/en-US/docs/MDN/Kitchensink>
...assuming the only page that was built was build/en-us/docs/mdn/kitchensink
.
Note that this assumes the PR build has been deployed to the Dev server.
This is useful for debugging when the PR you made wasn't on mdn/content
. For example:
poetry run deployer analyze-pr-build ../client/build --repo peterbe/content ...
By default it will pick up the $GITHUB_TOKEN
environment variable but with this
option you can override it.
This is needed to be able to find the PR (on https://github.com/mdn/content/pulls) to post the comment to.
This is mostly useful for local development or when debugging. It determines whether
to print to stdout
what it would post as a PR issue comment.
This option, just like the --dry-run
is technically part of the deployer
command
and not the analyze-pr-build
sub-command. So put it before the analyze-pr-build
.
This example demonstrates all options.
poetry run deployer --verbose --dry-run analyze-pr-build ../client/build \
--analyze-flaws --analyze-dangerous-content --github-token="xxx" \
--repo=peterbe/content --pr-number=3
An important part of the analyze-pr-builds
command is that it must be easy to
debug and develop further without having to rely on landing code in main
and seeing how it worked.
The first thing you need to do is to download a build
artifact or to simply
run yarn build
and use the ../client/build
directory. To download the artifact
go to a finished "PR Test" workflow,
like https://github.com/mdn/content/pull/3381/checks?check_run_id=2169672013 for
example. Near the upper right-hand corner of the content (near the "Re-run jobs"
button) it says "Artifacts (1)". Download that build.zip
file somewhere and unpack
it. Now you can run:
poetry run deployer --verbose analyze-pr-build ~/Downloads/build ...
You can even go and get a personal access token and set $GITHUB_TOKEN
(assuming it has the right scopes) and have it actually post the comment.
The following environment variables are supported.
DEPLOYER_BUCKET_NAME
is equivalent to using--bucket
(the default ismdn-content-dev
)DEPLOYER_BUCKET_PREFIX
is equivalent to using--prefix
(the default ismain
)DEPLOYER_NO_PROGRESSBAR
is equivalent to using--no-progressbar
(the default istrue
if not run from a terminal or theCI
environment variable istrue
like it is for GitHub Actions, otherwise the default isfalse
)DEPLOYER_CACHE_CONTROL
can be used to specify thecache-control
header for all non-hashed files that are uploaded (the default is3600
or one hour)DEPLOYER_HASHED_CACHE_CONTROL
can be used to specify thecache-control
header for all hashed files (e.g.,main.3c12da89.chunk.js
) that are uploaded (the default is31536000
or one year)DEPLOYER_MAX_WORKERS_PARALLEL_UPLOADS
controls the number of worker threads used when uploading (the default is50
)DEPLOYER_LOG_EACH_SUCCESSFUL_UPLOAD
will print successful upload tasks tostdout
. The default is that this isFalse
.DEPLOYER_ELASTICSEARCH_URL
used by thesearch-index
command.CONTENT_ROOT
is equivalent to using--content-root
(there is no default)CONTENT_TRANSLATED_ROOT
is equivalent to using--content-translated-root
(there is no default)
You need to have poetry
installed on your system.
Now run:
cd deployer
poetry install
That should have installed the CLI:
poetry run deployer
If you wanna make a PR, make sure it's formatted with black
and
passes flake8
.
You can check that all files are flake8
fine by running:
flake8 deployer
And to check that all files are formatted according to black
run:
black --check deployer