Skip to content

Latest commit

 

History

History
387 lines (288 loc) · 16 KB

README.md

File metadata and controls

387 lines (288 loc) · 16 KB

Gentoo AMI Builder

Features

  • Single simple command line tool to create bootable Gentoo AMI images.
  • Uses spot instances by default to save up to 50% on bill. One image build usually costs less than ~20 cents (as of 2020-10-14).
  • Supports any customization and any kernel version (aws ec2 import-image supports only fixed predefined list of kernels).
  • Build time is around ~50 mins for amd64 and ~90 mins for arm64 with default instance types (as of 2020-10-14).
  • Steals kernel config from Amazon Linux so configures all needed kernel modules, including block device drivers to boot instance (NVMe etc) and network drivers to have network after boot (IXGBEVF, ENA etc).
  • Should support all known HVM types of instances (including amd64 and arm64).
  • Minimalistic, only mandatory packages will be installed to get bootable system. System eats just ~50 MB of RAM after boot.
  • Uses minimalistic ec2-init script that can bootstrap hostname, ssh keys and run shell script from EC2 user metadata similar to how cloud-init do that.
  • Nice, not too verbose, progress reporting with advanced verbose error handling.
  • Supports OpenRC and Systemd init systems.
  • Supports profile switching, including upgrade to 17.1 from 17.0 amd64 profiles.
  • Highly customizable (well, it is Gentoo), open source and free :-)
  • Multi-region support.
  • Automatic fresh Amazon Linux 2 image detection.

How it works

The builder replaces Amazon Linux with Gentoo Linux using second volume as temporary buffer (aux disk) in a few phases:

  • Phase 1: Prepare Instance - Spawn instance with Amazon Linux and two volumes
  • Phase 2: Prepare Root - Prepare second volume and install Gentoo stage3 to it
  • Phase 3: Build Root - Make Gentoo on second volume bootable
  • Phase 4: Switch Root - Reconfigure bootloader and reboot from second volume
  • Phase 5: Migrate Root - Clone second volume to first and reboot from first volume
  • Phase 6: Build AMI - Request AMI from first volume

The build process is orchestrated by builder so ensure that network connection is stable, otherwise, the process could crash.

"Build Root" has bottleneck on CPU.

"Migrate Root" has bottleneck on disk IO bandwidth (cloning volume to volume).

"Build AMI" has bottleneck on AWS, not controllable on our side.

Using more powerfull instance type helps to make Phase 3 faster, however, it doesn't have noticeable effect on other phases.

The builder is configured to use default instance types that are well-known to have good build time / cost ratio. You can pick another instance type to speedup the build or to make build process cheaper. Keep in mind, build on instance with less than 2GB of RAM will most-likely fail on kernel compilation phase.

Prerequisites

  • Locally installed and configured aws cli.
  • Linux or macOS with openssh, bash, curl, coreutils
  • AWS account
  • SSH key generated in AWS console or imported into AWS account (Key Pair)
  • AWS security group that allows incoming connections on 22 port
  • AWS user with enabled programmatic access
    • Permissions to build on on-demand instances:
      • ec2:CreateTags
      • ec2:RunInstances
      • ec2:TerminateInstances
      • ec2:DescribeInstances
      • ec2:CreateImage
      • ec2:DeregisterImage
      • ec2:DescribeImages
      • ec2:DeleteSnapshot
      • sts:GetCallerIdentity
    • Additional permissions to build on spot instances:
      • ec2:DescribeSpotInstanceRequests
      • ec2:RequestSpotInstances
    • Additional permission needed the first time you launch a spot instance. You don't need this if you already have the AWSServiceRoleForEC2Spot Service-Linked Role in your account; it's automatically created by the console app the first time you create a spot instance.
      • iam:CreateServiceLinkedRole

Usually the easiest solution is to just temporarily add AWS managed policy "AdministratorAccess" to your user.

Alternatively, this policy can be used to grant AWS user all needed permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags",
        "ec2:RunInstances",
        "ec2:TerminateInstances",
        "ec2:DescribeInstances",
        "ec2:CreateImage",
        "ec2:DeregisterImage",
        "ec2:DescribeImages",
        "ec2:DescribeSpotInstanceRequests",
        "ec2:DeleteSnapshot",
        "ec2:RequestSpotInstances",
        "iam:CreateServiceLinkedRole",
        "sts:GetCallerIdentity"
      ],
      "Resource": "*"
    }
  ]
}

Usage

Usually you just need to configure aws cli and run command below to get working default Gentoo AMI amd64 / OpenRC image:

git checkout https://github.com/sormy/gentoo-ami-builder
cd gentoo-ami-builder
./gentoo-ami-builder.sh --key-pair "Your Key Pair Name"

You will find an AMI in AWS console once the builder will finish the process. The image can be used to start any instance for the same platform.

NOTE: Spot instances are used by default to save on bill.

The most important options:

  • --region - custom AWS region (by default it is us-east-1)
  • --subnet-id - AWS VPC subnet for spawned instance
  • --security-group - custom security group to attach to spawn instance
  • --key-pair - required to access EC2 builder instance over SSH
  • --gentoo-stage3 - pick what stage3 to use, usually, amd64 or arm64
  • --gentoo-image-name - what AMI name prefix to use
  • --user-phase - local script to sideload and execute to bootstrap additional tools into Gentoo AMI image
  • --update-world - setting to no can signtificantly reduce build time at the cost of using stage3 prebuilt packages as it is without attempt to rebuild or update them

In addition, some environment variables affect the underlying subsystems:

  • AWS_PROFILE, used by the AWS CLI commands.

  • SSH_OPTS is passed to ssh. For example, -i myidentityfile.pem -o ServerAliveInterval=30

  • GENKERNEL_OPTS, passed to genkernel

Run gentoo-ami-builder --help to see full list of available options.

Doesn't work? Please file a bug and we will take care of it!

Troubleshooting

Can't connect over SSH during prepare instance phase

Check if default security group "default" has enabled incoming access on 22 port form 0.0.0.0 or your IP address.

Timeout on "Waiting until AMI image will be available"

The time that takes to create image depends on multiple factors, including region, time of the day, day of the week, type of instance, size of volume etc.

Failing on last step doesn't mean that image creation won' be finished at all, most likely it will finish, but a bit later. You can still monitor progress in AWS console.

If you are experience continues failures when default 30 minutes is not enough, then submit an issue on the tracker.

Customization

Custom user phase build script

Use --user-phase option to pass custom script that can do any kind of special configuration, install needed packages, anything that is needed to make a base AMI for your use cases.

Custom provisioning script

Produced image has ec2-init service that automatically do a provisioning for hostname, ssh keys and can also execute custom provisioning shell script provided using EC2 metadata.

Read more here about ec2-init: https://github.com/sormy/ec2-init#module-exec

Read more about EC2 metadata: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-add-user-data.html

Stage3

Here are all available Gentoo stage3 tarballs that are theoretically compatible with EC2 hardware (as of 2022-02-13):

Stage3 Profile Arch Status Last Verified
amd64-desktop-openrc default amd64
amd64-desktop-systemd default amd64
amd64-hardened-nomultilib-openrc default amd64 🆗 v1.1.0 on 2020-10-14
amd64-hardened-nomultilib-selinux-openrc default amd64 🆗 v1.1.0 on 2020-10-14
amd64-hardened-openrc default amd64 🆗 v1.1.0 on 2020-10-14
amd64-hardened-selinux-openrc default amd64 🆗 v1.1.0 on 2020-10-14
amd64-musl default amd64 v1.1.0 on 2020-10-14
amd64-musl-hardened default amd64 v1.1.0 on 2020-10-14
amd64-nomultilib-openrc default amd64 🆗 v1.1.0 on 2020-10-14
amd64-nomultilib-systemd default amd64
amd64-openrc default amd64 🆗 v1.1.7 on 2023-03-03
amd64-systemd default amd64 🆗 v1.1.7 on 2023-03-03
arm64 default arm64
arm64-desktop-openrc default arm64
arm64-desktop-systemd default arm64
arm64-musl default arm64
arm64-musl-hardened default arm64
arm64-openrc default arm64 🆗 v1.1.7 on 2023-03-03
arm64-systemd default arm64 🆗 v1.1.7 on 2023-03-03
i486-openrc default x86
i686-hardened-openrc default x86
i686-musl default x86
i686-openrc default x86
i686-systemd default x86
x32-openrc default amd64 🆗 v1.1.1 on 2020-10-20

Status:

  • 🆗 - it works, verified by maintainers
  • ❌ - it doesn't work, verified by maintainers (PRs are welcome!)
  • ❓ - not verified, could work or not, please submit a PR to update this table if you have tested the stage (PRs for fixes are also welcome!)

Problems:

  • x86 (stable) - needs x86 kernel config generated from amd64 config
  • musl (exp) - kernel compilation fails (dive deep)
  • uclibc (exp) - gettext compilation fails during world update (dive deep)

EC2 Instance Type

The build is tested to be working well on these instance types.

  • amd64 / c6a.2xlarge (network ENA, block NVMe, MBR boot)
  • amd64 / c6in.2xlarge (network ENA, block NVMe, MBR boot)
  • arm64 / t4g.xlarge (default cpu credits, ENA, NVME, EFI)
  • arm64 / c7g.2xlarge (network ENA, block NVME, EFI boot)

Build process on slow instances could fail (due to lack of RAM) or could take a lot of time (due to low CPU performance). For a default build (minimal compilation) all of the 8-CPU instances are about the same, whether amd64 or arm64, and take about an hour. The exception is c6in; it reduces elapsed time by 25% over the c6a or c6i, mostly due to reduced waiting for the AMI image to be available. The t4g.xlarge (4 cores) is about 15 minutes slower than the t4g.2xlarge.

Init System

This builder has been tested to work well with two init systems:

  • OpenRC (default)
  • Systemd

Kernel Config

This script uses kernel config that is used in Amazon Linux instances. This is a reason why bootstrap should be performed using Amazon Linux distribution, to steal kernel config :-)

By the way, there are some additional fixes performed by this script:

  • Some instances, like C4, have network only with IXGBEVF driver. Stock config has different name for driver so without fix it won't be enabled by default.
  • Some instances, like C5, have network only with ENA driver. This driver need to be compiled during installation from sources provided by Amazon.
  • Modern instances, C/M/R5 and above, and T3 and above, have NVMe block devices. The NVMe driver needs to be compiled into kernel to make sure that Gentoo will load it before mounting the root.

NOTE: EFA driver is not available yet. PRs are welcome!

FAQ

Downloading stage3 is slow

Sometimes Gentoo distfile server could work slow, around 200Kb/sec, making whole process much slower. You could terminate AMI builder and restart. New request will be most-likely served from another distfile server and will be fast. Another option is to change distfile server in settings to the one that you trust.

NOTE: Ensure that there are no any not terminated instances running if build process has been terminated.

AMI image creation is slow

AMI image creation could be slow, usually it is up to 10-20 minutes for 20GB volume.

What about PVM instances?

PVM is used on old instance types C1, C3, HS1, M1, M3, M2, and T1 that are not highly available these days and will be all eventually replaced with modern HVM instances.

Read more: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html

PVM is not supported at this time but technically can be implemented. Need to configure a bit differently bootloader and kernel.

Feel free to submit a PR that adds PVM support.

What about x86 support?

x86 architecture is not supported at this time but technically can be implemented.

Please also consider using Gentoo x32 stage3 that has benefits of both amd64 and x86 worlds.

Feel free to submit a PR that adds x86 support.

Why can't we just use aws ec2 import-image?

AWS cli has a command aws ec2 import-image that is designated to import existing disk images, however, there are a few reasons why it is not used in this builder:

  • It is picky to image content. It does STRICT validation of image, including kernel version, so you can easy get something like message below: "ClientError: Unsupported kernel version 5.4.66-gentoo-x86_64"
  • It is picky to image format. Only raw images are generally acceptable without compatibility issues. For vmdk created with qemu-img it produces this error: "ClientError: Disk validation failed [Unsupported VMDK File Format]"
  • It requires to upload image file to s3 before the process can be executed. This also makes process slower and adds additional cost for big images.
  • It is slower because source image is converted from source format to the format used by AWS.

This builder script doesn't have these limitations but the procedure it performs is more complex.

Examples

Success p1

Success p2

Success p3

Failure

Build log examples: amd64 amd64-systemd arm64 arm64-systemd x32

Reporting Issues

Gentoo is rolling release system, AWS is also releasing new instance types periodically, so the builder that worked Yesterday could stop working Today. This application requires periodical maintenance to ensure that it is still working on latest Gentoo and new AWS instance type. Please file a bug if you are experiencing an issue and we will take care of it.

Please use the GitHub issue tracker for any bugs or feature suggestions.

Contributing

Contributions are very welcome!

Please take a look on TODO to see what things could be improved.

Please submit fixes or improvements as GitHub pull requests!

For code changes please consider doing 4 default builds to verify that there are no any regressions: amd64, amd64-systemd, arm64 and arm64-systemd.

Contributions must be licensed under the MIT.

Copyright

gentoo-ami-builder is licensed under the MIT.

A copy of this license is included in the file LICENSE.txt