Skip to content

Latest commit

 

History

History
306 lines (248 loc) · 15.4 KB

aws-dms.md

File metadata and controls

306 lines (248 loc) · 15.4 KB
title summary toc docs_area
Migrate with AWS Database Migration Service (DMS)
Learn how to use AWS Database Migration Service (DMS) to migrate data to a CockroachDB target cluster.
true
develop

This page has instructions for setting up AWS DMS to migrate data to CockroachDB from an existing, publicly-hosted database containing application data such as MySQL, Oracle, or PostgreSQL.

For a detailed tutorial about using AWS DMS and information about specific migration tasks, see the AWS DMS documentation site.

We have tested AWS DMS with CockroachDB as a target enough to claim preview-level support. If you encounter problems in CockroachDB, please open an issue with details to help us make progress toward full support.

For any issues related to AWS DMS, aside from its interaction with CockroachDB as a migration target, please reach out to AWS Support.

{{site.data.alerts.callout_info}} Using CockroachDB as a source database within AWS DMS is unsupported. {{site.data.alerts.end}}

Before you begin

Ensure the following items are completed prior to starting this tutorial:

  • Configure a source endpoint in AWS pointing to your source database.
  • Configure a replication instance in AWS.
  • Ensure you have a secure, publicly available CockroachDB cluster running v22.1 GA or later.

As of publishing, AWS DMS supports migrations from these relational databases (for a more accurate view of what's currently supported, see Sources for AWS DMS):

  • Amazon Aurora
  • Amazon DocumentDB (with MongoDB compatibility)
  • Amazon S3
  • IBM Db2 (LUW edition only)
  • MariaDB
  • Microsoft Azure SQL
  • Microsoft SQL Server
  • MongoDB
  • MySQL
  • Oracle
  • PostgreSQL
  • SAP ASE

Step 1. Create a target endpoint pointing to CockroachDB

  1. In the AWS Console, open AWS DMS.
  2. Open Endpoints in the sidebar. A list of endpoints will display, if any exist.
  3. In the top-right portion of the window, select Create endpoint. AWS-DMS-Create-Endpoint
  4. In the Endpoint type section, select Target endpoint.
  5. Supply an Endpoint identifier to identify the new target endpoint.
  6. For the Target engine dropdown, select PostgreSQL.
  7. For the Access to endpoint database radio button, select the Provide access information manually.
  8. Enter the Server name, Port, User name, Password, and Database name of your CockroachDB cluster. {{site.data.alerts.callout_info}} To connect to a {{ site.data.products.serverless }} cluster, use {routing-id}.{database} for the Database name. For more information, see Connect to a {{ site.data.products.serverless }} Cluster. {{site.data.alerts.end}} AWS-DMS-Endpoint-Configuration
  9. You can test the connection if needed under Test endpoint connection (optional).
  10. Create the endpoint by selecting Create endpoint. AWS-DMS-Test-Endpoint

Step 2. Create a database migration task

A database migration task, also known as a replication task, controls what data are moved from the source database to the target database.

Step 2.1. Task configuration

  1. While in AWS DMS, select Database migration tasks in the sidebar. A list of database migration tasks will display, if any exist.
  2. In the top-right portion of the window, select Create task. AWS-DMS-Create-DB-Migration-Task
  3. Supply a Task identifier to identify the replication task.
  4. Select the Replication instance and Source database endpoint you created prior to starting this tutorial.
  5. For the Target database endpoint dropdown, select the CockroachDB endpoint created in the previous section.
  6. Select the appropriate Migration type based on your needs. AWS-DMS-Task-Configuration

Step 2.2. Task settings

  1. For the Editing mode radio button, keep Wizard selected.
  2. For the Target table preparation mode, select either Truncate or Do nothing. AWS-DMS-Task-Settings
  3. Manually create all schema objects in the target CockroachDB cluster. This step is required in order for the migration to populate data successfully. {{site.data.alerts.callout_info}} All tables must have a primary key associated with them. For more information, see Primary Key Constraint. {{site.data.alerts.end}}

{{site.data.alerts.callout_info}} Drop tables on target is unsupported at this time. {{site.data.alerts.end}}

{{site.data.alerts.callout_success}} When specifying a range of tables, you must ensure the following before data migration can successfully occur:

  • The column names within each table being migrated from the source database to CockroachDB are identical.
  • The column types for the columns within each table being migrated from the source database to CockroachDB are compatible. {{site.data.alerts.end}}

{% comment %}In order to use Drop tables on target, you must create a new dms user and ensure BatchApplyEnabled is set to False. {% endcomment %}

Step 2.3. Table mappings

  1. For the Editing mode radio button, keep Wizard selected.
  2. Select Add new selection rule.
  3. In the Schema drop down, select Enter a schema.
  4. Supply the appropriate Source name (schema name), Table name, and Action. AWS-DMS-Table-Mappings

{{site.data.alerts.callout_info}} Use % as an example of a wildcard for all schemas in a PostgreSQL database. However, in MySQL, using % as a schema name imports all the databases, including the metadata/system ones, as MySQL treats schemas and databases as the same. {{site.data.alerts.end}}

Step 3. Verify the migration

Data should now be moving from source to target. You can analyze the Table Statistics page for information about replication.

  1. In AWS DMS, open Database migration tasks in the sidebar.
  2. Select the task you created in Step 2.
  3. Select Table statistics below the Summary section.

If your migration failed for some reason, you can check the checkbox next to the table(s) you wish to re-migrate and select Reload table data.

AWS-DMS-Reload-Table-Data

Optional configurations

{% comment %}

Create a new dms user

Creating a dms user is necessary when using Drop tables on target as a target table preparation mode.

  1. Connect to your {{ site.data.products.serverless }} cluster.

  2. In the SQL client, create a new dms admin user to handle the migration. Replace '<password>' with a strong password:

    {% include_cached copy-clipboard.html %}

    > CREATE USER dms WITH PASSWORD '<password>';
    > GRANT admin TO dms;
    > ALTER USER dms SET expect_and_ignore_not_visible_columns_in_copy = true;

    {{site.data.alerts.callout_danger}} Do not use this user for normal SQL activity. The expect_and_ignore_not_visible_columns_in_copy session variable may make it behave unpredictably for normal usage. {{site.data.alerts.end}} {% endcomment %}

AWS PrivateLink

If using {{ site.data.products.dedicated }}, you can enable AWS PrivateLink to securely connect your AWS application with your {{ site.data.products.dedicated }} cluster using a private endpoint. To configure AWS PrivateLink with {{ site.data.products.dedicated }}, see Network Authorization.

CloudWatch logs

You can Enable CloudWatch logs for extra insight about the replication. To enable CloudWatch logs:

  1. Edit the existing replication task.
  2. Under Task settings, select Enable CloudWatch logs. From here, you can specify logging levels for each event type: AWS-DMS-CloudWatch-Logs

BatchApplyEnabled

The BatchApplyEnabled setting can improve replication performance and is recommended for larger workloads. {% comment %}If you enable this setting, then you must set your target table preparation mode to Truncate or Do nothing.{% endcomment %}

  1. Open the existing database migration task.
  2. Choose your task, and then choose Modify.
  3. From the Task settings section, switch the Editing mode from Wizard to JSON editor. Locate the BatchApplyEnabled setting and change its value to true. Information about the BatchApplyEnabled setting can be found here.

AWS-DMS-BatchApplyEnabled

{% comment %}

{{site.data.alerts.callout_info}} BatchApplyEnabled does not work when using Drop tables on target as a target table preparation mode. Thus, all schema-related changes must be manually copied over if using BatchApplyEnabled. {{site.data.alerts.end}} {% endcomment %}

Known limitations

  • When using Truncate or Do nothing as a target table preparation mode, you cannot include tables with any hidden columns. You can verify which tables contain hidden columns by executing the following SQL query:

{% include_cached copy-clipboard.html %}

> SELECT table_catalog, table_schema, table_name, column_name FROM information_schema.columns WHERE is_hidden = 'YES';
  • Drop tables on target is currently not supported and will error on import.

{% comment %}

  • Not all schema objects are migrated when using Drop tables on target as a target table preparation mode. Thus, it is not supported at this time. A list of supported DDL statements within AWS DMS are mentioned here. Any statements outside of that (creating or modifying foreign keys, secondary indexes, constraints, etc.) will have to be run manually in the target database. {% endcomment %}

Troubleshooting common issues

Run the following query from within the target CockroachDB cluster to identify common problems with any tables that may be migrated:

{% include_cached copy-clipboard.html %}

> WITH
    invalid_columns
        AS (
            SELECT
                'Table '
                || table_schema
                || '.'
                || table_name
                || ' has column '
                || column_name
                || ' which is hidden. Either drop the column or mark it as not hidden for DMS to work.'
                    AS fix_me
            FROM
                information_schema.columns
            WHERE
                is_hidden = 'YES'
                AND table_name NOT LIKE 'awsdms_%'
        ),
    invalid_version
        AS (
            SELECT
                'This cluster is on a version of CockroachDB which does not support AWS DMS. CockroachDB v21.2.13+ or v22.1+ is required.'
                    AS fix_me
            WHERE
                split_part(
                    substr(
                        substring(
                            version(),
                            e'v\\d+\\.\\d+.\\d+'
                        ),
                        2
                    ),
                    '.',
                    1
                )::INT8
                < 22
                AND NOT
                        (
                            split_part(
                                substr(
                                    substring(
                                        version(),
                                        e'v\\d+\\.\\d+.\\d+'
                                    ),
                                    2
                                ),
                                '.',
                                1
                            )::INT8
                            = 21
                            AND split_part(
                                    substr(
                                        substring(
                                            version(),
                                            e'v\\d+\\.\\d+.\\d+'
                                        ),
                                        2
                                    ),
                                    '.',
                                    2
                                )::INT8
                                = 2
                            AND split_part(
                                    substr(
                                        substring(
                                            version(),
                                            e'v\\d+\\.\\d+.\\d+'
                                        ),
                                        2
                                    ),
                                    '.',
                                    3
                                )::INT8
                                >= 13
                        )
        ),
    has_no_pk
        AS (
            SELECT
                'Table '
                || a.table_schema
                || '.'
                || a.table_name
                || ' has column '
                || a.column_name
                || ' has no explicit PRIMARY KEY. Ensure you are not using target mode "Drop tables on target" and that this table has a PRIMARY KEY.'
                    AS fix_me
            FROM
                information_schema.key_column_usage AS a
                JOIN information_schema.columns AS b ON
                        a.table_schema = b.table_schema
                        AND a.table_name = b.table_name
                        AND a.column_name = b.column_name
            WHERE
                b.is_hidden = 'YES'
                AND a.column_name = 'rowid'
                AND a.table_name NOT LIKE 'awsdms_%'
        )
SELECT fix_me FROM has_no_pk
UNION ALL SELECT fix_me FROM invalid_columns
UNION ALL SELECT fix_me FROM invalid_version;

See Also