Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Idempotent Teardown/Redeploy script for DB assets #10582

Open
JayFlexy opened this issue Jan 21, 2025 · 0 comments
Open

Create Idempotent Teardown/Redeploy script for DB assets #10582

JayFlexy opened this issue Jan 21, 2025 · 0 comments
Labels
4 (dx/ox) Medium-High Priority Devex/Opex Devex

Comments

@JayFlexy
Copy link
Collaborator

Problem Description

1. DynamoDB Replication Dependencies

  • DynamoDB global tables create dependencies between replicas in multiple regions.
  • When attempting to destroy the infrastructure (e.g., using terraform destroy), the teardown process can get stuck due to circular dependencies between the primary and secondary replica tables.
  • Neither replica can be dropped because Terraform doesn't account for the dependency resolution required to break the replication relationship cleanly.
  • DynamoDB deployment relies on having an existing table to function properly with Terraform, creating additional challenges during a full teardown and rebuild.

2. Opensearch Challenges

  • Opensearch can take a significant amount of time to delete and recreate during each migration.
  • Recreating Opensearch instances for every migration incurs a sizable cost and requires extensive system knowledge, further complicating the process.

3. Terraform's Idempotency Challenge

  • Removing the Terraform state can cause issues, as Terraform is unable to recreate resources that already exist.
  • There are known issues with the latest AWS provider for Terraform, specifically:
    • Terraform can mistakenly demote RDS replicas to regional clusters and attempt to reattach new replicas, which isn’t possible in a global cluster setup.
    • RDS clusters cannot be modified to attach additional replicas after the initial creation.

4. RDS-Specific Challenges

  • Deleting RDS instances requires a specific order and multiple steps:
    • Delete protection must be disabled first, which may not be possible directly on the global cluster.
    • A script is needed to handle delete protection and deletion steps properly.
  • Restoring from a snapshot always creates a new instance, which cannot simply be managed with Terraform.
    • Terraform often tries to remove the restored snapshot instance once pulled into the state, requiring multiple steps or manual intervention.

5. Outdated Create/Destroy Script

  • The current create/destroy script is outdated and does not function properly, considering all of the above issues.
  • Rewriting the script from scratch is necessary to account for:
    • Dependencies between resources.
    • Issues with replication and restore processes.
    • Cost and time considerations for migrations.

6. Additional Observations

  • The process is further complicated by the practice of recreating instances during every migration, adding cost and complexity.
  • Using RDS has its own pros and cons, as it solves some issues but introduces others, such as limitations with global clusters and multi-step deletion requirements.
  • While restoring from snapshots is theoretically possible, the required Terraform steps make it impractical without manual intervention.

Developer Notes

  • Achieving a simple tear down and recreate process is theoretically possible but requires addressing many moving pieces.
  • The recommendation is to:
    1. Tackle immediate issues through dependency updates.
    2. Allocate additional time for scripting improvements.
    3. Focus on finishing the PostgreSQL implementation before rewriting the create/destroy script.
    4. Revisit all identified challenges with an updated approach.

Summary

The inability to reliably destroy and recreate the environment with Terraform stems from:

  1. Circular dependencies in DynamoDB replication.
  2. The lengthy and expensive Opensearch recreation process.
  3. RDS limitations with snapshots and global clusters.
  4. Outdated scripts that do not account for these complexities.

These issues require a coordinated solution involving dependency updates, a rewrite of the scripts, and careful planning to address both immediate needs and long-term goals.

@JayFlexy JayFlexy converted this from a draft issue Jan 21, 2025
@JayFlexy JayFlexy added the Devex label Jan 21, 2025
@pixiwyn pixiwyn added the 4 (dx/ox) Medium-High Priority Devex/Opex label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 (dx/ox) Medium-High Priority Devex/Opex Devex
Projects
Status: Devex/Opex
Development

No branches or pull requests

2 participants