Designing our Cloud Platform
At Astradot, we understand that building a robust and scalable cloud infrastructure is crucial for us to move forward quickly and efficiently. That's why we recently made the decision to rebuild our AWS setup to be truly 'enterprise' grade. During our journey, we realized that while many startups have shared engineering blog posts about implementing certain product features, few provide a comprehensive breakdown of their cloud infrastructure. We are excited to share our setup in the hope that it will inspire other companies to do the same and share their experiences and best practices. This blog post marks the first in a series where we'll delve into the details of how we architected various parts of our cloud platform.
Our cloud infrastructure has the following goals:
- Use a Multi AWS Account setup, which is now recommended practice by AWS.
- Make it easy to achieve high security, ensuring that engineers do not resort to insecure methods.
- Enable easy experimentation without affecting production.
- Allow turning on new regions as needed.
- Use 100% Infrastructure as Code, avoiding click-ops entirely.
- Create numerous accounts that mimic production.
- Create per-branch/developer AWS accounts.
We will explain why the above are non-goals. Having a clear understanding of non-goals guided how we organized our Terraform code and how we thought about development practices.
Multi-Account AWS Setup
We use the root 'management' account solely for operating AWS Control Tower to create new AWS accounts. It is also the account where the billing information of all other accounts is consolidated.
The 'identity' account is where we manage users for all other accounts using AWS SSO. This is the only account that can create users. If something needs to access entities in other accounts, it must assume a role in that account. Keeping all users in a central place makes it easy to see who has access to what, and revoking access is straightforward.
The 'security' account enables AWS Security Hub, which aggregates security data from all other AWS accounts into a central location. Data from AWS security services like AWS GuardDuty, AWS Config, and others are all consolidated in the Security Hub of this account. Any investigation or audit related to security only requires access to this single account.
The 'artifact' account contains all build artifacts that any other AWS account can use, including Docker images, custom AMIs, Nodejs packages, and so on. Any CI jobs that need to use EC2 instances to generate artifacts also do so in this account. Having build artifacts in a separate central account ensures that artifacts are environment-independent. Any environment account can choose to deploy any of the artifacts for whatever purpose it requires.
The 'dns' account contains the Route53 setup for the root astradot.com domain. However, each environment account has its own DNS hosted zone with a subdomain specific to that environment.
The 'email' account has AWS SES configured and is used solely by other accounts to send emails.
We have separate accounts for each environment: Production, Staging, and Sandbox. The Production account requires no further explanation. The Staging account is meant for testing new application code or trying out new systems/services/technologies that won't disrupt the existing application code. The Sandbox account, on the other hand, is reserved for testing changes to the bottom layers that could potentially disrupt all running application code. This typically involves major upgrades to the Kubernetes cluster, such as K8s itself, ArgoCD, Karpenter, and so on. By having a separate AWS account to experiment with foundational technologies, engineers can conduct tests without the risk of disrupting the workflow of the rest of the company. Sandbox accounts are created temporarily for the duration of the experiments and can be more than one if needed.
Infrastructure as Code (IAC)
For IAC, we selected Terraform as our preferred tool. We particularly appreciate using HCL (HashiCorp Configuration Language) to describe infrastructure. While other tools use general-purpose programming languages for IAC, we find it more convenient to use a DSL that's been optimized for a specific purpose. We opted to use Terraform Cloud to manage our Terraform setup, with a dedicated workspace for each AWS account.
We'll discuss our experience with Terraform in more detail in our next blog post.