# Deploying on AWS (ECS Fargate)

## Introduction

This guide covers deploying the Composable Agentic Platform (CAP) on Amazon Web Services using a fully managed container infrastructure. The CAP Agent runtime is packaged as a Docker container and deployed to AWS ECS Fargate, giving you a scalable, serverless hosting environment without managing EC2 instances.

Deployment is self-service via a provided AWS CloudFormation template that provisions the complete infrastructure stack in a single operation.

{% hint style="info" %}
**AWS Marketplace customers:** If you have not yet subscribed, visit the [TomorrowX listing on AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile?id=13c74961-a005-4868-a927-d66b4e6ff19f) before proceeding. A valid subscription is required for the container image to start correctly.
{% endhint %}

## Architecture Overview

The CloudFormation template provisions a production-ready full-stack environment in your AWS account:

```md
┌──────────────────────────────────────────────────────────────────┐
│ Your AWS Account                                            │
│                                                             │
│  Internet → Application Load Balancer (ALB)                 │
│                 │                                           │
│                 └── Port 80/443  → CAP Agent (web/solution) │
│                                                             │
│  ECS Fargate (CAP Agent container — private IP per task)    │
│       │             ↑                                       │
│       │      CAP Console → Task Private IP:20001 (direct)   │
│       │                                                     │
│       ├── RDS PostgreSQL (application data)                 │
│       ├── EFS (shared file storage / uploads)               │
│       └── CloudWatch Logs (agent logs)                      │
└──────────────────────────────────────────────────────────────────┘
```

**Infrastructure provisioned by the CloudFormation template:**

| Component           | Service                   | Purpose                                                         |
| ------------------- | ------------------------- | --------------------------------------------------------------- |
| Container runtime   | ECS Fargate               | Runs the CAP Agent — no EC2 to manage                           |
| Load balancer       | Application Load Balancer | Routes web traffic (port 80/443) to the agent solution endpoint |
| Database            | RDS PostgreSQL            | Persistent application data storage                             |
| Shared file storage | EFS                       | File uploads, shared across container restarts                  |
| Secrets             | Secrets Manager           | Database credentials (never in plaintext)                       |
| Logs                | CloudWatch                | Agent log streaming and retention                               |
| Container image     | AWS Marketplace ECR       | TomorrowX-managed, subscription-gated                           |

## Prerequisites

Before deploying, you'll need:

* **An active AWS Marketplace subscription** to a TomorrowX CAP platform tier. Verify at `AWS Marketplace → Manage subscriptions`.
* **An AWS account** with permissions to create: CloudFormation stacks, ECS clusters, RDS instances, EFS file systems, ALBs, IAM roles, VPCs, and Secrets Manager entries.
* **AWS CLI** installed and configured, or access to the AWS Management Console.
* A nominated **AWS region** — the stack can be deployed to any region that supports ECS Fargate, RDS, and EFS.

{% hint style="warning" %}
Deploying this stack will create billable AWS resources. Review the CloudFormation template and AWS pricing for ECS Fargate, RDS, EFS, and ALB in your region before deploying.
{% endhint %}

## Deployment

### Step 1: Obtain the CloudFormation Template

The CloudFormation template (`cap-fullstack.yaml`) is provided by your TomorrowX delivery partner or available from the TomorrowX support portal at [tomorrowx.dev](https://tomorrowx.dev/).

### Step 2: Deploy the Stack

**Via AWS Console:**

1. Open **CloudFormation → Stacks → Create stack → With new resources**
2. Upload `cap-fullstack.yaml`
3. Enter a **Stack name** (e.g. `cap-production`). The ECS service will be named `<stack-name>-svc`.
4. Complete the parameter fields:

| Parameter            | Description                                                                                                           | Example                  |
| -------------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------ |
| `VpcId`              | Your VPC ID                                                                                                           | `vpc-xxxxxxxxxxxxxxxxx`  |
| `SubnetIds`          | At least 2 subnets (different AZs)                                                                                    | `subnet-xxx,subnet-yyy`  |
| `WebAccessCidr`      | CIDR range for web access (ports 80/8080). Set to `x.x.x.x/32` for one IP, `0.0.0.0/0` for all, or another CIDR range | `0.0.0.0/0`              |
| `ManagementCidr`     | CIDR range for Console management access (port 20001). Restrict to your Console server IP for production              | `10.0.0.5/32`            |
| `DatabaseAccessCidr` | CIDR range for direct PostgreSQL access (port 5432). Use `127.0.0.1/32` to disable external access                    | `127.0.0.1/32`           |
| `DBPassword`         | RDS database password                                                                                                 | Choose a strong password |
| `DBName`             | PostgreSQL database name                                                                                              | `aispark`                |
| `AgentName`          | CAP Agent identifier (must match Console definition)                                                                  | `Agent`                  |

5. Acknowledge IAM resource creation and deploy (see [AWS Resources Created](#aws-resources-created) below).
6. Wait for the stack status to reach **CREATE\_COMPLETE** (typically 10–15 minutes).

**Via AWS CLI:**

```bash
aws cloudformation deploy \
  --template-file cap-fullstack.yaml \
  --stack-name cap-production \
  --parameter-overrides \
      VpcId=vpc-xxxxxxxxxxxxxxxxx \
      SubnetIds="subnet-xxx,subnet-yyy" \
      WebAccessCidr=0.0.0.0/0 \
      ManagementCidr=10.0.0.5/32 \
      DatabaseAccessCidr=127.0.0.1/32 \
      DBPassword=YourSecurePassword \
      DBName=aispark \
      AgentName=Agent \
  --capabilities CAPABILITY_IAM
```

### Step 3: Retrieve the Stack Outputs

Once the stack is deployed, retrieve the output values — these are the URLs and connection details you'll need:

```bash
aws cloudformation describe-stacks \
  --stack-name cap-production \
  --query 'Stacks[0].Outputs' \
  --output table
```

| Output Key                     | Description                                                      | Example                                                                                              |
| ------------------------------ | ---------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `AgentWebURL`                  | Public-facing solution URL (ALB)                                 | `http://cap-production-alb-xxxxx.eu-west-1.elb.amazonaws.com`                                        |
| `AgentManagementEndpoint`      | ALB DNS name — **not** used for Console registration (see below) | `cap-production-alb-xxxxx.eu-west-1.elb.amazonaws.com`                                               |
| `RDSEndpoint`                  | Database host and port                                           | `cap-db.xxxxx.eu-west-1.rds.amazonaws.com:5432`                                                      |
| `RDSJdbcUrl`                   | Full JDBC connection string                                      | `jdbc:postgresql://host:5432/aispark`                                                                |
| `ECSClusterName`               | ECS cluster name                                                 | `cap-production-cluster`                                                                             |
| `ECSServiceName`               | ECS service name                                                 | `cap-production-svc`                                                                                 |
| `SnapshotBucketName`           | S3 bucket for configuration snapshots                            | `cap-production-snapshots-xxxxxxxx`                                                                  |
| `LogGroupName`                 | CloudWatch log group                                             | `/ecs/cap-production`                                                                                |
| `DeployUserCredentialsArn`     | Secrets Manager ARN for the Console deploy user credentials      | `arn:aws:secretsmanager:eu-west-1:123456789012:secret:cap-production/deploy-user-credentials-AbCdEf` |
| `DeployUserCredentialsCommand` | CLI command to retrieve the deploy user access key and secret    | *(run in AWS CLI to get JSON with `AccessKeyId` and `SecretAccessKey`)*                              |

## First-Time Setup

### AWS Resources Created

The CloudFormation template creates the following IAM resources in your account:

| Resource                   | Type           | Purpose                                                                                                                                                                                               |
| -------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `TaskExecutionRole`        | IAM Role       | Allows ECS Fargate to pull container images from ECR and read database credentials from AWS Secrets Manager                                                                                           |
| `TaskRole`                 | IAM Role       | Grants running containers access to CloudWatch Logs, the stack's S3 snapshot bucket, ECS Exec for debugging, and AWS Marketplace metering                                                             |
| `SnapshotScriptLambdaRole` | IAM Role       | Allows a one-time Lambda function to deploy a helper script to the S3 bucket during stack creation                                                                                                    |
| `DeployUser`               | IAM User       | Scoped credentials for CAP Console's remote agent deployment feature (rolling updates). Policy is restricted to ECS task discovery (scoped to this stack's cluster) and ALB target group operations   |
| `DeployUserAccessKey`      | IAM Access Key | Long-term credentials for the deploy user, stored securely in AWS Secrets Manager (secret name: `<stack-name>/deploy-user-credentials`). Retrieve via the `DeployUserCredentialsCommand` stack output |

{% hint style="info" %}
All IAM resources are scoped to the minimum permissions required for their function. The deploy user credentials are never exposed in task definition environment variables — they are stored exclusively in AWS Secrets Manager.
{% endhint %}

{% hint style="info" %}
This guide assumes you already have a CAP Console running (for example, from the TomorrowX AMI on AWS Marketplace). The CloudFormation stack deploys a **CAP Agent (PDA) runtime only** — your existing Console is used to manage and deploy configurations to it.
{% endhint %}

### Register the Agent in Your CAP Console

The CAP Console connects **directly** to each ECS task on port `20001` using the task's **private IP address**. The ALB is for web (solution) traffic only — routing management commands through the ALB would distribute them randomly across tasks, so only one task would receive each configuration push while the others stay stale.

**Find the running task's private IP:**

```bash
aws ecs describe-tasks \
  --cluster cap-production-cluster \
  --tasks $(aws ecs list-tasks --cluster cap-production-cluster --output text --query 'taskArns[0]') \
  --query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' \
  --output text
```

Then in your CAP Console:

1. Navigate to **Administration → Agent Definitions → Add**
2. Enter the Agent ID matching the `AgentName` parameter (e.g. `Agent`)
3. Set **Host** to the ECS task **private IP** retrieved above (e.g. `10.0.2.47`)
4. Set **Port** to `20001`
5. Save — the agent should immediately show as **Online**

{% hint style="warning" %}
**Network connectivity required:** Your CAP Console must be able to reach the task's private IP on port `20001`. Ensure the ECS task security group allows inbound traffic on port `20001` from your Console's IP or security group. If your Console runs outside the VPC, connectivity via VPN or VPC peering is required.

**Task replacement:** If ECS replaces a task (deployment, restart, or scaling event), the new task gets a new private IP. Update the Console agent definition accordingly, or use [AWS Cloud Map service discovery](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html) to assign stable per-task DNS names.
{% endhint %}

### Verify the Agent Endpoint

The `AgentWebURL` stack output is the **public-facing URL of the solution deployed to the agent** — not a CAP Console login page. Once a configuration is deployed from your Console, this is where end users or downstream systems will reach it.

### Install Extensions

Extensions provide the rule libraries available in The Editor. TomorrowX provides a base extension package:

1. Navigate to **Administration → Extensions → Upload**
2. Upload the `RulesBaseFactory-EXTENSION.zip` (and any other extensions provided with your subscription)
3. Extensions activate automatically — no container restart is required

### Deploy Your First Agent Configuration

1. In The Editor, create a **Repository** and build a **Ruleset**
2. Navigate to **Repositories → Deploy to Agent → Select your agent**
3. The ruleset activates on the running agent within seconds
4. Test via the `AgentWebURL` in your browser

## Configuration Persistence (Gold Master Pattern)

The CAP Agent container is stateless by design — configuration is not baked into the container image. Instead, a **snapshot** of the live agent configuration can be captured to S3 at any time. When the container starts (or restarts), it automatically restores from this snapshot.

**How it works:**

| Container Start                  | Behaviour                                                                |
| -------------------------------- | ------------------------------------------------------------------------ |
| `S3_GOLD_MASTER` env var set     | Downloads configuration snapshot from S3, restores it, then starts Jetty |
| `S3_GOLD_MASTER` env var not set | Starts with a blank configuration (fresh deployment mode)                |

This means:

* Container replacements (deployments, scaling, restarts) automatically restore your configuration
* Multiple container tasks share the same configuration from S3
* Rolling back is as simple as restoring a previous snapshot

**Capturing a snapshot** is handled by your delivery partner using the TomorrowX snapshot tool, or via the `update-stack` command to point the ECS service at a new `S3_GOLD_MASTER` path after capturing.

{% hint style="info" %}
For managed deployments, your TomorrowX delivery partner will handle Gold Master configuration and updates. Contact <help@tomorrowx.dev> for guidance.
{% endhint %}

## Viewing Agent Logs

Logs are streamed to CloudWatch automatically. To view them:

**AWS Console:**

1. Open **CloudWatch → Log groups**
2. Select the log group from the `LogGroupName` stack output
3. Select the latest log stream

**AWS CLI:**

```bash
LOG_GROUP=$(aws cloudformation describe-stacks \
  --stack-name cap-production \
  --query 'Stacks[0].Outputs[?OutputKey==`LogGroupName`].OutputValue' \
  --output text)

aws logs tail "$LOG_GROUP" --follow
```

Logs are also accessible via the CAP Console UI — navigate to **Agent → View Agent Logs** to browse date-stamped log files.

## Scaling

To run multiple agent tasks for high availability or load distribution, update the ECS service desired count:

```bash
aws ecs update-service \
  --cluster cap-production-cluster \
  --service cap-production-svc \
  --desired-count 2
```

All tasks share the same RDS database and EFS storage, and restore from the same S3 Gold Master snapshot on startup. The ALB distributes web traffic across healthy tasks automatically.

{% hint style="warning" %}
**Management and scaling:** Each ECS task has its own private IP. When running multiple tasks, each task must be registered as a separate agent definition in your CAP Console (with its own private IP and a unique Agent ID, e.g. `Agent-1`, `Agent-2`). Alternatively, use [AWS Cloud Map](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html) to assign stable per-task DNS names that update automatically when tasks are replaced.
{% endhint %}

{% hint style="info" %}
For session-stateful configurations, enable ALB sticky sessions (duration-based, 1 hour recommended) so that a client is consistently routed to the same container task.
{% endhint %}

## Updating the Container Image

When a new CAP version is released via AWS Marketplace, update the ECS service to pull the new image:

```bash
# Force a new deployment (pulls latest Marketplace image)
aws ecs update-service \
  --cluster cap-production-cluster \
  --service cap-production-svc \
  --force-new-deployment
```

The ALB performs a rolling update — existing tasks continue serving traffic while new tasks start and pass health checks.

## Troubleshooting

### Agent Shows Offline in Console

* Confirm the agent definition **Host** is set to the task's **current private IP** — not the ALB DNS name
* Get the current private IP: `aws ecs describe-tasks --cluster <cluster> --tasks <task-arn> --query 'tasks[0].attachments[0].details[?name==\`privateIPv4Address\`].value' --output text\`
* Verify the ECS task security group allows inbound traffic on port `20001` from your CAP Console's IP or security group
* Confirm your CAP Console has network connectivity to the task's private IP (same VPC, VPC peering, or VPN)
* View ECS task logs in CloudWatch for startup errors

### Container Tasks Failing Health Checks

```bash
# Check ECS task stopped reason
aws ecs describe-tasks \
  --cluster cap-production-cluster \
  --tasks $(aws ecs list-tasks --cluster cap-production-cluster --query 'taskArns[0]' --output text)
```

Common causes:

* RDS security group not allowing inbound from the ECS task security group
* Incorrect DB credentials (check Secrets Manager entry)
* EFS mount point not accessible (check EFS security group)

### S3 Snapshot Restore Fails on Startup

* Confirm the `S3_GOLD_MASTER` environment variable in the task definition points to an existing S3 object
* Verify the ECS task IAM role has `s3:GetObject` permission on the snapshot bucket
* Check containers logs in CloudWatch for the specific error during restore

### Cannot Access Console Web UI

* Confirm the ALB listener is configured on port `80` (or `443` if HTTPS is configured)
* Check ALB target group health — all targets should show `healthy`
* Verify the ECS task security group allows outbound traffic


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tomorrowx.com/cap/product-reference/installation-and-configuration/docker-configuration-guide.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
