EcsDeploy

Helper script for deployment to Amazon ECS, designed to be compatible with capistrano.

This gem is experimental.

Installation

Add this line to your application's Gemfile:

gem "ecs_deploy", github: "reproio/ecs_deploy"

And then execute:

$ bundle

Configuration

# Capfile
require "ecs_deploy/capistrano"

# deploy.rb
set :ecs_default_cluster, "ecs-cluster-name"
set :ecs_region, %w(ap-northeast-1) # optional, if nil, use environment variable
set :ecs_service_role, "customEcsServiceRole" # default: ecsServiceRole
set :ecs_deploy_wait_timeout, 600 # default: 300
set :ecs_wait_until_services_stable_max_attempts, 40 # optional
set :ecs_wait_until_services_stable_delay, 15 # optional
set :ecs_client_params, { retry_mode: "standard", max_attempts: 10 } # default: {}
set :ecs_docker_buildx_env, { "DOCKER_CONFIG" => "/path/to/docker/config" } # optional. Env vars for the `docker buildx` command used by use_digest. Merged with each task's docker_buildx_env (the per-task one takes precedence). default: {}

set :ecs_tasks, [
  {
    name: "myapp-#{fetch(:rails_env)}",
    use_digest: true, # optional. When true, each container_definitions[].image is rewritten to a digest reference (registry/repo@sha256:...) resolved via `docker buildx imagetools inspect` at registration time. Requires the `docker` CLI with buildx on the deploy host; resolution failure aborts the deploy.
    docker_buildx_env: { "DOCKER_CONFIG" => "/path/to/docker/config" }, # optional. Env vars for the `docker buildx` command. Merged on top of :ecs_docker_buildx_env.
    container_definitions: [
      {
        name: "myapp",
        image: "#{fetch(:docker_registry_host_with_port)}/myapp:#{fetch(:sha1)}",
        cpu: 1024,
        memory: 512,
        port_mappings: [],
        essential: true,
        environment: [
          {name: "RAILS_ENV", value: fetch(:rails_env)},
        ],
        mount_points: [
          {
            source_volume: "sockets_path",
            container_path: "/app/tmp/sockets",
            read_only: false,
          },
        ],
        volumes_from: [],
        log_configuration: {
          log_driver: "fluentd",
          options: {
            "tag" => "docker.#{fetch(:rails_env)}.#{name}.{{.ID}}",
          },
        },
      },
      {
        name: "nginx",
        image: "#{fetch(:docker_registry_host_with_port)}/my-nginx",
        cpu: 256,
        memory: 256,
        links: [],
        port_mappings: [
          {container_port: 443, host_port: 443, protocol: "tcp"},
        ],
        essential: true,
        environment: {},
        mount_points: [],
        volumes_from: [
          {source_container: "myapp-#{fetch(:rails_env)}", read_only: false},
        ],
        log_configuration: {
          log_driver: "fluentd",
          options: {
            "tag" => "docker.#{fetch(:rails_env)}.#{name}.{{.ID}}",
          },
        },
      }
    ],
    volumes: [{name: "sockets_path", host: {}}],
  },
]

set :ecs_scheduled_tasks, [
  {
    cluster: "default", # Defaults to fetch(:ecs_default_cluster)
    rule_name: "schedule_name",
    schedule_expression: "cron(0 12 * * ? *)",
    description: "schedule_description", # Optional
    target_id: "task_name", # Defaults to the task_definition_name
    task_definition_name: "myapp-#{fetch(:rails_env)}",
    task_count: 2, # Default 1
    revision: 12, # Optional
    role_arn: "TaskRoleArn", # Optional
    container_overrides: [ # Optional
      name: "myapp-main",
      command: ["ls"],
    ]
  }
]

set :ecs_services, [
  {
    name: "myapp-#{fetch(:rails_env)}",
    load_balancers: [
      {
        load_balancer_name: "service-elb-name",
        container_port: 443,
        container_name: "nginx",
      },
      {
        target_group_arn: "alb_target_group_arn",
        container_port: 443,
        container_name: "nginx",
      }
    ],
    desired_count: 1,
    deployment_configuration: {maximum_percent: 200, minimum_healthy_percent: 50},
  },
]

Usage

bundle exec cap <stage> ecs:register_task_definition # register ecs_tasks as TaskDefinition
bundle exec cap <stage> ecs:deploy_scheduled_task # register ecs_scheduled_tasks to CloudWatchEvent
bundle exec cap <stage> ecs:deploy # create or update Service by ecs_services info

bundle exec cap <stage> ecs:rollback # deregister current task definition and update Service by previous revision of current task definition

Rollback example

sequence	taskdef	service	desc
1	myapp:12	myapp-service
2	myapp:13	myapp-service
3	myapp:14	myapp-service	current

After rollback

sequence	taskdef	service	desc
1	myapp:12	myapp-service
2	myapp:13	myapp-service
3	myapp:14	myapp-service	deregister
4	myapp:13	myapp-service	current

And rollback again

sequence	taskdef	service	desc
1	myapp:12	myapp-service
2	myapp:13	myapp-service	previous
3	myapp:14	myapp-service	deregister
4	myapp:13	myapp-service	deregister
5	myapp:12	myapp-service	current

And deploy new version

sequence	taskdef	service	desc
1	myapp:12	myapp-service
2	myapp:13	myapp-service
3	myapp:14	myapp-service	deregister
4	myapp:13	myapp-service	deregister
5	myapp:12	myapp-service
6	myapp:15	myapp-service	current

And rollback

sequence	taskdef	service	desc
1	myapp:12	myapp-service
2	myapp:13	myapp-service
3	myapp:14	myapp-service	deregister
4	myapp:13	myapp-service	deregister
5	myapp:12	myapp-service
6	myapp:15	myapp-service	deregister
7	myapp:12	myapp-service	current

Native Blue/Green Deployment (ECS-managed)

ecs_deploy supports ECS-managed blue/green deployments where ECS itself drives the rollout (no CodeDeploy required). Set deployment_controller, deployment_configuration, lifecycle hooks, and load_balancers[].advanced_configuration directly on the service entry. The Capistrano ecs:deploy task forwards the entire hash through to EcsDeploy::Service.new and then to Aws::ECS::Client#create_service / #update_service, so any new SDK field is supported automatically.

When updating an existing service, EcsDeploy::Service forwards every field accepted by Aws::ECS::Client#update_service, including deployment_configuration and load_balancers[].advanced_configuration. Options that ECS treats as create-only (launch_type, scheduling_strategy, role, client_token, deployment_controller) are skipped and, if their value differs from the current service, logged as a warning. Unknown keys are forwarded verbatim and will surface as SDK errors — this is intentional so new SDK fields work without gem updates.

set :ecs_services, [
  {
    name: "myapp-#{fetch(:rails_env)}",
    cluster: "myapp-cluster",
    task_definition_name: "myapp-#{fetch(:rails_env)}",
    launch_type: "FARGATE",
    platform_version: "LATEST",
    desired_count: 1,
    network_configuration: { awsvpc_configuration: { subnets: %w[subnet-...], security_groups: %w[sg-...], assign_public_ip: "DISABLED" } },
    deployment_controller: { type: "ECS" },
    deployment_configuration: {
      strategy: "LINEAR",
      linear_configuration: { step_percent: 50.0, step_bake_time_in_minutes: 60 },
      bake_time_in_minutes: 5,
      deployment_circuit_breaker: { enable: true, rollback: true },
      lifecycle_hooks: [
        {
          hook_target_arn: "arn:aws:lambda:ap-northeast-1:<account-id>:function:my-pause-hook",
          role_arn: "arn:aws:iam::<account-id>:role/ecsLifecycleHookRole",
          lifecycle_stages: ["POST_TEST_TRAFFIC_SHIFT"],
        },
      ],
    },
    load_balancers: [{
      target_group_arn: "arn:aws:elasticloadbalancing:...:targetgroup/blue/...",
      container_name: "app",
      container_port: 8080,
      advanced_configuration: {
        alternate_target_group_arn: "arn:aws:elasticloadbalancing:...:targetgroup/green/...",
        production_listener_rule: "arn:aws:elasticloadbalancing:...:listener-rule/...", # for NLB, pass the Listener ARN directly
        test_listener_rule: "arn:aws:elasticloadbalancing:...:listener-rule/...",
        role_arn: "arn:aws:iam::<account-id>:role/ecsInfrastructureRole",
      },
    }],
    health_check_grace_period_seconds: 300,

    # gem-internal option (not sent to the SDK):
    wait_strategy: :none, # default for ECS-managed deployments is auto-detected as :none
  },
]

option	values	purpose
`wait_strategy:`	`nil` (auto), `:legacy`, `:none`, `:service_deployment`	`nil` auto-detects ECS-managed deployments and skips waiting (multi-day Pause Hooks make blocking impractical). `:legacy` matches pre-1.1 behavior. `:service_deployment` polls `list_service_deployments` (not recommended for sessions).

Operational tasks

bundle exec cap <stage> ecs:describe_deployment              # list in-flight deployments with lifecycle hook details
bundle exec cap <stage> ecs:continue_deployment[hook-id]     # approve a paused lifecycle hook
bundle exec cap <stage> ecs:rollback_deployment[hook-id]     # reject a paused lifecycle hook (ECS rolls back)
bundle exec cap <stage> ecs:stop_deployment[arn]             # stop an in-progress deployment; STOP_TYPE=ABORT or ROLLBACK

Caveats

deployment_controller is immutable on an existing service. To switch an existing CODE_DEPLOY service to ECS, delete and re-create the service.
The PAUSE lifecycle hook stages TEST_TRAFFIC_SHIFT and PRODUCTION_TRAFFIC_SHIFT are rejected by AWS (those stages are also entered during rollback). Use PRE_*/POST_* variants instead.
For NLB, advanced_configuration.production_listener_rule should hold the Listener ARN directly (NLBs do not have Listener Rules).
Pre-1.1 versions filtered the set :ecs_services hash to an allow-list of keys before calling EcsDeploy::Service.new. Starting with 1.1, the entire hash is forwarded. Any custom non-SDK keys previously placed there must be removed or renamed; otherwise the SDK will raise.

Autoscaler

The autoscaler of ecs_deploy supports auto scaling of ECS services and clusters.

Prerequisits

An ECS cluster whose instances belong to either an Auto Scaling group or a Spot Fleet request
You have CloudWatch alarms and you want to scale services when their state changes

How to use autoscaler

First, write a configuration file (YAML format) like below:

# ポーリング時にupscale_triggersに指定した状態のalarmがあればstep分serviceとinstanceを増やす (max_task_countまで)
# ポーリング時にdownscale_triggersに指定した状態のalarmがあればstep分serviceとinstanceを減らす (min_task_countまで)
# max_task_countは段階的にリミットを設けられるようにする
# 一回リミットに到達するとcooldown_for_reach_maxを越えても状態が継続したら再開するようにする

polling_interval: 60

auto_scaling_groups:
  - name: ecs-cluster-nodes
    region: ap-northeast-1
    cluster: ecs-cluster
    # autoscaler will set the capacity to (buffer + desired_tasks * required_capacity).
    # Adjust this value if it takes much time to prepare ECS instances and launch new tasks.
    buffer: 1
    disable_draining: false # cf. spot_instance_intrp_warns_queue_urls
    services:
      - name: repro-api-production
        step: 1
        idle_time: 240
        max_task_count: [10, 25]
        scheduled_min_task_count:
          - {from: "1:45", to: "4:30", count: 8}
        cooldown_time_for_reach_max: 600
        min_task_count: 0
        # Required capacity per task (default: 1)
        # You should specify "binpack" as task placement strategy if the value is less than 1 and you use an auto scaling group.
        required_capacity: 0.5
        upscale_triggers:
          - alarm_name: "ECS [repro-api-production] CPUUtilization"
            state: ALARM
          - alarm_name: "ELB repro-api-a HTTPCode_Backend_5XX"
            state: ALARM
            step: 2
        downscale_triggers:
          - alarm_name: "ECS [repro-api-production] CPUUtilization (low)"
            state: OK

spot_fleet_requests:
  - id: sfr-354de735-2c17-4565-88c9-10ada5b957e5
    region: ap-northeast-1
    cluster: ecs-cluster-for-worker
    buffer: 1
    disable_draining: false # cf. spot_instance_intrp_warns_queue_urls
    services:
      - name: repro-worker-production
        step: 1
        idle_time: 240
        cooldown_time_for_reach_max: 600
        min_task_count: 0
        # Required capacity per task (default: 1)
        # The capacity assumes that WeightedCapacity is equal to the number of vCPUs.
        required_capacity: 2
        upscale_triggers:
          - alarm_name: "ECS [repro-worker-production] CPUUtilization"
            state: ALARM
        downscale_triggers:
          - alarm_name: "ECS [repro-worker-production] CPUUtilization (low)"
            state: OK
          - alarm_name: "Aurora DMLLatency is high"
            state: ALARM
            prioritized_over_upscale_triggers: true

# When you use spot instances, instances that receive interruption warnings should be drained.
# If you set URLs of SQS queues for spot instance interruption warnings to `spot_instance_intrp_warns_queue_urls`,
# autoscaler drains instances to interrupt and detaches the instances from the auto scaling groups with
# should_decrement_desired_capacity false.
# If you set ECS_ENABLE_SPOT_INSTANCE_DRAINING to true, we recommend that you opt out of the draining feature
# by setting disable_draining to true in the configurations of auto scaling groups and spot fleet requests.
# Otherwise, instances don't seem to be drained on rare occasions.
# Even if you opt out of the feature, you still have the advantage of setting `spot_instance_intrp_warns_queue_urls`
# because instances to interrupt are replaced with new instances as soon as possible.
spot_instance_intrp_warns_queue_urls:
  - https://sqs.ap-northeast-1.amazonaws.com/<account-id>/spot-instance-intrp-warns

Then, execute the following command:

ecs_auto_scaler <config yaml>

It is recommended to run the ecs_auto_scaler via a container on ECS.

Signals

Signal	Description
TERM, INT	Shutdown gracefully
CONT	Resume auto scaling
TSTP	Pause auto scaling (Run only container instance draining)

IAM policy for autoscaler

The following permissions are required for the preceding configuration of "repro-api-production" service:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "cloudwatch:DescribeAlarms",
        "ec2:DescribeInstances",
        "ec2:TerminateInstances",
        "ecs:ListTasks"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DescribeServices",
        "ecs:UpdateService"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:service/ecs-cluster/repro-api-production"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DescribeTasks"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:task/ecs-cluster/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DetachInstances",
        "autoscaling:UpdateAutoScalingGroup"
      ],
      "Resource": [
        "arn:aws:autoscaling:ap-northeast-1:<account-id>:autoScalingGroup:<group-id>:autoScalingGroupName/ecs-cluster-nodes"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DescribeContainerInstances"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:container-instance/ecs-cluster/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DeregisterContainerInstance",
        "ecs:ListContainerInstances"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:cluster/ecs-cluster"
      ]
    }
  ]
}

If you use spot instances, additional permissions are required like below:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ecs:UpdateContainerInstancesState",
      "Resource": "arn:aws:ecs:ap-northeast-1:<account-id>:container-instance/ecs-cluster/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "sqs:DeleteMessage",
        "sqs:DeleteMessageBatch",
        "sqs:ReceiveMessage"
      ],
      "Resource": "arn:aws:sqs:ap-northeast-1:<account-id>:spot-instance-intrp-warns"
    }
  ]
}

The following permissions are required for the preceding configuration of "repro-worker-production" service:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sqs:DeleteMessage",
        "sqs:DeleteMessageBatch",
        "sqs:ReceiveMessage"
      ],
      "Resource": "arn:aws:sqs:ap-northeast-1:<account-id>:spot-instance-intrp-warns"
    },
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:DescribeAlarms",
        "ec2:DescribeInstances",
        "ec2:DescribeSpotFleetInstances",
        "ec2:DescribeSpotFleetRequests",
        "ec2:ModifySpotFleetRequest",
        "ec2:TerminateInstances",
        "ecs:ListTasks"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DescribeServices",
        "ecs:UpdateService"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:service/ecs-cluster-for-worker/repro-worker-production"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DescribeTasks"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:task/ecs-cluster-for-worker/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:DescribeContainerInstances",
        "ecs:UpdateContainerInstancesState"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:container-instance/ecs-cluster-for-worker/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecs:ListContainerInstances"
      ],
      "Resource": [
        "arn:aws:ecs:ap-northeast-1:<account-id>:cluster/ecs-cluster-for-worker"
      ]
    }
  ]
}

How to deploy faster with Auto Scaling Group

Add the following configuration and hooks to your config/deploy.rb:

# deploy.rb
set :ecs_instance_fluctuation_manager_configs, [
  {
    region: "ap-northeast-1",
    cluster: "CLUSTER_NAME",
    auto_scaling_group_name: "AUTO_SCALING_GROUP_NAME",
    desired_capacity: 20, # original capacity of auto scaling group
  }
]

This configuration enables tasks ecs:increase_instances_to_max_size and ecs:terminate_redundant_instances. If this configuration is not set, the above tasks do nothing. The task ecs:increase_instances_to_max_size will increase ECS instances. The task ecs:terminate_redundant_instances will decrease ECS instances considering AZ balance.

Hook configuration example:

after "deploy:updating", "ecs:increase_instances_to_max_size"
after "deploy:finished", "ecs:terminate_redundant_instances"
after "deploy:failed", "ecs:terminate_redundant_instances"

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/reproio/ecs_deploy.

ecs_deploy

Development

Runtime

EcsDeploy

Installation

Configuration

Usage

Rollback example

Native Blue/Green Deployment (ECS-managed)

Operational tasks

Caveats

Autoscaler

Prerequisits

How to use autoscaler

Signals

IAM policy for autoscaler

How to deploy faster with Auto Scaling Group

Development

Contributing