This article introduces AWS Elastic Container Service (ECS) from first principles. It explains why ECS exists, what it means for a service to be “managed,” and how core concepts like task definitions, tasks, services, clusters, and capacity fit together to run containers reliably in production.

A Beginners Guide to AWS Elastic Container Service - ECS

Introduction

Running software in production is more than just writing code. It is also about managing the environment where that code runs, ensuring the code runs efficiently, which may require managing multiple copies. It is also about handling failure, and how and the changes that should or will happen over time.

Containers solved part of this problem by packaging an application together with its runtime, libraries, and dependencies. They give us a unit that can be started the same way, everywhere. But containers do not solve the hard problem of managing large numbers of them across machines.

That is exactly the problem AWS Elastic Container Service (ECS) is designed to solve.

ECS is AWS’s managed container orchestration service. It provides the control plan that manages when, where, and how containers are run on AWS infrastructure. This leaves engineers to focus on more pressing tasks instead of manually managing the services. This is why it is called a “managed service”

For us to clearly understand this, we have to take a look at how containers are run.

From containers to container management

Containers are not virtual machines. They do not run by themselves.

At the most basic level, every container is just one or more Linux processes. It runs on a Linux kernel, is isolated using namespaces, and resource-limited using cgroups.

This means that something must provide the machines, the kernel, the CPU, the memory, and the networking. In AWS, those machines are usually EC2 instances.

Before ECS, running containers on AWS typically looked like this:

You launched EC2 instances, installed Docker, and manually decided:

Which containers should run
Where they should run
What happens when they crash
How many copies should exist
How updates should be rolled out without downtimes

As systems grew, this approach became a real pain. Engineers had to manually handle what computers are much better at doing consistently.

A container management service exists to solve this pain efficiently. AWS ECS is AWS’s of solving that problem. This is why we call it a managed service.

What “Managed” Actually Means in ECS

When we say ECS is managed, it doesn’t mean that the container run by themselves. It means:

AWS runs and maintains the control plane
You do not install, patch, or scale the orchestration software.
You interact with ECS entirely through the interfaces provided by AWS.

The control plane is part of the system that:

Accepts requests like “run this container”
Decides where it should run
Monitors whether it is still running
Starts replacements when it stops

In ECS, all of that logic lives inside AWS infrastructure. You never see it, and you never manage it.

What you provide is capacity and intent.

What ECS Actually Does

At a high level, ECS sits between two things:

Your intent, expressed through configuration
AWS compute capacity, where containers actually run

You tell ECS what you want to be running. ECS figures out how to run it.

ECS as Layers of Responsibility

Amazon ECS is organised in 3 layers of responsibility.

Capacity - which is the machine resources that would be needed to run the containers such as CPU and memory.
Control layer - which decides what should be running, and where it should be running.
Provisioning layer - which is how describe our intent and interact with ECS.

We will come to understand these layers later on.

Intent Before Execution

In ECS, you never start containers directly. Instead it is designed around declaring intent. You tell ECS things like:

Which container images to use
How much CPU and memory the containers require
What command should be executed
Which ports should be exposed
What environment variables should exist
What AWS permissions the code should have

ECS takes responsibility for turning that intent into running processes. This distinction - declaring what you want, not how to do it - shapes every ECS concept that follows.

The scheduler

Turning intent into running containers requires a scheduler. A scheduler is the part of the system that decides where work should run. Given a description of required CPU and memory and a pool of coming-soon capacity, it selects a placement and starts execution.

In ECS, this scheduling logic is part of the managed control plane. You never call it directly. Every time you run a task or create a service, you are asking the ECS scheduler to make placement decisions on your behalf.

Task Definitions: Describing an application

A task definition acts like a blueprint for running containers. It defines a group of one or more containers that belong together and are expected to run as a single unit.

In the task definition, you specify things like:

The container image to use
How much CPU and memory the containers require
The command(s) to be executed
The ports to be exposed
The environment variables that are needed
The AWS permissions the code should have.

It is important to note that task definitions do not run containers. They are static, versioned configuration and just a declaration of intent. You can think of it as describing a process layout without actually starting the process.

Tasks: from definition to reality

ECS uses tasks definitions to create tasks. A task is a concrete instance of a task definition. It corresponds to real containers running as real linux processes on a real machine. Running the same task definition a couple of times will give you identical tasks. Each task is independent even though created by the same blueprint(task definition).

Basically, a tasks represents one or more containers that must start together, share networking, and lifecycle. It is not a template, it is a running instance. ECS doesn’t let you create tasks directly from scratch. Each task is created from a task definition.

Tasks are ephemeral by design. They can start, stop, crash, and be replaced. ECS assumes this volatility and builds higher-level concepts on top of it.

Services: Keeping tasks running

If tasks are ephemeral, we need something to enforce stability, and continuity. This is what services are for.

When you create a service, you are telling the ECS control plane:

I want this certain number of tasks running
If one stops, replace it
if I update my task definition, roll out the changes safely.

The service continuously observes the system and compares it to the desired state described in the task definition. When there is a change in the system that doesn’t match the task definition, ECS takes corrective action.

This is how ECS achieves self-healing without you writing any recovery logic. This is how it moves from running containers to maintaining an entire system of the application.

Clusters and Capacity: where tasks are run

Tasks and services still need a place to run or execute. In ECS, this place is called a cluster.

A cluster is a logical boundary that answers the question:

Where is ECS allowed to place containers?

It is a logical grouping of computer capacity. When we say logical here, we mean virtual - it is not a physical grouping or boundary, everything is virtual. ECS doesn’t run code by itself. Instead it aggregates resources and gives the scheduler a place to work within.

Capacity here means resources like CPU, Memory, and storage.

Capacity inside an ECS cluster can come from either EC2 or Fargate.

EC2-Based Capacity

In this model, you create EC2 instances, install Docker and the ECS agents on them, and the agent registers the instance with the cluster.

Once registered, the instance advertises coming-soon CPU, Memory, and supported features. ECS then schedules tasks on these machines.

In this model, we manage the OS, patching, and instance lifecycle while ECS manages scheduling, restart decisions, and state tracking.

AWS-Managed Capacity (Fargate)

In the fargate model, you do not manage or provision any machines. You just specify how much CPU and memory a tasks needs. AWS provides the underlying infrastructure automatically. The containers still run on real machines like EC2, you just never see or manage them.

ECS always schedules onto capacity, you just decide whether you see that capacity or not.

Networking: Giving Tasks an Identity

Once tasks are running, they need to communicate and access resources. Modern ECS networking assigns each task its own network identity. EAch task can have its own

IP address
Security group rules
Direct access to VPC resources.

Behind the scences, this is implemented using using Linux network namespaces connected directly to AWS VPC networking.

IAM: Permissions Without Credentials

Code running inside a container often needs access to AWS services. ECS makes this possible by assigning IAM roles directly to tasks.

When a task starts, temporary credentials are generated. These credentials are made coming-soon to the container and permissions are enforced by AWS. No secrets are baked into images. No static credentials are stored on disk. Identity becomes part of the runtime, not the configuration.

Putting the Pieces Together

By now, the architecture should feel cohesive rather than fragmented.

You describe your application using task definitions.
You turn those descriptions into running processes by creating tasks.
You keep tasks alive and scaled using services.
You give ECS a boundary for scheduling by creating clusters.
You supply or delegate the compute capacity underneath.

ECS continuously reconciles what is running with what should be running. That is the core idea.

A Beginners Guide to AWS Elastic Container Service - ECS

A Beginners Guide to AWS Elastic Container Service - ECS

Introduction

From containers to container management

What “Managed” Actually Means in ECS

What ECS Actually Does

ECS as Layers of Responsibility

Intent Before Execution

The scheduler

Task Definitions: Describing an application

Tasks: from definition to reality

Services: Keeping tasks running

Clusters and Capacity: where tasks are run

EC2-Based Capacity

AWS-Managed Capacity (Fargate)

Networking: Giving Tasks an Identity

IAM: Permissions Without Credentials

Putting the Pieces Together

About David Essien

Related Articles

Choosing AWS Services: A Workload-First Framework for Lambda vs ECS

Understanding Amazon EKS Capabilities: Managed Building Blocks for Kubernetes Platforms

AWS NAT Gateway Complete Guide: Zonal vs Regional + Architecture