How to Build and Push Production-Ready Container Images to Amazon ECR
Introduction: “It builds” is not the same as “it’s ready”
I remember the first time I pushed a container image to Amazon Elastic Container Registry(ECR). I was so excited, felt like I had achieved one of the greatest feats. It was straightforward: I wrote a Dockerfile, ran docker build, tagged the image, and pushed it. I confirmed the image existed, went into my EC2 instance, pulled it, and ran it.
Well, years later, I realised that things were that straightforward because I did it wrong. The problems didn’t happen in one day or in one project. They appeared later in a several incidents:
- I once found that we had an image in our production environment with a critical vulnerability. I discovered this after reading a newsletter from a DevOps blog.
- A team for a cryptocurrency startup I joined had their secrets baked into the image layers.
- A CI/CD failure that taught me that the image I build locally on my laptop needs to be consistent with the one I build for the production environment.
- We had a service deployed to kubernetes that refused to update because we set our imagePullPolicy to ifNotPresent while using the latest tag.
There is a very big difference between an image that works and an image that is production-ready and Amazon ECR is more than just a place where we store images.
In reality, Amazon ECR is at the center of the software delivery chain. How one builds, and pushes images into ECR determines the security, reliability, and traceability of everything that runs after that.
In this article, I walk you through how I approach ECR as part of a disciplined production workflow instead of just a container storage service.
What Amazon ECR is -and What It’s Not
Before we dive into how to use ECR correctly, let’s take a look about the role it plays
Amazon Elastic Container Registry is:
- A private container image registry provided by AWS
- Integrated with AWS IAM for authentication and authorisation
- Capable of scanning images for known vulnerabilities
- Regionally scoped and tightly coupled to AWS infrastructure
What ECR does not do
- It does not build images for you
- It does not enforce good Dockerfile practices
- It does not prevent you from pushing unsafe or poorly designed images
- It doesn’t replace good CI/CD practices
ECR will faithfully store whatever you give it. This means that it is your responsibility to make sure that your image is safe/ready for production, and not ECR’s.
What “Production-Ready” Actually Means for Container Images
The phrase “production-ready” is often used loosely to mean functional or feature-complete software that works in basic demos. In our context, it means an image satisfies a few non-negotiable standards.
A production-ready image should meet the following standards:
- It should be reproducable: This means that the same inputs should always produce the same image. It requires more than good intentions: pinned base images, deterministic dependency installation, and an awareness that package repositories, timestamps, and network access can quietly undermine reproducibility.
- Minimal : It should contain only what is required to run the application. Every other files from the application should be stripped off. For example, in react apps, get rid of the node_modules, package.json, or lock files, and just export the static files from a build when that’s possible.
- Secure by default: It does not run as root, does not embed secrets, and minimizes attack surface.
- Immutable: Once built and tagged, it is never modified in place.
- Traceable: You can tell when, how, and from which code it was built.
Everything else - performance, scalability, cost - builds on these foundations.
Designing an Image Strategy Before Writing a Dockerfile
A common mistake is starting with the Dockerfile and figuring out strategy later. That almost always leads to cleanup work.
An image strategy in Docker refers to the planned approach for building efficient, secure, and optimized container images before writing a Dockerfile. It covers ways to minimize image size, speed up builds, improve security, and boost performance. This planning ensures the resulting images are lightweight and production-ready.
Before building anything, decide a few things clearly.
-
One Image, One Responsibility: A container image should do one thing well. If an image needs to be configured differently per environment, that configuration should happen at runtime, not build time.
This allows:
- The same image to run in dev, staging, and production
- Clear rollback paths
- Predictable behavior
When the images per environment are different, it becomes meaningless to try and compare them. It also becomes harder to detect failure as to opposed to having the same image such that if something fails in dev, we fix it before it moves on to prod.
-
Tagging is a Promise: Tags aren’t just labels - they’re commitments to what users get.
Use a clear strategy:
- Fixed version tags (like 1.4.2 or git-sha) that never change.
- Optional rolling tags (main, stable) for ease.
Avoid:
- Pushing “latest” to production.
- Retagging rebuilt images.
Changing a deployed tag breaks trust and messes up debugging and makes post-incident analysis unreliable.
Writing a Production-Grade Dockerfile
The Dockerfile is not just a build script. It is a security and operational boundary.
Base Image Selection
Base images are inherited risk - they carry whatever vulnerabilities, bloat, or outdated packages the original creator baked in. Your choice of a base image can affect the overall outcome of your image build process.
Some practical guidelines:
- Prefer official images or well-maintained minimal images
- Understand the trade-offs:
- Alpine is small, but can cause compatibility issues
- Distroless images reduce attack surface, but require maturity
- Pin base image versions explicitly. Instead of
FROM node:alpine(unpinned) useFROM node:20-alpine3.19(pinned). An unpinned base image introduces invisible change, even when your own code has not changed.
Multi-Stage Builds
A container image is built in stages. Each stage produces files, binaries, or artifacts that can be passed forward, but the next stage does not need to inherit everything that came before it.
This distinction is important.
By default, a single-stage Dockerfile accumulates everything: compilers, package managers, temporary files, and runtime dependencies all end up in the final image - even if they were only needed briefly during the build.
A multi-stage Dockerfile separates concerns explicitly. A typical structure looks like this:
- A build stage that includes: compilers, build tools, and dependency managers
- A runtime stage that receives: onnly the compiled binaries or required artifacts and the libraries needed to execute the application
Everything else is left behind.
The runtime image does not need:
- The compiler that produced the binary
- The package manager used to install dependencies
- Temporary files created during the build
As a result, multi-stage builds:
- Reduce image size: Smaller images are faster to pull, faster to deploy, and simpler to reason about during incidents.
- Remove build tools from production: If a shell, compiler, or package manager is not present, it cannot be abused. Entire attack paths disappear simply because the tools are not there.
- Limit the blast radius of vulnerabilities: Vulnerabilities in build-time dependencies cannot be exploited at runtime if those dependencies never make it into the final image.
This separation mirrors a broader principle: production environments should contain only what they need to run, not what was needed to create them.
Multi-stage builds make that principle enforceable, not aspirational.
Running as Non-Root
Containers often run as root by default - not because they need to, but because it is the easiest option.
When an application runs as root inside a container, it has full control inside that container. If something goes wrong, such as a bug in the application, a vulnerable library, or a misconfiguration, that power can be abused. In the worst case, an attacker can use that foothold to access the host machine itself or interfere with other workloads running nearby.
Containers are meant to isolate applications, but that isolation is not absolute. Running as root increases the impact of mistakes when the boundary is crossed.
A production image should:
- Define a non-root user
- Switch to that user explicitly
- Ensure file permissions are correct for runtime access
This does not make containers “secure,” but it removes an entire class of avoidable risk.
When something goes wrong, running as non-root limits how far the failure can spread.
Layer Hygiene
Every instruction in a Dockerfile creates a new image layer. Each layer is kept, cached, and shipped, even if it only existed to support a temporary step during the build.
Good layer hygiene means being intentional about what actually ends up in the final image:
- Combine related commands so temporary steps do not leave permanent traces
- Remove temporary files and caches once they are no longer needed
- Use
.dockerignoreto prevent unnecessary files from ever entering the build
Large images are not just slower to deploy. They carry more unused files, tools, and libraries -each one of those, another thing that can fail, leak information, or be exploited when something goes wrong.
Building Images Correctly: Local vs CI
Local builds are useful for development.
They allow engineers to iterate quickly and test ideas. They should not be the source of truth for production images.
A healthy production workflow looks like this:
- Developers build locally for iteration
- CI systems build images that will be deployed
- CI builds are:
- Ephemeral
- Repeatable
- Logged and auditable
CI, however, is not automatically trustworthy. If build machines change over time, tools are updated without notice, or old dependency caches are reused, the same build can start producing different images without anyone realizing it.
When production images are built on laptops, or on loosely controlled CI runners, you can no longer reliably answer where an image came from or why it behaves the way it does. At that point, traceability is already gone.
Authenticating to Amazon ECR the Right Way
ECR authentication is often misunderstood because it looks like a Docker concern, but it is really an IAM concern.
At its core:
- Docker clients authenticate using temporary credentials
- Those credentials are issued based on IAM permissions
This means access to ECR is meant to be short-lived and role-based, not hard-coded. In practice:
- EC2, ECS, and EKS workloads should use IAM roles
- CI/CD systems should use OIDC-based role assumption
- Long-lived access keys should be avoided entirely
If AWS access keys are stored in CI environment variables, access has already been decoupled from identity and intent - and failures or leaks become much harder to contain.
Tagging and Pushing Images to ECR
Pushing an image to ECR is straightforward. What matters is what that image represents once it is there.
In a production workflow, an image is not just something you push - it is something you will later need to identify, audit, and possibly roll back.
A healthy flow looks like this:
- The image is built in CI, not on a developer’s machine
- The image is tagged with:
- A unique identifier tied to the source code (for example, a Git commit SHA)
- Optionally, a human-friendly version tag
- CI authenticates to ECR using a role
- The image is pushed
- Metadata about the build is recorded
- This discipline exists for one reason:
You must be able to answer, at any time, exactly which image is running in production and where it came from.
If an image tag cannot be traced back to a specific build and a specific commit, it is no longer a reliable production artifact.
Image Scanning and Its Limits
Amazon ECR can scan images for known vulnerabilities. This is useful, but it is important to understand what scanning does - and what it does not do.
Image scanning can:
- Identify known vulnerabilities in operating system packages and libraries
It cannot:
- Understand how your application behaves
- Detect logical or configuration flaws
- Decide whether a vulnerability is exploitable in your environment
- Enforce fixes on its own
For this reason, scanning should be treated as one signal among many, not as proof of safety.
It works best as:
- A warning mechanism for serious issues
- A gate for clearly unacceptable risk
- A supplement to good base image selection and build practices
Relying on scanners alone shifts responsibility away from design decisions - and that is where most real failures begin.
Operating ECR in Real Systems
Once images are built and pushed regularly, operational concerns start to matter.
Lifecycle Policies
Without lifecycle policies, container registries tend to grow quietly and indefinitely. Over time, old images accumulate, storage costs rise without visibility, engineers struggle to identify which images are still relevant during incidents. Lifecycle policies exist to impose order.
They should retain recent images, preserve images that are currently deployed or referenced, and remove artifacts that are no longer used. The goal is not aggressive cleanup. It is being able to reason clearly when something breaks.
Cross-Account Patterns
In more mature environments, images are built in one account, and deployed from other accounts. This separation forces clarity around ownership and trust.
To make this work safely, repository access must be explicit, deployment accounts should not be able to mutate images and trust should be intentional and limited
A common rule is build once, consume read-only. When deployment environments can only pull immutable images - not rebuild or retag them, the integrity of the supply chain is preserved across accounts.
Common Mistakes to Avoid
Over the years, there are some mistakes I have seen appear repeatedly in production systems:
- Deploying the
latesttag to production - Baking secrets into images
- Rebuilding images separately for each environment
- Giving CI unrestricted access to ECR
- Treating image size as an aesthetic concern
These issues often remain invisible during normal operation. They surface during incidents, when clarity matters most.
Let’s do a recap of what we learnt
A production-ready ECR workflow is not complicated, but it is intentional.
It means:
- Writing Dockerfiles that include only what is needed
- Building images in controlled, repeatable CI environments
- Using short-lived, role-based authentication
- Tagging images immutably
- Managing images with the expectation that they will be inspected later
- Deploying the same artifact across environments
This is less about tools and more about discipline.
Conclusion
A container image is not just a file stored in a registry.
It is a promise:
- That what was tested is what is running
- That changes are intentional
- That failures can be understood after the fact
Amazon ECR gives you a place to store images. Whether those images deserve trust depends entirely on how they were built, tagged, and pushed.
Production readiness is rarely about adding more steps. It is about removing ambiguity - and keeping it removed.



