Cloud Native Blog - Container Solutions

Lean Go Containers with Multi-Stage Dockerfiles

Written by Cyle Riggs | Jul 18, 2017 12:02:15 PM

On June 28 Docker 17.06 CE was released, which among other improvements adds support for multi-stage image builds. While traditional docker builds had to use a single container for their work and output, multi-stage builds allow the use of intermediate containers to generate artifacts. Artifacts from intermediate containers are then copied into the final build image, meaning one needn’t ship the intermediate tools in the final image. While the community has found ways to perform multi-stage builds in prior versions of Docker, this is the first time that multi-stage builds can be accomplished in a single Dockerfile. By placing all of the build logic in a single Dockerfile we can use build tools without fear of bloating the output image, and make strong integrations in build pipelines that accept Dockerfiles, even for complex builds.

These improvements have a particularly large impact on Go projects. Go can use static compilation to generate a self-contained executable, so many projects can now easily build containers holding just a single binary without resorting to hacks or breaking your build pipeline.

Let’s take a look at how this new feature impacts the containerization of Go’s introductory hello world app:

 
package main

import "fmt"

func main() {
	fmt.Println("Hello world")
}
	
	

Containerizing this simple program with traditional docker techniques results in a staggering 700MB image:

  
cd ~/Documents/GitHub/lean-go-containers/hello-world
cat Dockerfile.legacy
(out)FROM golang:1.8.3
(out)WORKDIR /go/src/hello-world
(out)COPY main.go /go/src/hello-world
(out)RUN go install
(out)CMD ["hello-world"]
docker build -q -t hello-world-legacy -f Dockerfile.legacy .
(out)sha256:95052c72e4afda066e3bf2c05c854c5ac30e2a8cfc2e6a14772483a7a1b86976
docker run hello-world-legacy
(out)Hello world
docker image ls hello-world-legacy
(out)REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
(out)hello-world-legacy   latest              95052c72e4af        22 seconds ago      700MB

The application works, but why is the image so large? Upon inspecting the image it becomes obvious that most of that 700MB is extraneous data, as the generated binary is a standalone, statically linked executable:

  
docker run hello-world-legacy ldd /go/bin/hello-world
(out)	not a dynamic executable
docker run hello-world-legacy ls -lh /go/bin/hello-world
(out)-rwxr-xr-x 1 root root 1.5M Jul 12 13:20 /go/bin/hello-world

If you don’t see the “not a dynamic executable” message when inspecting your go binaries with ldd, don’t worry, we’ll cover that case later.

While “not a dynamic executable” may look like an error message, this is exactly what we were hoping to see. This message indicates that the binary packages all of its runtime code in a self-contained manner, and is portable. Static binaries are easily moved between systems, and in this case, between containers. Multi-stage Dockerfiles allow us to easily move that static binary into an empty image and execute it from there, leaving behind the 698.5 MB of build time dependencies.

There are hacks and other techniques that have been developed over time to work around this problem of large images, but most result in something that isn’t as easy to grok or maintain, or doesn’t play well with systems that expect a standard, monolithic Dockerfile. Let’s see how the new multi-stage build feature handles this problem:

  
cat Dockerfile.multi-stage
(out)FROM golang:1.8.3 as builder
(out)WORKDIR /go/src/hello-world
(out)COPY main.go ./
(out)RUN go install
(out)RUN ldd /go/bin/hello-world | grep -q "not a dynamic executable"
(out)
(out)FROM scratch
(out)COPY --from=builder /go/bin/hello-world /hello-world
(out)CMD ["/hello-world"]
docker build -q -t hello-world-multi-stage -f Dockerfile.multi-stage .
(out)sha256:657aa29b8b48f1680618ad7b882a4a40151c362e68a54c822d31acef821b6a83
docker run hello-world-multi-stage
(out)Hello world
docker image ls hello-world-multi-stage
(out)REPOSITORY                TAG                 IMAGE ID            CREATED             SIZE
(out)hello-world-multi-stage   latest              657aa29b8b48        22 seconds ago      1.55MB
 

By using a multi-stage build, the output image size was reduced by a factor of 450:1, but how does the Dockerfile work? The first block, starting with “FROM golang” creates an intermediary image with a nickname of builder, which contains our static binary. Note that I’ve automated the binary inspection using ldd and grep to ensure the compiled image is statically linked, as that is critical to the next step. If ldd does not report a static binary, the build will fail.

The second block, beginning with “FROM scratch” starts a new empty image, into which our build artifact is copied from the builder image. The --from=builder argument on the COPY command instructs docker to take files from our intermediate image.

Here’s a slightly more complex example, which downloads and prints RFC 2795:


package main

import (
	"io"
	"log"
	"net/http"
	"os"
)

func main() {
	resp, err := http.Get("https://www.ietf.org/rfc/rfc2795.txt")
	if err != nil {
		log.Fatal(err)
	} else {
		defer resp.Body.Close()
		io.Copy(os.Stdout, resp.Body)
	}
}
	
	

Let’s try to containerize this simple HTTPS application and inspect the result:

  
cd ~/Documents/GitHub/lean-go-containers/https-get
cat Dockerfile.legacy
(out)FROM golang:1.8.3
(out)WORKDIR /go/src/https-get
(out)COPY main.go ./
(out)RUN go install
(out)CMD ["https-get"]
docker build -q -t https-get-legacy -f Dockerfile.legacy .
(out)sha256:23af1003e0a59c17bf91d5500b5ac0f6b99c89d18f2436079890422ab91c23f4
docker run https-get-legacy | head
(out) 
(out) 
(out) 
(out) 
(out) 
(out) 
(out)Network Working Group                                     S. Christey
(out)Request for Comments: 2795                         MonkeySeeDoo, Inc.
(out)Category: Informational                                  1 April 2000
(out) 
(out)write /dev/stdout: broken pipe

Note that the “broken pipe” message is expected, and is a result of piping the output to head.

Using this Dockerfile the image builds and the app downloads RFC 2795 as expected, but can we apply the same strategy as before to create a lean container? Let’s inspect the resulting binary and see:

  
docker run https-get-legacy ldd /go/bin/https-get
(out)	linux-vdso.so.1 (0x00007ffe05d8a000)
(out)	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4ec4f8c000)
(out)	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4ec4be1000)
(out)	/lib64/ld-linux-x86-64.so.2 (0x000056508be4a000)
docker run https-get-legacy ls -lh /go/bin/https-get
(out)-rwxr-xr-x 1 root root 5.3M Jul 12 13:53 /go/bin/https-get

The output from ldd looks quite different this time. Instead of giving us a nice “not a dynamic executable” message we see various .so files required at runtime, which indicates that this binary isn’t self-contained. We can also see that adding the automated ldd test now breaks the build:

  
cat Dockerfile.legacy-broken-ldd-test
(out)FROM golang:1.8.3
(out)WORKDIR /go/src/https-get
(out)COPY main.go ./
(out)RUN go install
(out)RUN ldd /go/bin/https-get | grep -q "not a dynamic executable"
(out)CMD ["https-get"]
docker build -f Dockerfile.legacy-broken-ldd-test .
(out)Sending build context to Docker daemon  6.144kB
(out)Step 1/6 : FROM golang:1.8.3
(out) ---> 6d9bf2aec386
(out)Step 2/6 : WORKDIR /go/src/https-get
(out) ---> Using cache
(out) ---> 5e3efedc7b15
(out)Step 3/6 : COPY main.go ./
(out) ---> Using cache
(out) ---> 4d6ebcd06e4f
(out)Step 4/6 : RUN go install
(out) ---> Using cache
(out) ---> 4826fee44cfb
(out)Step 5/6 : RUN ldd /go/bin/https-get | grep -q "not a dynamic executable"
(out) ---> Running in e06948888e93
(out)The command '/bin/sh -c ldd /go/bin/https-get | grep -q "not a dynamic executable"' returned a non-zero code: 1
	
	

This happens because Go uses C libraries by default to perform DNS resolution, and uses dynamic linking and something called cgo to call into those C libraries. The way to fix this is to change the install command, forcing static linking:

  
cat Dockerfile.legacy-static
(out)FROM golang:1.8.3
(out)WORKDIR /go/src/https-get
(out)COPY main.go ./
(out)RUN CGO_ENABLED=0 go install -a -tags netgo -ldflags '-extldflags "-static"'
(out)RUN ldd /go/bin/https-get | grep -q "not a dynamic executable"
(out)CMD ["https-get"]
docker build -q -t https-get-legacy-static -f Dockerfile.legacy-static .
(out)sha256:20d7d491b320495ac586cb72eee7db40266bc6bc1cfea368b1021464ce6f991e
docker run https-get-legacy-static | head
(out) 
(out) 
(out) 
(out) 
(out) 
(out) 
(out)Network Working Group                                     S. Christey
(out)Request for Comments: 2795                         MonkeySeeDoo, Inc.
(out)Category: Informational                                  1 April 2000
(out)
(out)write /dev/stdout: broken pipe
docker run https-get-legacy-static ldd /go/bin/https-get
(out)	not a dynamic executable
docker run https-get-legacy-static ls -lh /go/bin/https-get
(out)-rwxr-xr-x 1 root root 5.2M Jul 12 14:10 /go/bin/https-get

That looks better: the application runs and is reported as a static binary. Let’s try creating a lean container again:

  
cat Dockerfile.multi-stage
(out)FROM golang:1.8.3 as builder
(out)WORKDIR /go/src/https-get
(out)COPY main.go ./
(out)RUN CGO_ENABLED=0 go install -a -tags netgo -ldflags '-extldflags "-static"'
(out)RUN ldd /go/bin/https-get | grep -q "not a dynamic executable"
(out)
(out)FROM scratch
(out)COPY --from=builder /go/bin/https-get /https-get
(out)COPY --from=builder /etc/ssl/certs/ /etc/ssl/certs
(out)CMD ["/https-get"]
docker build -q -t https-get-multi-stage -f Dockerfile.multi-stage .
(out)sha256:be8d79a56e247fb50f64f4b595fcdd0d54b1fcba8001655b08dd8774a92679a9
docker run https-get-multi-stage | head
(out)
(out)
(out)
(out)
(out)
(out)
(out)Network Working Group                                     S. Christey
(out)Request for Comments: 2795                         MonkeySeeDoo, Inc.
(out)Category: Informational                                  1 April 2000
(out)
(out)write /dev/stdout: broken pipe

It works! Note that to avoid errors we also had to copy in the SSL certs from the build environment, since we are contacting an https:// url. How do the image sizes compare?

  
docker image ls | grep https-get
(out)https-get-multi-stage     latest              be8d79a56e24        50 seconds ago      5.71MB
(out)https-get-legacy-static   latest              20d7d491b320        3 minutes ago       717MB
(out)https-get-legacy          latest              23af1003e0a5        20 minutes ago      704MB

The https-get-multi-stage image is barely larger than the dynamically linked binary by itself, and significantly smaller than the full size image produced by the legacy Dockerfile.

When using multi-stage Dockerfiles only the image resulting from the last block of the Dockerfile is tagged when the build is complete. What happens to the intermediary images? They’re still present, taking up precious disk space, and at some point will need to be garbage collected. A few such intermediary images can be seen below, untagged (<none>), but present on disk:

  
docker image ls
(out)REPOSITORY                TAG                 IMAGE ID            CREATED             SIZE
(out)                                  cadbacc14117        14 minutes ago      700MB
(out)hello-world-multi-stage   latest              7c978fc374db        14 minutes ago      1.55MB
(out)golang                    1.8.3               d2f558dda133        2 weeks ago         699MB

The --force-rm flag, which cleans up intermediary containers throughout the build, seems promising here but it doesn’t resolve the issue as it doesn’t clean up intermediary images. Make sure to understand whether your build process creates untagged images, and have a plan for regularly cleaning them up. The docker image prune command could prove useful for this purpose.

The code used here can be found @ https://github.com/ContainerSolutions/lean-go-containers.