docker container start 源码分析
tags: docker,container,源码分析
docker container start 源码分析
本文编写时,最新release为v20.10.14, 下文涉及代码均为该版本的代码。1. docker container start 简介
启动容器,通过docker container start
命令调用,该流程在运行容器时也会执行。
详见官方文档: https://docs.docker.com/engine/reference/commandline/container_start/
2. 源码入口位置
由cli接收docker container start
命令参数,发送至docker engine api
cli与engine api的代码入口分别位于:
cli
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L29
func NewStartCommand(dockerCli command.Cli) *cobra.Command {
...
}
engine
https://github.com/moby/moby/blob/v20.10.14/api/server/router/container/container.go#L54
func (r *containerRouter) initRoutes() {
r.routes = []router.Route{
...
router.NewPostRoute("/containers/{name:.*}/start", r.postContainersStart),
...
}
}
3. cli
docker使用 github.com/spf13/cobra
实现cli功能,NewStartCommand
函数中定义了cobra.Command
。该函数中用户输入的参数经cobra
解析后,会被传给runStart
函数。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L38
func NewStartCommand(dockerCli command.Cli) *cobra.Command {
...
cmd := &cobra.Command{
...
RunE: func(cmd *cobra.Command, args []string) error {
opts.containers = args
return runStart(dockerCli, &opts)
},
}
...
}
3.1 attach
如果参数中指定了--attach
或--interactive
, 则需要将容器的STDOUT/STDERR,或STDIN “attach"到当前shell。
3.1.1 确保只有一个容器
因为要attach
容器, 所以需要确认要启动的只有一个容器。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L64-L66
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
// We're going to attach to a container.
// 1. Ensure we only have one container.
if len(opts.containers) > 1 {
return errors.New("you cannot start and attach multiple containers at once")
}
...
} else if opts.checkpoint != "" {
...
} else {
...
}
return nil
}
3.1.2 attach
调用docker inspect api, 判断容器的配置中有无设置tty
选项。如果没有,将信号直接发给容器。其中,inspect部分的分析详见docker container inspect 源码分析。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L69-L80
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
// 2. Attach to the container.
container := opts.containers[0]
c, err := dockerCli.Client().ContainerInspect(ctx, container)
if err != nil {
return err
}
// We always use c.ID instead of container to maintain consistency during `docker start`
if !c.Config.Tty {
sigc := notfiyAllSignals()
go ForwardAllSignals(ctx, dockerCli, c.ID, sigc)
defer signal.StopCatch(sigc)
}
...
} else if opts.checkpoint != "" {
...
} else {
...
}
return nil
}
覆盖detach按键。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L82-L84
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
if opts.detachKeys != "" {
dockerCli.ConfigFile().DetachKeys = opts.detachKeys
}
...
} else if opts.checkpoint != "" {
...
} else {
...
}
return nil
}
调用docker container attach api 实现attach,详细内容参见docker container attach 源码分析。
调用attach api后,将io流绑定到std上。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L86-L104
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
options := types.ContainerAttachOptions{
Stream: true,
Stdin: opts.openStdin && c.Config.OpenStdin,
Stdout: true,
Stderr: true,
DetachKeys: dockerCli.ConfigFile().DetachKeys,
}
var in io.ReadCloser
if options.Stdin {
in = dockerCli.In()
}
resp, errAttach := dockerCli.Client().ContainerAttach(ctx, c.ID, options)
if errAttach != nil {
return errAttach
}
defer resp.Close()
cErr := make(chan error, 1)
go func() {
cErr <- func() error {
streamer := hijackedIOStreamer{
streams: dockerCli,
inputStream: in,
outputStream: dockerCli.Out(),
errorStream: dockerCli.Err(),
resp: resp,
tty: c.Config.Tty,
detachKeys: options.DetachKeys,
}
errHijack := streamer.stream(ctx)
if errHijack == nil {
return errAttach
}
return errHijack
}()
}()
...
} else if opts.checkpoint != "" {
...
} else {
...
}
return nil
}
3.1.3 等待容器退出或删除
等待容器退出或删除,退出或被删除后statusChan
将收到信号。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L130-L134
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
// 3. We should open a channel for receiving status code of the container
// no matter it's detached, removed on daemon side(--rm) or exit normally.
statusChan := waitExitOrRemoved(ctx, dockerCli, c.ID, c.HostConfig.AutoRemove)
startOptions := types.ContainerStartOptions{
CheckpointID: opts.checkpoint,
CheckpointDir: opts.checkpointDir,
}
...
} else if opts.checkpoint != "" {
...
} else {
...
}
return nil
}
waitExitOrRemoved
函数会调用多个api,相关api的分析详见:
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/utils.go#L16
func waitExitOrRemoved(ctx context.Context, dockerCli command.Cli, containerID string, waitRemove bool) <-chan int {
if len(containerID) == 0 {
// containerID can never be empty
panic("Internal Error: waitExitOrRemoved needs a containerID as parameter")
}
// Older versions used the Events API, and even older versions did not
// support server-side removal. This legacyWaitExitOrRemoved method
// preserves that old behavior and any issues it may have.
if versions.LessThan(dockerCli.Client().ClientVersion(), "1.30") {
return legacyWaitExitOrRemoved(ctx, dockerCli, containerID, waitRemove)
}
condition := container.WaitConditionNextExit
if waitRemove {
condition = container.WaitConditionRemoved
}
resultC, errC := dockerCli.Client().ContainerWait(ctx, containerID, condition)
statusC := make(chan int)
go func() {
select {
case result := <-resultC:
if result.Error != nil {
logrus.Errorf("Error waiting for container: %v", result.Error.Message)
statusC <- 125
} else {
statusC <- int(result.StatusCode)
}
case err := <-errC:
logrus.Errorf("error waiting for container: %v", err)
statusC <- 125
}
}()
return statusC
}
3.1.4 启动容器
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L136-L145
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
// 4. Start the container.
if err := dockerCli.Client().ContainerStart(ctx, c.ID, startOptions); err != nil {
cancelFun()
<-cErr
if c.HostConfig.AutoRemove {
// wait container to be removed
<-statusChan
}
return err
}
...
} else if opts.checkpoint != "" {
...
} else {
...
}
return nil
}
调用docker engine /containers/{CONTAINERID}/start
func (cli *Client) ContainerStart(ctx context.Context, containerID string, options types.ContainerStartOptions) error {
query := url.Values{}
if len(options.CheckpointID) != 0 {
query.Set("checkpoint", options.CheckpointID)
}
if len(options.CheckpointDir) != 0 {
query.Set("checkpoint-dir", options.CheckpointDir)
}
resp, err := cli.post(ctx, "/containers/"+containerID+"/start", query, nil, nil)
ensureReaderClosed(resp)
return err
}
3.1.5 等待attach退出
接收管道消息,即等待attach退出。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L148-L163
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
// 5. Wait for attachment to break.
...
if attachErr := <-cErr; attachErr != nil {
if _, ok := attachErr.(term.EscapeError); ok {
// The user entered the detach escape sequence.
return nil
}
return attachErr
}
if status := <-statusChan; status != 0 {
return cli.StatusError{StatusCode: status}
}
} else if opts.checkpoint != "" {
...
} else {
...
}
return nil
}
3.2 未attach,设置了checkpoint
设置了checkpoint,也要求待启动容器只能有一个。因为checkpoint是与容器绑定的。
https://github.com/docker/cli/blob/master/cli/command/container/start.go#L178-L188
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
} else if opts.checkpoint != "" {
if len(opts.containers) > 1 {
return errors.New("you cannot restore multiple containers at once")
}
container := opts.containers[0]
startOptions := types.ContainerStartOptions{
CheckpointID: opts.checkpoint,
CheckpointDir: opts.checkpointDir,
}
return dockerCli.Client().ContainerStart(ctx, container, startOptions)
} else {
...
}
return nil
}
调用docker engine /containers/{CONTAINERID}/start
。
func (cli *Client) ContainerStart(ctx context.Context, containerID string, options types.ContainerStartOptions) error {
query := url.Values{}
if len(options.CheckpointID) != 0 {
query.Set("checkpoint", options.CheckpointID)
}
if len(options.CheckpointDir) != 0 {
query.Set("checkpoint-dir", options.CheckpointDir)
}
resp, err := cli.post(ctx, "/containers/"+containerID+"/start", query, nil, nil)
...
}
3.3 未attach, 也未设置checkpoint
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L178
func runStart(dockerCli command.Cli, opts *startOptions) error {
...
if opts.attach || opts.openStdin {
...
} else if opts.checkpoint != "" {
...
} else {
return startContainersWithoutAttachments(ctx, dockerCli, opts.containers)
}
return nil
}
调用docker engine API。
https://github.com/docker/cli/blob/v20.10.14/cli/command/container/start.go#L184
func startContainersWithoutAttachments(ctx context.Context, dockerCli command.Cli, containers []string) error {
...
for _, container := range containers {
if err := dockerCli.Client().ContainerStart(ctx, container, types.ContainerStartOptions{}); err != nil {
...
continue
}
...
}
...
}
4. dockerd api
api /containers/{CONTAINERID}/start
对应 r.postContainersStart
方法。
https://github.com/moby/moby/blob/v20.10.14/api/server/router/container/container.go#L54
func (r *containerRouter) initRoutes() {
r.routes = []router.Route{
...
router.NewPostRoute("/containers/{name:.*}/start", r.postContainersStart),
...
}
}
https://github.com/moby/moby/blob/v20.10.14/api/server/router/container/container_routes.go#L210
func (s *containerRouter) postContainersStart(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
...
if err := s.backend.ContainerStart(vars["name"], hostConfig, checkpoint, checkpointDir); err != nil {
return err
}
...
}
4.1 Daemon.ContainerStart
跟进到Daemon.ContainerStart
方法,该方法封装了Daemon.containerStart
方法,作为Backend的实现,供api调用。
该方法内容较长,下面我们依次走读一下。
根据客户端提供的容器名称找到容器对象。
容器名称参数可以是:
- 容器完整ID
- 容器完整名称
- 容器部分ID
Daemon.GetContainer
会按照上述顺序依次查找对应的容器对象。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L17
func (daemon *Daemon) ContainerStart(name string, hostConfig *containertypes.HostConfig, checkpoint string, checkpointDir string) error {
...
ctr, err := daemon.GetContainer(name)
if err != nil {
return err
}
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/container.go#L38
func (daemon *Daemon) GetContainer(prefixOrName string) (*container.Container, error) {
...
if containerByID := daemon.containers.Get(prefixOrName); containerByID != nil {
return containerByID, nil
}
if containerByName, _ := daemon.GetByName(prefixOrName); containerByName != nil {
return containerByName, nil
}
containerID, indexError := daemon.idIndex.Get(prefixOrName)
...
return daemon.containers.Get(containerID), nil
}
校验容器状态,如果容器处在Paused, Running, RemovalInProgress, Dead状态,则返回错误。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L27-L43
func (daemon *Daemon) ContainerStart(name string, hostConfig *containertypes.HostConfig, checkpoint string, checkpointDir string) error {
...
validateState := func() error {
ctr.Lock()
defer ctr.Unlock()
if ctr.Paused {
return errdefs.Conflict(errors.New("cannot start a paused container, try unpause instead"))
}
if ctr.Running {
return containerNotModifiedError{running: true}
}
if ctr.RemovalInProgress || ctr.Dead {
return errdefs.Conflict(errors.New("container is marked for removal and cannot be started"))
}
return nil
}
if err := validateState(); err != nil {
return err
}
...
}
在调用start api时,可以通过body传递容器的HostConfig, 不过当前版本(v20.10.14)docker cli不传递请求体。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L50-L93
func (daemon *Daemon) ContainerStart(name string, hostConfig *containertypes.HostConfig, checkpoint string, checkpointDir string) error {
...
if runtime.GOOS != "windows" {
if hostConfig != nil {
logrus.Warn("DEPRECATED: Setting host configuration options when the container starts is deprecated and has been removed in Docker 1.12")
...
}
}
...
if _, err = daemon.verifyContainerSettings(ctr.OS, ctr.HostConfig, nil, false); err != nil {
return errdefs.InvalidParameter(err)
}
if hostConfig != nil {
if err := daemon.adaptContainerSettings(ctr.HostConfig, false); err != nil {
return errdefs.InvalidParameter(err)
}
}
...
}
调用Daemon.containerStart
方法。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L94
func (daemon *Daemon) ContainerStart(name string, hostConfig *containertypes.HostConfig, checkpoint string, checkpointDir string) error {
...
return daemon.containerStart(ctr, checkpoint, checkpointDir, true)
}
4.2 Daemon.containerStart
Daemon.containerStart
方法实际处理容器启动。
该方法内容较长,下面我们依次走读一下。
4.2.1 checkpointDir
可以看到,尽管docker cli中已经有了checkpointDir参数,但该功能在daemon中还未被支持。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L114-L117
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
if checkpointDir != "" {
// TODO(mlaventure): how would we support that?
return errdefs.Forbidden(errors.New("custom checkpointdir is not supported"))
}
...
}
4.2.2 遇到错误清理容器
预埋一个defer函数,在Daemon.containerStart
方法结束前,如果有报错,统一在这个函数里做一些清理工作,使得容器恢复到可以再次被启动的状态。
Daemon.Cleanup
方法将释放分配给容器的网络资源,以及卸载容器文件系统。该方法也会在Daemon.handleContainerExit
方法中被调用(即处理容器退出事件时)。
daemon.ContainerRm
方法在配置自动删除容器选项时被调用,用于删除容器。该方法在执行docker container rm
时也会被调用,本文不对该方法进行分析。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L121-L143
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
defer func() {
if err != nil {
container.SetError(err)
// if no one else has set it, make sure we don't leave it at zero
if container.ExitCode() == 0 {
container.SetExitCode(128)
}
if err := container.CheckpointTo(daemon.containersReplica); err != nil {
logrus.Errorf("%s: failed saving state on start failure: %v", container.ID, err)
}
container.Reset(false)
daemon.Cleanup(container)
// if containers AutoRemove flag is set, remove it after clean up
if container.HostConfig.AutoRemove {
container.Unlock()
if err := daemon.ContainerRm(container.ID, &types.ContainerRmConfig{ForceRemove: true, RemoveVolume: true}); err != nil {
logrus.Errorf("can't remove container %s: %v", container.ID, err)
}
container.Lock()
}
}
}()
...
}
daemon.releaseNetwork
方法释放与容器连接的相关网络资源(libnetwork.sandbox,libnetwork.Network), 清除容器网络配置。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L229
func (daemon *Daemon) Cleanup(container *container.Container) {
daemon.releaseNetwork(container)
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/container_operations.go#L1012
func (daemon *Daemon) releaseNetwork(container *container.Container) {
...
settings := container.NetworkSettings.Networks
container.NetworkSettings.Ports = nil
...
var networks []libnetwork.Network
for n, epSettings := range settings {
...
cleanOperationalData(epSettings)
}
sb, err := daemon.netController.SandboxByID(sid)
...
if err := sb.Delete(); err != nil {
logrus.Errorf("Error deleting sandbox id %s for container %s: %v", sid, container.ID, err)
}
for _, nw := range networks {
daemon.tryDetachContainerFromClusterNetwork(nw, container)
}
...
}
Container.UnmountIpcMount
方法用于卸载shm。
共享了IPC的容器会在容器的“工作目录”创建shm文件,供其他容器挂载。
例如,使用docker run -ti -d --ipc=shareable ubuntu
命令创建的容器,将存在形如/var/lib/docker/containers/{ContainerID}/mounts/shm
的挂载点。
而Container.UnmountIpcMount
方法,就是用于卸载上述挂载点。但如果容器中存在主动挂载至/dev/shm
的情况(例如docker run -v test:/dev/shm
)时,则不会卸载。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L231-L233
func (daemon *Daemon) Cleanup(container *container.Container) {
...
if err := container.UnmountIpcMount(); err != nil {
logrus.Warnf("%s cleanup: failed to unmount IPC: %s", container.ID, err)
}
...
}
https://github.com/moby/moby/blob/v20.10.14/container/container_unix.go#L179
func (container *Container) UnmountIpcMount() error {
if container.HasMountFor("/dev/shm") {
return nil
}
shmPath, err := container.ShmResourcePath()
...
if err = mount.Unmount(shmPath); err != nil && !errors.Is(err, os.ErrNotExist) {
return err
}
return nil
}
卸载容器的rootfs。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L235-L241
func (daemon *Daemon) Cleanup(container *container.Container) {
...
if err := daemon.conditionalUnmountOnCleanup(container); err != nil {
// FIXME: remove once reference counting for graphdrivers has been refactored
// Ensure that all the mounts are gone
if mountid, err := daemon.imageService.GetLayerMountID(container.ID, container.OS); err == nil {
daemon.cleanupMountsByID(mountid)
}
}
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/daemon_unix.go#L1365
func (daemon *Daemon) conditionalUnmountOnCleanup(container *container.Container) error {
return daemon.Unmount(container)
}
https://github.com/moby/moby/blob/v20.10.14/daemon/daemon.go#L1341
func (daemon *Daemon) Unmount(container *container.Container) error {
if container.RWLayer == nil {
return errors.New("RWLayer of container " + container.ID + " is unexpectedly nil")
}
if err := container.RWLayer.Unmount(); err != nil {
logrus.WithField("container", container.ID).WithError(err).Error("error unmounting container")
return err
}
return nil
}
使用swarm secrets时,secrets会挂载在/var/lib/docker/containers/{ContainerID}/mounts/secrets
目录。在清理时,会递归卸载该目录。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L243
func (daemon *Daemon) Cleanup(container *container.Container) {
...
if err := container.UnmountSecrets(); err != nil {
logrus.Warnf("%s cleanup: failed to unmount secrets: %s", container.ID, err)
}
...
}
https://github.com/moby/moby/blob/v20.10.14/container/container_unix.go#L255
func (container *Container) UnmountSecrets() error {
p, err := container.SecretMountPath()
...
return mount.RecursiveUnmount(p)
}
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/moby/sys/mount/mount_unix.go#L43
func RecursiveUnmount(target string) error {
if err := unix.Unmount(target, mntDetach); err == nil {
return nil
}
...
}
递归卸载容器的"工作目录”, 即/var/lib/docker/containers/{ContainerID}/
目录。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L247-L249
func (daemon *Daemon) Cleanup(container *container.Container) {
...
if err := recursiveUnmount(container.Root); err != nil {
logrus.WithError(err).WithField("container", container.ID).Warn("Error while cleaning up container resource mounts.")
}
...
}
取消注册容器的exec命令。(该命令在执行docker exec或Health Check时被添加)
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L251
func (daemon *Daemon) Cleanup(container *container.Container) {
...
for _, eConfig := range container.ExecCommands.Commands() {
daemon.unregisterExecCommand(container, eConfig)
}
...
}
卸载容器挂载的卷。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L256
func (daemon *Daemon) Cleanup(container *container.Container) {
...
if container.BaseFS != nil && container.BaseFS.Path() != "" {
if err := container.UnmountVolumes(daemon.LogVolumeEvent); err != nil {
logrus.Warnf("%s cleanup: Failed to umount volumes: %v", container.ID, err)
}
}
...
}
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L493
func (container *Container) UnmountVolumes(volumeEventLog func(name, action string, attributes map[string]string)) error {
var errors []string
for _, volumeMount := range container.MountPoints {
if volumeMount.Volume == nil {
continue
}
if err := volumeMount.Cleanup(); err != nil {
errors = append(errors, err.Error())
continue
}
...
}
...
}
https://github.com/moby/moby/blob/v20.10.14/volume/mounts/mounts.go#L78
func (m *MountPoint) Cleanup() error {
if m.Volume == nil || m.ID == "" {
return nil
}
if err := m.Volume.Unmount(m.ID); err != nil {
return errors.Wrapf(err, "error unmounting volume %s", m.Volume.Name())
}
...
}
取消Attach。在attach流程中,会等待至container.attachContext
结束时detach。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L261
func (daemon *Daemon) Cleanup(container *container.Container) {
...
container.CancelAttachContext()
...
}
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L625
func (container *Container) CancelAttachContext() {
container.attachContext.mu.Lock()
if container.attachContext.ctx != nil {
container.attachContext.cancel()
container.attachContext.ctx = nil
}
container.attachContext.mu.Unlock()
}
最后,调用containerd删除容器。containerd具体如何删除将在下一章节分析containerd源码时统一分析。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L263
func (daemon *Daemon) Cleanup(container *container.Container) {
...
if err := daemon.containerd.Delete(context.Background(), container.ID); err != nil {
logrus.Errorf("%s cleanup: failed to delete container from containerd: %v", container.ID, err)
}
}
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L482
func (c *client) Delete(ctx context.Context, containerID string) error {
ctr, err := c.getContainer(ctx, containerID)
if err != nil {
return err
}
...
if err := ctr.Delete(ctx); err != nil {
return wrapError(err)
}
...
}
4.2.3 挂载容器rootfs
挂载容器的rootfs。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L145
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
if err := daemon.conditionalMountOnStart(container); err != nil {
return err
}
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/daemon_unix.go#L1359
func (daemon *Daemon) conditionalMountOnStart(container *container.Container) error {
return daemon.Mount(container)
}
https://github.com/moby/moby/blob/v20.10.14/daemon/daemon.go#L1316
func (daemon *Daemon) Mount(container *container.Container) error {
if container.RWLayer == nil {
return errors.New("RWLayer of container " + container.ID + " is unexpectedly nil")
}
dir, err := container.RWLayer.Mount(container.GetMountLabel())
if err != nil {
return err
}
...
container.BaseFS = dir // TODO: combine these fields
return nil
}
具体的挂载由graphDriver实现。例如overlay2驱动挂载的路径将返回形如 /var/lib/docker/overlay2/${LayerID}/merged
的rootfs路径。
下文我仅以overlay2为例,展开分析,至于其他类型的驱动,可在未来针对graphDriver进行源码分析时进行。
https://github.com/moby/moby/blob/v20.10.14/layer/mounted_layer.go#L103
func (rl *referencedRWLayer) Mount(mountLabel string) (containerfs.ContainerFS, error) {
return rl.layerStore.driver.Get(rl.mountedLayer.mountID, mountLabel)
}
生成调用mount函数将要传入的参数。
- mountData形如
index=off,lowerdir=/var/lib/docker/overlay2/l/{LowerID}:/var/lib/docker/overlay2/l/{LowerID},upperdir=/var/lib/docker/overlay2/{LayerID}/diff,workdir=/var/lib/docker/overlay2/{LayerID}/work
。 - mountTarget形如
/var/lib/docker/overlay2/{LayerID}/merged
。 - 要调用的mount函数即golang对mount系统调用的封装。
其中LowerId是指向相关目录的软链接。
https://github.com/moby/moby/blob/v20.10.14/daemon/graphdriver/overlay2/overlay.go#L524
func (d *Driver) Get(id, mountLabel string) (_ containerfs.ContainerFS, retErr error) {
...
var opts string
if readonly {
opts = indexOff + userxattr + "lowerdir=" + diffDir + ":" + strings.Join(absLowers, ":")
} else {
opts = indexOff + userxattr + "lowerdir=" + strings.Join(absLowers, ":") + ",upperdir=" + diffDir + ",workdir=" + workDir
}
mountData := label.FormatMountLabel(opts, mountLabel)
mount := unix.Mount
mountTarget := mergedDir
rootUID, rootGID, err := idtools.GetRootUIDGID(d.uidMaps, d.gidMaps)
if err != nil {
return nil, err
}
if err := idtools.MkdirAndChown(mergedDir, 0700, idtools.Identity{UID: rootUID, GID: rootGID}); err != nil {
return nil, err
}
...
}
但mount系统调用的data参数有长度限制,只会使用在page size内的mount data。
https://elixir.bootlin.com/linux/latest/source/fs/namespace.c#L3224
static void *copy_mount_options(const void __user * data)
{
char *copy;
unsigned left, offset;
if (!data)
return NULL;
copy = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!copy)
return ERR_PTR(-ENOMEM);
left = copy_from_user(copy, data, PAGE_SIZE);
/*
* Not all architectures have an exact copy_from_user(). Resort to
* byte at a time.
*/
offset = PAGE_SIZE - left;
while (left) {
char c;
if (get_user(c, (const char __user *)data + offset))
break;
copy[offset] = c;
left--;
offset++;
}
if (left == PAGE_SIZE) {
kfree(copy);
return ERR_PTR(-EFAULT);
}
return copy;
}
因为有page size的限制,所以docker才使用会使用“lower目录”,通过较短的软链接替代较长的路径。常见情况下,已经不需要考虑page size的边界问题。
如果mount data长度超过了page size, 则可以使用相对路径以减少长度。
如果还是超出了长度,则报错。
https://github.com/moby/moby/blob/v20.10.14/daemon/graphdriver/overlay2/overlay.go#L592-L613
func (d *Driver) Get(id, mountLabel string) (_ containerfs.ContainerFS, retErr error) {
...
pageSize := unix.Getpagesize()
// Use relative paths and mountFrom when the mount data has exceeded
// the page size. The mount syscall fails if the mount data cannot
// fit within a page and relative links make the mount data much
// smaller at the expense of requiring a fork exec to chroot.
if len(mountData) > pageSize-1 {
if readonly {
opts = indexOff + userxattr + "lowerdir=" + path.Join(id, diffDirName) + ":" + string(lowers)
} else {
opts = indexOff + userxattr + "lowerdir=" + string(lowers) + ",upperdir=" + path.Join(id, diffDirName) + ",workdir=" + path.Join(id, workDirName)
}
mountData = label.FormatMountLabel(opts, mountLabel)
if len(mountData) > pageSize-1 {
return nil, fmt.Errorf("cannot mount layer, mount label too large %d", len(mountData))
}
mount = func(source string, target string, mType string, flags uintptr, label string) error {
return mountFrom(d.home, source, target, mType, flags, label)
}
mountTarget = path.Join(id, mergedDirName)
}
...
}
注意到,使用相对路径时,mount函数需要替换成mountFrom
函数。mountFrom
函数中实际调用了docker自己实现的命令docker-mountfrom
。关于该命令的源码分析,我们将在 docker reexec docker-mountfrom 源码分析 一文中详细展开。该函数的作用是,先chroot到一个目录,然后在该目录中执行mount,即可实现使用相对路径进行mount。
https://github.com/moby/moby/blob/v20.10.14/daemon/graphdriver/overlay2/overlay.go#L609
func (d *Driver) Get(id, mountLabel string) (_ containerfs.ContainerFS, retErr error) {
...
if len(mountData) > pageSize-1 {
...
mount = func(source string, target string, mType string, flags uintptr, label string) error {
return mountFrom(d.home, source, target, mType, flags, label)
}
...
}
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/graphdriver/overlay2/mount.go#L34
func mountFrom(dir, device, target, mType string, flags uintptr, label string) error {
...
cmd := reexec.Command("docker-mountfrom", dir)
w, err := cmd.StdinPipe()
if err != nil {
return fmt.Errorf("mountfrom error on pipe creation: %v", err)
}
output := bytes.NewBuffer(nil)
cmd.Stdout = output
cmd.Stderr = output
if err := cmd.Start(); err != nil {
w.Close()
return fmt.Errorf("mountfrom error on re-exec cmd: %v", err)
}
// write the options to the pipe for the untar exec to read
if err := json.NewEncoder(w).Encode(options); err != nil {
w.Close()
return fmt.Errorf("mountfrom json encode to pipe failed: %v", err)
}
...
}
最后调用mount函数,挂载rootfs。
https://github.com/moby/moby/blob/v20.10.14/daemon/graphdriver/overlay2/overlay.go#L615-L617
func (d *Driver) Get(id, mountLabel string) (_ containerfs.ContainerFS, retErr error) {
...
if err := mount("overlay", mountTarget, "overlay", 0, mountData); err != nil {
return nil, fmt.Errorf("error creating overlay mount to %s: %v", mergedDir, err)
}
if !readonly {
// chown "workdir/work" to the remapped root UID/GID. Overlay fs inside a
// user namespace requires this to move a directory from lower to upper.
if err := os.Chown(path.Join(workDir, workDirName), rootUID, rootGID); err != nil {
return nil, err
}
}
return containerfs.NewLocalContainerFS(mergedDir), nil
}
4.2.4 初始化容器网络
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L149
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
if err := daemon.initializeNetworking(container); err != nil {
return err
}
...
}
如果是与其他容器共享网络 (即使用--network container:{ContainerID}
参数运行容器,参见官方文档) ,则初始化网络过程较为简单。
因为网络已经存在,找到共享网络的容器,更新hostname等配置即可。然后直接返回,初始化网络流程结束。
https://github.com/moby/moby/blob/v20.10.14/daemon/container_operations.go#L958
func (daemon *Daemon) initializeNetworking(container *container.Container) error {
var err error
if container.HostConfig.NetworkMode.IsContainer() {
// we need to get the hosts files from the container to join
nc, err := daemon.getNetworkedContainer(container.ID, container.HostConfig.NetworkMode.ConnectedContainer())
if err != nil {
return err
}
err = daemon.initializeNetworkingPaths(container, nc)
if err != nil {
return err
}
container.Config.Hostname = nc.Config.Hostname
container.Config.Domainname = nc.Config.Domainname
return nil
}
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/container_operations_unix.go#L448
func (daemon *Daemon) initializeNetworkingPaths(container *container.Container, nc *container.Container) error {
container.HostnamePath = nc.HostnamePath
container.HostsPath = nc.HostsPath
container.ResolvConfPath = nc.ResolvConfPath
return nil
}
如果是host模式的网络,需要额外配置hostname,并继续执行。
https://github.com/moby/moby/blob/v20.10.14/daemon/container_operations_unix.go#L448
func (daemon *Daemon) initializeNetworking(container *container.Container) error {
...
if container.HostConfig.NetworkMode.IsHost() {
if container.Config.Hostname == "" {
container.Config.Hostname, err = os.Hostname()
if err != nil {
return err
}
}
}
...
}
调用 daemon.allocateNetwork
,创建并连接至网络。
https://github.com/moby/moby/blob/v20.10.14/daemon/container_operations.go#L987
func (daemon *Daemon) initializeNetworking(container *container.Container) error {
...
if err := daemon.allocateNetwork(container); err != nil {
return err
}
...
}
创建sandbox, endpoint, 连接至容器网络。
https://github.com/moby/moby/blob/v20.10.14/daemon/container_operations.go#L559-L605
func (daemon *Daemon) allocateNetwork(container *container.Container) (retErr error) {
...
defaultNetName := runconfig.DefaultDaemonNetworkMode().NetworkName()
if nConf, ok := container.NetworkSettings.Networks[defaultNetName]; ok {
cleanOperationalData(nConf)
if err := daemon.connectToNetwork(container, defaultNetName, nConf.EndpointSettings, updateSettings); err != nil {
return err
}
}
// the intermediate map is necessary because "connectToNetwork" modifies "container.NetworkSettings.Networks"
networks := make(map[string]*network.EndpointSettings)
for n, epConf := range container.NetworkSettings.Networks {
if n == defaultNetName {
continue
}
networks[n] = epConf
}
for netName, epConf := range networks {
cleanOperationalData(epConf)
if err := daemon.connectToNetwork(container, netName, epConf.EndpointSettings, updateSettings); err != nil {
return err
}
}
// If the container is not to be connected to any network,
// create its network sandbox now if not present
if len(networks) == 0 {
if nil == daemon.getNetworkSandbox(container) {
sbOptions, err := daemon.buildSandboxOptions(container)
if err != nil {
return err
}
sb, err := daemon.netController.NewSandbox(container.ID, sbOptions...)
if err != nil {
return err
}
updateSandboxNetworkSettings(container, sb)
defer func() {
if retErr != nil {
sb.Delete()
}
}()
}
}
...
}
Daemon.allocateNetwork
方法最后更新HostConfig文件。
hostconfig文件位于/var/lib/docker/{ContainerID}/hostconfig.json
, 以json格式存储了容器的host配置。
func (daemon *Daemon) allocateNetwork(container *container.Container) (retErr error) {
...
if _, err := container.WriteHostConfig(); err != nil {
return err
}
...
}
Daemon.initializeNetworking
方法最后创建hostname文件。
除了container类型的网络,其他类型都会在容器“工作目录”例如/var/lib/docker/containers/{ContainerID}
创建hostname文件,并写入容器的hostname。
https://github.com/moby/moby/blob/v20.10.14/daemon/container_operations.go#L991
func (daemon *Daemon) initializeNetworking(container *container.Container) error {
...
return container.BuildHostnameFile()
}
https://github.com/moby/moby/blob/v20.10.14/container/container_unix.go#L54
func (container *Container) BuildHostnameFile() error {
hostnamePath, err := container.GetRootResourcePath("hostname")
if err != nil {
return err
}
container.HostnamePath = hostnamePath
return ioutil.WriteFile(container.HostnamePath, []byte(container.Config.Hostname+"\n"), 0644)
}
4.2.5 创建oci配置
创建符合oci标准的容器配置,后续该配置将被传递给oci。考虑到篇幅问题,该流程源码分析可以参见docker 创建oci运行时配置 源码分析。
https://github.com/moby/moby/blob/master/daemon/start.go#L153-L156
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
spec, err := daemon.createSpec(container)
if err != nil {
return errdefs.System(err)
}
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/oci_linux.go#L1008
func (daemon *Daemon) createSpec(c *container.Container) (retSpec *specs.Spec, err error) {
var (
opts []coci.SpecOpts
s = oci.DefaultSpec()
)
opts = append(opts,
WithCommonOptions(daemon, c),
WithCgroups(daemon, c),
WithResources(c),
WithSysctls(c),
WithDevices(daemon, c),
WithUser(c),
WithRlimits(daemon, c),
WithNamespaces(daemon, c),
WithCapabilities(c),
WithSeccomp(daemon, c),
WithMounts(daemon, c),
WithLibnetwork(daemon, c),
WithApparmor(c),
WithSelinux(c),
WithOOMScore(&c.HostConfig.OomScoreAdj),
)
if c.NoNewPrivileges {
opts = append(opts, coci.WithNoNewPrivileges)
}
// Set the masked and readonly paths with regard to the host config options if they are set.
if c.HostConfig.MaskedPaths != nil {
opts = append(opts, coci.WithMaskedPaths(c.HostConfig.MaskedPaths))
}
if c.HostConfig.ReadonlyPaths != nil {
opts = append(opts, coci.WithReadonlyPaths(c.HostConfig.ReadonlyPaths))
}
if daemon.configStore.Rootless {
opts = append(opts, WithRootless(daemon))
}
return &s, coci.ApplyOpts(context.Background(), nil, &containers.Container{
ID: c.ID,
}, &s, opts...)
}
4.2.6 为容器对象保存apparmor配置
尽管在docker create
和本文4.2.5 创建oci配置
节中,已经处理过apparmor配置,但container.Container.AppArmorProfile
字段可能为空值。这里为了在docker inspect时显示docker-default
,再解析和更新一次apparmor配置。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L163-L165
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
if err := daemon.saveAppArmorConfig(container); err != nil {
return err
}
...
}
4.2.7 checkpoint
获取checkpoint目录。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L167-L172
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
if checkpoint != "" {
checkpointDir, err = getCheckpointDir(checkpointDir, checkpoint, container.Name, container.ID, container.CheckpointDir(), false)
if err != nil {
return err
}
}
...
}
该目录默认形如/var/lib/docker/containers/{ContainerID}/checkpoints/{checkpoint}
。
https://github.com/moby/moby/blob/v20.10.14/daemon/checkpoint.go#L20
func getCheckpointDir(checkDir, checkpointID, ctrName, ctrID, ctrCheckpointDir string, create bool) (string, error) {
var checkpointDir string
var err2 error
if checkDir != "" {
...
} else {
checkpointDir = ctrCheckpointDir
}
checkpointAbsDir := filepath.Join(checkpointDir, checkpointID)
stat, err := os.Stat(checkpointAbsDir)
if create {
...
} else {
switch {
case err != nil:
err2 = fmt.Errorf("checkpoint %s does not exist for container %s", checkpointID, ctrName)
case stat.IsDir():
err2 = nil
default:
err2 = fmt.Errorf("%s exists and is not a directory", checkpointAbsDir)
}
}
return checkpointAbsDir, err2
}
4.2.8 containerd shim路径和选项
获取containerd所需的runtime shim路径和选项。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L174-L177
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
shim, createOptions, err := daemon.getLibcontainerdCreateOptions(container)
if err != nil {
return err
}
...
}
getLibcontainerdCreateOptions()
函数根据HostConfig.Runtime获取对应的runtime, 返回runtime的shim路径和选项。默认情况,HostConfig.Runtime值为"runc"。
https://github.com/moby/moby/blob/v20.10.14/daemon/start_unix.go#L10
func (daemon *Daemon) getLibcontainerdCreateOptions(container *container.Container) (string, interface{}, error) {
// Ensure a runtime has been assigned to this container
if container.HostConfig.Runtime == "" {
container.HostConfig.Runtime = daemon.configStore.GetDefaultRuntimeName()
container.CheckpointTo(daemon.containersReplica)
}
rt, err := daemon.getRuntime(container.HostConfig.Runtime)
if err != nil {
return "", nil, translateContainerdStartErr(container.Path, container.SetExitCode, err)
}
return rt.Shim.Binary, rt.Shim.Opts, nil
}
https://github.com/moby/moby/blob/v20.10.14/daemon/runtime_unix.go#L132-L135
func (daemon *Daemon) getRuntime(name string) (*types.Runtime, error) {
rt := daemon.configStore.GetRuntime(name)
if rt == nil {
return nil, errdefs.InvalidParameter(errors.Errorf("runtime not found in config: %s", name))
}
...
}
从conf.Runtimes
中根据名称查询runtime。
https://github.com/moby/moby/blob/v20.10.14/daemon/config/config_common_unix.go#L32
func (conf *Config) GetRuntime(name string) *types.Runtime {
conf.Lock()
defer conf.Unlock()
if rt, ok := conf.Runtimes[name]; ok {
return &rt
}
return nil
}
其中conf.Runtimes
有3个默认配置项,用户也可在dockerd启动参数中额外配置。
在dockerd启动时默认配置了runc
, io.containerd.runtime.v1.linux
, io.containerd.runc.v2
3个runtime。
https://github.com/moby/moby/blob/v20.10.14/daemon/runtime_unix.go#L24-L41
const (
defaultRuntimeName = "runc"
...
)
func configureRuntimes(conf *config.Config) {
if conf.DefaultRuntime == "" {
conf.DefaultRuntime = config.StockRuntimeName
}
if conf.Runtimes == nil {
conf.Runtimes = make(map[string]types.Runtime)
}
conf.Runtimes[config.LinuxV1RuntimeName] = types.Runtime{Path: defaultRuntimeName, Shim: defaultV1ShimConfig(conf, defaultRuntimeName)}
conf.Runtimes[config.LinuxV2RuntimeName] = types.Runtime{Path: defaultRuntimeName, Shim: defaultV2ShimConfig(conf, defaultRuntimeName)}
conf.Runtimes[config.StockRuntimeName] = conf.Runtimes[config.LinuxV2RuntimeName]
}
https://github.com/moby/moby/blob/v20.10.14/daemon/config/config.go#L50-L56
const (
...
// StockRuntimeName is the reserved name/alias used to represent the
// OCI runtime being shipped with the docker daemon package.
StockRuntimeName = "runc"
// LinuxV1RuntimeName is the runtime used to specify the containerd v1 shim with the runc binary
// Note this is different than io.containerd.runc.v1 which would be the v1 shim using the v2 shim API.
// This is specifically for the v1 shim using the v1 shim API.
LinuxV1RuntimeName = "io.containerd.runtime.v1.linux"
// LinuxV2RuntimeName is the runtime used to specify the containerd v2 runc shim
LinuxV2RuntimeName = "io.containerd.runc.v2"
)
io.containerd.runtime.v1.linux
, io.containerd.runc.v2
对应的shim结构体实现如下,runc
与io.containerd.runc.v2
相同。其中runc的root目录默认为/var/run/docker/runtime-runc
。
https://github.com/moby/moby/blob/v20.10.14/daemon/runtime_unix.go#L55
func defaultV1ShimConfig(conf *config.Config, runtimePath string) *types.ShimConfig {
return &types.ShimConfig{
Binary: linuxShimV1,
Opts: &runctypes.RuncOptions{
Runtime: runtimePath,
RuntimeRoot: filepath.Join(conf.ExecRoot, "runtime-"+defaultRuntimeName),
SystemdCgroup: UsingSystemd(conf),
},
}
}
https://github.com/moby/moby/blob/v20.10.14/daemon/runtime_unix.go#L43
func defaultV2ShimConfig(conf *config.Config, runtimePath string) *types.ShimConfig {
return &types.ShimConfig{
Binary: linuxShimV2,
Opts: &v2runcoptions.Options{
BinaryName: runtimePath,
Root: filepath.Join(conf.ExecRoot, "runtime-"+defaultRuntimeName),
SystemdCgroup: UsingSystemd(conf),
NoPivotRoot: os.Getenv("DOCKER_RAMDISK") != "",
},
}
}
containerd API 只会直接调用oci runtime文件,对于指定了额外参数的情况(可能会出现在用户自定义runtime的场景中),docker封装了一层脚本,再将脚本传递给containerd API。
https://github.com/moby/moby/blob/v20.10.14/daemon/runtime_unix.go#L137-L144
func (daemon *Daemon) getRuntime(name string) (*types.Runtime, error) {
...
if len(rt.Args) > 0 {
p, err := daemon.rewriteRuntimePath(name, rt.Path, rt.Args)
if err != nil {
return nil, err
}
rt.Path = p
rt.Args = nil
}
...
}
Daemon.rewriteRuntimePath()
方法中将含有参数的runtime路径替换成了/var/lib/docker/runtimes/{runtime_name}
路径。
https://github.com/moby/moby/blob/v20.10.14/daemon/runtime_unix.go#L118
func (daemon *Daemon) rewriteRuntimePath(name, p string, args []string) (string, error) {
if len(args) == 0 {
return p, nil
}
// Check that the runtime path actually exists here so that we can return a well known error.
if _, err := exec.LookPath(p); err != nil {
return "", errors.Wrap(err, "error while looking up the specified runtime path")
}
return filepath.Join(daemon.configStore.Root, "runtimes", name), nil
}
该路径在dockerd启动时已被写入为sh脚本。
https://github.com/moby/moby/blob/v20.10.14/daemon/runtime_unix.go#L100-L111
func (daemon *Daemon) initRuntimes(runtimes map[string]types.Runtime) (err error) {
...
for name, rt := range runtimes {
if len(rt.Args) > 0 {
script := filepath.Join(tmpDir, name)
content := fmt.Sprintf("#!/bin/sh\n%s %s $@\n", rt.Path, strings.Join(rt.Args, " "))
if err := ioutil.WriteFile(script, []byte(content), 0700); err != nil {
return err
}
}
if rt.Shim == nil {
rt.Shim = defaultV2ShimConfig(daemon.configStore, rt.Path)
}
}
...
}
4.2.9 调用containerd api
完成准备工作后,docker将准备好的参数传递至containerd API。
首先调用containerd.Create()
,如果是因为ErrConflict
错误,则分别调用containerd.DeleteTask()
,containerd.Delete()
, 然后再尝试一次containerd.Create()
。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L179-L195
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
ctx := context.TODO()
err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
if err != nil {
if errdefs.IsConflict(err) {
logrus.WithError(err).WithField("container", container.ID).Error("Container not cleaned up from containerd from previous run")
// best effort to clean up old container object
daemon.containerd.DeleteTask(ctx, container.ID)
if err := daemon.containerd.Delete(ctx, container.ID); err != nil && !errdefs.IsNotFound(err) {
logrus.WithError(err).WithField("container", container.ID).Error("Error cleaning up stale containerd container object")
}
err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
}
if err != nil {
return translateContainerdStartErr(container.Path, container.SetExitCode, err)
}
}
...
}
然后调用containerd.Start()
, 如果返回错误,则调用containerd.Delete()
。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L198-L207
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
// TODO(mlaventure): we need to specify checkpoint options here
pid, err := daemon.containerd.Start(context.Background(), container.ID, checkpointDir,
container.StreamConfig.Stdin() != nil || container.Config.Tty,
container.InitializeStdio)
if err != nil {
if err := daemon.containerd.Delete(context.Background(), container.ID); err != nil {
logrus.WithError(err).WithField("container", container.ID).
Error("failed to delete failed start container")
}
return translateContainerdStartErr(container.Path, container.SetExitCode, err)
}
...
}
4.2.10 更新容器状态信息
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L209-L211
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
container.SetRunning(pid, true)
container.HasBeenStartedBefore = true
daemon.setStateCounter(container)
...
}
4.2.11 HealthCheck
根据容器或镜像中配置的HealthCheck,启动HealthMonitor。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L213
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
daemon.initHealthMonitor(container)
...
}
https://github.com/moby/moby/blob/v20.10.14/daemon/health.go#L297
func (daemon *Daemon) initHealthMonitor(c *container.Container) {
...
daemon.updateHealthMonitor(c)
}
https://github.com/moby/moby/blob/v20.10.14/daemon/health.go#L276
func (daemon *Daemon) updateHealthMonitor(c *container.Container) {
...
go monitor(daemon, c, stop, probe)
...
}
定时执行probe.run()
方法。
https://github.com/moby/moby/blob/v20.10.14/daemon/health.go#L207
func monitor(d *Daemon, c *container.Container, stop chan struct{}, probe probe) {
...
for {
...
select {
...
case <-intervalTimer.C:
...
go func() {
...
result, err := probe.run(ctx, d, c)
...
}()
...
}
}
}
封装、注册exec命令。
https://github.com/moby/moby/blob/v20.10.14/daemon/health.go#L65-L93
func (p *cmdProbe) run(ctx context.Context, d *Daemon, cntr *container.Container) (*types.HealthcheckResult, error) {
cmdSlice := strslice.StrSlice(cntr.Config.Healthcheck.Test)[1:]
if p.shell {
cmdSlice = append(getShell(cntr), cmdSlice...)
}
entrypoint, args := d.getEntrypointAndArgs(strslice.StrSlice{}, cmdSlice)
execConfig := exec.NewConfig()
execConfig.OpenStdin = false
execConfig.OpenStdout = true
execConfig.OpenStderr = true
execConfig.ContainerID = cntr.ID
execConfig.DetachKeys = []byte{}
execConfig.Entrypoint = entrypoint
execConfig.Args = args
execConfig.Tty = false
execConfig.Privileged = false
execConfig.User = cntr.Config.User
execConfig.WorkingDir = cntr.Config.WorkingDir
linkedEnv, err := d.setupLinkedContainers(cntr)
if err != nil {
return nil, err
}
execConfig.Env = container.ReplaceOrAppendEnvValues(cntr.CreateDaemonEnvironment(execConfig.Tty, linkedEnv), execConfig.Env)
d.registerExecCommand(cntr, execConfig)
attributes := map[string]string{
"execID": execConfig.ID,
}
...
}
调用ContainerExecStart
方法,返回执行结果。
https://github.com/moby/moby/blob/v20.10.14/daemon/health.go#L95-L113
func (p *cmdProbe) run(ctx context.Context, d *Daemon, cntr *container.Container) (*types.HealthcheckResult, error) {
...
output := &limitedBuffer{}
err = d.ContainerExecStart(ctx, execConfig.ID, nil, output, output)
if err != nil {
return nil, err
}
info, err := d.getExecConfig(execConfig.ID)
if err != nil {
return nil, err
}
if info.ExitCode == nil {
return nil, fmt.Errorf("healthcheck for container %s has no exit code", cntr.ID)
}
// Note: Go's json package will handle invalid UTF-8 for us
out := output.String()
return &types.HealthcheckResult{
End: time.Now(),
ExitCode: *info.ExitCode,
Output: out,
}, nil
}
Daemon.ContainerExecStart()
方法在docker exec流程中也会被调用, 本文仅分析本流程涉及的部分代码。
设置exec进程的参数。
https://github.com/moby/moby/blob/v20.10.14/daemon/exec.go#L154-L240
func (daemon *Daemon) ContainerExecStart(ctx context.Context, name string, stdin io.Reader, stdout io.Writer, stderr io.Writer) (err error) {
...
ec, err := daemon.getExecConfig(name)
...
p := &specs.Process{}
if runtime.GOOS != "windows" {
ctr, err := daemon.containerdCli.LoadContainer(ctx, ec.ContainerID)
if err != nil {
return err
}
spec, err := ctr.Spec(ctx)
if err != nil {
return err
}
p = spec.Process
}
p.Args = append([]string{ec.Entrypoint}, ec.Args...)
...
if err := daemon.execSetPlatformOpt(c, ec, p); err != nil {
return err
}
...
}
设置进程的安全选项。
https://github.com/moby/moby/blob/v20.10.14/daemon/exec_linux.go#L13
func (daemon *Daemon) execSetPlatformOpt(c *container.Container, ec *exec.Config, p *specs.Process) error {
if len(ec.User) > 0 {
var err error
p.User, err = getUser(c, ec.User)
if err != nil {
return err
}
}
if ec.Privileged {
p.Capabilities = &specs.LinuxCapabilities{
Bounding: caps.GetAllCapabilities(),
Permitted: caps.GetAllCapabilities(),
Effective: caps.GetAllCapabilities(),
}
}
if apparmor.IsEnabled() {
var appArmorProfile string
if c.AppArmorProfile != "" {
appArmorProfile = c.AppArmorProfile
} else if c.HostConfig.Privileged {
...
appArmorProfile = unconfinedAppArmorProfile
} else {
appArmorProfile = defaultAppArmorProfile
}
if appArmorProfile == defaultAppArmorProfile {
if err := ensureDefaultAppArmorProfile(); err != nil {
return err
}
}
p.ApparmorProfile = appArmorProfile
}
s := &specs.Spec{Process: p}
return WithRlimits(daemon, c)(context.Background(), nil, nil, s)
}
调用containerd.Exec()
方法, 交由containerd处理exec进程。如果超时,调用daemon.containerd.SignalProcess()
方法kill进程。
https://github.com/moby/moby/blob/v20.10.14/daemon/exec.go#L263-L303
func (daemon *Daemon) ContainerExecStart(ctx context.Context, name string, stdin io.Reader, stdout io.Writer, stderr io.Writer) (err error) {
...
systemPid, err := daemon.containerd.Exec(ctx, c.ID, ec.ID, p, cStdin != nil, ec.InitializeStdio)
...
select {
case <-ctx.Done():
...
daemon.containerd.SignalProcess(ctx, c.ID, name, int(signal.SignalMap["TERM"]))
timeout := time.NewTimer(termProcessTimeout)
...
select {
case <-timeout.C:
...
daemon.containerd.SignalProcess(ctx, c.ID, name, int(signal.SignalMap["KILL"]))
...
}
...
}
return nil
}
4.2.12 保存容器配置
将容器的当前配置和状态保存在文件和数据库中。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L215
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
if err := container.CheckpointTo(daemon.containersReplica); err != nil {
logrus.WithError(err).WithField("container", container.ID).
Errorf("failed to store container")
}
...
return nil
}
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L197
func (container *Container) CheckpointTo(store ViewDB) error {
deepCopy, err := container.toDisk()
if err != nil {
return err
}
return store.Save(deepCopy)
}
将容器状态和配置保存在/var/lib/docker/containers/{CONTAINER_ID}/config.v2.json
。
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L162
func (container *Container) toDisk() (*Container, error) {
var (
buf bytes.Buffer
deepCopy Container
)
pth, err := container.ConfigPath()
if err != nil {
return nil, err
}
// Save container settings
f, err := ioutils.NewAtomicFileWriter(pth, 0600)
if err != nil {
return nil, err
}
defer f.Close()
w := io.MultiWriter(&buf, f)
if err := json.NewEncoder(w).Encode(container); err != nil {
return nil, err
}
if err := json.NewDecoder(&buf).Decode(&deepCopy); err != nil {
return nil, err
}
deepCopy.HostConfig, err = container.WriteHostConfig()
if err != nil {
return nil, err
}
return &deepCopy, nil
}
将HostConfig另外保存在/var/lib/docker/containers/{CONTAINER_ID}/hostconfig.json
。
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L236
func (container *Container) WriteHostConfig() (*containertypes.HostConfig, error) {
var (
buf bytes.Buffer
deepCopy containertypes.HostConfig
)
pth, err := container.HostConfigPath()
if err != nil {
return nil, err
}
f, err := ioutils.NewAtomicFileWriter(pth, 0644)
if err != nil {
return nil, err
}
defer f.Close()
w := io.MultiWriter(&buf, f)
if err := json.NewEncoder(w).Encode(&container.HostConfig); err != nil {
return nil, err
}
if err := json.NewDecoder(&buf).Decode(&deepCopy); err != nil {
return nil, err
}
return &deepCopy, nil
}
将容器状态和配置保存到内存数据库中。
https://github.com/moby/moby/blob/v20.10.14/container/view.go#L154
func (db *memDB) Save(c *Container) error {
return db.withTxn(func(txn *memdb.Txn) error {
return txn.Insert(memdbContainersTable, c)
})
}
5. containerd
本文中,docker通过以下函数作为客户端调用了containerd api。下文将分析这些函数,直到跟踪到调用containerd api的具体接口地址。
5.1 daemon.containerd.Create
docker在容器启动时,调用daemon.containerd.Create
方法。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L181
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
...
}
调用containerd api前,添加一些适配选项。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L131-L139
func (c *client) Create(ctx context.Context, id string, ociSpec *specs.Spec, shim string, runtimeOptions interface{}, opts ...containerd.NewContainerOpts) error {
bdir := c.bundleDir(id)
...
newOpts := []containerd.NewContainerOpts{
containerd.WithSpec(ociSpec),
containerd.WithRuntime(shim, runtimeOptions),
WithBundle(bdir, ociSpec),
}
opts = append(opts, newOpts...)
...
}
将oci配置序列化。
func WithSpec(s *oci.Spec, opts ...oci.SpecOpts) NewContainerOpts {
return func(ctx context.Context, client *Client, c *containers.Container) error {
...
c.Spec, err = typeurl.MarshalAny(s)
return err
}
}
将runtime选项序列化。
func WithRuntime(name string, options interface{}) NewContainerOpts {
return func(ctx context.Context, client *Client, c *containers.Container) error {
...
if options != nil {
any, err = typeurl.MarshalAny(options)
if err != nil {
return err
}
}
c.Runtime = containers.RuntimeInfo{
Name: name,
Options: any,
}
return nil
}
}
创建bundle目录,默认路径为/var/run/docker/containerd/{CONTAINER_ID}
。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client_linux.go#L61
func WithBundle(bundleDir string, ociSpec *specs.Spec) containerd.NewContainerOpts {
return func(ctx context.Context, client *containerd.Client, c *containers.Container) error {
if c.Labels == nil {
c.Labels = make(map[string]string)
}
uid, gid := getSpecUser(ociSpec)
if uid == 0 && gid == 0 {
c.Labels[DockerContainerBundlePath] = bundleDir
return idtools.MkdirAllAndChownNew(bundleDir, 0755, idtools.Identity{UID: 0, GID: 0})
}
p := string(filepath.Separator)
components := strings.Split(bundleDir, string(filepath.Separator))
for _, d := range components[1:] {
p = filepath.Join(p, d)
fi, err := os.Stat(p)
if err != nil && !os.IsNotExist(err) {
return err
}
if os.IsNotExist(err) || fi.Mode()&1 == 0 {
p = fmt.Sprintf("%s.%d.%d", p, uid, gid)
if err := idtools.MkdirAndChown(p, 0700, idtools.Identity{UID: uid, GID: gid}); err != nil && !os.IsExist(err) {
return err
}
}
}
if c.Labels == nil {
c.Labels = make(map[string]string)
}
c.Labels[DockerContainerBundlePath] = p
return nil
}
}
调用containerd提供的客户端代码,创建containerd语境下的容器。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L141
func (c *client) Create(ctx context.Context, id string, ociSpec *specs.Spec, shim string, runtimeOptions interface{}, opts ...containerd.NewContainerOpts) error {
...
_, err := c.client.NewContainer(ctx, id, opts...)
...
}
应用docker创建的适配选项,调用containerd API创建容器。
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/client.go#L263
func (c *Client) NewContainer(ctx context.Context, id string, opts ...NewContainerOpts) (Container, error) {
...
container := containers.Container{
ID: id,
Runtime: containers.RuntimeInfo{
Name: c.runtime,
},
}
for _, o := range opts {
if err := o(ctx, c, &container); err != nil {
return nil, err
}
}
r, err := c.ContainerService().Create(ctx, container)
if err != nil {
return nil, err
}
return containerFromRecord(c, r), nil
}
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/client.go#L577
func (c *Client) ContainerService() containers.Store {
...
return NewRemoteContainerStore(containersapi.NewContainersClient(c.conn))
}
func (r *remoteContainers) Create(ctx context.Context, container containers.Container) (containers.Container, error) {
created, err := r.client.Create(ctx, &containersapi.CreateContainerRequest{
Container: containerToProto(&container),
})
...
}
调用containerd API,API接口为/containerd.services.containers.v1.Containers/Create
,API具体实现本文不再跟踪,将在分析containerd源码时详细介绍。
func (c *containersClient) Create(ctx context.Context, in *CreateContainerRequest, opts ...grpc.CallOption) (*CreateContainerResponse, error) {
out := new(CreateContainerResponse)
err := c.cc.Invoke(ctx, "/containerd.services.containers.v1.Containers/Create", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
5.2 daemon.containerd.DeleteTask
在调用daemon.containerd.Create()
时,可能因为上一次containerd退出时未清理容器,导致名称冲突错误,则分别调用daemon.containerd.DeleteTask()
,daemon.containerd.Delete()
, 然后再尝试一次daemon.containerd.Create()
。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L179-L195
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
if err != nil {
if errdefs.IsConflict(err) {
logrus.WithError(err).WithField("container", container.ID).Error("Container not cleaned up from containerd from previous run")
// best effort to clean up old container object
daemon.containerd.DeleteTask(ctx, container.ID)
if err := daemon.containerd.Delete(ctx, container.ID); err != nil && !errdefs.IsNotFound(err) {
logrus.WithError(err).WithField("container", container.ID).Error("Error cleaning up stale containerd container object")
}
err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
}
if err != nil {
return translateContainerdStartErr(container.Path, container.SetExitCode, err)
}
}
...
}
删除容器init进程。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L459
func (c *client) DeleteTask(ctx context.Context, containerID string) (uint32, time.Time, error) {
p, err := c.getProcess(ctx, containerID, libcontainerdtypes.InitProcessName)
...
status, err := p.Delete(ctx)
...
}
根据容器ID查询容器,返回容器当前进程。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L604
func (c *client) getProcess(ctx context.Context, containerID, processID string) (containerd.Process, error) {
ctr, err := c.getContainer(ctx, containerID)
if err != nil {
return nil, err
}
t, err := ctr.Task(ctx, nil)
if err != nil {
...
}
if processID == libcontainerdtypes.InitProcessName {
return t, nil
}
...
}
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L593
func (c *client) getContainer(ctx context.Context, id string) (containerd.Container, error) {
ctr, err := c.client.LoadContainer(ctx, id)
...
return ctr, nil
}
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/client.go#L289
func (c *Client) LoadContainer(ctx context.Context, id string) (Container, error) {
r, err := c.ContainerService().Get(ctx, id)
...
return containerFromRecord(c, r), nil
}
func (r *remoteContainers) Get(ctx context.Context, id string) (containers.Container, error) {
resp, err := r.client.Get(ctx, &containersapi.GetContainerRequest{
ID: id,
})
...
return containerFromProto(&resp.Container), nil
}
调用containerd API查询容器,API接口的地址为/containerd.services.containers.v1.Containers/Get
。
func (c *containersClient) Get(ctx context.Context, in *GetContainerRequest, opts ...grpc.CallOption) (*GetContainerResponse, error) {
out := new(GetContainerResponse)
err := c.cc.Invoke(ctx, "/containerd.services.containers.v1.Containers/Get", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
查询任务状态,要求任务状态必须是停止的。然后调用containerd API删除进程。
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/task.go#L296
func (t *task) Delete(ctx context.Context, opts ...ProcessDeleteOpts) (*ExitStatus, error) {
...
status, err := t.Status(ctx)
...
switch status.Status {
case Stopped, Unknown, "":
case Created:
...
fallthrough
default:
return nil, errors.Wrapf(errdefs.ErrFailedPrecondition, "task must be stopped before deletion: %s", status.Status)
}
...
r, err := t.client.TaskService().Delete(ctx, &tasks.DeleteTaskRequest{
ContainerID: t.id,
})
...
}
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/task.go#L258
func (t *task) Status(ctx context.Context) (Status, error) {
r, err := t.client.TaskService().Get(ctx, &tasks.GetRequest{
ContainerID: t.id,
})
...
}
查询任务状态的API接口地址为/containerd.services.tasks.v1.Tasks/Get
。
func (c *tasksClient) Get(ctx context.Context, in *GetRequest, opts ...grpc.CallOption) (*GetResponse, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/Get", in, out, opts...)
...
}
删除任务的API接口地址为/containerd.services.tasks.v1.Tasks/Delete
。
func (c *tasksClient) Delete(ctx context.Context, in *DeleteTaskRequest, opts ...grpc.CallOption) (*DeleteResponse, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/Delete", in, out, opts...)
...
}
5.3 daemon.containerd.Delete
在调用daemon.containerd.Create()
时,可能因为上一次containerd退出时未清理容器,导致名称冲突错误,则分别调用daemon.containerd.DeleteTask()
,daemon.containerd.Delete()
, 然后再尝试一次daemon.containerd.Create()
。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L179-L195
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
if err != nil {
if errdefs.IsConflict(err) {
logrus.WithError(err).WithField("container", container.ID).Error("Container not cleaned up from containerd from previous run")
// best effort to clean up old container object
daemon.containerd.DeleteTask(ctx, container.ID)
if err := daemon.containerd.Delete(ctx, container.ID); err != nil && !errdefs.IsNotFound(err) {
logrus.WithError(err).WithField("container", container.ID).Error("Error cleaning up stale containerd container object")
}
err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
}
if err != nil {
return translateContainerdStartErr(container.Path, container.SetExitCode, err)
}
}
...
}
也可能在调用daemon.containerd.Start
启动容器失败后,调用daemon.containerd.Delete
清理容器。
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L202
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
pid, err := daemon.containerd.Start(context.Background(), container.ID, checkpointDir,
container.StreamConfig.Stdin() != nil || container.Config.Tty,
container.InitializeStdio)
if err != nil {
if err := daemon.containerd.Delete(context.Background(), container.ID); err != nil {
...
}
...
}
...
}
根据容器ID查询容器,调用容器的Delete方法删除容器、删除bundle目录。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L472
func (c *client) Delete(ctx context.Context, containerID string) error {
ctr, err := c.getContainer(ctx, containerID)
if err != nil {
return err
}
labels, err := ctr.Labels(ctx)
if err != nil {
return err
}
bundle := labels[DockerContainerBundlePath]
if err := ctr.Delete(ctx); err != nil {
return wrapError(err)
}
c.oomMu.Lock()
delete(c.oom, containerID)
c.oomMu.Unlock()
c.v2runcoptionsMu.Lock()
delete(c.v2runcoptions, containerID)
c.v2runcoptionsMu.Unlock()
if os.Getenv("LIBCONTAINERD_NOCLEAN") != "1" {
if err := os.RemoveAll(bundle); err != nil {
c.logger.WithError(err).WithFields(logrus.Fields{
"container": containerID,
"bundle": bundle,
}).Error("failed to remove state dir")
}
}
return nil
}
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L482
func (c *container) Delete(ctx context.Context, opts ...DeleteOpts) error {
if _, err := c.loadTask(ctx, nil); err == nil {
return errors.Wrapf(errdefs.ErrFailedPrecondition, "cannot delete running task %v", c.id)
}
...
return c.client.ContainerService().Delete(ctx, c.id)
}
func (r *remoteContainers) Delete(ctx context.Context, id string) error {
_, err := r.client.Delete(ctx, &containersapi.DeleteContainerRequest{
ID: id,
})
...
}
删除容器的API接口地址为/containerd.services.containers.v1.Containers/Delete
。
func (c *containersClient) Delete(ctx context.Context, in *DeleteContainerRequest, opts ...grpc.CallOption) (*types.Empty, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.containers.v1.Containers/Delete", in, out, opts...)
...
}
5.4 daemon.containerd.Start
https://github.com/moby/moby/blob/v20.10.14/daemon/start.go#L198
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
...
pid, err := daemon.containerd.Start(context.Background(), container.ID, checkpointDir,
container.StreamConfig.Stdin() != nil || container.Config.Tty,
container.InitializeStdio)
...
}
5.4.1 checkpoint
如果指定了checkpoint,则将checkpoint目录下的文件打包。待容器启动后,再删除打包好的内容。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L169-L191
func (c *client) Start(ctx context.Context, id, checkpointDir string, withStdin bool, attachStdio libcontainerdtypes.StdioCallback) (int, error) {
...
if checkpointDir != "" {
// write checkpoint to the content store
tar := archive.Diff(ctx, "", checkpointDir)
cp, err = c.writeContent(ctx, images.MediaTypeContainerd1Checkpoint, checkpointDir, tar)
// remove the checkpoint when we're done
defer func() {
if cp != nil {
err := c.client.ContentStore().Delete(context.Background(), cp.Digest)
...
}
}()
...
}
...
taskOpts := []containerd.NewTaskOpts{
func(_ context.Context, _ *containerd.Client, info *containerd.TaskInfo) error {
info.Checkpoint = cp
return nil
},
}
...
}
Diff()
函数返回两个目录之间差别的tar包, 这里a目录实际为空,表示直接打包b目录的文件。
func Diff(ctx context.Context, a, b string) io.ReadCloser {
r, w := io.Pipe()
go func() {
err := WriteDiff(ctx, w, a, b)
...
}()
return r
}
func WriteDiff(ctx context.Context, w io.Writer, a, b string) error {
cw := newChangeWriter(w, b)
err := fs.Changes(ctx, a, b, cw.HandleChange)
...
}
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/continuity/fs/diff.go#L104
func Changes(ctx context.Context, a, b string, changeFn ChangeFunc) error {
if a == "" {
logrus.Debugf("Using single walk diff for %s", b)
return addDirChanges(ctx, changeFn, b)
}
...
}
遍历目录下的文件,执行changeFn
,将文件添加到tar包中。这里changeFn
就是changeWriter.HandleChange
方法。关于changeWriter.HandleChange
方法的具体实现,我们将在分析containerd/archive
时具体分析。
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/continuity/fs/diff.go#L114
func addDirChanges(ctx context.Context, changeFn ChangeFunc, root string) error {
return filepath.Walk(root, func(path string, f os.FileInfo, err error) error {
if err != nil {
return err
}
// Rebase path
path, err = filepath.Rel(root, path)
if err != nil {
return err
}
path = filepath.Join(string(os.PathSeparator), path)
// Skip root
if path == string(os.PathSeparator) {
return nil
}
return changeFn(ChangeKindAdd, path, f, nil)
})
}
将tar包写入ContentStore。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L892
func (c *client) writeContent(ctx context.Context, mediaType, ref string, r io.Reader) (*types.Descriptor, error) {
writer, err := c.client.ContentStore().Writer(ctx, content.WithRef(ref))
if err != nil {
return nil, err
}
defer writer.Close()
size, err := io.Copy(writer, r)
if err != nil {
return nil, err
}
labels := map[string]string{
"containerd.io/gc.root": time.Now().UTC().Format(time.RFC3339),
}
if err := writer.Commit(ctx, 0, "", content.WithLabels(labels)); err != nil {
return nil, err
}
return &types.Descriptor{
MediaType: mediaType,
Digest: writer.Digest(),
Size_: size,
}, nil
}
5.4.2 创建任务选项
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L193-L231
func (c *client) Start(ctx context.Context, id, checkpointDir string, withStdin bool, attachStdio libcontainerdtypes.StdioCallback) (int, error) {
ctr, err := c.getContainer(ctx, id)
...
spec, err := ctr.Spec(ctx)
...
uid, gid := getSpecUser(spec)
taskOpts := []containerd.NewTaskOpts{
func(_ context.Context, _ *containerd.Client, info *containerd.TaskInfo) error {
info.Checkpoint = cp
return nil
},
}
if runtime.GOOS != "windows" {
taskOpts = append(taskOpts, func(_ context.Context, _ *containerd.Client, info *containerd.TaskInfo) error {
c.v2runcoptionsMu.Lock()
opts, ok := c.v2runcoptions[id]
c.v2runcoptionsMu.Unlock()
if ok {
opts.IoUid = uint32(uid)
opts.IoGid = uint32(gid)
info.Options = &opts
} else {
info.Options = &runctypes.CreateOptions{
IoUid: uint32(uid),
IoGid: uint32(gid),
NoPivotRoot: os.Getenv("DOCKER_RAMDISK") != "",
}
}
return nil
})
} else {
...
}
...
}
5.4.3 创建任务
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L233-L241
func (c *client) Start(ctx context.Context, id, checkpointDir string, withStdin bool, attachStdio libcontainerdtypes.StdioCallback) (int, error) {
...
labels, err := ctr.Labels(ctx)
...
bundle := labels[DockerContainerBundlePath]
...
t, err = ctr.NewTask(ctx,
func(id string) (cio.IO, error) {
fifos := newFIFOSet(bundle, libcontainerdtypes.InitProcessName, withStdin, spec.Process.Terminal)
rio, err = c.createIO(fifos, id, libcontainerdtypes.InitProcessName, stdinCloseSync, attachStdio)
return rio, err
},
taskOpts...,
)
...
}
创建io,供将要创建的任务使用。
func (c *container) NewTask(ctx context.Context, ioCreate cio.Creator, opts ...NewTaskOpts) (_ Task, err error) {
i, err := ioCreate(c.id)
...
cfg := i.Config()
request := &tasks.CreateTaskRequest{
ContainerID: c.id,
Terminal: cfg.Terminal,
Stdin: cfg.Stdin,
Stdout: cfg.Stdout,
Stderr: cfg.Stderr,
}
...
}
设置io文件路径,路径分别为
/var/run/docker/containerd/{CONTAINER_ID}/init-stdout
/var/run/docker/containerd/{CONTAINER_ID}/init-stdin
/var/run/docker/containerd/{CONTAINER_ID}/init-stderr
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client_linux.go#L99
func newFIFOSet(bundleDir, processID string, withStdin, withTerminal bool) *cio.FIFOSet {
config := cio.Config{
Terminal: withTerminal,
Stdout: filepath.Join(bundleDir, processID+"-stdout"),
}
paths := []string{config.Stdout}
if withStdin {
config.Stdin = filepath.Join(bundleDir, processID+"-stdin")
paths = append(paths, config.Stdin)
}
if !withTerminal {
config.Stderr = filepath.Join(bundleDir, processID+"-stderr")
paths = append(paths, config.Stderr)
}
closer := func() error {
for _, path := range paths {
if err := os.RemoveAll(path); err != nil {
logrus.Warnf("libcontainerd: failed to remove fifo %v: %v", path, err)
}
}
return nil
}
return cio.NewFIFOSet(config, closer)
}
创建fifo,并将输出attach到stdout, stderr。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L631
func (c *client) createIO(fifos *cio.FIFOSet, containerID, processID string, stdinCloseSync chan struct{}, attachStdio libcontainerdtypes.StdioCallback) (cio.IO, error) {
...
io, err = c.newDirectIO(context.Background(), fifos)
...
if io.Stdin != nil {
...
pipe := io.Stdin
io.Stdin = ioutils.NewWriteCloserWrapper(pipe, func() error {
stdinOnce.Do(func() {
err = pipe.Close()
...
go func() {
<-stdinCloseSync
p, err := c.getProcess(context.Background(), containerID, processID)
if err == nil {
err = p.CloseIO(context.Background(), containerd.WithStdinCloser)
...
}
}()
})
return err
})
}
rio, err := attachStdio(io)
...
return rio, err
}
准备调用containerd api的参数。
func (c *container) NewTask(ctx context.Context, ioCreate cio.Creator, opts ...NewTaskOpts) (_ Task, err error) {
...
request := &tasks.CreateTaskRequest{
ContainerID: c.id,
Terminal: cfg.Terminal,
Stdin: cfg.Stdin,
Stdout: cfg.Stdout,
Stderr: cfg.Stderr,
}
r, err := c.get(ctx)
if err != nil {
return nil, err
}
if r.SnapshotKey != "" {
...
}
info := TaskInfo{
runtime: r.Runtime.Name,
}
for _, o := range opts {
if err := o(ctx, c.client, &info); err != nil {
return nil, err
}
}
if info.RootFS != nil {
...
}
if info.Options != nil {
any, err := typeurl.MarshalAny(info.Options)
if err != nil {
return nil, err
}
request.Options = any
}
...
if info.Checkpoint != nil {
request.Checkpoint = info.Checkpoint
}
...
}
创建任务
func (c *container) NewTask(ctx context.Context, ioCreate cio.Creator, opts ...NewTaskOpts) (_ Task, err error) {
...
t := &task{
client: c.client,
io: i,
id: c.id,
c: c,
}
response, err := c.client.TaskService().Create(ctx, request)
...
t.pid = response.Pid
return t, nil
}
调用containerd api,接口地址为/containerd.services.tasks.v1.Tasks/Create
。
func (c *tasksClient) Create(ctx context.Context, in *CreateTaskRequest, opts ...grpc.CallOption) (*CreateTaskResponse, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/Create", in, out, opts...)
...
}
5.4.4 启动任务
启动任务,如果失败则删除任务。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L254-L262
func (c *client) Start(ctx context.Context, id, checkpointDir string, withStdin bool, attachStdio libcontainerdtypes.StdioCallback) (int, error) {
...
if err := t.Start(ctx); err != nil {
if _, err := t.Delete(ctx); err != nil {
c.logger.WithError(err).WithField("container", id).
Error("failed to delete task after fail start")
}
return -1, wrapError(err)
}
return int(t.Pid()), nil
}
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/task.go#L209
func (t *task) Start(ctx context.Context) error {
r, err := t.client.TaskService().Start(ctx, &tasks.StartRequest{
ContainerID: t.id,
})
...
t.pid = r.Pid
return nil
}
启动任务的API接口地址为/containerd.services.tasks.v1.Tasks/Start
。
func (c *tasksClient) Start(ctx context.Context, in *StartRequest, opts ...grpc.CallOption) (*StartResponse, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/Start", in, out, opts...)
...
}
5.5 daemon.containerdCli.LoadContainer
在多处被使用,用于查询containerd容器对象。
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/client.go#L289
func (c *Client) LoadContainer(ctx context.Context, id string) (Container, error) {
r, err := c.ContainerService().Get(ctx, id)
...
}
func (r *remoteContainers) Get(ctx context.Context, id string) (containers.Container, error) {
resp, err := r.client.Get(ctx, &containersapi.GetContainerRequest{
ID: id,
})
...
}
对应的containerd api接口地址为/containerd.services.containers.v1.Containers/Get
。
func (c *containersClient) Get(ctx context.Context, in *GetContainerRequest, opts ...grpc.CallOption) (*GetContainerResponse, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.containers.v1.Containers/Get", in, out, opts...)
...
}
5.6 daemon.containerd.Exec
在Daemon.ContainerExecStart()
方法中调用containerd.Exec()
方法, 交由containerd处理exec进程。
https://github.com/moby/moby/blob/v20.10.14/daemon/exec.go#L263-L303
func (daemon *Daemon) ContainerExecStart(ctx context.Context, name string, stdin io.Reader, stdout io.Writer, stderr io.Writer) (err error) {
...
systemPid, err := daemon.containerd.Exec(ctx, c.ID, ec.ID, p, cStdin != nil, ec.InitializeStdio)
...
}
containerd客户端调用task.Exec()
来在shim端注册exec配置。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L273-L322
func (c *client) Exec(ctx context.Context, containerID, processID string, spec *specs.Process, withStdin bool, attachStdio libcontainerdtypes.StdioCallback) (int, error) {
ctr, err := c.getContainer(ctx, containerID)
...
t, err := ctr.Task(ctx, nil)
if err != nil {
...
}
...
labels, err := ctr.Labels(ctx)
...
fifos := newFIFOSet(labels[DockerContainerBundlePath], processID, withStdin, spec.Terminal)
...
p, err = t.Exec(ctx, processID, spec, func(id string) (cio.IO, error) {
rio, err = c.createIO(fifos, containerID, processID, stdinCloseSync, attachStdio)
return rio, err
})
...
}
创建exec任务。
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/task.go#L334
func (t *task) Exec(ctx context.Context, id string, spec *specs.Process, ioCreate cio.Creator) (_ Process, err error) {
...
i, err := ioCreate(id)
if err != nil {
return nil, err
}
defer func() {
if err != nil && i != nil {
i.Cancel()
i.Close()
}
}()
any, err := typeurl.MarshalAny(spec)
if err != nil {
return nil, err
}
cfg := i.Config()
request := &tasks.ExecProcessRequest{
ContainerID: t.id,
ExecID: id,
Terminal: cfg.Terminal,
Stdin: cfg.Stdin,
Stdout: cfg.Stdout,
Stderr: cfg.Stderr,
Spec: any,
}
if _, err := t.client.TaskService().Exec(ctx, request); err != nil {
i.Cancel()
i.Wait()
i.Close()
return nil, errdefs.FromGRPC(err)
}
return &process{
id: id,
task: t,
io: i,
}, nil
}
对应的API接口地址为/containerd.services.tasks.v1.Tasks/Exec
func (c *tasksClient) Exec(ctx context.Context, in *ExecProcessRequest, opts ...grpc.CallOption) (*types1.Empty, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/Exec", in, out, opts...)
...
}
调用process.Start
方法启动exec进程, 报错则删除进程。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L307
func (c *client) Exec(ctx context.Context, containerID, processID string, spec *specs.Process, withStdin bool, attachStdio libcontainerdtypes.StdioCallback) (int, error) {
...
if err = p.Start(ctx); err != nil {
...
p.Delete(ctx)
...
}
return int(p.Pid()), nil
}
process.Start
方法调用containerd api启动exec进程。
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/process.go#L117
func (p *process) Start(ctx context.Context) error {
r, err := p.task.client.TaskService().Start(ctx, &tasks.StartRequest{
ContainerID: p.task.id,
ExecID: p.id,
})
...
p.pid = r.Pid
return nil
}
对应的api接口地址为/containerd.services.tasks.v1.Tasks/Start
。
func (c *tasksClient) Start(ctx context.Context, in *StartRequest, opts ...grpc.CallOption) (*StartResponse, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/Start", in, out, opts...)
...
}
调用containerd api 删除进程。
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/process.go#L201
func (p *process) Delete(ctx context.Context, opts ...ProcessDeleteOpts) (*ExitStatus, error) {
...
status, err := p.Status(ctx)
...
switch status.Status {
case Running, Paused, Pausing:
return nil, errors.Wrapf(errdefs.ErrFailedPrecondition, "process must be stopped before deletion")
}
r, err := p.task.client.TaskService().DeleteProcess(ctx, &tasks.DeleteProcessRequest{
ContainerID: p.task.id,
ExecID: p.id,
})
...
}
删除进程的api接口地址为/containerd.services.tasks.v1.Tasks/DeleteProcess
。
func (c *tasksClient) DeleteProcess(ctx context.Context, in *DeleteProcessRequest, opts ...grpc.CallOption) (*DeleteResponse, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/DeleteProcess", in, out, opts...)
...
}
5.7 daemon.containerd.SignalProcess
调用containerd.Exec()
方法, 交由containerd处理exec进程。如果超时,调用daemon.containerd.SignalProcess()
方法kill进程。
https://github.com/moby/moby/blob/v20.10.14/daemon/exec.go#L263-L303
func (daemon *Daemon) ContainerExecStart(ctx context.Context, name string, stdin io.Reader, stdout io.Writer, stderr io.Writer) (err error) {
...
systemPid, err := daemon.containerd.Exec(ctx, c.ID, ec.ID, p, cStdin != nil, ec.InitializeStdio)
...
select {
case <-ctx.Done():
...
daemon.containerd.SignalProcess(ctx, c.ID, name, int(signal.SignalMap["TERM"]))
timeout := time.NewTimer(termProcessTimeout)
...
select {
case <-timeout.C:
...
daemon.containerd.SignalProcess(ctx, c.ID, name, int(signal.SignalMap["KILL"]))
...
}
...
}
return nil
}
根据容器ID和进程ID,获取进程对象,调用process.Kill
方法杀死进程。
https://github.com/moby/moby/blob/v20.10.14/libcontainerd/remote/client.go#L336
func (c *client) SignalProcess(ctx context.Context, containerID, processID string, signal int) error {
p, err := c.getProcess(ctx, containerID, processID)
...
return wrapError(p.Kill(ctx, syscall.Signal(signal)))
}
process.Kill
方法调用containerd api
https://github.com/moby/moby/blob/v20.10.14/vendor/github.com/containerd/containerd/process.go#L134
func (p *process) Kill(ctx context.Context, s syscall.Signal, opts ...KillOpts) error {
var i KillInfo
...
_, err := p.task.client.TaskService().Kill(ctx, &tasks.KillRequest{
Signal: uint32(s),
ContainerID: p.task.id,
ExecID: p.id,
All: i.All,
})
return errdefs.FromGRPC(err)
}
containerd杀死进程的api接口地址为/containerd.services.tasks.v1.Tasks/Kill
func (c *tasksClient) Kill(ctx context.Context, in *KillRequest, opts ...grpc.CallOption) (*types1.Empty, error) {
...
err := c.cc.Invoke(ctx, "/containerd.services.tasks.v1.Tasks/Kill", in, out, opts...)
...
}