tags: 源码分析,container

docker-tar,docker-untar源码分析

本文编写时,最新release为v20.10.6, 因此代码均为v20.10.6分支的代码

1. 简介

docker-tar和docker-untar是dockerd的两个“子命令”,是docker cp命令在容器内的延伸,分别提供在容器内打包、解包的能力。

2. 使用

2.1 代码使用

参考 docker reexec源码分析

2.2 二进制使用

除了通过以类似fork的形式在代码中使用,docker-tar也可以通过二进制的形式使用,但这种形式不常见。这种使用方式的价值在于能方便得直接进入docker-tar的逻辑,在分析docker-tar实现、或研究相关的安全问题时,比较方便、直观。

将dockerd复制一份,得到docker-tar的二进制文件,两者使用相同的二进制文件,通过执行时的不同的cmdline,进入不同的代码逻辑。

cp /usr/bin/dockerd /usr/bin/docker-tar

docker-tar的stdin是json格式的配置选项,arg1是需要打包的路径,arg2是chroot的目录,stdout会输出打包好的文件。

echo "{}" > /tmp/json
mkdir /tmp/test
cp /etc/hosts /tmp/test/
docker-tar < /tmp/json /test/hosts /tmp/ > /tmp/test.tar

docker-tar具体接收的配置选项参见: https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L37

TarOptions struct {
    IncludeFiles     []string
    ExcludePatterns  []string
    Compression      Compression
    NoLchown         bool
    UIDMaps          []idtools.IDMap
    GIDMaps          []idtools.IDMap
    ChownOpts        *idtools.Identity
    IncludeSourceDir bool
    // WhiteoutFormat is the expected on disk format for whiteout files.
    // This format will be converted to the standard format on pack
    // and from the standard format on unpack.
    WhiteoutFormat WhiteoutFormat
    // When unpacking, specifies whether overwriting a directory with a
    // non-directory is allowed and vice versa.
    NoOverwriteDirNonDir bool
    // For each include when creating an archive, the included name will be
    // replaced with the matching name from this map.
    RebaseNames map[string]string
    InUserNS    bool
}

docker-untar的使用类似, 通过stdin传入打包后的文件,从fd3读取json格式的配置选项,arg1是解压到的目录,arg2是chroot的目录。

cp /usr/bin/dockerd /usr/bin/docker-untar
echo "{}" > /tmp/json
mkdir -p /tmp/test
docker-untar </tmp/test.tar  3</tmp/json /test/ /tmp/

其中,docker-untar具体接收的配置选项与docker-tar相同。

3. 实现分析

docker-tar,docker-untar的前段的调用流程、代码,我们已在docker reexec源码分析一文中分析过。下面,我们直接从docker-tar,docker-untar的具体实现展开。

3.1 docker-tar

tar()函数是docker-tar的起始入口。实现了参数解析、chroot、打包、输出四部分功能,下文详细展开。

https://github.com/moby/moby/blob/v20.10.6/pkg/chrootarchive/archive_unix.go#L121

func tar() {
    runtime.LockOSThread()
    flag.Parse()

    src := flag.Arg(0)
    var root string
    if len(flag.Args()) > 1 {
        root = flag.Arg(1)
    }

    if root == "" {
        root = src
    }

    if err := realChroot(root); err != nil {
        fatal(err)
    }

    var options archive.TarOptions
    if err := json.NewDecoder(os.Stdin).Decode(&options); err != nil {
        fatal(err)
    }

    rdr, err := archive.TarWithOptions(src, &options)
    if err != nil {
        fatal(err)
    }
    defer rdr.Close()

    if _, err := io.Copy(os.Stdout, rdr); err != nil {
        fatal(err)
    }

    os.Exit(0)
}

3.1.1 参数解析

涉及参数解析的代码如下,其中

  • arg1为src,和配置选项一起传递给archive.TarWithOptions函数,用于具体打包。src指要打包的路径。
  • arg2为root,在开始打包前,会先chroot到该路径,以防止相关安全问题。如果没有传入arg2, 则root的取值为src。
  • stdin为打包的配置选项
  • 打包完毕后,会输出到stdout
func tar() {
    ...
    
    flag.Parse()

    src := flag.Arg(0)
    var root string
    if len(flag.Args()) > 1 {
        root = flag.Arg(1)
    }

    if root == "" {
        root = src
    }
    
    ...

    var options archive.TarOptions
    if err := json.NewDecoder(os.Stdin).Decode(&options); err != nil {
        fatal(err)
    }

    ...
}

3.1.2 chroot

chroot调用的函数名为realChroot, 是为了与docker自己实现的privot_root区别开,实际调用的是传统的chroot。

https://github.com/moby/moby/blob/v20.10.6/pkg/chrootarchive/archive_unix.go#L135-L137

if err := realChroot(root); err != nil {
    fatal(err)
}

https://github.com/moby/moby/blob/v20.10.6/pkg/chrootarchive/chroot_linux.go#L106

func realChroot(path string) error {
    if err := unix.Chroot(path); err != nil {
        return fmt.Errorf("Error after fallback to chroot: %v", err)
    }
    if err := unix.Chdir("/"); err != nil {
        return fmt.Errorf("Error changing to new root after chroot: %v", err)
    }
    return nil
}

3.1.3 打包

https://github.com/moby/moby/blob/v20.10.6/pkg/chrootarchive/archive_unix.go#L144

func tar() {
    ...
    rdr, err := archive.TarWithOptions(src, &options)
    ...
}
3.1.3.1 pkg/archive/archive.go/TarWithOptions

调用archive库的TarWithOptions函数完成打包 https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L724

3.1.3.2 pkg/archive/archive.go/TarWithOptions: 解决windows长路径问题

如果是windows平台,可能会有路径过长的问题。对于linux平台,不作任何修改。

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L728

func TarWithOptions(srcPath string, options *TarOptions) (io.ReadCloser, error) {

	// Fix the source path to work with long path names. This is a no-op
	// on platforms other than Windows.
	srcPath = fixVolumePathPrefix(srcPath)

参考windows官方文档,我们知道windows有专门的处理长路径的表示方式

https://stackoverflow.com/questions/21194530/what-does-mean-when-prepended-to-a-file-path

https://docs.microsoft.com/zh-cn/windows/win32/fileio/naming-a-file?redirectedfrom=MSDN#short-vs-long-names

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive_windows.go#L14

func fixVolumePathPrefix(srcPath string) string {
	return longpath.AddPrefix(srcPath)
}

const Prefix = `\\?\`

func AddPrefix(path string) string {
	if !strings.HasPrefix(path, Prefix) {
		if strings.HasPrefix(path, `\\`) {
			// This is a UNC path, so we need to add 'UNC' to the path as well.
			path = Prefix + `UNC` + path[1:]
		} else {
			path = Prefix + path
		}
	}
	return path
}
3.1.3.3 pkg/archive/archive.go/TarWithOptions: 使用ExcludePattern跳过部分文件

如果打包的配置选项中有options.ExcludePatterns, 则在打包时会排除掉这些文件。(docker-tar不涉及)。 https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L730

func TarWithOptions(srcPath string, options *TarOptions) (io.ReadCloser, error) {
    ...
    pm, err := fileutils.NewPatternMatcher(options.ExcludePatterns)
    ...
    for _, include := range options.IncludeFiles {
        ...
        skip, err = pm.Matches(relFilePath)
        ...
    }
    if skip {
        ...
        if !pm.Exclusions() {
            return filepath.SkipDir
        }
        ...
        for _, pat := range pm.Patterns() {
            if !pat.Exclusion() {
                continue
            }
            ...
        }
    ...
3.1.3.4 pkg/archive/archive.go/TarWithOptions: 输入输出和压缩

创建一组pipe读写流(pipe的读和写是一对一匹配的,是一个在内存中的同步的管道),下面会向pipeWriter写入tar包,然后把Reader传出去由其他逻辑读取: https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L735

func TarWithOptions(srcPath string, options *TarOptions) (io.ReadCloser, error) {
    ...
    pipeReader, pipeWriter := io.Pipe()
    ...

如果配置了压缩选项,则输出时进行压缩(docker-tar不压缩,即不涉及)

func TarWithOptions(srcPath string, options *TarOptions) (io.ReadCloser, error) {
    ...
    compressWriter, err := CompressStream(pipeWriter, options.Compression)
    ...

压缩的格式有不压缩、Gzip两种

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L234

func CompressStream(dest io.Writer, compression Compression) (io.WriteCloser, error) {
	p := pools.BufioWriter32KPool
	buf := p.Get(dest)
	switch compression {
	case Uncompressed:
		writeBufWrapper := p.NewWriteCloserWrapper(buf, buf)
		return writeBufWrapper, nil
	case Gzip:
		gzWriter := gzip.NewWriter(dest)
		writeBufWrapper := p.NewWriteCloserWrapper(buf, gzWriter)
		return writeBufWrapper, nil
	case Bzip2, Xz:
		return nil, fmt.Errorf("Unsupported compression format %s", (&compression).Extension())
	default:
		return nil, fmt.Errorf("Unsupported compression format %s", (&compression).Extension())
	}
}
3.1.3.5 pkg/archive/archive.go/TarWithOptions: 处理whiteout转换

在使用ns时处理Whiteout,具体来说就是

  • 如果配置了OverlayWhiteoutFormat,并且没有使用user ns, 则后续会处理Whiteout。
  • 如果配置了OverlayWhiteoutFormat, 而且使用了user ns, 则报错。

第二种情况是在20.10才实现的,因为overlay2不再支持 在user ns中使用OverlayWhiteout了,所以可以直接报错。在这之前,在user ns中处理Whiteout有一个相当复杂的过程(详见 https://github.com/moby/moby/commit/c1e7924f7cb85f1ee0ad168eb9d6b74790ef4b65 )。

这里提到的whiteout是一种文件格式,是为了隐藏下层分支的空白文件,详见: https://github.com/opencontainers/image-spec/blob/master/layer.md#whiteouts http://aufs.sourceforge.net/aufs.html https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L742

func TarWithOptions(srcPath string, options *TarOptions) (io.ReadCloser, error) {
    ...
    whiteoutConverter, err := getWhiteoutConverter(options.WhiteoutFormat, options.InUserNS)

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive_linux.go#L14

func getWhiteoutConverter(format WhiteoutFormat, inUserNS bool) (tarWhiteoutConverter, error) {
	if format == OverlayWhiteoutFormat {
		if inUserNS {
			return nil, errors.New("specifying OverlayWhiteoutFormat is not allowed in userns")
		}
		return overlayWhiteoutConverter{}, nil
	}
	return nil, nil
}
3.1.3.6 pkg/archive/archive.go/TarWithOptions: 生成TarAppender

剩下的操作在一个goroutine中执行,这样整个函数不会阻塞在这里,可以继续执行其他逻辑:

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L747

func TarWithOptions(srcPath string, options *TarOptions) (io.ReadCloser, error) {
    ...
    go func() {
    ...
    }()

    return pipeReader, nil
}

生成TarAppender,配置选项中的UIDMaps和GIDMaps和ChownOpts会在这里用到(docker-tar不涉及) https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L748-L753

ta := newTarAppender(
    idtools.NewIDMappingsFromMaps(options.UIDMaps, options.GIDMaps),
    compressWriter,
    options.ChownOpts,
)
ta.WhiteoutConverter = whiteoutConverter

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L434

func newTarAppender(idMapping *idtools.IdentityMapping, writer io.Writer, chownOpts *idtools.Identity) *tarAppender {
	return &tarAppender{
		SeenFiles:       make(map[uint64]string),
		TarWriter:       tar.NewWriter(writer),
		Buffer:          pools.BufioWriter32KPool.Get(nil),
		IdentityMapping: idMapping,
		ChownOpts:       chownOpts,
	}
}
3.1.3.7 pkg/archive/archive.go/TarWithOptions: 遍历待打包路径

检查srcPath是否为目录,如果不是,要取其上级目录 https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L776-L793

stat, err := os.Lstat(srcPath)
if err != nil {
    return
}

if !stat.IsDir() {
    // We can't later join a non-dir with any includes because the
    // 'walk' will error if "file/." is stat-ed and "file" is not a
    // directory. So, we must split the source path and use the
    // basename as the include.
    if len(options.IncludeFiles) > 0 {
        logrus.Warn("Tar: Can't archive a file with includes")
    }

    dir, base := SplitPathDirEntry(srcPath)
    srcPath = dir
    options.IncludeFiles = []string{base}
}

下面是进入到一个循环,对include的每一个路径做遍历 https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L801

seen := make(map[string]bool)

for _, include := range options.IncludeFiles {
    rebaseName := options.RebaseNames[include]

    walkRoot := getWalkRoot(srcPath, include)
    filepath.Walk(walkRoot, func(filePath string, f os.FileInfo, err error) error {
    ...
    })
}

对遍历到的每一个路径,执行这样的一个函数。我们省略了一些判断是否跳过路径的逻辑。

在打包时,写入到tar包中的文件路径会取决于rebaseName选项做替换。 https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L805-L895

func(filePath string, f os.FileInfo, err error) error {
    ...

    // Rename the base resource.
    if rebaseName != "" {
        var replacement string
        if rebaseName != string(filepath.Separator) {
            // Special case the root directory to replace with an
            // empty string instead so that we don't end up with
            // double slashes in the paths.
            replacement = rebaseName
        }

        relFilePath = strings.Replace(relFilePath, include, replacement, 1)
    }

    if err := ta.addTarFile(filePath, relFilePath); err != nil {
        logrus.Errorf("Can't add file %s to tar: %s", filePath, err)
        // if pipe is broken, stop writing tar stream to it
        if err == io.ErrClosedPipe {
            return err
        }
    }
    return nil
}
3.1.3.8 pkg/archive/archive.go/addTarFile: 处理软链接

具体添加文件到tar包,是执行的addTarfile函数 https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L457

如果该文件是软链接,则取其链接的地址

func (ta *tarAppender) addTarFile(path, name string) error {
    fi, err := os.Lstat(path)
    if err != nil {
        return err
    }

    var link string
    if fi.Mode()&os.ModeSymlink != 0 {
        var err error
        link, err = os.Readlink(path)
        if err != nil {
            return err
        }
    }
    ...
}
3.1.3.9 pkg/archive/archive.go/addTarFile: 从FileInfo中组装Header

读取路径的文件头信息 https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L472

func (ta *tarAppender) addTarFile(path, name string) error {
    ...
    hdr, err := FileInfoHeader(name, fi, link)
    ...

FileInfoHeader函数是创建一些常用的文件头信息,这些信息主要是go官方库archive/tar的FileInfoHeader提供的,然后做了一些简单的处理。但是在19.03.8增加了vendor/archive/tar, 替换了官方库(https://github.com/moby/moby/commit/aa6a9891b09cce3d9004121294301a30d45d998d#diff-630ba09448af522154f38ef7685ef1f44b0f3e9430f80829a03ce24f400f3754)。

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L360-L375

func FileInfoHeader(name string, fi os.FileInfo, link string) (*tar.Header, error) {
	hdr, err := tar.FileInfoHeader(fi, link)
	if err != nil {
		return nil, err
	}
	hdr.Format = tar.FormatPAX
	hdr.ModTime = hdr.ModTime.Truncate(time.Second)
	hdr.AccessTime = time.Time{}
	hdr.ChangeTime = time.Time{}
	hdr.Mode = fillGo18FileTypeBits(int64(chmodTarEntry(os.FileMode(hdr.Mode))), fi)
	hdr.Name = canonicalTarName(name, fi.IsDir())
	if err := setHeaderForSpecialDevice(hdr, name, fi.Sys()); err != nil {
		return nil, err
	}
	return hdr, nil
}

因此,archive/tar在19.03.8(不含)之前,使用的是go的官方库archive/tar; 在19.03.8及之后的版本使用的是docker修改并复制到vendor/archive/tar的代码。

3.1.3.10 go/src/archive/tar/common.go: 从FileInfo中组装Header

从FileInfo中获取Name,ModTime,Perm等等信息 https://github.com/golang/go/blob/go1.16/src/archive/tar/common.go#L629

如果实现了sysStat方法,就调用这个方法。 https://github.com/golang/go/blob/go1.16/src/archive/tar/common.go#L701-L703

if sysStat != nil {
    return h, sysStat(fi, h)
}

sysStat是一个函数的声明,如果不实现它,它就会是nil https://github.com/golang/go/blob/go1.16/src/archive/tar/common.go#L602

var sysStat func(fi fs.FileInfo, h *Header) error

unix系统会在init时,实现sysStat。 https://github.com/golang/go/blob/go1.16/src/archive/tar/stat_unix.go#L18-L101

func init() {
    sysStat = statUnix
}

func statUnix(fi fs.FileInfo, h *Header) error {
    ...
3.1.3.11 go/src/archive/tar/stat_unix.go: 设置username,groupname和Devmajor,Devminor的值

statUnix的功能主要是设置username,groupname和Devmajor,Devminor的值。

设置username,groupname时,优先使用缓存,如果没有缓存,则会去/etc/passwd, /etc/group文件中去查找。 https://github.com/golang/go/blob/go1.16/src/archive/tar/stat_unix.go#L37-L48

if u, ok := userMap.Load(h.Uid); ok {
    h.Uname = u.(string)
} else if u, err := user.LookupId(strconv.Itoa(h.Uid)); err == nil {
    h.Uname = u.Username
    userMap.Store(h.Uid, h.Uname)
}
if g, ok := groupMap.Load(h.Gid); ok {
    h.Gname = g.(string)
} else if g, err := user.LookupGroupId(strconv.Itoa(h.Gid)); err == nil {
    h.Gname = g.Name
    groupMap.Store(h.Gid, h.Gname)
}

查找user,group是由go的os/user库提供的。先尝试通过当前用户获取用户信息,如果无法获知当前用户,或当前用户与目标uid不一致,则调用lookupUserId函数实现。 https://github.com/golang/go/blob/go1.16/src/os/user/lookup.go#L41

func LookupId(uid string) (*User, error) {
	if u, err := Current(); err == nil && u.Uid == uid {
		return u, err
	}
	return lookupUserId(uid)
}

Current函数会调用current函数,并在第一次调用时缓存。 https://github.com/golang/go/blob/go1.16/src/os/user/lookup.go#L14

func Current() (*User, error) {
	cache.Do(func() { cache.u, cache.err = current() })
	if cache.err != nil {
		return nil, cache.err
	}
	u := *cache.u // copy
	return &u, nil
}
3.1.3.12 go/src/os/user/ 查找当前用户

current函数的实现与编译器强相关。如果在编译时使用了cgo,编译器会编译os/user/cgo_lookup_unix.go中的相关代码,该文件在头部使用注释的形式声明了编译器类型;如果在编译时未使用cgo,会使用os/user/lookup_stubs.go中的代码。

这个变化,是在 https://github.com/golang/go/issues/23265 中产生的。目的是解决之前版本中加载libnss相关so库,导致静态编译被打破的问题。

依据编译器不同编译不同函数,是通过go语言的build tag特性实现的。

os/user/cgo_lookup_unix.go中的build tags如下

(aix or darwin or dragonfly or freebsd or ((not android) and linux) or netbsd or openbsd or solaris) and (cgo and not osusergo)

即要求必须使用cgo编译器,不能指定osusergo这个标签。

https://github.com/golang/go/blob/go1.16/src/os/user/cgo_lookup_unix.go#L5-L6

// +build aix darwin dragonfly freebsd !android,linux netbsd openbsd solaris
// +build cgo,!osusergo

os/user/lookup_stubs.go中的build tags如下

((not cgo) and (not windows) and (not plan9)) or android or (osusergo and (not windows) and (not plan9))

可以粗略理解为未使用cgo时,会编译以下代码

https://github.com/golang/go/blob/go1.16/src/os/user/lookup_stubs.go#L5

// +build !cgo,!windows,!plan9 android osusergo,!windows,!plan9

关于build tag参见: https://golang.org/cmd/go/#hdr-Build_constraints https://www.digitalocean.com/community/tutorials/customizing-go-binaries-with-build-tags

3.1.3.13 go/src/os/user/cgo_lookup_unix.go 查找当前用户(cgo版)

cgo版本的current函数在实现具体功能时调用了nss中的getpwuid_r函数,然后将其拼装成go中的User结构体。即,在调用该函数时,是需要动态得加载nss相关的so库的。

https://github.com/golang/go/blob/go1.16/src/os/user/cgo_lookup_unix.go#L48

// +build aix darwin dragonfly freebsd !android,linux netbsd openbsd solaris
// +build cgo,!osusergo

package user

...

/*
#cgo solaris CFLAGS: -D_POSIX_PTHREAD_SEMANTICS
#include <unistd.h>
#include <sys/types.h>
#include <pwd.h>
#include <grp.h>
#include <stdlib.h>
static int mygetpwuid_r(int uid, struct passwd *pwd,
	char *buf, size_t buflen, struct passwd **result) {
	return getpwuid_r(uid, pwd, buf, buflen, result);
}
static int mygetpwnam_r(const char *name, struct passwd *pwd,
	char *buf, size_t buflen, struct passwd **result) {
	return getpwnam_r(name, pwd, buf, buflen, result);
}
static int mygetgrgid_r(int gid, struct group *grp,
	char *buf, size_t buflen, struct group **result) {
 return getgrgid_r(gid, grp, buf, buflen, result);
}
static int mygetgrnam_r(const char *name, struct group *grp,
	char *buf, size_t buflen, struct group **result) {
 return getgrnam_r(name, grp, buf, buflen, result);
}
*/
import "C"

func current() (*User, error) {
	return lookupUnixUid(syscall.Getuid())
}

func lookupUnixUid(uid int) (*User, error) {
	var pwd C.struct_passwd
	var result *C.struct_passwd

	buf := alloc(userBuffer)
	defer buf.free()

	err := retryWithBuffer(buf, func() syscall.Errno {
		// mygetpwuid_r is a wrapper around getpwuid_r to avoid using uid_t
		// because C.uid_t(uid) for unknown reasons doesn't work on linux.
		return syscall.Errno(C.mygetpwuid_r(C.int(uid),
			&pwd,
			(*C.char)(buf.ptr),
			C.size_t(buf.size),
			&result))
	})
	if err != nil {
		return nil, fmt.Errorf("user: lookup userid %d: %v", uid, err)
	}
	if result == nil {
		return nil, UnknownUserIdError(uid)
	}
	return buildUser(&pwd), nil
}

func buildUser(pwd *C.struct_passwd) *User {
	u := &User{
		Uid:      strconv.FormatUint(uint64(pwd.pw_uid), 10),
		Gid:      strconv.FormatUint(uint64(pwd.pw_gid), 10),
		Username: C.GoString(pwd.pw_name),
		Name:     C.GoString(pwd.pw_gecos),
		HomeDir:  C.GoString(pwd.pw_dir),
	}
	// The pw_gecos field isn't quite standardized. Some docs
	// say: "It is expected to be a comma separated list of
	// personal data where the first item is the full name of the
	// user."
	if i := strings.Index(u.Name, ","); i >= 0 {
		u.Name = u.Name[:i]
	}
	return u
}
3.1.3.14 go/src/os/user/lookup_stubs.go 查找当前用户(纯go版)

以上介绍的是使用cgo的情况,如果使用纯go编译器,走的是不同的分支。

纯go编译器下,通过直接读取/etc/passwd文件实现。

https://github.com/golang/go/blob/go1.16/src/os/user/lookup_stubs.go#L21

func current() (*User, error) {
    uid := currentUID()
    // $USER and /etc/passwd may disagree; prefer the latter if we can get it.
    // See issue 27524 for more information.
    u, err := lookupUserId(uid)
    if err == nil {
        return u, nil
    }

    homeDir, _ := os.UserHomeDir()
    ...
}

https://github.com/golang/go/blob/go1.16/src/os/user/lookup_unix.go#L190

const userFile = "/etc/passwd"
...
func lookupUserId(uid string) (*User, error) {
    f, err := os.Open(userFile)
    if err != nil {
        return nil, err
    }
    defer f.Close()
    return findUserId(uid, f)
}

3.1.3.15 go/src/os/user/cgo_lookup_unix.go lookupUserId (cgo版)

如果Current函数报错或找到的user与目的userId不一致,则调用lookupUserId函数, 与current类似,这个函数也对cgo和纯go编译器有不同的实现。

详细的实现和current的逻辑差不多,本文不详细分析。 https://github.com/golang/go/blob/go1.16/src/os/user/cgo_lookup_unix.go#L81-L112

func lookupUserId(uid string) (*User, error) {
	i, e := strconv.Atoi(uid)
	if e != nil {
		return nil, e
	}
	return lookupUnixUid(i)
}

func lookupUnixUid(uid int) (*User, error) {
	var pwd C.struct_passwd
	var result *C.struct_passwd

	buf := alloc(userBuffer)
	defer buf.free()

	err := retryWithBuffer(buf, func() syscall.Errno {
		// mygetpwuid_r is a wrapper around getpwuid_r to avoid using uid_t
		// because C.uid_t(uid) for unknown reasons doesn't work on linux.
		return syscall.Errno(C.mygetpwuid_r(C.int(uid),
			&pwd,
			(*C.char)(buf.ptr),
			C.size_t(buf.size),
			&result))
	})
	if err != nil {
		return nil, fmt.Errorf("user: lookup userid %d: %v", uid, err)
	}
	if result == nil {
		return nil, UnknownUserIdError(uid)
	}
	return buildUser(&pwd), nil
}
3.1.3.16 go/src/os/user/lookup_unix.go lookupUserId (纯go版)

与current函数中调用到的函数一致 https://github.com/golang/go/blob/go1.16/src/os/user/lookup_unix.go#L190

3.1.3.17 vendor/archive/tar/stat_unix.go

上文我们跟进了go官方库的实现,但是我们也提到了docker在19.03.8及之后的版本使用的是docker修改并复制到vendor/archive/tar的代码,下面我们会对这一部分的代码的主要区别做分析。

与golang官方库相比,vendor下的代码删去了获取username,groupname的过程。

https://github.com/moby/moby/blob/v20.10.6/vendor/archive/tar/stat_unix.go#L19

以下代码在vendor下不存在

if u, ok := userMap.Load(h.Uid); ok {
    h.Uname = u.(string)
} else if u, err := user.LookupId(strconv.Itoa(h.Uid)); err == nil {
    h.Uname = u.Username
    userMap.Store(h.Uid, h.Uname)
}
if g, ok := groupMap.Load(h.Gid); ok {
    h.Gname = g.(string)
} else if g, err := user.LookupGroupId(strconv.Itoa(h.Gid)); err == nil {
    h.Gname = g.Name
    groupMap.Store(h.Gid, h.Gname)
}
3.1.3.18 pkg/archive/archive.go/addTarFile: 读取文件capability添加至Tar文件头

前面已经读取了大部分文件头信息,这一部分添加文件的capability信息

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L476

func (ta *tarAppender) addTarFile(path, name string) error {
    ...
    hdr, err := FileInfoHeader(name, fi, link)
    if err != nil {
        return err
    }
    if err := ReadSecurityXattrToTarHeader(path, hdr); err != nil {
        return err
    }
    ...

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L404

func ReadSecurityXattrToTarHeader(path string, hdr *tar.Header) error {
    capability, _ := system.Lgetxattr(path, "security.capability")
    if capability != nil {
        hdr.Xattrs = make(map[string]string)
        hdr.Xattrs["security.capability"] = string(capability)
    }
    return nil
}

这个功能的实现是通过Lgetxattr系统调用完成的。 https://github.com/moby/moby/blob/v20.10.6/pkg/system/xattrs_linux.go#L8

func Lgetxattr(path string, attr string) ([]byte, error) {
    // Start with a 128 length byte array
    dest := make([]byte, 128)
    sz, errno := unix.Lgetxattr(path, attr, dest)

    for errno == unix.ERANGE {
        // Buffer too small, use zero-sized buffer to get the actual size
        sz, errno = unix.Lgetxattr(path, attr, []byte{})
        if errno != nil {
            return nil, errno
        }
        dest = make([]byte, sz)
        sz, errno = unix.Lgetxattr(path, attr, dest)
    }

    switch {
    case errno == unix.ENODATA:
        return nil, nil
    case errno != nil:
        return nil, errno
    }

    return dest[:sz], nil
}
3.1.3.19 pkg/archive/archive.go/addTarFile: 处理硬链接

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L482-L496

func (ta *tarAppender) addTarFile(path, name string) error {
    ...
    if !fi.IsDir() && hasHardlinks(fi) {
        inode, err := getInodeFromStat(fi.Sys())
        if err != nil {
            return err
        }
        // a link should have a name that it links too
        // and that linked name should be first in the tar archive
        if oldpath, ok := ta.SeenFiles[inode]; ok {
            hdr.Typeflag = tar.TypeLink
            hdr.Linkname = oldpath
            hdr.Size = 0 // This Must be here for the writer math to add up!
        } else {
            ta.SeenFiles[inode] = name
        }
    }
    ...
3.1.3.20 pkg/archive/archive.go/addTarFile: 处理uid,gid

处理容器和宿主机文件的uid,gid映射,如果是Whiteout文件则不涉及。

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L500-L520

func (ta *tarAppender) addTarFile(path, name string) error {
    ...
    isOverlayWhiteout := fi.Mode()&os.ModeCharDevice != 0 && hdr.Devmajor == 0 && hdr.Devminor == 0

    // handle re-mapping container ID mappings back to host ID mappings before
    // writing tar headers/files. We skip whiteout files because they were written
    // by the kernel and already have proper ownership relative to the host
    if !isOverlayWhiteout && !strings.HasPrefix(filepath.Base(hdr.Name), WhiteoutPrefix) && !ta.IdentityMapping.Empty() {
        fileIDPair, err := getFileUIDGID(fi.Sys())
        if err != nil {
            return err
        }
        hdr.Uid, hdr.Gid, err = ta.IdentityMapping.ToContainer(fileIDPair)
        if err != nil {
            return err
        }
    }
    ...

如果配置了ChownOpts,以该选项配置的UID, GID为准。

if ta.ChownOpts != nil {
    hdr.Uid = ta.ChownOpts.UID
    hdr.Gid = ta.ChownOpts.GID
}
...
3.1.3.21 pkg/archive/archive.go/addTarFile: whiteout

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L522-L542

func (ta *tarAppender) addTarFile(path, name string) error {
    ...
    if ta.WhiteoutConverter != nil {
        wo, err := ta.WhiteoutConverter.ConvertWrite(hdr, path, fi)
        if err != nil {
            return err
        }

        if wo != nil {
            if err := ta.TarWriter.WriteHeader(hdr); err != nil {
                return err
            }
            if hdr.Typeflag == tar.TypeReg && hdr.Size > 0 {
                return fmt.Errorf("tar: cannot use whiteout for non-empty file")
            }
            hdr = wo
        }
    }
3.1.3.22 pkg/archive/archive.go/addTarFile: 写入文件

https://github.com/moby/moby/blob/v20.10.6/pkg/archive/archive.go#L544-L571

func (ta *tarAppender) addTarFile(path, name string) error {
    ...
    if err := ta.TarWriter.WriteHeader(hdr); err != nil {
        return err
    }

    if hdr.Typeflag == tar.TypeReg && hdr.Size > 0 {
        // We use system.OpenSequential to ensure we use sequential file
        // access on Windows to avoid depleting the standby list.
        // On Linux, this equates to a regular os.Open.
        file, err := system.OpenSequential(path)
        if err != nil {
            return err
        }

        ta.Buffer.Reset(ta.TarWriter)
        defer ta.Buffer.Reset(nil)
        _, err = io.Copy(ta.Buffer, file)
        file.Close()
        if err != nil {
            return err
        }
        err = ta.Buffer.Flush()
        if err != nil {
            return err
        }
    }

    return nil
}

3.2 docker-untar