Skip to content

feat(layerscanning): add optional FileRequirer to image Config#2170

Open
waldemar-kindler wants to merge 4 commits into
google:mainfrom
waldemar-kindler:layerscanning-file-requirer
Open

feat(layerscanning): add optional FileRequirer to image Config#2170
waldemar-kindler wants to merge 4 commits into
google:mainfrom
waldemar-kindler:layerscanning-file-requirer

Conversation

@waldemar-kindler

Copy link
Copy Markdown

Summary

Adds an optional FileRequirer to the layerscanning image Config so callers
can avoid materializing files that no extractor needs. When set, a regular file
is unpacked into the image content store only if FileRequired returns true;
directories and symlinks are always kept, and a nil requirer means
"require all" — the existing default behavior is unchanged.

Impact: decompression is unaffected, so this is primarily a footprint win (with
a minor unpack-time gain). On a 398MB test image the content store shrank from
~29.9k to ~6.5k files.

Changes

  • feat(layerscanning): add the optional FileRequirer field to Config.
    DefaultConfig and validateConfig default a nil requirer to
    FileRequirerAll{}, so the require-all path is preserved.
  • fix(layerscanning): keep whiteouts when filtering with a requirer.
    Whiteouts are 0-byte regular files whose path encodes a deleted entry; gating
    tar.TypeReg entries on the requirer dropped whiteouts whose de-whiteouted
    path was not required, leaking deleted files back into the merged filesystem.
    Whiteouts are now exempt from the requirer check, preserving layer deletion
    semantics. Includes a regression test where an upper layer deletes a directory
    via a whiteout while a path requirer is active.
  • fix(layerscanning): materialize symlink targets of required files. A
    required path may be a symlink whose target regular file is not itself
    required; the single-pass filter skipped the target, leaving the required path
    dangling (e.g. /etc/os-release -> /usr/lib/os-release). The layers are now
    swept to a fixpoint: a required symlink records its resolved target, and a
    later pass materializes the file it resolves to, following symlink chains. The
    sweep is idempotent and breaks early, so the default require-all path stays
    single-pass. This mirrors the multi-pass requiredTargets approach already
    used in image/unpack.

Testing

  • go test ./artifact/image/layerscanning/image/ passes.
  • Adds requirer_test.go covering: requirer gating of regular files, whiteout
    preservation under a requirer, and symlink-target materialization (including
    chains).

No new dependencies. gofmt, go vet, and golangci-lint are clean on the
changed files.

Materialize a regular file only when FileRequired returns true; dirs and
symlinks are always kept and a nil requirer means require-all (unchanged
default). Lets callers skip unpacking files no extractor needs, shrinking
the content store (~29.9k -> ~6.5k files on a 398MB image). Decompression
is unaffected, so this is a footprint win with a minor unpack-time gain.
Whiteouts are 0-byte regular files whose path encodes the deleted entry.
Gating tar.TypeReg entries on FileRequirer skipped any whiteout whose
de-whiteouted path was not required, dropping directory whiteouts and
leaking deleted files back into the merged filesystem. Exempt whiteouts
from the requirer check so layer deletion semantics are preserved.

Add a regression test that filters with a path requirer while an upper
layer deletes a directory via a whiteout, asserting the file stays gone.
A required path may be a symlink whose target regular file is not itself
required. The single-pass requirer filter skipped the target, so reading
the required path through the symlink dangled (e.g. /etc/os-release ->
/usr/lib/os-release).

Sweep the layers repeatedly until a fixpoint: a required symlink records
its resolved target, and a later pass materializes the regular file it
resolves to, following symlink chains. The sweep is idempotent and breaks
early, so the default (require-all) path stays single-pass. Mirrors the
multi-pass requiredTargets approach already used in image/unpack.
The comment claimed decompression of layer streams is unaffected by
filtering, but resolving required symlink targets sweeps the layers
repeatedly, re-decompressing each stream up to MaxSymlinkDepth+1 times.
Note the trade-off and that the default requirer settles in one pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant