# ./rap.sh > Python Pip Golf (November 12, 2025)

A Tale of pip and tar: I Shrank the Wheel

Binary Golf Grand Prix 6: The Quest for the Smallest Pip Package

For Binary Golf Grand Prix 6 (BGGP6), one of the options for entries was to create the smallest file of a given type that prints, returns, or otherwise displays “6”. I chose to explore pip packages - as I'd recently looked at pip package indexes. I decided to look pip compatible .tar.gz files specifically and went down a rabbit hole of optimisation that led me from 109 bytes down to just 67 bytes.

This is the story of that journey.

The Challenge

A pip package seems like an unlikely candidate for code golf. The format requires: - Valid gzip compression - Valid tar archive structure - A setup.py file that pip can execute - Tar headers (512 bytes each) - Tar end-of-archive markers (1024 bytes)

The theoretical minimum for a tar.gz is substantial. A single file requires: - 512-byte header - File contents - Padding to 512-byte boundary - 1024 bytes of EOF markers

That's over 1.5KB uncompressed. But gzip compression can dramatically reduce this. The question becomes: what's the smallest compressed package that pip will process?

Entry 1: The Naive Approach (109 bytes)

Let's start with the obvious solution. Create a tar.gz with a single setup.py containing print(6):

import io, tarfile, gzip

tar_buf = io.BytesIO()
with tarfile.open(fileobj=tar_buf, mode="w") as tf:
    ti = tarfile.TarInfo("a/setup.py")
    ti.size = 8
    ti.mode = 0o644
    ti.mtime = 0
    ti.uid = ti.gid = 0
    ti.uname = ti.gname = ""
    tf.addfile(ti, io.BytesIO(b"print(6)"))

out = io.BytesIO()
with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz:
    gz.write(tar_buf.getvalue())

with open("bggp6_minimal_109b.tgz", "wb") as f:
    f.write(out.getvalue())

Result: 109 bytes

Testing it:

$ pip install bggp6_minimal_109b.tgz -v
...
Getting requirements to build wheel: started
Running command Getting requirements to build wheel
6
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Running command Preparing metadata (pyproject.toml)
6
Traceback (most recent call last):
...
AssertionError: Multiple .egg-info directories found

Success! The package prints "6" twice before erroring out. The -v flag is crucial—pip suppresses stdout by default.

Optimization 1: Directory Names (108 bytes)

I wondered: does the directory name affect compression? Let's test all 62 alphanumeric characters:

for char in "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ":
    # Generate package with directory f"{char}/setup.py"
    # Measure compressed size

Results:

Directory '3': 108 bytes  ← BEST!
Directory 'a': 109 bytes  (original)
Directory 'B': 110 bytes
Directory 'z': 109 bytes
...

Saved: 1 byte

Why does '3' compress better than 'a'? It comes down to how gzip's DEFLATE algorithm finds patterns. The byte value of '3' (0x33) creates better repetition patterns in the tar header structure.

Optimization 2: Reading pip's Source Code (106 bytes)

To go further, I needed to understand what pip actually requires. I cloned the pip repository and found the extraction logic in pip/src/pip/_internal/utils/unpacking.py:

def untar_file(filename, location):
    # ...
    for member in tar.getmembers():
        # ...
        if member.isfile() and orig_mode & 0o111:
            member.mode = default_mode_plus_executable
        else:
            member.mode = None  # Use system defaults!

Discovery: pip overwrites file permissions!

This means the tar archive's mode field only affects compression, not functionality. Let's test:

ti.mode = 0o000  # No permissions

Testing all directory characters again with mode 000:

Directory '3': 106 bytes  ← BEST with mode 000!
Directory 'a': 109 bytes

Saved: 3 bytes total (from original 109)

Mode 000 compresses better because it creates more uniform byte patterns—all those zero bits compress excellently.

The Discovery: Old-Style Tar Format (75 bytes)

At this point, I was stuck. I had optimized every parameter in the POSIX ustar format. But then I read the tar format specification more carefully and discovered something:

Python's tarfile module supports pre-POSIX tar format!

The "old-style" tar format (pre-1988, before the POSIX standard) has dramatically less overhead:

POSIX ustar format (what Python's tarfile creates):

Offset  Size  Field
0-99    100   filename
100-107 8     mode
108-115 8     uid
116-123 8     gid
124-135 12    size
136-147 12    mtime
148-155 8     checksum
156     1     typeflag
157-256 100   linkname
257-262 6     magic ("ustar\0")    ← OVERHEAD
263-264 2     version ("00")       ← OVERHEAD
265-296 32    uname                ← OVERHEAD
297-328 32    gname                ← OVERHEAD
329-344 16    devmajor/devminor    ← OVERHEAD
345-500 156   prefix               ← OVERHEAD

Uncompressed size: 10,240 bytes (with EOF)

Old-style tar format:

Offset  Size  Field
0-99    100   filename
100-107 8     mode
108-115 8     uid
116-123 8     gid
124-135 12    size
136-147 12    mtime
148-155 8     checksum
156     1     typeflag
157-511 355   UNUSED (all zeros)

Uncompressed size: 1,024 bytes (without EOF)

The POSIX fields contain strings and structured data that don't compress well. Old-style tar is mostly zeros, which gzip compresses excellently!

Manual Binary Crafting

Python's tarfile module doesn't support creating old-style tar archives—we have to craft the header manually:

def make_oldstyle_tar():
    payload = b"print(6)"
    header = bytearray(512)

    # Filename: 3/setup.py
    header[0:10] = b"3/setup.py"

    # Mode: 000
    header[100:108] = b"0000000\0"

    # UID/GID: 0
    header[108:116] = b"0000000\0"
    header[116:124] = b"0000000\0"

    # Size: 8 bytes (octal)
    header[124:136] = b"00000000010\0"

    # Mtime: 0
    header[136:148] = b"00000000000\0"

    # Type flag: '0' = regular file
    header[156] = ord('0')

    # NO ustar magic string!

    # Calculate checksum
    header[148:156] = b"        "
    checksum = sum(header)
    header[148:156] = f"{checksum:06o}\0 ".encode()

    # Build tar without EOF marker
    tar_data = bytes(header) + payload + b"\0" * 504

    # Compress
    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz:
        gz.write(tar_data)

    return out.getvalue()

Result: 75 bytes

Saved: 34 bytes from POSIX version!

Testing it:

$ pip install bggp6_minimal_75b.tgz -v
...
6
...
6
...

It works! Python's tarfile module detects the format by checking for the magic string. If absent, it falls back to old-style parsing.

Optimization 3: Re-testing Directory Names (73 bytes)

With old-style tar, the compression patterns changed completely. Time to test all 62 characters again:

Directory 'H': 73 bytes  ← BEST with old-style!
Directory '3': 75 bytes  (was best for POSIX)
Directory 'a': 76 bytes
Directory 'z': 76 bytes
...

Saved: 2 more bytes (36 bytes total)

The optimal character changed because the tar header structure is different—different byte patterns interact with gzip differently.

Optimization 4: No EOF Marker (67 bytes)

The tar specification says archives should end with two 512-byte blocks of zeros (1024 bytes total). But is this actually required?

Testing without the EOF marker:

# Build tar without EOF marker
tar_data = bytes(header) + payload + b"\0" * 504
# (no EOF marker added)

Result: 67 bytes

Testing:

$ pip install bggp6_minimal_67b.tgz -v
Using pip 24.2 from /path/to/venv/lib/python3.12/site-packages/pip (python 3.12)
Processing ./bggp6_minimal_67b.tgz
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Running command Getting requirements to build wheel
  6
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Running command Preparing metadata (pyproject.toml)
  6
  Traceback (most recent call last):
  ...
  AssertionError: Multiple .egg-info directories found
  error: subprocess-exited-with-error

Prints "6" twice before erroring. Still works! Python's tarfile module handles missing EOF markers gracefully.

Saved: 6 more bytes (42 bytes total from original!)

Final Result: 67 Bytes

def make_minimal_67b():
    payload = b"print(6)"
    header = bytearray(512)

    # Filename: H/setup.py (optimal for old-style)
    header[0:10] = b"H/setup.py"

    # Mode: 000
    header[100:108] = b"0000000\0"

    # UID/GID: 0
    header[108:116] = b"0000000\0"
    header[116:124] = b"0000000\0"

    # Size: 8 bytes (octal 10)
    header[124:136] = b"00000000010\0"

    # Mtime: 0
    header[136:148] = b"00000000000\0"

    # Type flag: '0' = regular file
    header[156] = ord('0')

    # Calculate checksum
    header[148:156] = b"        "
    checksum = sum(header)
    header[148:156] = f"{checksum:06o}\0 ".encode()

    # Build tar without EOF marker
    tar_data = bytes(header) + payload + b"\0" * 504

    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz:
        gz.write(tar_data)

    return out.getvalue()

Fully Working Version (127 bytes)

The 67-byte entry prints "6" but then errors. What about a package that installs successfully?

The challenge: we need both print(6) and a valid setuptools.setup() call.

Solution: Python code golf with star import!

from setuptools import*;print(6);setup()

Initially, I had used __import__? Let's compare:

# Method 1: __import__ (41 bytes)
print(6);__import__('setuptools').setup()

# Method 2: Star import (40 bytes)
from setuptools import*;print(6);setup()

Star import saves 1 byte in the payload! But does it compress better?

Testing both: - __import__ method with directory 't': 132 bytes - Star import with directory 't': 130 bytes (2 bytes saved!)

But wait—with a different payload, the optimal directory might change! Let me test all 62 characters with the star import payload...

Directory '5': 127 bytes  ← BEST!
Directory 'u': 127 bytes
Directory 't': 130 bytes  (was best for __import__)
...

Amazing! Directory '5' with star import gives us 127 bytes—a 5-byte improvement!

Creating the optimized package:

def make_best_127b():
    payload = b"from setuptools import*;print(6);setup()"

    tar_buf = io.BytesIO()
    with tarfile.open(fileobj=tar_buf, mode="w") as tf:
        ti = tarfile.TarInfo("5/setup.py")  # '5' optimal with star import!
        ti.size = len(payload)
        ti.mode = 0o000
        ti.mtime = 0
        ti.uid = ti.gid = 0
        ti.uname = ti.gname = ""
        tf.addfile(ti, io.BytesIO(payload))

    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz:
        gz.write(tar_buf.getvalue())

    return out.getvalue()

Result: 127 bytes

Testing:

$ pip install bggp6_best_127b.tgz -v
Using pip 24.2 from /path/to/venv/lib/python3.12/site-packages/pip (python 3.12)
Processing ./bggp6_best_127b.tgz
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Running command Getting requirements to build wheel
  6
  running egg_info
  creating UNKNOWN.egg-info
  writing UNKNOWN.egg-info/PKG-INFO
  writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
  writing top-level names to UNKNOWN.egg-info/top_level.txt
  writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
  reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
  writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Running command Preparing metadata (pyproject.toml)
  6
  running dist_info
  creating /tmp/pip-modern-metadata-xxxxx/UNKNOWN.egg-info
  writing /tmp/pip-modern-metadata-xxxxx/UNKNOWN.egg-info/PKG-INFO
  ...
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml): started
  Running command Building wheel for UNKNOWN (pyproject.toml)
  6
  running bdist_wheel
  running build
  installing to build/bdist.macosx-10.13-universal2/wheel
  running install
  running install_egg_info
  ...
  Building wheel for UNKNOWN (pyproject.toml): finished with status 'done'
  Created wheel for UNKNOWN: filename=unknown-0.0.0-py3-none-any.whl size=921 sha256=ed1691d64cda5c609b208a011c3b6a1c57e974ed2353c8eed8aebb2e9c142f1c
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0

Prints "6" three times during different build phases: 1. During "Getting requirements to build wheel" 2. During "Preparing metadata" 3. During "Building wheel"

Then installs successfully! The package can be uninstalled with pip uninstall UNKNOWN -y.

Bonus Round: Persistence (150 bytes)

What if we wanted "6" to appear on every Python startup after installation?

The technique: sitecustomize.py

Python automatically imports sitecustomize.py at startup if it exists in site-packages. We can exploit this!

Structure (with star import optimization):

z/setup.py: from setuptools import*;setup(py_modules=['sitecustomize'])
z/sitecustomize.py: print(6)

Since we need two files, I used old-style tar format for both. And applying the star import trick here too:

def make_persistent_150b():
    setup = b"from setuptools import*;setup(py_modules=['sitecustomize'])"
    sitecust = b"print(6)"

    # Create two old-style tar headers (512 bytes each)
    # File 1: z/setup.py
    header1 = bytearray(512)
    header1[0:11] = b"z/setup.py\0"
    header1[100:108] = b"0000000\0"
    header1[108:116] = b"0000000\0"
    header1[116:124] = b"0000000\0"
    size1_oct = f"{len(setup):011o}".encode() + b"\0"
    header1[124:136] = size1_oct
    header1[136:148] = b"00000000000\0"
    header1[156] = ord('0')
    header1[148:156] = b"        "
    checksum1 = sum(header1)
    header1[148:156] = f"{checksum1:06o}\0 ".encode()

    # File 2: a/sitecustomize.py
    header2 = bytearray(512)
    header2[0:19] = b"a/sitecustomize.py\0"
    header2[100:108] = b"0000000\0"
    header2[108:116] = b"0000000\0"
    header2[116:124] = b"0000000\0"
    size2_oct = f"{len(sitecust):011o}".encode() + b"\0"
    header2[124:136] = size2_oct
    header2[136:148] = b"00000000000\0"
    header2[156] = ord('0')
    header2[148:156] = b"        "
    checksum2 = sum(header2)
    header2[148:156] = f"{checksum2:06o}\0 ".encode()

    # Build tar
    tar_data = bytes(header1) + setup
    padding1 = (512 - (len(setup) % 512)) % 512
    tar_data += b"\0" * padding1

    tar_data += bytes(header2) + sitecust
    padding2 = (512 - (len(sitecust) % 512)) % 512
    tar_data += b"\0" * padding2

    # NO EOF marker

    # Compress
    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz:
        gz.write(tar_data)

    return out.getvalue()

Result: 150 bytes (30 bytes saved from POSIX version!)

Testing:

$ pip install bggp6_persistent_150b.tgz
Processing ./bggp6_persistent_150b.tgz
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: sitecustomize
  Building wheel for sitecustomize (pyproject.toml): started
  Building wheel for sitecustomize (pyproject.toml): finished with status 'done'
  Created wheel for sitecustomize: filename=sitecustomize-0.0.0-py3-none-any.whl size=1155 sha256=eb5f55f46ad29a613916630fcaaf289033d3d9d0ce4159adcef5853a85de5b16
Successfully built sitecustomize
Installing collected packages: sitecustomize
Successfully installed sitecustomize-0.0.0

$ python3 -c "pass"
6

$ python3 -c "print('hello')"
6
hello

$  python3
6
Python 3.12.6 (v3.12.6:a4a2d2b0d85, Sep  6 2024, 16:08:03) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Notice: No "6" printed during installation The installation is silent. But now every Python invocation prints "6" first! The behavior persists across sessions.

Cleanup:

$ pip uninstall sitecustomize -y
6
Found existing installation: sitecustomize 0.0.0
Uninstalling sitecustomize-0.0.0:
  Successfully uninstalled sitecustomize-0.0.0

$ python3 -c "pass"
(no output - the "6" is gone)

Even pip uninstall triggers the sitecustomize hook one last time, so you see "6" during uninstallation. After uninstall, Python no longer prints "6".

Optimization Timeline

Version Size Discovery Improvement
Original 109b POSIX ustar, directory a, mode 644 baseline
After char test 108b Directory 3 compresses better -1 byte
After mode 000 106b Pip ignores mode field -3 bytes
Old-style tar 75b Pre-POSIX format valid! -34 bytes
Dir 'H' optimal 73b Different with old-style -36 bytes
No EOF marker 67b Python handles it -42 bytes!

Verification

SHA256 Hashes: - Absolute Minimal (67b): ddc6197c37e92ad21f144475507af9bb530c56c394d8b4bb05eb5035f466326b - Best Working (127b): 2464b6c487b994afc28b5e712004e433140c572fd21499c111c4aa72b9724258 - Persistent (150b): 81fca99e90c6f3a3798e938bf8cddce7a88fbe3abcb039269821ed380d91a787

Tested with: - pip 24.2 - Python 3.12 - All platforms (Linux, macOS, Windows)

Generator Script

All three entries can be generated with the included generate_bggp6_entries.py script in the following repo:

https://github.com/dtmsecurity/bggp6