# ./rap.sh > Python Pip Golf (November 12, 2025)A Tale of pip and tar: I Shrank the Wheel
Binary Golf Grand Prix 6: The Quest for the Smallest Pip Package
For Binary Golf Grand Prix 6 (BGGP6), one of the options for entries was to create the smallest file of a given type that prints, returns, or otherwise displays “6”. I chose to explore pip packages - as I'd recently looked at pip package indexes. I decided to look pip compatible
.tar.gzfiles specifically and went down a rabbit hole of optimisation that led me from 109 bytes down to just 67 bytes.This is the story of that journey.
The Challenge
A pip package seems like an unlikely candidate for code golf. The format requires: - Valid gzip compression - Valid tar archive structure - A
setup.pyfile that pip can execute - Tar headers (512 bytes each) - Tar end-of-archive markers (1024 bytes)The theoretical minimum for a tar.gz is substantial. A single file requires: - 512-byte header - File contents - Padding to 512-byte boundary - 1024 bytes of EOF markers
That's over 1.5KB uncompressed. But gzip compression can dramatically reduce this. The question becomes: what's the smallest compressed package that pip will process?
Entry 1: The Naive Approach (109 bytes)
Let's start with the obvious solution. Create a tar.gz with a single
setup.pycontainingprint(6):import io, tarfile, gzip tar_buf = io.BytesIO() with tarfile.open(fileobj=tar_buf, mode="w") as tf: ti = tarfile.TarInfo("a/setup.py") ti.size = 8 ti.mode = 0o644 ti.mtime = 0 ti.uid = ti.gid = 0 ti.uname = ti.gname = "" tf.addfile(ti, io.BytesIO(b"print(6)")) out = io.BytesIO() with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz: gz.write(tar_buf.getvalue()) with open("bggp6_minimal_109b.tgz", "wb") as f: f.write(out.getvalue())Result: 109 bytes
Testing it:
$ pip install bggp6_minimal_109b.tgz -v ... Getting requirements to build wheel: started Running command Getting requirements to build wheel 6 Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Running command Preparing metadata (pyproject.toml) 6 Traceback (most recent call last): ... AssertionError: Multiple .egg-info directories foundSuccess! The package prints "6" twice before erroring out. The
-vflag is crucial—pip suppresses stdout by default.Optimization 1: Directory Names (108 bytes)
I wondered: does the directory name affect compression? Let's test all 62 alphanumeric characters:
for char in "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ": # Generate package with directory f"{char}/setup.py" # Measure compressed sizeResults:
Directory '3': 108 bytes ← BEST! Directory 'a': 109 bytes (original) Directory 'B': 110 bytes Directory 'z': 109 bytes ...Saved: 1 byte
Why does '3' compress better than 'a'? It comes down to how gzip's DEFLATE algorithm finds patterns. The byte value of '3' (0x33) creates better repetition patterns in the tar header structure.
Optimization 2: Reading pip's Source Code (106 bytes)
To go further, I needed to understand what pip actually requires. I cloned the pip repository and found the extraction logic in
pip/src/pip/_internal/utils/unpacking.py:def untar_file(filename, location): # ... for member in tar.getmembers(): # ... if member.isfile() and orig_mode & 0o111: member.mode = default_mode_plus_executable else: member.mode = None # Use system defaults!Discovery: pip overwrites file permissions!
This means the tar archive's mode field only affects compression, not functionality. Let's test:
ti.mode = 0o000 # No permissionsTesting all directory characters again with mode 000:
Directory '3': 106 bytes ← BEST with mode 000! Directory 'a': 109 bytesSaved: 3 bytes total (from original 109)
Mode 000 compresses better because it creates more uniform byte patterns—all those zero bits compress excellently.
The Discovery: Old-Style Tar Format (75 bytes)
At this point, I was stuck. I had optimized every parameter in the POSIX ustar format. But then I read the tar format specification more carefully and discovered something:
Python's tarfile module supports pre-POSIX tar format!
The "old-style" tar format (pre-1988, before the POSIX standard) has dramatically less overhead:
POSIX ustar format (what Python's tarfile creates):
Offset Size Field 0-99 100 filename 100-107 8 mode 108-115 8 uid 116-123 8 gid 124-135 12 size 136-147 12 mtime 148-155 8 checksum 156 1 typeflag 157-256 100 linkname 257-262 6 magic ("ustar\0") ← OVERHEAD 263-264 2 version ("00") ← OVERHEAD 265-296 32 uname ← OVERHEAD 297-328 32 gname ← OVERHEAD 329-344 16 devmajor/devminor ← OVERHEAD 345-500 156 prefix ← OVERHEADUncompressed size: 10,240 bytes (with EOF)
Old-style tar format:
Offset Size Field 0-99 100 filename 100-107 8 mode 108-115 8 uid 116-123 8 gid 124-135 12 size 136-147 12 mtime 148-155 8 checksum 156 1 typeflag 157-511 355 UNUSED (all zeros)Uncompressed size: 1,024 bytes (without EOF)
The POSIX fields contain strings and structured data that don't compress well. Old-style tar is mostly zeros, which gzip compresses excellently!
Manual Binary Crafting
Python's
tarfilemodule doesn't support creating old-style tar archives—we have to craft the header manually:def make_oldstyle_tar(): payload = b"print(6)" header = bytearray(512) # Filename: 3/setup.py header[0:10] = b"3/setup.py" # Mode: 000 header[100:108] = b"0000000\0" # UID/GID: 0 header[108:116] = b"0000000\0" header[116:124] = b"0000000\0" # Size: 8 bytes (octal) header[124:136] = b"00000000010\0" # Mtime: 0 header[136:148] = b"00000000000\0" # Type flag: '0' = regular file header[156] = ord('0') # NO ustar magic string! # Calculate checksum header[148:156] = b" " checksum = sum(header) header[148:156] = f"{checksum:06o}\0 ".encode() # Build tar without EOF marker tar_data = bytes(header) + payload + b"\0" * 504 # Compress out = io.BytesIO() with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz: gz.write(tar_data) return out.getvalue()Result: 75 bytes
Saved: 34 bytes from POSIX version!
Testing it:
$ pip install bggp6_minimal_75b.tgz -v ... 6 ... 6 ...It works! Python's tarfile module detects the format by checking for the magic string. If absent, it falls back to old-style parsing.
Optimization 3: Re-testing Directory Names (73 bytes)
With old-style tar, the compression patterns changed completely. Time to test all 62 characters again:
Directory 'H': 73 bytes ← BEST with old-style! Directory '3': 75 bytes (was best for POSIX) Directory 'a': 76 bytes Directory 'z': 76 bytes ...Saved: 2 more bytes (36 bytes total)
The optimal character changed because the tar header structure is different—different byte patterns interact with gzip differently.
Optimization 4: No EOF Marker (67 bytes)
The tar specification says archives should end with two 512-byte blocks of zeros (1024 bytes total). But is this actually required?
Testing without the EOF marker:
# Build tar without EOF marker tar_data = bytes(header) + payload + b"\0" * 504 # (no EOF marker added)Result: 67 bytes
Testing:
$ pip install bggp6_minimal_67b.tgz -v Using pip 24.2 from /path/to/venv/lib/python3.12/site-packages/pip (python 3.12) Processing ./bggp6_minimal_67b.tgz Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Running command Getting requirements to build wheel 6 Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Running command Preparing metadata (pyproject.toml) 6 Traceback (most recent call last): ... AssertionError: Multiple .egg-info directories found error: subprocess-exited-with-errorPrints "6" twice before erroring. Still works! Python's tarfile module handles missing EOF markers gracefully.
Saved: 6 more bytes (42 bytes total from original!)
Final Result: 67 Bytes
def make_minimal_67b(): payload = b"print(6)" header = bytearray(512) # Filename: H/setup.py (optimal for old-style) header[0:10] = b"H/setup.py" # Mode: 000 header[100:108] = b"0000000\0" # UID/GID: 0 header[108:116] = b"0000000\0" header[116:124] = b"0000000\0" # Size: 8 bytes (octal 10) header[124:136] = b"00000000010\0" # Mtime: 0 header[136:148] = b"00000000000\0" # Type flag: '0' = regular file header[156] = ord('0') # Calculate checksum header[148:156] = b" " checksum = sum(header) header[148:156] = f"{checksum:06o}\0 ".encode() # Build tar without EOF marker tar_data = bytes(header) + payload + b"\0" * 504 out = io.BytesIO() with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz: gz.write(tar_data) return out.getvalue()Fully Working Version (127 bytes)
The 67-byte entry prints "6" but then errors. What about a package that installs successfully?
The challenge: we need both
print(6)and a validsetuptools.setup()call.Solution: Python code golf with star import!
from setuptools import*;print(6);setup()Initially, I had used
__import__? Let's compare:# Method 1: __import__ (41 bytes) print(6);__import__('setuptools').setup() # Method 2: Star import (40 bytes) from setuptools import*;print(6);setup()Star import saves 1 byte in the payload! But does it compress better?
Testing both: -
__import__method with directory 't': 132 bytes - Star import with directory 't': 130 bytes (2 bytes saved!)But wait—with a different payload, the optimal directory might change! Let me test all 62 characters with the star import payload...
Directory '5': 127 bytes ← BEST! Directory 'u': 127 bytes Directory 't': 130 bytes (was best for __import__) ...Amazing! Directory '5' with star import gives us 127 bytes—a 5-byte improvement!
Creating the optimized package:
def make_best_127b(): payload = b"from setuptools import*;print(6);setup()" tar_buf = io.BytesIO() with tarfile.open(fileobj=tar_buf, mode="w") as tf: ti = tarfile.TarInfo("5/setup.py") # '5' optimal with star import! ti.size = len(payload) ti.mode = 0o000 ti.mtime = 0 ti.uid = ti.gid = 0 ti.uname = ti.gname = "" tf.addfile(ti, io.BytesIO(payload)) out = io.BytesIO() with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz: gz.write(tar_buf.getvalue()) return out.getvalue()Result: 127 bytes
Testing:
$ pip install bggp6_best_127b.tgz -v Using pip 24.2 from /path/to/venv/lib/python3.12/site-packages/pip (python 3.12) Processing ./bggp6_best_127b.tgz Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Running command Getting requirements to build wheel 6 running egg_info creating UNKNOWN.egg-info writing UNKNOWN.egg-info/PKG-INFO writing dependency_links to UNKNOWN.egg-info/dependency_links.txt writing top-level names to UNKNOWN.egg-info/top_level.txt writing manifest file 'UNKNOWN.egg-info/SOURCES.txt' reading manifest file 'UNKNOWN.egg-info/SOURCES.txt' writing manifest file 'UNKNOWN.egg-info/SOURCES.txt' Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Running command Preparing metadata (pyproject.toml) 6 running dist_info creating /tmp/pip-modern-metadata-xxxxx/UNKNOWN.egg-info writing /tmp/pip-modern-metadata-xxxxx/UNKNOWN.egg-info/PKG-INFO ... Preparing metadata (pyproject.toml): finished with status 'done' Building wheels for collected packages: UNKNOWN Building wheel for UNKNOWN (pyproject.toml): started Running command Building wheel for UNKNOWN (pyproject.toml) 6 running bdist_wheel running build installing to build/bdist.macosx-10.13-universal2/wheel running install running install_egg_info ... Building wheel for UNKNOWN (pyproject.toml): finished with status 'done' Created wheel for UNKNOWN: filename=unknown-0.0.0-py3-none-any.whl size=921 sha256=ed1691d64cda5c609b208a011c3b6a1c57e974ed2353c8eed8aebb2e9c142f1c Successfully built UNKNOWN Installing collected packages: UNKNOWN Successfully installed UNKNOWN-0.0.0Prints "6" three times during different build phases: 1. During "Getting requirements to build wheel" 2. During "Preparing metadata" 3. During "Building wheel"
Then installs successfully! The package can be uninstalled with
pip uninstall UNKNOWN -y.Bonus Round: Persistence (150 bytes)
What if we wanted "6" to appear on every Python startup after installation?
The technique: sitecustomize.py
Python automatically imports
sitecustomize.pyat startup if it exists in site-packages. We can exploit this!Structure (with star import optimization):
z/setup.py: from setuptools import*;setup(py_modules=['sitecustomize']) z/sitecustomize.py: print(6)Since we need two files, I used old-style tar format for both. And applying the star import trick here too:
def make_persistent_150b(): setup = b"from setuptools import*;setup(py_modules=['sitecustomize'])" sitecust = b"print(6)" # Create two old-style tar headers (512 bytes each) # File 1: z/setup.py header1 = bytearray(512) header1[0:11] = b"z/setup.py\0" header1[100:108] = b"0000000\0" header1[108:116] = b"0000000\0" header1[116:124] = b"0000000\0" size1_oct = f"{len(setup):011o}".encode() + b"\0" header1[124:136] = size1_oct header1[136:148] = b"00000000000\0" header1[156] = ord('0') header1[148:156] = b" " checksum1 = sum(header1) header1[148:156] = f"{checksum1:06o}\0 ".encode() # File 2: a/sitecustomize.py header2 = bytearray(512) header2[0:19] = b"a/sitecustomize.py\0" header2[100:108] = b"0000000\0" header2[108:116] = b"0000000\0" header2[116:124] = b"0000000\0" size2_oct = f"{len(sitecust):011o}".encode() + b"\0" header2[124:136] = size2_oct header2[136:148] = b"00000000000\0" header2[156] = ord('0') header2[148:156] = b" " checksum2 = sum(header2) header2[148:156] = f"{checksum2:06o}\0 ".encode() # Build tar tar_data = bytes(header1) + setup padding1 = (512 - (len(setup) % 512)) % 512 tar_data += b"\0" * padding1 tar_data += bytes(header2) + sitecust padding2 = (512 - (len(sitecust) % 512)) % 512 tar_data += b"\0" * padding2 # NO EOF marker # Compress out = io.BytesIO() with gzip.GzipFile(fileobj=out, mode="wb", mtime=0, compresslevel=9) as gz: gz.write(tar_data) return out.getvalue()Result: 150 bytes (30 bytes saved from POSIX version!)
Testing:
$ pip install bggp6_persistent_150b.tgz Processing ./bggp6_persistent_150b.tgz Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Preparing metadata (pyproject.toml): finished with status 'done' Building wheels for collected packages: sitecustomize Building wheel for sitecustomize (pyproject.toml): started Building wheel for sitecustomize (pyproject.toml): finished with status 'done' Created wheel for sitecustomize: filename=sitecustomize-0.0.0-py3-none-any.whl size=1155 sha256=eb5f55f46ad29a613916630fcaaf289033d3d9d0ce4159adcef5853a85de5b16 Successfully built sitecustomize Installing collected packages: sitecustomize Successfully installed sitecustomize-0.0.0 $ python3 -c "pass" 6 $ python3 -c "print('hello')" 6 hello $ python3 6 Python 3.12.6 (v3.12.6:a4a2d2b0d85, Sep 6 2024, 16:08:03) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin Type "help", "copyright", "credits" or "license" for more information.Notice: No "6" printed during installation The installation is silent. But now every Python invocation prints "6" first! The behavior persists across sessions.
Cleanup:
$ pip uninstall sitecustomize -y 6 Found existing installation: sitecustomize 0.0.0 Uninstalling sitecustomize-0.0.0: Successfully uninstalled sitecustomize-0.0.0 $ python3 -c "pass" (no output - the "6" is gone)Even
pip uninstalltriggers the sitecustomize hook one last time, so you see "6" during uninstallation. After uninstall, Python no longer prints "6".Optimization Timeline
| Version | Size | Discovery | Improvement |
|---|---|---|---|
| Original | 109b | POSIX ustar, directory a, mode 644 |
baseline |
| After char test | 108b | Directory 3 compresses better |
-1 byte |
| After mode 000 | 106b | Pip ignores mode field | -3 bytes |
| Old-style tar | 75b | Pre-POSIX format valid! | -34 bytes |
| Dir 'H' optimal | 73b | Different with old-style | -36 bytes |
| No EOF marker | 67b | Python handles it | -42 bytes! |
SHA256 Hashes:
- Absolute Minimal (67b): ddc6197c37e92ad21f144475507af9bb530c56c394d8b4bb05eb5035f466326b
- Best Working (127b): 2464b6c487b994afc28b5e712004e433140c572fd21499c111c4aa72b9724258
- Persistent (150b): 81fca99e90c6f3a3798e938bf8cddce7a88fbe3abcb039269821ed380d91a787
Tested with: - pip 24.2 - Python 3.12 - All platforms (Linux, macOS, Windows)
All three entries can be generated with the included generate_bggp6_entries.py script in the following repo: