Decoding RealSound: Pulling 1987 Speech Out of a DOS Golf Game

Jul 4, 2026 · written & coded by Claude, guidance by Jacob DeHart

There is a particular kind of magic to hearing a full sentence of digitized human speech come out of a machine that, on paper, can only produce a square wave. In 1987, Access Software shipped exactly that in World Class Leader Board: an announcer who reacts to your golf shots — "straight onto the fairway," "he's deep in the sandtrap" — all playing through the internal PC speaker, a device with a single output bit.

This post walks through how those sounds are stored, how the decoder works, and what the engineering tells us about squeezing real audio through a 1-bit hole a decade before consumer sound cards were universal. We'll go from a raw disk image down to the bit level of the codec, and end with a clean, dependency-free converter.

Step 1: Cracking open the disk image

The starting point was a 3 MB floppy/hard-disk image (WC_LeaderBoard.img). A quick look at the boot sector:

DOS/MBR boot sector, FREE-DOS MBR;
partition 1 : ID=0x6 (FAT16), active, start sector 8, 6136 sectors

Inside the FAT16 partition (MSDOS5.0 OEM string, 512-byte sectors, one sector per cluster, 512 root entries, two FATs) sat the game's files. Rather than reach for mtools, it's a few dozen lines of Python to walk a FAT16 volume directly: read the BPB to locate the reserved area, the two FAT copies, and the fixed-size root directory; then follow cluster chains through the FAT. The directory tree gave up the whole game, and one group of files stood out:

CHEER.RSH   CLAPS.RSH   OOHS.RSH    SPLASH1.RSH  SWING1.RSH  ...
VFAIRWY.RSH VPUTT.RSH   VTRAP.RSH   VTREES.RSH   VCANT.RSH

Twenty-three .RSH files: sound effects (swings, putts, splashes, the ball rattling in the cup, a frog, crickets, crowd noise) and a set of V* files that are clearly the announcer's voice lines. These are the RealSound resources.

Step 2: What RealSound actually is

RealSound is Access Software's branded technique (later covered by US Patent 5,054,086) for playing digitized audio through the PC speaker. The speaker is a one-bit device — the CPU can push its cone out or let it sit in, nothing between. Access got roughly 6-bit PCM out of it using pulse-width modulation (PWM).

The trick: drive the speaker with a fast pulse train and vary the duty cycle. Programmable Interval Timer (PIT) channel 2 is wired to the speaker. If you load it with a small count, the pulse is narrow; a larger count, wider. The speaker cone and, ultimately, your ear act as a low-pass filter, integrating that stream of pulses back into a smooth analog voltage. A pulse width becomes an amplitude. Feed it new widths ~8000 times a second and you have an 8 kHz DAC hiding inside a 1-bit toy.

That explains playback. The interesting part — the part that isn't in the patent — is how the samples are stored on disk, because at 8 kHz, uncompressed 6-bit audio is expensive. A 2.6-second announcer line is ~21,000 samples. Access compressed them, and to understand the container we had to read the decoder out of the game itself.

Step 3: Finding the decoder in `GOLF.EXE`

The .RSH header is tiny and unrevealing:

offset	field	value (VPUTT.RSH)
0	`word0`	`0x0004` (magic)
2	`word1`	`14732`
4	`word2`	`328`

After the header, the payload reads as a stream of 16-bit little-endian values in the 0–767 range — not bytes, and not obviously PCM. That's a compressed representation, and the only authoritative spec is the code that consumes it. So into GOLF.EXE we go.

Disassembling 16-bit real-mode code with objdump -b binary -m i8086 and hunting for the sound machinery — port I/O to 0x42/0x43 (the PIT), a lodsw-driven loop, comparisons against constants near 512 — turned up a decode routine that does something delightful and slightly evil: it self-modifies. The setup reads word1 and word2 out of the file and patches the immediate operands of the inner loop before running it:

mov  es, [1e60h]      ; source segment (the loaded .RSH file)
mov  si, 2            ; point at word1
mov  bp, si
add  bp, 4            ; bp = 6  (tree base offset)
mov  cx, es:[si]      ; cx = word1  = number of output samples
mov  ax, es:[si+2]    ; ax = word2  = tree size in bytes
shr  ax, 1
mov  cs:[patch_split], ax        ; split      = word2/2
sub  ax, 2
add  ax, bp
mov  cs:[patch_root], ax         ; root node  = word2/2 - 2 + bp
mov  ax, 202h                    ; 514
sub  ax, bp
mov  cs:[patch_ptrsub], ax       ; pointer bias = 514 - bp = 508

Those three patched constants — split = word2/2, root = word2/2 - 2 + 6, and ptr_bias = 508 — are the entire geometry of the format. With them in hand, the inner loop becomes readable.

Step 4: The format — a two-array Huffman tree with a side-channel bit stream

The payload after the 6-byte header is three regions:

Left-child array — word2/2 bytes of 16-bit codes.
Right-child array — another word2/2 bytes, immediately after.
Flag-bit stream — everything from word2 + 6 to end of file.

The two child arrays together form a canonical prefix-code (Huffman) tree, but stored in a way that's beautifully suited to a slow CPU. Instead of nodes with explicit left/right pointers, a node is simply a byte offset si, and its two children are found by pure index arithmetic:

left  child code = word[ si ]
right child code = word[ si + word2/2 ]

The word2/2 split is the constant distance between the "left" and "right" planes. No pointer chasing, no node structs — just two parallel tables and an add.

Each child code is self-describing:

code >= 514 → it's an internal node. Descend: si = code - 508. (The -508 unwinds the +6 base offset, so code directly encodes the child's byte offset in the tree.)
code < 514 → it's a leaf. Emit a sample: (code - 2) / 2.

The leaf formula is the giveaway that we're back in audio territory: codes 2..128 map to samples 0..63 — a clean 6-bit value, exactly the RealSound resolution. (The tree happens to carry 83 distinct leaf symbols spanning the range, so a handful of louder samples reach higher.)

The direction of each descent — left or right — comes from the flag-bit stream, read MSB-first, one bit per step:

si = root
loop:
    bit  = next_flag_bit()          # 0 -> left, 1 -> right
    code = (bit == 0) ? word[si] : word[si + word2/2]
    if code >= 514:  si = code - 508          # internal node: keep descending
    else:            emit((code - 2) / 2)      # leaf: one 6-bit sample, restart

Repeat exactly word1 times and you have the full sample buffer. Here it is as a self-contained Python decoder:

def decode_rsh(raw):
    def w(o): return raw[o] | (raw[o + 1] << 8)
    assert w(0) == 0x0004                     # magic
    n_samples = w(2)
    tree_size = w(4)
    split     = tree_size // 2
    root      = split - 2 + 6
    flag_off  = tree_size + 6                  # bit stream starts after the tree

    out, bx, mask = bytearray(), flag_off, 0x80
    for _ in range(n_samples):
        si = root
        while True:
            bit = 1 if (raw[bx] & mask) else 0
            if mask == 1: mask = 0x80; bx += 1  # MSB-first, advance byte on wrap
            else:         mask >>= 1
            code = w(si if bit == 0 else si + split)
            if code >= 0x202:                   # 514: internal node
                si = code - 508
            else:                               # leaf: 6-bit sample
                out.append(((code - 2) >> 1) & 0xFF)
                break
    return out

A couple of design notes worth appreciating:

The bit stream is separated from the code tables. Rather than interleave the Huffman branch bits with the node data, Access parks all the navigation bits in one contiguous run at the end of the file. On a 4.77 MHz 8088 that's a real win: the decoder keeps two cursors — bx marching linearly through the bit stream, si bouncing around the tree — and the bit cursor never has to chase alignment inside the code tables. It's a classic separation of control from data.
word1 is a hard sample count, not a byte length. The decoder loops a fixed number of times and trusts the bit stream to be exactly long enough. No end-of-stream sentinel, no length-prefixed blocks — every bit is payload. That is a very 1987 kind of confidence.
Self-modifying code as parameterization. Rather than keep split, root, and the pointer bias in registers or memory and pay for the extra loads inside the hottest loop in the program, the setup bakes them straight into the instruction immediates. On a machine where every cycle in an 8000-times-a-second inner loop counts, patching cmp ax, imm16 beats a memory fetch. It also makes static analysis mildly infuriating, which is a bonus.

Verifying the decode is unusually satisfying because the format is self-checking. A correct entropy decoder is a house of cards: get the bit order or the child mapping even slightly wrong and it collapses into noise almost immediately. When we ran it, the output produced exactly word1 samples, used exactly 83 distinct values (matching the tree's 83 leaves), and those values landed cleanly in the 6-bit range. Flip the bit order or swap the child arrays and the sample-to-sample correlation drops to zero. Only one interpretation holds together — which is how you know it's the right one.

Step 5: The playback engine

With samples in hand, the last question is: how fast do they play, and how do they reach the speaker? The playback ISR answers both. It's a timer interrupt (PIT channel 0, reprogrammed to fire at ~16.5 kHz) that drives PIT channel 2 as the PWM DAC:

mov  ax, ss:[0]      ; current (countdown, sample)
out  42h, al         ; write sample as channel-2 pulse width -> speaker
dec  ah
je   load_next       ; every couple of ticks, advance to the next sample
...
load_next:
    lods al          ; next sample byte from the decoded buffer
    mov  ah, 2       ; two carrier ticks per sample

Each decoded sample is pushed to port 0x42 as a channel-2 count — a pulse whose width is the amplitude. The ISR services two carrier ticks per sample, so the effective sample rate is roughly 8 kHz (the ~16.5 kHz timer divided down). At that rate the announcer lines clock in at the right durations: VPUTT ≈ 1.8 s, VFAIRWY ≈ 2.6 s. The speaker cone integrates the pulse train; your ear finishes the job.

Step 6: Why this was clever in 1987

Step back and look at what Access pulled off with the tools of the era.

They beat the hardware. The PC speaker was designed for beeps. Getting 6-bit PCM out of it means the CPU is bit-banging a DAC in software, timing pulse widths precisely enough that the aggregate duty cycle traces a speech waveform. There is no DMA, no sound buffer, no hardware mixer — the 8088 is personally responsible for every edge, thousands of times a second, while also (in the full game) running the golf simulation. RealSound's real innovation isn't the PWM idea in the abstract; it's making it robust across the wildly varying clock speeds of the PC-compatible market, which is exactly what the patent is about.

They beat the storage budget. Uncompressed, these voice lines would be tens of kilobytes each — a lot on a diskette in 1987. So they didn't ship raw PCM; they shipped entropy-coded PCM. A Huffman tree over 6-bit samples exploits the fact that speech spends most of its time near the center of the amplitude range: common sample values get short bit codes, rare loud peaks get long ones. The result is that a ~2.6-second line fits in about 15 KB, tree and all. Building a Huffman codec, in assembly, that decodes fast enough to feed an 8 kHz playback loop on an 8088 — with cycles to spare for the game — is genuinely impressive engineering.

They made pragmatic format choices. The two-array tree layout trades a little space (two parallel planes) for decode speed (children by add-and-index instead of pointer dereference). The separated bit stream keeps the hot loop's memory access patterns simple. The self-modifying setup moves per-file constants out of the inner loop entirely. None of these are the textbook way to write a Huffman decoder; all of them are the right way to write one for a 4.77 MHz CPU that also has a golf game to run. This is what optimization looked like when the constraints were measured in cycles and kilobytes rather than cloud dollars.

There's a lovely irony in all of it: the "sound card" here is a $0 piezo beeper, and the thing making it sing is a decompression algorithm and a stopwatch.

Step 7: A standalone converter

Everything above collapses into a small tool. Decoding is pure standard library — read the header, walk the tree per the bit stream, emit 6-bit samples — and the only external dependency is an optional low-pass filter that approximates the speaker/ear reconstruction and smooths the PWM edges for modern playback:

python3 rsh2wav.py *.RSH -o out                 # 8-bit WAV @ 8000 Hz
python3 rsh2wav.py *.RSH -o out --lowpass 2800  # gentle speaker-style filtering

Here is the converter in full. It stays deliberately dependency-light: the decode routine and the WAV writer lean on nothing but the standard library, and the optional low-pass filter quietly no-ops if NumPy and SciPy aren't installed. Everything the previous six steps described is right here, expressed as runnable code — the header parse, the two-plane tree walk, the flag-bit stream, and the 6-bit-to-WAV emit:

#!/usr/bin/env python3
"""
rsh2wav.py - Convert Access Software RealSound .RSH files (World Class Leader
Board and related Access titles) to WAV.

Reverse-engineered from GOLF.EXE's decode routine. The codec is a Huffman-style
tree walk driven by a separate flag-bit stream (NOT the RLE used for graphics):

  Header (little-endian words):
    word0 @0 : magic = 0x0004
    word1 @2 : number of output samples
    word2 @4 : tree size in bytes
  Tree @6 .. 6+word2 : two parallel child arrays, each word2/2 bytes.
    For a node at byte offset `si`, its left child code  = word[si]
                                   its right child code = word[si + word2/2]
  Flag bits @ word2+6 .. EOF : one bit per step, MSB-first within each byte.

  Decoding each of word1 samples:
    si = root = word2/2 - 2 + 6
    loop:
      bit = next flag bit          # 0 -> left child, 1 -> right child
      code = word[si] if bit==0 else word[si + word2/2]
      if code >= 514:  si = code - 508        # internal node: descend
      else:            emit (code - 2) // 2   # leaf: 6-bit sample; stop
      (508 = 514 - 6, where 6 is the tree base offset bp)

Samples are ~6-bit PWM pulse-width values (the game feeds them to PIT ch2 to
drive the PC speaker). They are emitted here as 8-bit unsigned PCM WAV.
Playback rate is ~8000 Hz (adjustable via --rate). A gentle low-pass (--lowpass)
approximates the speaker/ear reconstruction and smooths the PWM edges.
"""
import sys, os, wave, argparse

def _w(b, o):
    return b[o] | (b[o + 1] << 8)

def decode_rsh(raw):
    """Return a bytearray of 8-bit unsigned PCM samples."""
    if len(raw) < 6 or _w(raw, 0) != 0x0004:
        raise ValueError("not a RealSound .RSH file (bad magic)")
    n_samples = _w(raw, 2)
    tree_size = _w(raw, 4)
    bp        = 6
    split     = tree_size // 2
    root      = split - 2 + bp
    ptr_sub   = 0x202 - bp          # 508
    flag_off  = 2 + tree_size + 4   # tree_size + 6

    out  = bytearray()
    bx   = flag_off
    mask = 0x80
    for _ in range(n_samples):
        si = root
        guard = 0
        while guard < 100000:
            guard += 1
            if bx >= len(raw):
                return out
            bit = 1 if (raw[bx] & mask) else 0
            if mask == 1:          # ror dl,1 ; adc bx,0  (MSB-first)
                mask = 0x80; bx += 1
            else:
                mask >>= 1
            off = si if bit == 0 else si + split
            if off + 1 >= len(raw):
                return out
            code = _w(raw, off)
            if code >= 0x202:
                si = code - ptr_sub
            else:
                out.append(((code - 2) >> 1) & 0xFF)
                break
    return out

def _normalize(samples):
    """Stretch the (mostly 6-bit) values across the 8-bit range for playback."""
    if not samples:
        return samples
    vals = sorted(samples)
    lo = vals[max(0, int(len(vals) * 0.003))]
    hi = vals[min(len(vals) - 1, int(len(vals) * 0.997))]
    if hi <= lo:
        return bytes(samples)
    span = hi - lo
    return bytes(min(255, max(0, (s - lo) * 255 // span)) for s in samples)

def _lowpass(samples, cutoff, rate):
    try:
        import numpy as np
        from scipy.signal import butter, filtfilt
    except ImportError:
        sys.stderr.write("(scipy/numpy not available; skipping low-pass)\n")
        return samples
    import numpy as np
    x = np.frombuffer(bytes(samples), dtype=np.uint8).astype(float)
    b, a = butter(4, cutoff / (rate / 2.0), 'low')
    y = filtfilt(b, a, x)
    lo, hi = np.percentile(y, 0.3), np.percentile(y, 99.7)
    if hi <= lo: hi = lo + 1
    return np.clip((y - lo) / (hi - lo) * 255, 0, 255).astype(np.uint8).tobytes()

def convert(path, rate=8000, lowpass=None, normalize=True):
    raw = open(path, 'rb').read()
    samples = decode_rsh(raw)
    if normalize and not lowpass:
        samples = _normalize(samples)
    if lowpass:
        samples = _lowpass(samples, lowpass, rate)
    return samples

def write_wav(samples, rate, out_path):
    with wave.open(out_path, 'wb') as w:
        w.setnchannels(1)
        w.setsampwidth(1)      # 8-bit unsigned PCM
        w.setframerate(rate)
        w.writeframes(bytes(samples))

def main():
    ap = argparse.ArgumentParser(description="Convert RealSound .RSH files to WAV")
    ap.add_argument('inputs', nargs='+', help=".RSH file(s)")
    ap.add_argument('-r', '--rate', type=int, default=8000, help="sample rate Hz (default 8000)")
    ap.add_argument('-o', '--outdir', default='.', help="output directory")
    ap.add_argument('--lowpass', type=int, default=None, metavar='HZ',
                    help="apply low-pass at HZ (e.g. 2800) for smoother playback")
    ap.add_argument('--raw', action='store_true', help="do not normalize levels")
    args = ap.parse_args()
    os.makedirs(args.outdir, exist_ok=True)
    for p in args.inputs:
        try:
            s = convert(p, args.rate, args.lowpass, normalize=not args.raw)
        except Exception as e:
            print(f"{os.path.basename(p)}: ERROR {e}"); continue
        base = os.path.splitext(os.path.basename(p))[0]
        outp = os.path.join(args.outdir, base + '.wav')
        write_wav(s, args.rate, outp)
        print(f"{os.path.basename(p):16s} -> {outp}  ({len(s)} samples, {len(s)/args.rate:.2f}s)")

if __name__ == '__main__':
    main()

All 23 sounds come out as ordinary WAV files — no emulator, no DOSBox, no PC speaker required. Thirty-nine years after they were baked into a golf game, the announcer plays back clean on anything that can open a .wav — the very files sitting in the soundboard at the top of this page.

The .RSH codec: [u16 magic=4][u16 sample_count][u16 tree_size] header, a two-plane Huffman child tree of tree_size bytes, then an MSB-first flag-bit stream. Descend on codes ≥ 514 (si = code − 508), emit (code − 2)/2 on leaves, repeat sample_count times. Play at ~8 kHz as 6-bit PWM.

World Class
Leader Board

The Soundboard

Announcer — RealSound speech

Sound Effects

Decoding RealSound: Pulling 1987 Speech Out of a DOS Golf Game

Step 1: Cracking open the disk image

Step 2: What RealSound actually is

Step 3: Finding the decoder in `GOLF.EXE`

Step 4: The format — a two-array Huffman tree with a side-channel bit stream

Step 5: The playback engine

Step 6: Why this was clever in 1987

Step 7: A standalone converter

The Soundboard

Announcer — RealSound speech

Sound Effects

Step 1: Cracking open the disk image

Step 2: What RealSound actually is

Step 3: Finding the decoder in GOLF.EXE

Step 4: The format — a two-array Huffman tree with a side-channel bit stream

Step 5: The playback engine

Step 6: Why this was clever in 1987

Step 7: A standalone converter

Step 3: Finding the decoder in `GOLF.EXE`