Decoding RealSound: Pulling 1987 Speech Out of a DOS Golf Game
There is a particular kind of magic to hearing a full sentence of digitized human speech come out of a machine that, on paper, can only produce a square wave. In 1987, Access Software shipped exactly that in World Class Leader Board: an announcer who reacts to your golf shots — "straight onto the fairway," "he's deep in the sandtrap" — all playing through the internal PC speaker, a device with a single output bit.
This post walks through how those sounds are stored, how the decoder works, and what the engineering tells us about squeezing real audio through a 1-bit hole a decade before consumer sound cards were universal. We'll go from a raw disk image down to the bit level of the codec, and end with a clean, dependency-free converter.
Step 1: Cracking open the disk image
The starting point was a 3 MB floppy/hard-disk image (WC_LeaderBoard.img). A
quick look at the boot sector:
DOS/MBR boot sector, FREE-DOS MBR;
partition 1 : ID=0x6 (FAT16), active, start sector 8, 6136 sectors
Inside the FAT16 partition (MSDOS5.0 OEM string, 512-byte sectors, one sector
per cluster, 512 root entries, two FATs) sat the game's files. Rather than reach
for mtools, it's a few dozen lines of Python to walk a FAT16 volume directly:
read the BPB to locate the reserved area, the two FAT copies, and the fixed-size
root directory; then follow cluster chains through the FAT. The directory tree
gave up the whole game, and one group of files stood out:
CHEER.RSH CLAPS.RSH OOHS.RSH SPLASH1.RSH SWING1.RSH ...
VFAIRWY.RSH VPUTT.RSH VTRAP.RSH VTREES.RSH VCANT.RSH
Twenty-three .RSH files: sound effects (swings, putts, splashes, the ball
rattling in the cup, a frog, crickets, crowd noise) and a set of V* files that
are clearly the announcer's voice lines. These are the RealSound resources.
Step 2: What RealSound actually is
RealSound is Access Software's branded technique (later covered by US Patent 5,054,086) for playing digitized audio through the PC speaker. The speaker is a one-bit device — the CPU can push its cone out or let it sit in, nothing between. Access got roughly 6-bit PCM out of it using pulse-width modulation (PWM).
The trick: drive the speaker with a fast pulse train and vary the duty cycle. Programmable Interval Timer (PIT) channel 2 is wired to the speaker. If you load it with a small count, the pulse is narrow; a larger count, wider. The speaker cone and, ultimately, your ear act as a low-pass filter, integrating that stream of pulses back into a smooth analog voltage. A pulse width becomes an amplitude. Feed it new widths ~8000 times a second and you have an 8 kHz DAC hiding inside a 1-bit toy.
That explains playback. The interesting part — the part that isn't in the patent — is how the samples are stored on disk, because at 8 kHz, uncompressed 6-bit audio is expensive. A 2.6-second announcer line is ~21,000 samples. Access compressed them, and to understand the container we had to read the decoder out of the game itself.
Step 3: Finding the decoder in GOLF.EXE
The .RSH header is tiny and unrevealing:
| offset | field | value (VPUTT.RSH) |
|---|---|---|
| 0 | word0 |
0x0004 (magic) |
| 2 | word1 |
14732 |
| 4 | word2 |
328 |
After the header, the payload reads as a stream of 16-bit little-endian values
in the 0–767 range — not bytes, and not obviously PCM. That's a compressed
representation, and the only authoritative spec is the code that consumes it. So
into GOLF.EXE we go.
Disassembling 16-bit real-mode code with objdump -b binary -m i8086 and hunting
for the sound machinery — port I/O to 0x42/0x43 (the PIT), a lodsw-driven
loop, comparisons against constants near 512 — turned up a decode routine that
does something delightful and slightly evil: it self-modifies. The setup
reads word1 and word2 out of the file and patches the immediate operands of
the inner loop before running it:
mov es, [1e60h] ; source segment (the loaded .RSH file)
mov si, 2 ; point at word1
mov bp, si
add bp, 4 ; bp = 6 (tree base offset)
mov cx, es:[si] ; cx = word1 = number of output samples
mov ax, es:[si+2] ; ax = word2 = tree size in bytes
shr ax, 1
mov cs:[patch_split], ax ; split = word2/2
sub ax, 2
add ax, bp
mov cs:[patch_root], ax ; root node = word2/2 - 2 + bp
mov ax, 202h ; 514
sub ax, bp
mov cs:[patch_ptrsub], ax ; pointer bias = 514 - bp = 508
Those three patched constants — split = word2/2, root = word2/2 - 2 + 6, and
ptr_bias = 508 — are the entire geometry of the format. With them in hand, the
inner loop becomes readable.
Step 4: The format — a two-array Huffman tree with a side-channel bit stream
The payload after the 6-byte header is three regions:
- Left-child array —
word2/2bytes of 16-bit codes. - Right-child array — another
word2/2bytes, immediately after. - Flag-bit stream — everything from
word2 + 6to end of file.
The two child arrays together form a canonical prefix-code (Huffman) tree,
but stored in a way that's beautifully suited to a slow CPU. Instead of nodes
with explicit left/right pointers, a node is simply a byte offset si, and
its two children are found by pure index arithmetic:
left child code = word[ si ]
right child code = word[ si + word2/2 ]
The word2/2 split is the constant distance between the "left" and "right"
planes. No pointer chasing, no node structs — just two parallel tables and an add.
Each child code is self-describing:
code >= 514→ it's an internal node. Descend:si = code - 508. (The-508unwinds the+6base offset, socodedirectly encodes the child's byte offset in the tree.)code < 514→ it's a leaf. Emit a sample:(code - 2) / 2.
The leaf formula is the giveaway that we're back in audio territory: codes
2..128 map to samples 0..63 — a clean 6-bit value, exactly the RealSound
resolution. (The tree happens to carry 83 distinct leaf symbols spanning the range,
so a handful of louder samples reach higher.)
The direction of each descent — left or right — comes from the flag-bit stream, read MSB-first, one bit per step:
si = root
loop:
bit = next_flag_bit() # 0 -> left, 1 -> right
code = (bit == 0) ? word[si] : word[si + word2/2]
if code >= 514: si = code - 508 # internal node: keep descending
else: emit((code - 2) / 2) # leaf: one 6-bit sample, restart
Repeat exactly word1 times and you have the full sample buffer. Here it is as a
self-contained Python decoder:
def decode_rsh(raw):
def w(o): return raw[o] | (raw[o + 1] << 8)
assert w(0) == 0x0004 # magic
n_samples = w(2)
tree_size = w(4)
split = tree_size // 2
root = split - 2 + 6
flag_off = tree_size + 6 # bit stream starts after the tree
out, bx, mask = bytearray(), flag_off, 0x80
for _ in range(n_samples):
si = root
while True:
bit = 1 if (raw[bx] & mask) else 0
if mask == 1: mask = 0x80; bx += 1 # MSB-first, advance byte on wrap
else: mask >>= 1
code = w(si if bit == 0 else si + split)
if code >= 0x202: # 514: internal node
si = code - 508
else: # leaf: 6-bit sample
out.append(((code - 2) >> 1) & 0xFF)
break
return out
A couple of design notes worth appreciating:
The bit stream is separated from the code tables. Rather than interleave the Huffman branch bits with the node data, Access parks all the navigation bits in one contiguous run at the end of the file. On a 4.77 MHz 8088 that's a real win: the decoder keeps two cursors —
bxmarching linearly through the bit stream,sibouncing around the tree — and the bit cursor never has to chase alignment inside the code tables. It's a classic separation of control from data.word1is a hard sample count, not a byte length. The decoder loops a fixed number of times and trusts the bit stream to be exactly long enough. No end-of-stream sentinel, no length-prefixed blocks — every bit is payload. That is a very 1987 kind of confidence.Self-modifying code as parameterization. Rather than keep
split,root, and the pointer bias in registers or memory and pay for the extra loads inside the hottest loop in the program, the setup bakes them straight into the instruction immediates. On a machine where every cycle in an 8000-times-a-second inner loop counts, patchingcmp ax, imm16beats a memory fetch. It also makes static analysis mildly infuriating, which is a bonus.
Verifying the decode is unusually satisfying because the format is
self-checking. A correct entropy decoder is a house of cards: get the bit order
or the child mapping even slightly wrong and it collapses into noise almost
immediately. When we ran it, the output produced exactly word1 samples,
used exactly 83 distinct values (matching the tree's 83 leaves), and those
values landed cleanly in the 6-bit range. Flip the bit order or swap the child
arrays and the sample-to-sample correlation drops to zero. Only one interpretation
holds together — which is how you know it's the right one.
Step 5: The playback engine
With samples in hand, the last question is: how fast do they play, and how do they reach the speaker? The playback ISR answers both. It's a timer interrupt (PIT channel 0, reprogrammed to fire at ~16.5 kHz) that drives PIT channel 2 as the PWM DAC:
mov ax, ss:[0] ; current (countdown, sample)
out 42h, al ; write sample as channel-2 pulse width -> speaker
dec ah
je load_next ; every couple of ticks, advance to the next sample
...
load_next:
lods al ; next sample byte from the decoded buffer
mov ah, 2 ; two carrier ticks per sample
Each decoded sample is pushed to port 0x42 as a channel-2 count — a pulse whose
width is the amplitude. The ISR services two carrier ticks per sample, so the
effective sample rate is roughly 8 kHz (the ~16.5 kHz timer divided down). At
that rate the announcer lines clock in at the right durations: VPUTT ≈ 1.8 s,
VFAIRWY ≈ 2.6 s. The speaker cone integrates the pulse train; your ear finishes
the job.
Step 6: Why this was clever in 1987
Step back and look at what Access pulled off with the tools of the era.
They beat the hardware. The PC speaker was designed for beeps. Getting 6-bit PCM out of it means the CPU is bit-banging a DAC in software, timing pulse widths precisely enough that the aggregate duty cycle traces a speech waveform. There is no DMA, no sound buffer, no hardware mixer — the 8088 is personally responsible for every edge, thousands of times a second, while also (in the full game) running the golf simulation. RealSound's real innovation isn't the PWM idea in the abstract; it's making it robust across the wildly varying clock speeds of the PC-compatible market, which is exactly what the patent is about.
They beat the storage budget. Uncompressed, these voice lines would be tens of kilobytes each — a lot on a diskette in 1987. So they didn't ship raw PCM; they shipped entropy-coded PCM. A Huffman tree over 6-bit samples exploits the fact that speech spends most of its time near the center of the amplitude range: common sample values get short bit codes, rare loud peaks get long ones. The result is that a ~2.6-second line fits in about 15 KB, tree and all. Building a Huffman codec, in assembly, that decodes fast enough to feed an 8 kHz playback loop on an 8088 — with cycles to spare for the game — is genuinely impressive engineering.
They made pragmatic format choices. The two-array tree layout trades a little space (two parallel planes) for decode speed (children by add-and-index instead of pointer dereference). The separated bit stream keeps the hot loop's memory access patterns simple. The self-modifying setup moves per-file constants out of the inner loop entirely. None of these are the textbook way to write a Huffman decoder; all of them are the right way to write one for a 4.77 MHz CPU that also has a golf game to run. This is what optimization looked like when the constraints were measured in cycles and kilobytes rather than cloud dollars.
There's a lovely irony in all of it: the "sound card" here is a $0 piezo beeper, and the thing making it sing is a decompression algorithm and a stopwatch.
Step 7: A standalone converter
Everything above collapses into a small tool. Decoding is pure standard library — read the header, walk the tree per the bit stream, emit 6-bit samples — and the only external dependency is an optional low-pass filter that approximates the speaker/ear reconstruction and smooths the PWM edges for modern playback:
python3 rsh2wav.py *.RSH -o out # 8-bit WAV @ 8000 Hz
python3 rsh2wav.py *.RSH -o out --lowpass 2800 # gentle speaker-style filtering
Here is the converter in full. It stays deliberately dependency-light: the decode routine and the WAV writer lean on nothing but the standard library, and the optional low-pass filter quietly no-ops if NumPy and SciPy aren't installed. Everything the previous six steps described is right here, expressed as runnable code — the header parse, the two-plane tree walk, the flag-bit stream, and the 6-bit-to-WAV emit:
#!/usr/bin/env python3
"""
rsh2wav.py - Convert Access Software RealSound .RSH files (World Class Leader
Board and related Access titles) to WAV.
Reverse-engineered from GOLF.EXE's decode routine. The codec is a Huffman-style
tree walk driven by a separate flag-bit stream (NOT the RLE used for graphics):
Header (little-endian words):
word0 @0 : magic = 0x0004
word1 @2 : number of output samples
word2 @4 : tree size in bytes
Tree @6 .. 6+word2 : two parallel child arrays, each word2/2 bytes.
For a node at byte offset `si`, its left child code = word[si]
its right child code = word[si + word2/2]
Flag bits @ word2+6 .. EOF : one bit per step, MSB-first within each byte.
Decoding each of word1 samples:
si = root = word2/2 - 2 + 6
loop:
bit = next flag bit # 0 -> left child, 1 -> right child
code = word[si] if bit==0 else word[si + word2/2]
if code >= 514: si = code - 508 # internal node: descend
else: emit (code - 2) // 2 # leaf: 6-bit sample; stop
(508 = 514 - 6, where 6 is the tree base offset bp)
Samples are ~6-bit PWM pulse-width values (the game feeds them to PIT ch2 to
drive the PC speaker). They are emitted here as 8-bit unsigned PCM WAV.
Playback rate is ~8000 Hz (adjustable via --rate). A gentle low-pass (--lowpass)
approximates the speaker/ear reconstruction and smooths the PWM edges.
"""
import sys, os, wave, argparse
def _w(b, o):
return b[o] | (b[o + 1] << 8)
def decode_rsh(raw):
"""Return a bytearray of 8-bit unsigned PCM samples."""
if len(raw) < 6 or _w(raw, 0) != 0x0004:
raise ValueError("not a RealSound .RSH file (bad magic)")
n_samples = _w(raw, 2)
tree_size = _w(raw, 4)
bp = 6
split = tree_size // 2
root = split - 2 + bp
ptr_sub = 0x202 - bp # 508
flag_off = 2 + tree_size + 4 # tree_size + 6
out = bytearray()
bx = flag_off
mask = 0x80
for _ in range(n_samples):
si = root
guard = 0
while guard < 100000:
guard += 1
if bx >= len(raw):
return out
bit = 1 if (raw[bx] & mask) else 0
if mask == 1: # ror dl,1 ; adc bx,0 (MSB-first)
mask = 0x80; bx += 1
else:
mask >>= 1
off = si if bit == 0 else si + split
if off + 1 >= len(raw):
return out
code = _w(raw, off)
if code >= 0x202:
si = code - ptr_sub
else:
out.append(((code - 2) >> 1) & 0xFF)
break
return out
def _normalize(samples):
"""Stretch the (mostly 6-bit) values across the 8-bit range for playback."""
if not samples:
return samples
vals = sorted(samples)
lo = vals[max(0, int(len(vals) * 0.003))]
hi = vals[min(len(vals) - 1, int(len(vals) * 0.997))]
if hi <= lo:
return bytes(samples)
span = hi - lo
return bytes(min(255, max(0, (s - lo) * 255 // span)) for s in samples)
def _lowpass(samples, cutoff, rate):
try:
import numpy as np
from scipy.signal import butter, filtfilt
except ImportError:
sys.stderr.write("(scipy/numpy not available; skipping low-pass)\n")
return samples
import numpy as np
x = np.frombuffer(bytes(samples), dtype=np.uint8).astype(float)
b, a = butter(4, cutoff / (rate / 2.0), 'low')
y = filtfilt(b, a, x)
lo, hi = np.percentile(y, 0.3), np.percentile(y, 99.7)
if hi <= lo: hi = lo + 1
return np.clip((y - lo) / (hi - lo) * 255, 0, 255).astype(np.uint8).tobytes()
def convert(path, rate=8000, lowpass=None, normalize=True):
raw = open(path, 'rb').read()
samples = decode_rsh(raw)
if normalize and not lowpass:
samples = _normalize(samples)
if lowpass:
samples = _lowpass(samples, lowpass, rate)
return samples
def write_wav(samples, rate, out_path):
with wave.open(out_path, 'wb') as w:
w.setnchannels(1)
w.setsampwidth(1) # 8-bit unsigned PCM
w.setframerate(rate)
w.writeframes(bytes(samples))
def main():
ap = argparse.ArgumentParser(description="Convert RealSound .RSH files to WAV")
ap.add_argument('inputs', nargs='+', help=".RSH file(s)")
ap.add_argument('-r', '--rate', type=int, default=8000, help="sample rate Hz (default 8000)")
ap.add_argument('-o', '--outdir', default='.', help="output directory")
ap.add_argument('--lowpass', type=int, default=None, metavar='HZ',
help="apply low-pass at HZ (e.g. 2800) for smoother playback")
ap.add_argument('--raw', action='store_true', help="do not normalize levels")
args = ap.parse_args()
os.makedirs(args.outdir, exist_ok=True)
for p in args.inputs:
try:
s = convert(p, args.rate, args.lowpass, normalize=not args.raw)
except Exception as e:
print(f"{os.path.basename(p)}: ERROR {e}"); continue
base = os.path.splitext(os.path.basename(p))[0]
outp = os.path.join(args.outdir, base + '.wav')
write_wav(s, args.rate, outp)
print(f"{os.path.basename(p):16s} -> {outp} ({len(s)} samples, {len(s)/args.rate:.2f}s)")
if __name__ == '__main__':
main()
All 23 sounds come out as ordinary WAV files — no emulator, no DOSBox, no PC
speaker required. Thirty-nine years after they were baked into a golf game, the
announcer plays back clean on anything that can open a .wav — the very files
sitting in the soundboard at the top of this page.
The .RSH codec: [u16 magic=4][u16 sample_count][u16 tree_size] header, a
two-plane Huffman child tree of tree_size bytes, then an MSB-first flag-bit
stream. Descend on codes ≥ 514 (si = code − 508), emit (code − 2)/2 on leaves,
repeat sample_count times. Play at ~8 kHz as 6-bit PWM.