Thursday, August 31, 2006

My shortest program

The shortest program I ever expended significant intellectual energy on occupied 116 bits on the Bendix G-15. It was a variant of the "four-word memory clear" routine. [1]

Background: Around 1960, programmers still took bugs rather personally. If you were running a program (for other than debugging purposes) and the machine crashed or started misbehaving, you didn't just reboot it, you stopped what you were doing and tried to track down the cause. Basically, there were two:
  • Hardware failure. This is what we always hoped would be the cause. Mean time between failure was measured in days or weeks, not months or years, so the first thing to do was to run the hardware diagnostics. [2] If they found a problem, call the hardware engineer. [3]
  • Software failure. Since there was no operating system, this basically meant a bug in the program you were running. If it was written by someone else, you either passed the buck immediately, or tried to score points by finding the other programmer's error before he did. If you wrote the program yourself, tough luck--and back to debugging.
Why memory clear? As is still the case today, the hardest bugs to track down were the ones that didn't happen repeatably. They were much harder to locate, and even if you found a bug, it was harder to be sure that you had fixed the original problem.

On the G-15, there were few sources of non-determinism. There was no operating system, no multi-programming, no behind-the-scenes virtual memory or page swapping. I/O was asynchronous, but rarely the source of the problem. Most of the time it was the reliance of the program on some memory location or bit of machine state that it had failed to initialize, making the program's behavior dependent on whatever the previous program had left.

An obvious first step in trying to narrow the possibilities when looking for a non-repeatable bug was to load the program into a machine that had been wiped clean--not just the contents of memory and the registers, but all the other pieces of machine state (such as the overflow flag) that a program might depend on. [4] If the problem completely vanished or became repeatable, there was an initialization error; if not, keep looking.

A second reason for clearing memory before running a program was to get clean memory dumps--anything non-zero must have been put there by the program. But this was less important, since memory dumps (via the 10 cps typewriter) were a last resort.

Why four words? The length of a memory clear routine might not seem very interesting or important. Just keep a (paper) tape of the program handy and read it in and execute as needed. Well, most of us could type four words faster than we could locate a tape, mount it, read it in, and dismount it. Four words was "magic," because that was the length of line 23, which the I/O system used to buffer input, and which was also command line 7. There was a short incantation (something like "ENABLE sc7fq") that activated typewriter input and transfered control to 23:00. So it was easy to type in and execute a four-word routine. Anything longer required transfering to another command line, and quickly got complicated. Besides, why memorize five or ten words if four would do?

Desiderata: The main requirement of a memory clear routine was to clear the long lines, 0 to 19. But it was highly desirable to also clear the short lines, 20 to 23, and the registers, lines 24 to 26 and 28. And there were two bits of state associated with arithmetic that couldn't be cleared by ENABLEd typewriter commands: the overflow bit, and the sign bit associated with multiplication and division.

Difficulty: How hard it is to write a memory clear routine obviously depends to a large extent on the machine's instruction set and memory architecture. On the IBM 1620, memory could be cleared by a single instruction typed in from the console. On many other machines it is practically impossible, because the routine needs to both clear itself out and clear any registers it has been using. [5] The G-15 fell in between these extremes, due to its block commands that made it possible for one instruction to clear a complete memory line (saving line 23 for last).

My attempt: At that age, I firmly believed that I could "improve anyone's code." So, having been mightily impressed by the four-word memory clear, I decided to write a three-word version. It occupied much of my spare time for weeks, perhaps months, as I tried every devious trick and scheme I could think of and scoured the hardware's logic diagrams for undocumented features that might provide some assistance. But in the end, the shortest program I could write that satisfied the desiderata given above required three instructions and one constant, for a total of four words. I salvaged some pride by noting that the constant had a very simple and memorable form (minus zero), so my routine was marginally easier to memorize and quicker to type, but it was a small gain for the effort expended. [6]

Notes:
[1] I have been credited with writing the original four-word version of this program, but I'm pretty sure it was someone else, because
a) I remember my astonishment at the cleverness of the program when I first encountered it;
b) I typed it on my Programmer's Reference Card before it was laminated, which suggests that I had access to it very early in my career as a machine-language programmer;
c) I don't think I would have worked so hard trying to shorten it to three words if I had been the person who established four words as the record.
[2] I recently discovered, quite by accident, that my home PC came equipped with a set of hardware diagnostic routines, but I never run them.
[3] Programmers would sometimes swear that there must be a hardware failure that the diagnostics were failing to detect. Hardware engineer par excellence Takashi Yogi (photo) would always respond, "I'll bet on the hardware," and he almost always won.
[4] Some of these, such as the NCAR (take Next Command from Accumulator Register) bit, could be reset from the typewriter, using keys in combination with the ENABLE switch.* It is the others we are concerned with here.
* In the original G-15, NCAR was not resettable from the typewriter, meaning that the machine could be put into a tight infinite loop by placing an NCAR command in AR and then executing it. Recovery then required shutting off machine power, waiting for the drum to spin down, turning power back on, waiting for the drum to spin back up, and then performing initialization, including reading in the "number track" and running the hardware diagnostics. This was something you definitely did not want to do--it took even longer than rebooting Windows does today, and the mechanical, thermal, and electrical stresses involved increased the probability of hardware failure.
[5] You may find it educational or entertaining to see how many instructions it takes on your favorite instruction set architecture.
[6] At this point, I would love to give both the original and the improved version of the program, but--although they once seemed engraved on my cortex--after 45 years I have forgotten them (except for the value of the constant). They probably reside on pieces of paper in my garage, but that is a needle-in-a-haystack sort of a challenge.

3 Comments:

Comment by Anonymous Thad Smith:

I remember seeing the G-15 clear routine. Back in the late 60s, the math department at college at Oklahoma University had a couple of donated G-15s and it was easier to get time on them than the newer computers. Before I looked at the instructions I tried to duplicate it. Try as I could, I only got down to 6 instructions. "Maybe that code was 6 and I only thought it was 4." Nope. When I looked at the instructions, I was astounded at the cleverness.

As I recall the first instuction cleared the accumulator. What a waste of an instruction! It executed a carefully coded instruction (which did a block copy from the accumulator), placed the magic instruction in the accumulator by a block copy, I think. It modified it by decrementing a certain number of word times (3?), copied the instruction back into the line 23 with the same block copy, which probably left the accumulator zero, then executed the instruction again. It cleared all the registers and even turned off the optional "digital differential analyzer" or something like that!

6:43 PM  
Comment by Anonymous Henry Troup:

As late as the 1980's, mainframe programmers still sometimes cared immensely about code size. In the 80's I programmed on VM/CMS. In CMS, there was an 8K "transient area" which was favored for particular kinds of programs, especially utilities to be called from scipts (EXECs). (Actually, it's still there in VM/ESA!)

We had a transient area utility called SCANFILE, sort of like grep. Adding features to this was an art, as the 8K limit was just bytes away. I think I enhanced it twice with new options, but I clearly remember finding ways to squeeze the last few bytes of overflow out.

8:54 AM  
Comment by Blogger Jim Horning:

Thad Smith's comment has jogged my memory enough to pretty well reconstruct the original four-word memory clear routine (though not the creative process by which it was devised). The key was that counter-intuitive first command, which cleared AR. The command itself was only executed once, but the zero in AR was the source of a third of the zeros written to memory and was executed every time around the loop.

Let me explain. The work was done in a loop of the form:
CLEAR
SWAP
NOOP
INCREMENT
SWAP

Yes, this was a five-command loop in a four-word line. The two SWAPS were the same command, and CLEAR and NOOP occupied the same location--at different times.

CLEAR was initially an ordinary one-line (108 word) block copy from AR (line 28, containing that all-important ZERO) to line 0.
[Why line 28, rather than line 29, which was a constant source of zeros? Read on.]

SWAP was a command with source and destination 23 (the line containing the routine), with characteristic 2, which caused 23 to be copied to AR, and AR to 23. The first SWAP in the loop exchanged the ZERO and CLEAR, so CLEAR could be incremented.

NOOP was that ZERO again, now in line 23. It copied line 0 to line 0; its only relevant effect was to select the next command from word zero (of line 23), whereas the command following the other execution of SWAP was the next CLEAR.

INCREMENT was, in fact, a Shift command that had the side effect of incrementing AR by the number of places shifted, namely 3. This compressed into one command what would normally have been an Add command plus a constant.
[This property of the Shift command was very useful for floating point arithmetic routines.]
[Why 3? Read on.]

SWAP again switched the ZERO and the (now incremented) CLEAR. On successive executions, CLEAR thus cleared lines 3, 6, 9, ...

Why 3? It might seem natural to clear lines 0, 1, 2, ... in that order. The problem would be that this would clear line 23 before getting to the registers (24, 25, and 26; 28 was already clear). Starting with the registers and working down would have the same problem. But 3 is relatively prime to 32, so the sequence of destinations generated by successively adding 3 would be
0, 3, 6, ..., 21, 24, 27, 30, 1, 4, ..., 28, 31, 2, 5, ... 23, 26, 29, 0, ...
Thus counting by 3 mod 32 would generate all 32 destinations, but in an order that put 23 near the end. When it cleared itself, the routine "ended" (actually, just started executing a continuous sequence of noops).

What about 26 and 29? 23 was not actually the last number in the sequence before it repeated. However, destination 29 was not a real line; it was the destination used to add to AR. Destination 26 was real, and the reason it could be omitted was more subtle. To optimize multiplication and division, loading a value into line 25 (the multiplicand/denominator register) had the side effect of clearing line 26 (the product/numerator register). As a further bonus, it also set the "sign" register (IP) to that of the value being copied in (i.e., cleared it).

Why start using line 28? INCREMENT performed ordinary arithmetic when adding 3 to AR. Thus, there was an overflow from the destination field to the source field after incrementing destinations 30 and 31. So it was necessary to have three consecutive sources of zero. Line 28 was initialized by the first instruction. Source 29 was a constant source of zeros. Source 30 was the logical and of line 21 and not line 20. By the time source 30 was used, line 21 had already been cleared (!), so source 30 was zero, too.

What about destinations 27 and 31? Destination 27 was used to test for a non-zero value, so copying a zero to it was a noop. Destination 31 indicated a special command. The source-destination combination 29 31 tested for overflow--and reset the overflow flag (!). However, this placed a constraint on the order of the instructions; if the overflow flag was on, the next instruction after CLEAR would be taken from the location one word-time later than SWAP--this had to be harmless.

Word 0: The really astute reader will have noticed that the initial command to clear AR was in location 0, and I said it was only executed once. However, the NOOP necessarily had a next-command field of 0, so how could it tranfser control to the INCREMENT, rather than the initial command? Only if the routine put INCREMENT in word 0. The final bit of cleverness was the third use of SWAP. When used in the loop, it executed for a single word time, thereby exchanging AR and a single word of line 23. But it was a block command, and its first execution was at a different word time than all the others. It precessed line 23 to place INCREMENT in word 0.

So many properties of the G-15 had to be "just right" for this routine to work that one is inescapably driven to the conclusion that the G-15 had an Intelligent Designer who intended for it to be clearable by a four-word routine.

6:42 PM  

Post a Comment

<< The Way It Was home