Thursday, August 31, 2006

My shortest program

The shortest program I ever expended significant intellectual energy on occupied 116 bits on the Bendix G-15. It was a variant of the "four-word memory clear" routine. [1]

Background: Around 1960, programmers still took bugs rather personally. If you were running a program (for other than debugging purposes) and the machine crashed or started misbehaving, you didn't just reboot it, you stopped what you were doing and tried to track down the cause. Basically, there were two:
  • Hardware failure. This is what we always hoped would be the cause. Mean time between failure was measured in days or weeks, not months or years, so the first thing to do was to run the hardware diagnostics. [2] If they found a problem, call the hardware engineer. [3]
  • Software failure. Since there was no operating system, this basically meant a bug in the program you were running. If it was written by someone else, you either passed the buck immediately, or tried to score points by finding the other programmer's error before he did. If you wrote the program yourself, tough luck--and back to debugging.
Why memory clear? As is still the case today, the hardest bugs to track down were the ones that didn't happen repeatably. They were much harder to locate, and even if you found a bug, it was harder to be sure that you had fixed the original problem.

On the G-15, there were few sources of non-determinism. There was no operating system, no multi-programming, no behind-the-scenes virtual memory or page swapping. I/O was asynchronous, but rarely the source of the problem. Most of the time it was the reliance of the program on some memory location or bit of machine state that it had failed to initialize, making the program's behavior dependent on whatever the previous program had left.

An obvious first step in trying to narrow the possibilities when looking for a non-repeatable bug was to load the program into a machine that had been wiped clean--not just the contents of memory and the registers, but all the other pieces of machine state (such as the overflow flag) that a program might depend on. [4] If the problem completely vanished or became repeatable, there was an initialization error; if not, keep looking.

A second reason for clearing memory before running a program was to get clean memory dumps--anything non-zero must have been put there by the program. But this was less important, since memory dumps (via the 10 cps typewriter) were a last resort.

Why four words? The length of a memory clear routine might not seem very interesting or important. Just keep a (paper) tape of the program handy and read it in and execute as needed. Well, most of us could type four words faster than we could locate a tape, mount it, read it in, and dismount it. Four words was "magic," because that was the length of line 23, which the I/O system used to buffer input, and which was also command line 7. There was a short incantation (something like "ENABLE sc7fq") that activated typewriter input and transfered control to 23:00. So it was easy to type in and execute a four-word routine. Anything longer required transfering to another command line, and quickly got complicated. Besides, why memorize five or ten words if four would do?

Desiderata: The main requirement of a memory clear routine was to clear the long lines, 0 to 19. But it was highly desirable to also clear the short lines, 20 to 23, and the registers, lines 24 to 26 and 28. And there were two bits of state associated with arithmetic that couldn't be cleared by ENABLEd typewriter commands: the overflow bit, and the sign bit associated with multiplication and division.

Difficulty: How hard it is to write a memory clear routine obviously depends to a large extent on the machine's instruction set and memory architecture. On the IBM 1620, memory could be cleared by a single instruction typed in from the console. On many other machines it is practically impossible, because the routine needs to both clear itself out and clear any registers it has been using. [5] The G-15 fell in between these extremes, due to its block commands that made it possible for one instruction to clear a complete memory line (saving line 23 for last).

My attempt: At that age, I firmly believed that I could "improve anyone's code." So, having been mightily impressed by the four-word memory clear, I decided to write a three-word version. It occupied much of my spare time for weeks, perhaps months, as I tried every devious trick and scheme I could think of and scoured the hardware's logic diagrams for undocumented features that might provide some assistance. But in the end, the shortest program I could write that satisfied the desiderata given above required three instructions and one constant, for a total of four words. I salvaged some pride by noting that the constant had a very simple and memorable form (minus zero), so my routine was marginally easier to memorize and quicker to type, but it was a small gain for the effort expended. [6]

[1] I have been credited with writing the original four-word version of this program, but I'm pretty sure it was someone else, because
a) I remember my astonishment at the cleverness of the program when I first encountered it;
b) I typed it on my Programmer's Reference Card before it was laminated, which suggests that I had access to it very early in my career as a machine-language programmer;
c) I don't think I would have worked so hard trying to shorten it to three words if I had been the person who established four words as the record.
[2] I recently discovered, quite by accident, that my home PC came equipped with a set of hardware diagnostic routines, but I never run them.
[3] Programmers would sometimes swear that there must be a hardware failure that the diagnostics were failing to detect. Hardware engineer par excellence Takashi Yogi (photo) would always respond, "I'll bet on the hardware," and he almost always won.
[4] Some of these, such as the NCAR (take Next Command from Accumulator Register) bit, could be reset from the typewriter, using keys in combination with the ENABLE switch.* It is the others we are concerned with here.
* In the original G-15, NCAR was not resettable from the typewriter, meaning that the machine could be put into a tight infinite loop by placing an NCAR command in AR and then executing it. Recovery then required shutting off machine power, waiting for the drum to spin down, turning power back on, waiting for the drum to spin back up, and then performing initialization, including reading in the "number track" and running the hardware diagnostics. This was something you definitely did not want to do--it took even longer than rebooting Windows does today, and the mechanical, thermal, and electrical stresses involved increased the probability of hardware failure.
[5] You may find it educational or entertaining to see how many instructions it takes on your favorite instruction set architecture.
[6] At this point, I would love to give both the original and the improved version of the program, but--although they once seemed engraved on my cortex--after 45 years I have forgotten them (except for the value of the constant). They probably reside on pieces of paper in my garage, but that is a needle-in-a-haystack sort of a challenge.


Tuesday, August 29, 2006

G-15 Instruction Set Architecture

[Skip this post if you're interested only in my stories.
It is expository, rather than anecdotal, part of my discussion of the Bendix G-15 architecture, which will feature in some of my stories. It relies on material from the Bit-serial arithmetic and G-15 Memory Model posts. [1]]

"Computers were expensive,
programmers were cheap." [2]

The basic structure of the G-15 involved 28 drum lines. In normal operation, the write head of each was connected to the corresponding read head, recirculating an amount of information that depended on the physical distance between the two heads. A key to minimizing the necessary hardware was that all buses were one bit wide.

There was an important exception to recirculation: Execution of an instruction could cause information from a source (any of the drum lines plus certain bitwise functions of particular drum lines) to a destination (any of the drum lines) for a period of from 1 to 108 word times. [3]

Instructions could be executed from any of the eight command lines (0-5, 19, and 23) or the accumulator register (AR, line 28). [4]

Each instruction occupied one 29-bit word, formatted into the following fields:
P T B N C S D c
T specified a (7-bit) word time, as did N. S specified a (5-bit) memory line, as did D. The characteristic, C was split: the sign bit (c) represented the value 4; the two bits between N and S were its low-order bits. P and B were 1-bit flags.

Combinations of C, S, and D specified the operation to be performed. S:T and/or D:T specified the address(es) of the operand(s) and/or result. N specified the word position of the next command. [5] B indicated that the command was breakpointed.

Ordinary (Data Transfer) Commands

Copy: The basic command transferred the contents of the word located at S:T to the location D:T, then executed the command located in word N (in the current command line).

This simple command structure could be modified in a number of ways:

Complementation: If C was odd, S:T was complemented if negative.

Addition: The special destinations 29 and 30 caused the contents of S:T to be added to 28:T or 26:T (the single- and double-precision accumulators) and the result stored back in 28:T or 26:T.

Absolute value: C of 2 with D 28 or greater cleared the sign of S:T.

Interchange: C of 2 or 3 with S and D less than 28 transferred the contents of S:T to 28:T (AR) and the contents of 28:T to D:T. [6]

Subtraction: C of 3 with S or D 28 or greater changed the sign of the source operand, then (since C was odd) complemented if negative.

Double-precision: Adding 4 to C (i.e., making the sign bit of the instruction negative) changed the command to double-precision, operating on 58 bits (starting at an even address) instead of 29, treating the even word as the sign and 28 low-order bits, and all 29 bits of the odd word as the high-order bits.

Block (immediate) commands: Setting the P flag [7] converted a command from deferred (executed starting at word time T) to immediate (starting at the word time immediately following the command and continuing through word time T-1). Thus, for example, a single command could copy an entire line to another, or sum from 1 to 108 consecutive locations in a line. [8]

Precession: A block command with a C of 2 and S = D would copy each word of the line to the location that was one word-time later, ending with the previous contents of AR in the first location, and the previous contents of the last location in AR. In a machine without index registers, it was frequently more efficient to iterate over data by "moving the data past the program," rather than modifying all the commands that accessed the data array. [9]

Special sources: The 5-bit S field encoded 32 sources, only 28 of which were drum lines. Source 27 was a complicated function of lines 20, 21, and 28. Source 29 provided all zeros, a very useful constant. Source 30 was the logical and of line 21 and not line 20; source 31 was the logical and of line 21 and line 20. These were very useful for unpacking data within words, e.g., floating point numbers.

Special destinations: Destination 27 was used to test for zero. If S:T was all zero the next command was taken from word N of the current command line. Otherwise, it was taken from word N+1. This was the most common form of conditional branching. (Destinations 29 and 30 for addition/subtraction have already been discussed.)

Destination 31

Almost none of the descriptions above applied when the destination was 31, which caused a different interpretation of the S and C fields. Most notably, S selected the complex operation to be performed, rather than a source line. [10]

Input/Output: Sources 0 to 15 initiated I/O operations on the various available devices (e.g., 8 selected type-out and 12 selected type-in; 10 selected punching paper tape and 15 selected reading paper tape). A full description of the I/O system would take us even further afield.

Special Commands: Sources 16 to 31 invoked a variety of different commands that did not fit into the basic transfer-source-to-destination paradigm.
16: Halt
17: Test for PUNCH switch on (and ring bell)
18: Send data to auxiliary equipment
19: Control optional Differential Analyzer
20: Select Command Line and Return (i.e., branch to command line C at the word time saved by a previous Mark; subroutine return)
21: Select Command Line and Mark (i.e., branch to C:N and store T; subroutine call) [11]
22: Test if Sign of AR Negative
23: Depending on C:
-- 0: Clear Multiplication and Division Registers (24 to 26)
-- 3: Extract and Copy into line 25 the bits of line 26 that correspond to 1 bits in 2:T, clearing the extracted bits in line 26 [12]
24, 25: Multiply and Divide (T determined the number of iterations of the bit-by-bit process, two word times per bit).
26: Shift line 24 left and line 25 right (the number of places could be controlled by T or by the contents of AR)
27: Normalize (shift the contents of line 24 left until there are no leading zeros, and add the number of places shifted to AR) [13]
28: Test for I/O System Ready
29: Test for Overflow (and clear the overflow bit)
30: Write File Code on Magnetic Tape
31: Take Next Command from AR (useful when instructions had to be explicitly indexed)

[1] I've simplified slightly for ease of exposition. For the nitty-gritty details, see the Coding Manual for the Bendix G-15, the G-15D Programmer's Reference Manual, and the G-15D Technical Manual, Simplified Drawings, from which the various figures have been extracted.
[2] A G-15 cost about $50K. A typical programmer's annual salary was about a tenth of that.
[3] Specifying a destination of 31 invoked a quite different class of instructions that will be discussed later.
[4] Memory line 19 was numbered as command line 6, and memory line 23 as command line 7, allowing the command line to be coded in the three bits of C. Since AR could hold only a single instruction, it was not considered a command line, and there was a special instruction to take the next command from AR (NCAR), after which execution continued in the previous command line.
[5] For minimum access coding, successive commands were not generally stored in consecutive locations. Inclusion of this field was why the command format was often referred to as "2.5-address," and inspired the comment: "The modified type of two-address code is unusual and difficult to classify."
[6] See also the section on precession.
[7] Customarily written as "u" on coding sheets.
[8] The sum of an entire line was called its checksum and was commonly used as a hash code to verify that information had not been damaged, either by an I/O failure or by corruption of memory. Its advantages were that it was fast, and caught the most common errors. Its disadvantages were that it did not provide enough redundancy to correct any errors, and it did not have the collision resistance required when hash functions are used as authenticators. (In fact, it was common to use any free word in a line to store the negative of the checksum of the rest of the line, so the full checksum would be the easily-recognized value zero.)
[9] There was also a trick shared among advanced programmers that used a peculiarity in the I/O system to precess line 19 by 4 words in a single drum cycle, allowing a factor of four speedup in certain programs.
[10] Since there was no clear pattern associating S with the commands, this list was the most-referenced part of the Programmer's Reference Card.
[11] Turing's insight about the need for a stack of return addresses was not reflected in the G-15. The hardware preserved a single return word time (mark). Thus subroutines could not call other subroutines, and needed to know to which line they should return. There was no provision for the programmer to access the mark other than by returning to it.
[12] A rather complex-looking command for a minimalist machine, but key to efficient binary-to-decimal conversion.
[13] Vital for software implementation of floating point arithmetic.


G-15 Memory Model

[This is part of a series of expository, rather than anecdotal, notes, but it provides background for my discussion of the Bendix G-15 architecture. Skip it if you're interested only in my stories.]

As far as the programmer was concerned, the G-15's memory of 2,183 (29-bit) words [1] was organized into
  • 20 "long lines" numbered from 0 to 19, each containing 108 words.
  • 4 "short lines" numbered from 20 to 23, each containing 4 words.
  • 3 double-precision registers numbered from 24 to 26, each containing 2 words.
  • 1 single-precision register, numbered 28.

A memory address consisted of a line number and a word time (from 0 to 107), e.g., 5:17 denoted line 5, word 17. All word times were modulo the length (in words) of the line addressed. E.g., 21:17 was the same as 21:1.

The variety of line lengths enabled a trade-off between storage capacity and access time, with a modest amount of hardware. [2] A long line contained 108 words, but its average access time was half a drum revolution (14.5 msec). [3] A short line contained only 4 words, but its average access time was just 2 word times (0.5 msec), so it could be used as a programmer-managed cache. The single-precision accumulator had to be available at every word time, so that operands could be accessed from, and results stored to, every location in memory.

Optimum programming required not only placing frequently-used values in fast memory, but ensuring that values and instructions in long lines were located to minimize the delay between reading an instruction and reading its operand, and between completing the execution of an instruction and reading the next instruction. Each of these delays could be anything from 0 to 107 word times. We programmers firmly believed that we could do a better job of choosing locations than any automatic system could, which is why systems like POGO (the Program Optimizer for G-15 Operation) [4] weren't commonly used.

Some memory lines had additional properties:

  • Instructions could be executed from lines 0 to 5, 19, 23, and 28.
  • Lines 19 and 23 were also used for buffering by the I/O system.
  • Lines 24 and 25 were used for shifting, multiplication, and division.
  • Line 26 was the double-precision accumulator, also used for multiplication and division.
  • Line 28 was the single-precision accumulator.

[1] There were at least 112 words of drum memory that were not ordinarily accessible to the programmer.
[2] Compare the G-15's 30 pairs of drum read-write heads with the contemporary IBM 650's 119 or 219 pairs (for 1,000 or 2,000 words of memory, respectively), giving an average access time of 2.4 msec.
[3] This was a "logical" drum revolution. Each memory line was recirculating and the time between writing a bit and reading it again depended on the physical distance between the read and write heads. But it did not mattter to programmers how rapidly the physical drum rotated.
[4] POGO was called a "compiler," but was actually more like a very simple assembler.


Friday, August 25, 2006

Aiken's Maxim

Don’t worry about people stealing your ideas. If your ideas are any good, you’ll have to ram them down people’s throats.
-- Howard Aiken
Reported by me as heard from his student Ken Iverson at the History of Programming Languages conference in 1978.


Wednesday, August 23, 2006

Hard Disk's 50th Anniversary

Yes, there were hard disks for computers even before I started using them. Yesterday's Wall Street Journal had a story by Lee Gomes on the 50th anniversary of the hard-disk drive. Worth reading in full, but here are some excerpts:

[The hard drive] is the storage device that makes possible not only PCs, but also iPods, TiVos and other consumer technology must-haves.

The first disk drive, called the RAMAC, was created by International Business Machines Corp. engineers in San Jose, Calif., in 1956...

The disks on it were 24 inches in diameter. The whole unit weighed over a ton, and had to be delivered on forklifts and loaded on to large cargo bays of airplanes. You had only five megabytes of storage. That's about five minutes worth of MP3 music...

It was four or five years between the first RAMAC and the next one, and there was a significant jump in storage capacity, which has been steady since then. For the first 35 years, storage capacity increased about 30% a year. Those annual increases got as high as 100% between 1998 and 2002. Today, they are running around 30% to 40% a year...

The RAMAC stored 2,000 bits per square inch. In disk drives today, the figure is as high as 135 billion bits per square inch. That's almost a 70-million-fold increase. And in the next five years, we will ship more disk drives than we shipped in the first 50 years...

I remember being told by Tom Steel that the machine room at the Equitable Life Insurance company was located directly below the office of a vice president, who was definitely not amused that his office floor had to be taken up and a crane brought in whenever they had to change the actual RAMAC disk. (Disk reliability has improved enormously since then.)

My first encounter with hard disks was in 1965. The Loma Linda University Scientific Computation Facility had an IBM 1620 Data Processing System with two 1311 Disk Storage Drives. Each removable 1316 Disk Pack consisted of six 14-inch diameter disks, weighed 10 pounds, and could store 2 million characters (2MB, the equivalent of 25,000 punched cards). This was a revolutionary advance over storing all the system software, application programs, and datasets on cards and manually placing them in the card reader (or removing them from the card punch) as needed. For me, it was much more important than the speed and memory advantages that the 1620 had over the Bendix G-15D.

Edited on 9/14/2006 to add: Other interesting sites