My previous article ESI-X – The first stored program scientific interpreter for the PDP-8 describes a programming language interpreter for the 1960’s era PDP-8 minicomputer. The PDP-8 had a very limited native instruction set, by design, in order to keep the number of discrete electronic components, and thus the price, low.
This article describes how the ESI-X interpreter uses custom data formats and subroutines to implement a virtual machine that raises the level of abstraction beyond the PDP-8’s supported data types and operations.
This virtual abstraction approach demonstrates some of the power of stored-program digital computing. By sacrificing speed and performance, software can be used to implement high-level functionality using repetitive executions of more primitive operations. This allows lower-cost hardware to support advanced features, and is a common trade-off in the world of computing.
The PDP-8 word size was 12 bits. This may seem odd today, with word sizes based on multiples of a standard 8 bit byte, but many computers designed in the 1950’s and 60’s had word sizes that were multiples of 6 bits. 18 and 36 bit machines were not unusual.
War-surplus military teletypes were originally used as input/output terminals. These teletypes could only transmit uppercase letters plus a few essential special characters and control codes. Various character-encoding patterns were used, all based on 6 bits per character, plus a 7th error-detecting parity bit. Many of the open-reel magnetic tape drives from the period had 7-track read and write heads that followed this pattern. Text strings using these various “sixbit” codes could be packed as multiple characters per 12, 18, or 36 bit memory word.
Newer teletype terminals appeared in the 1960’s with built-in paper tape readers and punches. By then the convention used for paper tape text encoding was the 7-bit American Standard Code for Information Interchange (ASCII), with an 8th parity bit. However, the keyboard and printer components were still limited to uppercase letters.
Direct terminal input/output for the PDP-8 teletype interface transmitted using the full 7-bit ASCII character set, with the parity bit value determined by switches on the teletype itself. Digital Equipment Corporation defined a SIXBIT code that used the “middle” 64 characters from the full set of 128 ASCII codes. For any ASCII character in the correct range, and with the terminal set to blank parity (always 0), simply subtracting decimal 32 from the value yielded the SIXBIT code.
Lowercase ASCII character codes are offset from their uppercase counterparts by 32 decimal. A lowercase ASCII character can be converted to its uppercase SIXBIT character by first subtracting 32, branching if the value is still higher than 64, and subtracting another 32. SIXBIT-to-ASCII conversion merely requires adding 32. (This history explains why 1950’s era languages like FORTRAN and COBOL are frequently written in all uppercase. No they weren’t shouting.)
ESI-X uses a more simplistic approach for its internal 6 bit character code: it masks out all high-order bits and only uses the right-most 6 bits. This causes letters and the special characters [ \ ] ^ and _ to be coded as 01 thru 37 octal, and the remaining special characters as 40 thru 77 octal. (ESI-X assumes the terminal is incapable of transmitting lowercase letters.) When converting back to ASCII for output, it adds back in one of two offsets depending on the range of the 6-bit value. ESI-X transmits to the teletype using mark parity (always 1) .
ESI-X packs strings as 2 characters per 12 bit word. Its string processing subroutines use string pointers that consist of two 12 bit words. The first word of the pointer points to the memory word containing the current character being pointed to. The second word only uses one bit to indicate which character in the word is the current position: 0 means the left character, 1 means the right character.
ESI-X does not support string variables. Instead, user-entered statements are stored as text strings, and text literals for TYPE statements remain embedded between quotes within stored statements.
Variable length text strings require some convention to indicate the length of the string, or alternately to mark the end of the string. One option is to store a prefix word containing the character count of the text string. This allows any character code to be included in the contents of the string. (Microsoft Visual Basic used this approach, with the text length stored at the memory word preceding the word addressed by the string pointer.) This approach allows a loop counter to be initialized to the length value and also simplifies calculating the size for allocating a new buffer to contain the result of appending two text strings together.
Another option is to mark the end of the string with a designated character code, typically the code with all zero bits. This designated character was not allowed to be part of the actual string value. (This null terminator approach was commonly used on the DEC PDP series of computers, which led to its use in the C language library routines.) ESI-X uses this approach.
The basic PDP-8 models only support 12-bit integers. These can be interpreted as unsigned values from 0 thru 4095, or as twos-complement signed values from -2048 thru +2047. Such a limited range of values does not support ESI-X’s intended calculation tasks.
Instead, ESI-X defines a custom floating-point number format using three 12-bit words:
- The power-of-ten exponent is a twos-complement signed 7 bit binary value stored in the leftmost bits of the 1st word, with allowed stored values -63 to +63, interpreted as E-64 to E+62.
- A positive exponent indicates the number of integer digits; the number 1.0 has exponent value +1, or E0.
- A negative exponent indicates the number of non-significant leading zeros after the decimal point; the number 0.1 has exponent 0 or E-1. The number 0.01 has exponent -1 or E-2.
- The signed exponent value of -64 (E-65) is not used, and is treated as an underflow.
- The floating-point number’s mantissa contains a sign bit, followed by 7 binary-coded decimal (BCD) digits.
- The 1st word rightmost 4 bits contain the least significant mantissa BCD digit.
- The 2nd and 3rd words contain 3 mantissa BCD digits each, stored in increasing order of significance.
- After any calculation, a normalization subroutine shifts the position of the significant digits, adjusting the exponent value, to ensure the most significant digit is in the last 4 bits of the 3rd word.
- The number 0.0 is stored with all bits of all 3 words set to 0. (Due to the normalization rule, an efficient check for a value of 0.0 merely needs to check if the rightmost 4 bits of the 3rd word is zero.)
Here are some examples of constant definitions contained in the ESI-X interpreter source code.
- The resulting bit patterns appear in Octal and the corresponding Binary grouped as 3 bits per Octal digit.
- The same Binary bits also appear in groups of 4 along with the corresponding Hexadecimal digits.
- Next appear the separate fields of signed Exponent, Mantissa Sign, and Mantissa.
- Finally the Meaning of the fields are shown as the resulting decimal value.
FP1, 0040 /1.0 0000 0001 Octal 0040 0000 0001 Binary 000 000 100 000 000 000 000 000 000 000 000 001 0000 0010 0000 0000 0000 0000 0000 0000 0001 Hex 020 000 001 Exponent 0000001 Mantissa Sign 0 Mantissa Digits 0 0 0 0 0 0 1 Meaning .100 000 0 x 10^1 = 1.0 PIBY2, 0046 /PI/2 = 1.570796326 4560 3521 Octal 0046 4560 3521 Binary 000 000 100 110 100 101 110 000 011 101 010 001 0000 0010 0110 1001 0111 0000 0111 0101 0001 Hex 026 970 751 Exponent 0000001 Mantissa Sign 0 Mantissa Digits 6 9 7 0 7 5 1 Meaning .157 079 6 x 10^1 = 1.570796 /ARCTANGENT CONSTANTS. ARCCON, 7730 /-.004054058 2404 2404 Octal 7730 2404 2404 Binary 111 111 011 000 010 100 000 100 010 100 000 100 1111 1101 1000 0101 0000 0100 0101 0000 0100 Hex FD8 504 504 Exponent 1111110 Mantissa Sign 1 Mantissa Digits 8 5 0 4 5 0 4 Meaning -0.405 405 8 x 10^-2 = -0.004054048
Note that storage using binary-coded decimal digits, rather than a binary representation, for the mantissa avoids any rounding errors for common currency values such as ten cents ($0.10).
This is important for financial analysis, as the value 1/10 decimal is a repeating fraction in a floating-point binary representation, requiring truncation to an inexact value. This is similar to the fact that 1/3 cannot be stored without truncation as a decimal floating-point value. It must be approximated as 0.3333333 and truncated after all the available significant digits are consumed.
This critical issue is one of the primary reasons the COBOL language persists as a popular language for accounting programs. IBM mainframe computers support binary-coded decimal formats in their native arithmetic instruction set. Other mainframe-class computers implement COBOL fixed-point currency values as scaled binary integers to avoid the truncation of pennies and dimes.
Modern standard binary floating-point values of type float or double remain unsuitable for financial calculations involving amounts less than a dollar (or pound or euro). Object-oriented languages like C# and Java resolved this issue by providing library classes such as BigDecimal. Often these library classes suffer from being unusable with normal arithmetic operators like + and -. The expression a = b + c; must be expressed similar to a = b.add(c); or a.setValue(b.getValue().addValue(c.getValue());
An optional Extended Arithmetic Element (EAE) hardware module for the PDP-8 provided native support for integer multiplication and division as well as add, subtract, multiply, and divide operations for floating-point numbers. The floating-point format for the EAE used one 12-bit word for the twos-complement signed exponent, and two more words to store the signed binary mantissa. This binary format could not store the value $0.10 exactly, as it was subject to the 1/10 as a binary repeating fraction issue. A copy of ESI-X running on a PDP-8 with the EAE option simply ignored its presence and provided slower, but decimal fraction friendly, calculations suitable for financial modeling.
Basic Numeric Operations
ESI-X emulates a higher-level set of numeric registers using a pre-allocated set of words in page zero of the PDP-8 core memory. (Remember that memory words on page zero are always directly addressible via the 7-bit address field of memory instructions without requiring indirect access.)
The two most important software-defined register structures are FAC (floating accumulator) and IR (input register). Each of these registers has the following memory layout:
- Seven 12-bit words, each containing a single mantissa digit unpacked from the original 3-word floating-point value. The least-significant digit is stored first, and the most-significant digit is stored last.
- A single 12-bit word storing the sign-extended exponent.
- A full 12-bit word storing the mantissa sign as the value 0 (positive) or 1 (negative).
An additional MQ (multiplier/quotient) register structure stores 7 unpacked digits, without any exponent or sign.
ESI-X contains three support subroutines that work with these registers:
- An unpack routine expands a 3-word decimal variable into the FAC or IR register format.
- A normalize routine adjusts a calculation result to ensure the most-significant digit is stored in the last register word.
- A pack routine repacks the result from the FAC register into a 3-word floating-point variable.
The addition subroutine adds the value from IR to the value in the FAC register and then calls the normalize routine for the FAC register.
Part of the addition logic involves aligning the digits of the two operand registers so they have the same exponent. The actual addition is performed by looping thru the pairs of digits in least to most-significant order, adding each pair and propagating any carry digit to the next higher digit pair. Adding two digits simply requires adding their binary-coded values, then checking if the result is greater than 9. If the result exceeds 9, a carry value is set to 1 and 10 is subtracted from the digit sum. The carry value is added to the result of the next higher digit pair sum before detecting a carry in that position.
The subtraction subroutine calls a ten’s complement routine for the IR register to negate it, and then calls the addition subroutine.
Multiplication and division subroutines implement decimal multiplication and long division as taught in many elementary schools. An outer loop uses each digit in one of the operands to control an inner loop of repeated addition or subtraction using the other operand. Each iteration of the outer loop includes a decimal shift loop that copies the unpacked digit words “up” or “down” in memory.
Advanced Numeric Functions
ESI-X provides mathematical functions to calculate values for square-root (SQRT), sine (SIN), cosine (COS), arctangent (ARCTAN), natural logarithm (LN), base-10 logarithm (LOG), and exponentiation (X^Y).
The square-root function uses the Newton–Raphson method of approximation.
The other advanced functions implement the approximation approach defined in the 1955 paper Approximations for Digital Computers by Cecil Hastings, Jr.
The approximation error for each function is less than 0.0000003 or 3E-7.
The ESI-X language only supports single-letter uppercase variable names. All values stored in variables are floating-point numbers.
The ESI-X interpreter pre-allocates a memory area containing 26 entries of three 12-bit words for each entry. Each entry represents the value for one variable A thru Z.
The six-bit binary character value of the variable name “A” thru “Z” is used as a numeric index to compute the position of the variable’s 3-word entry in the variable storage section.
The first time a variable is referenced during the interpretation of an ESI-X program determines whether the variable is a simple value, or is an array of one or two dimensions.
An array variable contains the octal value 3777 in the first word of its entry in the primary variable storage section. This octal value 3777 in hexadecimal is 7FF, with the last F in the position normally used to store the least-significant mantissa digit of a floating-point value. Since digit values only range from 0 thru 9, the value 7FF hex is safe to use as the indicator that the variable entry represents an array.
The second word of an array variable’s entry contains the number of subscripts, which can be 1 or 2.
The third word of the entry is a pointer to a linked list of array element values allocated from a heap memory area.
Each array element value on the heap contains six 12-bit words:
- The first element word contains the first subscript value within the array as a binary integer, with allowed values from -999 thru +999.
- The second element word contains the second subscript value, or the value 1 if the array is one-dimensional.
- The third element word is a pointer to next heap entry for the same array variable.
- The fourth thru sixth element words containt the floating-point value for the array element.
When referencing an array element in an ESI-X statement, a subscript value is converted from packed decimal floating point to signed binary integer in the allowed range -999 to +999. A non-integer value used as a subscript throws an error.
The conversion subroutine from packed decimal to binary involves mutiplication of single digits by decimal 10, which is 1010 in binary. This specific multiplication step is implemented as two bit shifts and an addition, rather than a looping addition.
Array variables are effectively sparse matrixes due to the tagged linked list storage scheme. Array elements can be deleted at run-time, causing a garbage compaction of the linked list heap area that moves the “last” heap entry into the deleted entry, and moves the heap pointer to recover the six memory words.
ESI-X implements an internal stack structure used for three purposes:
- Passing 3 or more values to internal subroutines.
- Storage of return step locations for DO PART and DO STEP statements.
- Storage of each user-defined Step statement in the currently active program.
The ESI-X stack allows a Part to recursively call itself, either directly or indirectly thru other Parts. Note that no user-accessible stack exists for storing local variables. The user can use array variables and increment/decrement an array index variable to simulate an application accessible stack.
The stack area grows upward in memory addresses toward the heap, which grows downward toward the stack. Thus the two memory areas share the extra memory not taken up by the ESI-X interpreter and support subroutines.
The user-defined ESI-X statements are defined as Steps within Parts. Each Step is labeled with a non-integer floating point number. A Part consists of all Steps with the same integer portion of their label.
Part numbers are limited to single-digit integer values 1 thru 9. Any number of Steps can be defined within a single Part. In the absence of a TO statement, steps are executed within their Part in numeric order of their labels. When the last statement in a Part finishes execution, the Part returns to the caller, which is a DO PART n. statement.
ESI-X pre-allocates an array of 12-bit words indexed by the integer Part number from 1 thru 9. Each of these Part List words is a pointer to a linked list of Steps allocated on the ESI-X internal stack. A Part List word containing zero indicates that Part number is undefined.
Each Step occupies a variable number of memory words organized in a linked-list on the ESI-X internal stack. A given Step entry has the following structure:
- The first thru third words contain the floating-point Step number.
- The fourth word is a pointer to the next Step in the linked list.
- The remaining words contain the null-terminated text of the Step statement packed as 2 sixbit characters per word.
During the run of a user-defined ESI-X program, each Step’s statement string is parsed, syntax analyzed, and interpreted on each execution of the step. A support subroutine provides the implementation of each ESI-X command.
Deletion of any step causes garbage compaction of all steps allocated higher in the memory stack.
The PDP-8 minicomputer has long been obsolete. However, its minimalist elegance allows the PDP-8 to remain popular for teaching fundamental concepts of software organization.
Modern integrated circuit technology, including massive amounts of inexpensive random access memory, may make these early software techniques seem irrelevant. On the contrary, we have merely stacked on many more levels of abstraction in our current software implementations. As we continue to add more layers of abstraction, we proceed to tackle ever more complex problems. Our ability to continue down this path depends on the primitive operations that still occur at the lowest levels of our execution stacks.
The PDP-8 and its surviving software provide useful insights into the foundation layers of computing.