Friday, February 14, 2014

80486 PC: Motherboard

Motherboard

 
This motherboard has 8 ISA bus expansion slots, 3 of them with VESA local bus, 2 of them 8 bit.  The ISA bus dates back to the first PC, the 8-bit PC XT bus.  It was extended to 16-bit with PC AT.  The increasing resolution of the display demanded a fast bus, so a simply VESA local bus was added mainly to support graphics card.  By 1994, PCI bus was starting to appear in the motherboard.  Even so the graphics card still demanded more bandwidth, so the AGP slot was introduced.  Eventually, PCI Express serial bus becomes the only expansion bus remaining.  The Universal Serial Bus would not appear for another one or two years and it took a few more years to be prevalent.

The CPU socket is a 19x19 238 pin ZIF socket.  It can house a number of 486 models.  However, it is not keyed for i486DX2, so installation requires care to note the chip orientation.

The main crystal is 14.31818MHz, being multiple of 4.77MHz, the CPU frequency of PC XT and multiple of 3.58MHz, the frequency of NTSC color burst signal.  The CPU frequency generator chip AV9107-03 generates a 33.3MHz clock for the CPU bus.  The ISA bus frequency is usually set at around 8MHz; so the configurable divider is set to 4 on this board.

The OPTi 82C895 chipset is the most complex IC besides the CPU.  It handles the cache, DRAM, VESA and ISA interfaces.  It uses 0.8um CMOS and comes in 0.5mm pitch 208-pin plastic quad flat pack.   Its companion chip 82C602 is a buffer device for AT bus signals with addition of the real-time clock function.  It is fabed with 1.0um CMOS and packaged with 100-pin plastic flat pack.  OPTi Inc was a major chipset producer in the early 1990s, its stock hit peak in 1995, but the company faltered soon afterwards because of the dominance of Intel in the chipset market.

The motherboard has 9 x 32KB 15ns SRAM chips that are used as L2 cache, for a total of 256KB with tag.  Interestingly the size of L2 cache stays the same in the latest Intel processors.  The board sockets can accommodate 4 x128KB and one 32KB for total 512KB.  The SRAM is asynchronous; there is no clock.  In the read cycle, 15ns after the address lines are presented, data lines are valid; in the write cycle, assert address, write enable and data for 15ns, the data are written.  The SRAM is power hungry; each chip consumes 150mA during operation.  The cache controller in the chipset uses the direct-mapped write-back cache organization.  The cache line size is 16 bytes.  The 8-bit tag is stored in one 32KB SRAM, so the total cacheable memory is 64MB.  The cache memory shared the 32-bit bus with the CPU.  When there is an L1 cache miss, the lower address bits A[17:4]  are used to read the tag SRAM and the content is compared with the upper address bit A[25:18].  If there is a match, the SRAM read or write for the next four bus cycles; otherwise the main memory is accessed.  When the CPU clock period is 30ns, the SRAM is fast enough to support the cache timing  2-1-1-1, i.e. first word taking two cycles and the one cycle for each of the remaining three.  If the write-back mode is enabled, one of the tag bits is the dirty bit and the cacheable memory size is halved.  When the data is written to the cache, the main memory is not accessed to improve efficiency, but the memory is inconsistent so the dirty bit is set.  The next time, when there is cache read miss, the cache must first be flushed to the main memory then the CPU and cache read the new data concurrently from the main memory.  If it is a write miss, the cache is bypassed.

The main memory are in 4 72-pin SIMM slots.  In 1994, DRAM was still expensive, but soon its price would drop and SIMM would be replaced by DIMM.  Though SIMM has contacts on both sides, they are connected together.  I believe it had 2 4MB modules and I added another 4MB a little later at considerable expense.  Each of the three modules uses 8 TI's 4M-bit fast page DRAM.  Two modules have 60ns RAS access time, and the other 70ns.  Afterwards, in 1996, the DRAM price dropped significant and I added another 4MB module that uses 2 16M-bit DRAM from LG also 70ns access time.  In 1998, TI sold its memory business to Micron.  The board can have up to 128MB.  The 72-pin SIMM is a standard defined by JEDEC.  It has 36 data lines and 14 address lines; with enable lines, it can support up to 2GB per module. 

There are presence detect bits that indicate the speed and size of the memory.   The DRAM access is more complex.  The DRAM used here is asynchronous.  Take the TI TMS44400 for example, it has 10 address lines, 4 data lines, write enable, output enable, column address strobe (CAS) and row address strobe (RAS).  The memory is organized into a matrix.   In the read cycle, the row address is driven along with the falling edge of RAS, then the column address is driven with the falling edge of CAS and some time later the data is available.  The 60ns access time refers to the time from the falling edge of RAS to data available.  A precharge time is required before next read, so the total cycle time is 110ns.   When reading on the same row address, the access can be faster using the page mode by keeping RAS and strobing only CAS for different column addresses.   One page is 1024 columns.  The minimum page-mode cycle time is 40ns.  Consider the case that a cache line needs to be filled, so 4 reads take 230ns or 8 clock cycles at 33MHz while a cache access is only 5 clock cycles.  The DRAM also requires periodic refresh; the refresh time is every 16ms.  A refresh is achieved by strobing each of 1024 rows.  The power consumption for each chip is about 100mA, so more than 20 times less per bit than the SRAM.

This motherboard has a regulate DIN connector for keyboard.  Mini DIN connectors would becomes standard for keyboard and mouse.  The keyboard controller chip is made by American Megatrends.  There is no mouse port, which has to come in on a expansion card as well as the printer port and the serial ports.  There is also no interface for floppy, CD and hard drive.  They also have to come from an expansion card.

The BIOS is from American Megatrends, a company specialized in PC BIOS and survived to today.  The version of BIOS for this motherboard is 486DX ISA BIOS 1993.  It is in a 28-pin DIP 64KB UV erasable EPROM with 45ns access time, 16-bit address and 8-bit data.  The motherboard socket can accommodate a 32-pin DIP.

The motherboard is a product from Taiwan, which dominated PC motherboard production and spurred a lot of entrepreneurs that sold PC in local stores.  At the time, through-hole technology was still fairly dominant.  All the discrete components are through-hole.  They belong to the era when the clock and the edge rate are slow.  Nowadays, the lead inductance on the through-hole capacitors would make them useless as bypass.  The number of bypass capacitors for the CPU is far fewer than in today's CPUs which requires multiple rails.  A lot of 74 TTL logic chips in DIP are still used here: F series for the fast logic and LS for the slower logic.  Most of these chips were made by National Semiconductor, which Fairchild Semiconductor was a part of.  Motorola also made a few of the logic chips; Motorola spin off On Semiconductor continued this part of the business. The PC board is dated the 21st week of 1994 and I bought it in the summer of that year.  Most of the components are also made in 1994 and a few in 1993.  The technology was rapidly evolving, so everything was fairly fresh.  The PC dimension is 10.2" x 8.65" (26cm x 22cmm)  and the standard 62mil thickness.  This is smaller than the 13" x 8.5" Baby AT form factor.  Later the ATX form factor, 12" × 9.6" would become the standard for desktop PCs.  The PC board has probably 4 layers: the top and bottom are signals routed in orthogonal directions and the inner two are Vcc and GND planes.  

The board itself runs on a single 5V, but +/-12V and -5V are brought in and distributed to the ISA bus expansion slots.  The 12-pin power connector pin-outs are Power Good, +5V, +12V, -12V, Ground, Ground, Ground, Ground, -5V, +5V, +5V, +5V.   This pin assignment goes back to IBM PC XT.  Eventually 3.3V is also brought in from power supply and -5V is dropped.  This board has options to install a low-drop-out voltage regulator to generate supply voltage for a 3.3V CPU.  More and more voltages had to be generated on board to accommodate multiple rails and continuing dropping core voltages with peak current reaching over hundred amps.   

Wednesday, February 12, 2014

80486 PC: CPU

I just recently opened a box of parts from my first PC.  It was purchased in 1994 from a computer store in Pasadena for maybe $1,800.  Back then, there were numerous computer stores that assembled and sold PCs, Microsoft was still a year away from releasing Windows 95 and Linux was just starting to gain a following.  I decide to resurrect this old PC, and provide a very detailed description of this PC.  I'll try to include as much information as I can possibly gather. I think it'll be interesting to review this 20 years old technology and draw some comparison with the current technology.

CPU

Intel 80486DX2 was fabricated on 5V 0.8u (or 800nm in today's unit) process and contains 1.2 million transistors.  It was first introduced in March 1992.  By 1994, Pentium which has 3 times as many transistors had already appeared, so 486DX2 was probably sold at reasonable price.  The 0.8u was already a mature process; the leading edge process was 0.6u.

The 4th generation x86 processor was first introduced in April 1989.  While the instruction set is similar to the 386 with addition of only six new instructions, it integrates on-chip the floating number processor and unified L1 cache.  i486DX2 has 8KB write-through cache. The processor runs at 66MHz and the front side bus is at half speed of 33MHz.  While software compatible with 386, the processor bus is not.  Intel literature said i486 featured a 32-bit RISC-technology core despite of x86 carrying CISC baggage; so by then it is clear RISC is preferred computer architecture.

i486DX2 comes in 17x17 168-pin 0.1" pitch PGA ceramic package, which was quite common for that era.  The PGA packaging would continue except for Pentium II which used the slot package.  Eventually, land grid array, LGA, a surface mount technology would become the standard.  Throughout ZIF sockets are used to house the CPUs.  The maximum power for i486DX2 is about 6W.  The junction-ambient thermal resistance is 17 deg/W.  A heat sink with forced air flow would be required to operate at room temperature. 

Twenty years later, the transistor count is increased by three orders of magnitude, the clock frequency by a factor of 50, the power consumption by 20, the pin count by 10 and the core voltage dropped by 5.

i486 features a five-stage pipeline, so many instructions execute in one clock cycle.  The design goal was to have 2 to 3 times performance improvement over 386 at the same clock with 4 times the transistor budget.  Intel made extensive use of CAD tools developed in cooperation with UC Berkeley.   Intel created its own hardware description language, iHDL, which was not replaced by Verilog until 2005.  Standard cell library was created to enable automatic RTL to layout conversion.  It was first time RTL to layout system ever employed in a major microprocessor development program, though it is only applied to the control logic.  Data path was still done manually.  It is interesting to note that Intel had been using transparent latches with a two-phase clocking system, which was not compatible with synthesis tools.  By the time of Pentium design, the two-phase clocking scheme was largely replaced by a single-clock and master-slave flip-flops. [Gelsinger, "Coping with the Complexity of Microprocessor Design at Intel –  A CAD History"]


The main enhancement of 486 over the previous generation is the inclusion of 8KB on-chip cache, which is a buffered write-through four-way set-associative with LRU replacement 16-byte line size. [Crawford, "The i486 CPU: Executing Instructions in One Clock Cycle", Feb. 1990]  About a quarter of the die area is occupied by the cache.  Today's core would have 32KB data and 32KB instruction L1 cache and an internal unified L2 cache of 256KB and multicore processor could have more than 20MB L3 cache shared by all cores on chip.  The L1 instruction cache is 4-way associative and the data cache 8-way associative; the L2 cache is 8-way associative and the L3 cache 16-way associative.  And the cache would take up majority of the die area.

i486DX2 has 32-bit address lines with byte enable signals, 32-bit data lines with parity, write/read memory and I/O control, bus control, bus arbitration, burst control, cache control signals.  Out of the 168 pins, 24 are Vcc and 28 are Vss.  JTAG boundary scan ports are included.  The CLK input has a maximum frequency of 33MHz.  The logic input level is TTL compatible, i.e. 2.0V and above as high.  The minimum setup time is 5ns, and hold time 3ns.