Wednesday, April 29, 2020

Reading Computer Architecture

Hennessy and Patterson's Computer Architecture book was first published in 1990 and the sixth edition came out in 2017.  That is a span of almost three decades.  I read a few editions over years since each edition has substantial new materials.  I have gathered every edition.  We will read them again to see how computer architecture has evolved as described by this influential book over 30 years from the 1e 1990, 2e 1996, 3e 2002, 4e 2006, 5e 2011, to 6e 2017 (about 5 years/edition).

The 1st edition published in 1990 has over 750 pages with appendices.  The 2nd edition published in 1996 has 1000 pages including the appendices.  The 6th edition has 1500 pages with appendices. 

It would be a mistake not to read the 1st edition (especially for beginner).  It lays the foundation for the subsequent editions to follow.  It describes the technology up to 1989.  It describes a hypothetical RISC processor DLX (similar to MIPS).  It emphasizes the importance of making quantitative measurements.  The quantitative principles are actually fairly simple, just some elementary school mathematics.  This seems to make it just an empirical science.  Does the lack of mathematical sophistication mean that the computer architecture discipline is still in its infancy?   The main technical discussions center on pipelining and memory hierarchy.  The vector processor and I/O are also covered.  The DEC VAX and IBM 360 are used as main examples.  In the pre-VLSI days, computer designers seemed to worry a lot about the speed of light being the limiting factor and totally missed the miniaturization brought about by the nanotechnology and the ultimate limitation actually being the power dissipation. 

In the 2e, the graph of microprocessor performance growth shows the top performer DEC Alpha in 1995; the microprocessor performance grows at 1.58x per year (vs 1.35x per year performance growth in technology).  The graph appears to show a SPECint92 rating of 330.  It matches the data from netlib.org: DEC Alpha 21164 266MHz in AlphaStation 600 5/266 of 1995 has a SPECint92 rating of 329.0 and SPECint95 of 6.30.   Intel Pentium 133MHz available at that time has SPECint92 rating of 190.9 and SPECint95 of 3.96.  (A lot of historical CPU performance data can be found at  http://cpudb.stanford.edu/ .)   In the 3e, the top performer is Intel Pentium III of 2000, same growth rate.  Pentium III 1GHz of 2000 has SPECint95 rating of 46.8.  In the 4e, the top performer is Intel Xeon 3.6GHz; the growth rate is revised to show 25%/year before 1986, 52%/year 1986-2002 and 20% 2002-2005.  In the 5e, the top performer is Intel Xeon 6 cores 3.3 GHz; the growth rate after 2003 is 22%/year.  In the 6e, the top microprocessor is Intel Core i7 4 cores 4.2GHz; the growth rate after 2003 is refined: 23%/year 2003-2011, 12%/year 2011-2015, 3.5%/year 2015-2017.

Also notable is the scaling of clock rate:  the first 15 years (1987-2003), the clock rate goes from 16MHz to 3.2GHz (40%/year increase), and the recent 15 years (2003-2017) the clock rate goes from 3.2GHz to 4.2GHz (2%/year).  Between 2003 and 2010, there was almost no improvement in clock rate.  FinFET technology production (22nm) started around 2011, which possibly boosted clock rate slightly.  The power limitation is first mentioned in the 3e as power consumption reaches over 100W.  (Xeon 9282 launched in 2019Q2 reaches TDP of 400W.) 

The 2e emphasis is mostly on the RISC principle; one entire chapter is devoted to describe the RISC instruction set, which DLX is introduced as a textbook example.  The DLX instruction set is simple by using load-store, fixed encoding, and large register set.  The question about CISC vs RISC is settled decisively by comparing CPI (cycles per instruction) and number of instructions executed using the quantitative principles. RISC's CPI advantage more than compensating for the more instructions per program and RISC is much simpler to build.  Rarely used complex instructions just do not pay off.  ILP (instruction-level parallelism) is the main focus from basic pipelining, dynamic scheduling, branch prediction, superscalar, speculation to compiler technology.  The fundamental ideas remain unchanged through editions.  The modern superscalar speculative dynamic processor has gotten very complex.  It is hinted that multiprocessor may be the way to go as it is possible to put two processors on a single die around the end of the 20th century.

TLP (thread-level parallelism) first appears in 3e.  Basic pipelining is relegated to the appendix.

From 4e, instruction set principles are relegated to the appendix.

DLP (data-level parallelism) is first introduced in 5e.  GPU is first described as DLP.  Vector processor is brought into the DLP section; it is included in the appendix of the previous editions.  RLP (request-level parallelism) is also introduced.  WSC (warehouse-scale computers) is coined; WSC is made of hundred thousands of networked servers.  

DSA (domain-specific architecture) is first discussed in 6e; the AI neural chips are presented as the examples. 


Tuesday, April 7, 2020

$3 USB Li+ Battery Charger

The $3 Li+ battery charger is very simple in construction.  It has a single layer PCB.  I could not identify the IC, which linearly regulates the voltage and the current from the USB 5V input.   It also controls two LEDs, so it may be designed for this application.  One 2K resistor is possibly for programming the current. It draws about 5mA when idle and 500mA when charging.  It cuts off at the right voltage of 4.2V.  But the open circuit voltage is 4.38V.  The negative battery terminal tab is spring loaded to accommodate different battery sizes.