Wednesday, November 10, 2010

EE books for a practicing EE

This list also appears here.

Troubleshooting Analog Circuits (EDN Series for Design Engineers)
1. Troubleshooting Analog Circuits (EDN Series for Design Engineers) by Robert A. Pease

"To learn about subtleties of basic components."
Analysis and Design of Analog Integrated Circuits
2. Analysis and Design of Analog Integrated Circuits by Robert G. Meyer

"For analog IC design. It is still unsurpassed as a standard introduction of analog IC design."
The Art of Electronics
3. The Art of Electronics by Paul Horowitz

"Very comprehensive coverage. It is a good reference to start and does not exhaust even as you gain experience."
Noise Reduction Techniques in Electronic Systems, 2nd Edition
4. Noise Reduction Techniques in Electronic Systems, 2nd Edition by Henry W. Ott

"For thorny noise problems. It's done much to demystify noise mitigation methods."
Analog Circuit Design: Art, Science and Personalities (EDN Series for Design Engineers)
5. Analog Circuit Design: Art, Science and Personalities (EDN Series for Design Engineers) by Jim Williams

"To learn about the temperament of an analog circuit designer."
CMOS Circuit Design, Layout, and Simulation
6. CMOS Circuit Design, Layout, and Simulation by R. Jacob Baker

"For CMOS IC design."
The Design of CMOS Radio-Frequency Integrated Circuits
7. The Design of CMOS Radio-Frequency Integrated Circuits by Thomas H. Lee

"For CMOS RF IC design. If it can be done in CMOS, it will be done in CMOS."
High Speed Digital Design: A Handbook of Black Magic
8. High Speed Digital Design: A Handbook of Black Magic by Howard W. Johnson

"For high speed digital design. It deals with the unpleasant non-ideal aspects of digital signals."
The Feynman Lectures on Physics, The Definitive Edition Volume 2 (2nd Edition)
9. The Feynman Lectures on Physics, The Definitive Edition Volume 2 (2nd Edition) by Richard P. Feynman

"For physical foundation of electromagnetism. Even though you do not intend to be a physicist, a solid background of classical EM serves you well for years to come."
Semiconductor Devices: Physics and Technology
10. Semiconductor Devices: Physics and Technology by S. M. Sze

"For device physics. Unless you are involved in device fabrications, it is a sufficient introduction for circuit designers."
Operational Amplifiers: Theory and Practice
11. Operational Amplifiers: Theory and Practice by James K. Roberge

"For feedback theory. It is slightly dated, but still useful. To be an effective circuit designer, you have to know feedback theory."
Solid State Radio Engineering
12. Solid State Radio Engineering by Herbert L. Krauss

"For RF design. You are an incomplete EE if you cannot build a radio."
Principles of CMOS VLSI Design
13. Principles of CMOS VLSI Design by Neil H. E. Weste

"For CMOS digital ASIC design."
Electric Motors and Drives: Fundamentals, Types and Applications (3rd Edition)
14. Electric Motors and Drives: Fundamentals, Types and Applications (3rd Edition) by Austin Hughes

"For operating principles of various motor types."
RF Circuit Design
15. RF Circuit Design by John E. Blyler

"A very good starter book on RF design."
Oliver Heaviside: The Life, Work, and Times of an Electrical Genius of the Victorian Age
16. Oliver Heaviside: The Life, Work, and Times of an Electrical Genius of the Victorian Age by Paul J. Nahin

"To see how our profession begins."
Designing Analog Chips
17. Designing Analog Chips by Hans Camenzind

"A good introductory account from an experienced designer."
The Soul of A New Machine
18. The Soul of A New Machine by Tracy Kidder

"The human aspect of computer design."
Switching Power Supply Design, 3rd Ed.
19. Switching Power Supply Design, 3rd Ed. by Abraham I. Pressman

"An insightful book on switching regulators."



Sunday, November 7, 2010

Enlightenment

In the course of one's life, there are a few moments of enlightenment that significantly alter one's conception. Here are the some of the technical nature.

Origin of Scientific Theory

Studying sciences up to high school gives the impression that a scientific theory is a summary of all the scientific facts and observations which are so painstakingly collected. Then there are modern physics with theory of relativity and quantum theory, which are much hard to deduce from just experimental observations. There was relatively little experimental data to suggest the theory of general relativity. Einstein claimed that scientific theory is the free creation of human mind. Kant already pointed out that the condition of human mind shapes the perception of external world. Science does not simply proceed from experiments to theory.

Evolution of Humans

From my early science classes, it seemed natural that the human species evolved from local apes. So the Peking man might be the ancestor of the Asian people. It was astonishing to learn that all humans evolved from a single group from Africa which migrated to the rest of the world.

Pseudo code

When I first learned about computer programming in high school, the teacher drew the flow diagrams. I never quite got it and programming was very hard for me. Then in the college, the professor showed the pseudo code and stepwise refinement. You start with very high level human language description and refine it to the degree that can be coded in the computer programming language. Suddenly programming became a lot easier. Later I learned that the technique was described in an article "Program Development by Stepwise Refinement" by Niklaus Wirth in 1971.

Circuit Theory

In the high school physics class, I encountered some circuit analysis problems. At that time, no systematic approach was presented. Each problem seemed to require some trick. Finally, in the first course of electrical engineering, the Kirchhoff's circuit laws were introduced, then there was a mechanical way for all linear circuits. But later in the real world engineering, it became clear the loop method and the nodal method were unwieldy for anything but the simplest circuits.

Computer architecture

The first introduction to central processing unit (CPU) was ugly. The accumulator architecture of the early days just did not appeal to me. Then the x86 assembly class was even worse; it is hard to like x86 architecture. Only after I read Hennessy and Patterson's book on RISC architecture, the whole computer design started to make sense to me. But it was a little too late.

Next in the computing world

Over the short span of computer history, we saw the main frame computers gave way to the minicomputers, the minicomputers succumbed to the workstations, and the workstations overtaken by the personal computers. The clear trend of large yields to small and expensive yields to low-cost. Computers companies, such as DEC and Sun flourish and perish in the process. Computers reach ever more to the population. If we extrapolate, the PC's will be replaced by something even smaller and more ubiquitous. And it is already happening. The smartphones will supplant the PC's. So what will happen to the tech giants of today? Who will flourish, who will perish?

PC Power Supply

I have a number of PC power supplies that have failed. They are easy enough to replace, but the reasons for failure are unclear. So I decided to tear one down to investigate.

Enermax Noisetaker 420W power supply (designed and manufactured in 2004-2005) looks nice on the outside, but it failed shortly and generated no output. I opened it up before and found the fuse blown. So I replaced the fuse (250V 10A), it was blown violently as soon as I plugged in the AC. So there is a short somewhere. Despite of the massive heatsinks, the one-layer power supply PCB is surprisingly easy to desolder. I find inside the familiar forward converter topology. There are two separately transformers, one for main power and the other for standby power (5V). Two FET's in parallel drive the primary, controlled by UC3842 with opto feedback. Schottky diode pairs are on the secondary. A magnetic amplifier is there to improve regulation. A 7912 is used for -12V. One IC monitors the outputs. There is one custom IC with Enermax logo; it appears to be one to throttle the fans depending the temperature.

Not long I found the culprit, a diode (HER208) connected to the reset coil of the primary failed short, effectively shorted the rectified DC bus. A nice power supply is brought down by a lowly diode. Then how did the diode fail? HER208 is rated 1000V 30A peak. There is no obvious damage on the outside.

Now I have in hand a nice collection of capacitors, inductors, transformers, FETs and diodes.

Smart AC Adapter

I'm sure the designers thought it was a good idea to build smarts into a lowly notebook AC adapter. My HP notebook AC adapter comes with a 3-pin power plug. As an electrical engineer, I had guessed that the center pin was a sensing pin to compensate for the IR drop of the wires. I measured the voltages: the inner ring is 20.0V and the center pin is 15.4V. I did not think too much of it why HP thought was necessary to have a center pin until a few months ago. My computer ran slow and took a long time to boot. Of course, I suspected software problems, maybe virus. I reinstalled the OS and the problem seemed to go away, but only reappeared later. It was very frustrating. One day I noticed that the computer would run fine when I unplugged the power adapter. It was very repeatable; I could see the CPU loading changing as I plugged and unplugged the AC adapter. I could not believe that the running on AC adapter could cause the computer to slow down. A quick search on the web showed other people had the similar problems. HP called the power adapter smart AC adapter. Apparently information is communicated between the AC adapter and the notebook PC and the PC throttles the CPU accordingly. When there is a connection problem on this center pin, the notebook gets stuck in the slow mode. I could juggle the wire to make the problem go away. The technical support at HP seemed unaware of this particular problem. (In general, HP tech support just isn't very good.)

Power supply failures seem very common on the computers. Now the smart AC adapter adds a new way for the power to fail.

Power factor

The power factor is defined as the ratio of real power and apparent power. Real power is consumed to do work (useful or not); apparent power is what is delivered by the generator. The concept is only relevant when we are looking at AC power. If power is delivered to an inductive load, there is no real power consumed and power is stored then returned back to the generator. There appears no power is lost, so why do we care about power factor? What does the generator see when it is driving an inductive load? If a person is to crank the generator, how does he feel? During one half of the cycle, he consumes energy to crank the generator; in the other half of the cycle, he gets pushed by the generator. Since a person cannot real absorb this returned energy, in a sense this energy is lost. Similarly in a power plant, a free spinning turbine cannot really put energy back. Then not to mention power is lost to wiring resistance. We need to maximize the power factor to get the most out of the generator.

Rectification

Traditionally rectification is simply a diode bridge and usually followed by capacitor filtering. It has peak detection characteristics, which means only small pulses of current are drawn at the peak voltages. The drawback is that not entire cycle is used, so power draw is not maximized. In other words, the power factor is low. We know that a resistive load achieves unity power factor. If we can design rectifiers so that the load appears resistive, i.e. the current is proportional to the input voltage, we maximize the power factor. We know in the switching mode converter the output voltage is a function of the duty cycle of pulse width modulation (PWM). If we adjust the duty cycle based on the AC input voltage, we can get a constant voltage output; this way power is extracted for the entire cycle.

Twisted pairs and differential signals

In a noisy environment, twisted-pair wiring is used in hope to achieve noise immunity. The thinking is that noise is cancelled as the common mode. People sometimes are surprised to find that twisted-pair wiring does not have the expected benefit. A lot times people simply twist the signals with ground wires. For the twisted pairs to be effective, the signals have to be balanced. When the noise is coupled to the signal lines, the noise voltage is depended on the signal source impedance. The signal impedance is likely to be larger than the ground wire, so unequal amount of noise is coupled to the signal and the ground wire and they are not cancelled out. Therefore, it is important to convert single-ended signal into balanced differential signals before sending through long twisted pair wires.

Common mode rejection

One method to measure a small differential signal riding on a large common mode voltage is to make the differential gain much larger than the common mode gain. With the standard difference amplifier configuration, we can have large differential gain with common mode gain determined by the common mode rejection ratio of the opamp. But one of its drawbacks is the low impedance inputs. If we add unity-gain buffers to get high impedance inputs, we would add common mode offset to the signals. One way to reduce the common mode influence is to raise only the differential gain at the input buffer stage. One arrangement is to take the non-inverting amplifier configuration, but instead of referencing to the ground, referencing to the common mode voltage by eliminating grounding connection from the ground resistors and connecting them together. Now only differential current flows through these resistors and gets amplified and the common-mode offset stays the same. The signals then are put through the difference amplifier to remove the common-mode voltage. This is the classic three-opamp instrumentation amplifier configuration.

Radiation Tolerant Electronics

On some occasions, an electrical engineer has to design radiation tolerant electronics for applications such as satellites, nuclear power plant. Radiations basically deposit certain amount of energy on electronic devices, which could generate electron-hole pairs, break bonds, create dislocations in crystal lattice. There are two effects, the total dose effect and the single event effect. The total dose effect is an accumulative effect , which traps charged particles or increases surface states causing changes in MOSFET thresholds. The single event effect could be single event latch-up (SEL) or single event transient (SET). The electron-hole pairs created by energetic particles trigger certain parasitic structures to turn on causing latch-up in CMOS. The single event latch-up can be mitigated with different device structures, such as epitaxial substrate or silicon on insulator. The single event transient can cause single event upset (SEU) in registers or memory cells when the disturbed electrical signals are latched into registers. The single event upset is usually mitigated with circuit design, single event upset immune register can be designed or with redundancy, in particular triple module redundancy.

Radiation tolerance level for total ionization dose (TID) is specified in Krad and for single event effect in linear energy transfer (LET). And the single event upset rate in number of errors per bit per day. In space applications, TID requirement is usually 100-300Krad (Si) , but in some extreme cases, such as Jupiter missions, TID could exceed 1Mrad.

Thermal Design

Thermal design does not usually appear in an electrical engineer's formal education. Circuit design classes may mention components' thermal resistance, but usually do not make a great impression on him. He may also encounter heat transfer in physics class and in applied math class where heat diffusion equations may be discussed and some partial differential equations with simplified boundary conditions may be solved, but connection to actual circuit design may not be apparent.

More likely, he first realizes the importance of thermal design when his fingers are burnt by an overheated component on a circuit board, or plastic case of a component melts, or a darken scorch mark is left on the circuit board, or sometime a formal thermal analysis is required. Knowledges in thermal design often come from IC manufacturers' application notes. Simple calculations can be made with thermal resistance numbers given in the data sheets, which usually include thermal resistances of junction to case and case to ambient. He is told that the thermal resistances can be treated as analog of electrical resistances, so can be simulated with circuit simulators.

Conductive thermal resistance is derived from Fourier's law of heat conduction, which states that heat flux is proportional to temperature gradient. The proportionality constant is the thermal conductivity. Thus the thermal resistance is distance divided by thermal conductivity multiplied by cross section area. So we can calculate that 1 sq inch of 2 oz copper has thermal resistance of 40 C/W in the lateral direction (copper thermal conductivity is 9 W/in C). A 20mil diameter via in a standard circuit board has thermal resistance of 67 C/W if plated with 1oz copper.

Convective heat transfer poses much greater challenge to analysis. In still air, we have natural convection, where heated air rises in the earth atmosphere. Note that when we design for satellites or spacecraft, natural convection does not exist. The Newton's law of cooling is a simplified theory to describe convection; it introduces a proportionality constant called convection heat-transfer coefficient or film coefficient. However, the film coefficient is hardly a constant, it is depended on many factors including temperature. Any sensible calculation would require computational fluid dynamics.

Radiation is another means of heat transfer. The heat flux is calculated from the Stefan-Boltzmann law. A square inch of a perfect blackbody emitter 50C above ambient room temperature radiates 1/4W. Circuit components may only radiate 1/3 as much depending on the emissivity.

A Survey of Microcontroller CPU Core Architectures: 8-bit core

Numerically microcontrollers dominate the processing world. They are ubiquitous in industrial controls and consumer electronics. And unlike the desktop computers that are dominated by the singular Intel architecture, microcontrollers are characterized by the greater varieties and multiple sources. Here I survey the CPU core architectures of the microcontrollers that I have used.

8-bit Microcontroller Core

Cypress PSoC M8C Core

We start with the M8C core in the Cypress PSoC (Programmable System-on-Chip) microcontrollers. PSoC is a very interesting microcontroller because of its configurable array of digital and analog peripherals. But here we focus on the CPU core which is among the simplest. Cypress says, "The M8C is a 4 MIPS 8-bit Harvard architecture microprocessor." This is 4 MIPS at 24MHz clock rate and it is a Harvard architecture because of separate data and instruction memory space access, typical for small microcontrollers with on-chip memory. Even Cypress would not call it high-performance.
M8C is an archaic architecture. It has five internal registers: Accumulator (A), Index (X), Program Counter (PC), Stack Pointer (SP) and Flags (F). The PC register is 16-bit and all others are 8-bit. M8C has three separate address spaces: ROM, RAM and Registers. ROM has its own 16-bit address bus and 8-bit data bus and RAM and Registers share 8-bit data bus and 8-bit address bus. But RAM and Registers are not in the same address space; Registers access is an I/O operation with its own read/write strobes. ROM, which is flash memory, has maximum 64KB address space; the largest device introduced so far has 32KB of flash memory. Registers space is two banks of 256 bytes. RAM consists of a number of 256-byte pages; the largest device introduced so far has 2KB of RAM. The Flags register is directly accessible via register address CPU_F. CPU_F contains the Zero, Carry, interrupt enable, register space and RAM page mode bit fields. The SP register points to a RAM address set up by the application.
There are thirty-seven types of instructions. The instruction lengths are one byte to three bytes and the instruction cycles range from four to fifteen. The operands of ALU instructions can be RAM locations; the logic instructions and the MOV (Move) instruction can access the Registers space. Given that the minimum instruction cycle is four, it appears that the CPU is not pipelined. The arithmetic instructions can vary from four to ten instruction cycles for different addressing modes. For instance, the instruction ADD, if the operands are Accumulator and an immediate value, the number of CPU cycles is four, possibly one cycle to fetch the first instruction byte, one cycle to decode, one cycle to fetch the second instruction byte and one cycle to execute. If the second operand is a direct RAM address, the number of CPU cycles is six, two additional cycles to access memory. If the second operand is indexed memory location, the number of cycles is seven, incurring one extra cycle to add the index register. If the first operand is a direct RAM address, the number of cycles is seven, one extra to write back the result. The JMP (Jump) instruction is two-byte long and takes five cycles, where two cycles may be expended on 16-bit address arithmetic. The conditional jump instruction such as JC (Jump if Carry) is also two-byte long and takes five cycles regardless if branch is taken or not. The CALL instruction is two-byte long and takes eleven cycles, which include two pushes to store the PC register on the stack and increment of stack register. The ROM area can be read with the instruction ROMX, which retrieves the byte addressed by the concatenation of register A and X or with the instruction INDEX, which retrieves ROM data relative to PC. There is also an instruction MVI that moves RAM data using data pointer with post increment. The stack is accessed with PUSH and POP and some instructions can operate on the SP register.
So we can conclude that the architecture implementation is very simple and takes up very little resource and the performance is relatively poor. The code density is good because of the variable length instructions and accessible ROM space. Preemptive multitasking is possible because the stack space and the stack pointer are accessible. The power of PSoC is not with its CPU core but with the configurable arrays. Some devices include a Multiply and Accumulate unit as a peripheral accessible through I/O registers. C compiler for M8C is available from HI-TECH Software and ImageCraft. The ImageCraft compiler is based on David Hanson's lcc retargetable compiler. There is also an open-source utility package m8cutils which contains assembler, disassembler, programmer and simulator.

Holtek HT46

The Holtek's microcontroller is billed as a cost-effective MCU and Holtek calls the HT46 series 8-bit high performance RISC architecture microcontrollers. They are either OTP (one-time-programmable) or mask type. They feature up to 8K-word program memory and up to 384-byte data memory. Most of the instructions use one instruction cycle and some branch instructions and table read instructions use two instruction cycles. One instruction cycle is actually four clock cycles. A two-stage pipeline is employed: Fetch and Execute. If a branch instruction is encountered, the pipeline is flushed. A separate memory is used for the stack indexed by the stack pointer (SP). The stack is 6, 8 or 16 levels deep, which limits the levels of subroutine calls but not a serious restriction for a small microcontroller. The I/O registers and the CPU registers all reside in the data memory space, addressable by the same instructions as the data memory. The CPU registers include accumulator (ACC), low byte of the program counter (PCL), status (STATUS), look-up table registers (TBLP and TBLH). The status register contains the usual CPU status bits: zero flag (Z), carry flag (C), auxiliary carry flag (AC), overflow flag (OV), as well as power down flag (PDF), and watchdog time-out flag (TO). All instructions are one-instruction word long, which can be 14, 15 or 16-bit depended on the subtypes to accommodate the size of the program counter. The instructions operate on data memory directly, between data memory and accumulator or between accumulator and an immediate value. The program memory can be read with the table read instructions. The skip instructions are used for conditional branching. The subroutine call instruction pushes the program counter to the stack and the return instruction restores the program counter. Besides the regular ALU instructions, there is a decimal adjust instruction for BCD (binary coded decimal) support. The instructions use direct memory addressing, but index memory addressing is partially supported through two additional registers, memory pointer (MP) and indirect addressing register (IAR). MP stores memory address and accessing IAR result in the memory address pointed by MP.
So the HT46 has fixed length instructions, which facilitate decoding. It does not really use the load/store architecture, though the data memory can be considered as general-purpose registers. But the accumulator is a special register in the instruction set. Mapping the I/O registers in the data memory and not having the indexed memory or other addressing modes simplify the instruction set. The two-stage pipeline is too short, so the instruction cycle takes too many clock cycles, not achieving the maximum performance. The simplicity of the instruction set probably leads to a very compact implementation, hence very low cost. The actual instruction encoding is not published, so there is little third-party development tools.

Microchip PIC

Microchip PIC microcontrollers are popular especially among the hobbyists. Here we focus on the mid-range devices, the PIC16 series.
The PIC16 has separate data memory buses. The data memory is made up of CPU registers, I/O registers and general-purpose registers. The program counter can be accessed through two registers PCL and PCLATH. The instruction length is 14-bit. A two-stage pipeline is used for fetch and execution; each instruction cycle is 4 clock cycles. Most of the instructions take one instruction cycle and the branching instructions takes two cycles. It has an eight-level hardware stack. There are thirty five instructions. Most of the instructions involve the accumulator (W). The instructions between W and a file register contain a bit which determines if the result goes to W or the file register. The file register is encoded with seven bits, for total 128 bytes. The Register Bank Select bits in the Status Register allows for larger file registers access. The skip instructions are for conditional branching. The lower eleven bit of PC is encoded in the unconditional branch instruction and the upper two bits come from the PCLATH register.

Xilinx PicoBlaze

PicoBlaze is an 8-bit RISC microcontroller soft core for Xilinx FPGA's. Xilinx does distribute synthesizable VHDL/Verilog code, but the code is structural rather than behavioral. The use of Xilinx FPGA primitives in the code prevents it from being used on other programmable devices. PicoBlaze evolves from Ken Chapman's programmable state machines.
PicoBlaze has sixteen general-purpose registers, 64-byte RAM and 1K 18-bit words program memory and 256 I/O ports. There is a separate 31 deep stack area for subroutine calls. The ALU instructions operate on the general-purpose registers and immediate values. It uses load/store architecture with the FETCH and STORE instructions to access the RAM space and the INPUT and OUTPUT instructions for I/O space. There are conditional jump instructions as well as conditional call instructions. There is no direct access to the status register, so separate instructions are provided to enable or disable interrupt. The return from interrupt instruction restores carry, zero and interrupt flags.
So here we have fixed-length instruction for easy decoding, uniform register-based ALU operations and simple addressing mode. There is no instruction to access the program space. The execution is not pipelined; fixed two-clock instruction cycle constrains the maximum speed attainable. Its implementation on Xilinx FPGA takes 96 slices (106 LUTs and 76 Flip flops) and 1 BlockRAM. 100MIPS is possible on some Xilinx FPGA's. There is a free behavioral implementation of PicoBlaze called PacoBlaze supported by a Java version of the assembler.

Atmel AVR

Atmel's AVR microcontrollers are among the most popular. Atmel offers an extensive line of microcontrollers based AVR complemented with a full set of peripherals. They range from the tiny 1KB Flash/no SRAM/1.2MHz to the XMEGA 384KB Flash/32KB SRAM/32MHz.
AVR has 32 8-bit general-purpose registers. The instructions are fixed 16-bit long (with a few exceptions) and the instruction encoding is quite regular with fixed location for source and destination registers and immediate value. The status register (SREG) is not directly accessible; special instructions are used to set, clear or test certain bit field in the status register. Most instructions execute in one to two cycles; some branch instructions take longer. A two-stage pipeline is used for fetch and execution. The AVR instruction set is rich and strives for performance rather than minimalist. The multiplication instructions are included. The program memory, RAM and I/O space are separate: the RAM space is accessed by the LD/ST instructions, the I/O space by IN/OUT instructions and the program memory by LPM/SPM instructions. The CPU registers, I/O registers and RAM are actually in the address space: the 32 registers occupy the address 0x0000 to 0x001F, followed by the I/O registers  from 0x0020 to 0x003F and the SRAM.  But when 32 I/O registers are not enough, extended I/O registers have to accessed by the LD/ST instructions.  The stack is maintained by the stack register, uses the RAM area and is accessed by PUSH/POP. Many different addressing modes are available to load and store instructions, including pointers with pre-decrement and post-increment. The registers 26-31 serve as the indirect address registers. The load/store instructions take two cycles, not penalized by complex addressing modes. There are branch instructions for almost every conceivable condition. If the branch is taken, an extra cycle is incurred to flush the pipeline. The call instruction takes 4 cycles, two extra cycles to push PC to the stack. The return and return-from-interrupt instruction take 4 cycles; the status register is not restored by the RETI instruction.
Here we see an implementation of modern RISC architecture: a large register set, fixed-length and uniform instruction format, load-store operations, simple addressing mode for ALU instructions. The development is well supported by the GNU compiler tools, including the Windows version winavr. Different versions of AVR core in VHDL are available open-source; pavr is a pipelined implementation of the instruction set, where a 6-stage pipeline is used to achieve high clock rate.

Zilog eZ80

The Z80 has the heritage of 8080. The latest from Zilog is the eZ80 series, which we'll focus on. eZ80 is upward object-code compatible with Z80 and Z180.
eZ80 uses a three-stage pipeline: fetch, decode and execute. The eZ80 CPU has two banks of 8-bit registers, which include the accumulator (A), six working registers (B, C, D, E, H, L) and the Flag register (F). The working registers can combine to form 16-bit registers. The control registers include Interrupt Page Address Register (I), Index Register (IX, IY), Memory Mode Base Address Register (MBASE), Program Counter Register (PC, 16 or 24-bit), Refresh Counter Register (R) and Stack Pointer Register (SPL for 24-bit and SPS for 16-bit). eZ80 can operate in two modes, the 16-bit addressing Z80 mode and the 24-bit addressing ADL mode.
This is a fairly sophiscated CISC architecture. Maintaining compatibility and increasing performance (four times faster and 256 timer larger address space) are important objectives in this implementation.

8051

The venerable 8051 is originated from Intel, widely second-sourced and is still popular. It serves as the embedded processor for the Cypress USB controllers and the Chipcon (now TI) Zigbee transceiver. The microcontrollers are available from Atmel, Silicon Laboratory (formerly Cygnal), NXP (formerly Philips), Dallas Semiconductor (Maxim), Cypress PSOC3 etc.  And HDL cores, including several open-source versions, are also available for FPGAs.  SDCC is an open-source C compiler that targets 8051.
8051 was introduced around 1980 as an enhancement to the 8048 architecture.  The original 8051 was fabricated on an nMOS process and had 60,000 transistors with 4KB factory mask-programmable ROM.
The 8051 is an accumulator based architecture: all the ALU instructions go through the register A.  Another register B is dedicated for the multiply and divide instructions.  The 8051 has separate code and data spaces.  The Program Counter is 16-bit addressing the 64K program memory space.  The registers A, B, SP (Stack Pointer), PSW (Processor Status Word), DPTR (Data Pointer 16-bit) are also mapped into the 128-byte Special Functions Registers (SFR) address space (address 128-255).  The Stack Pointer is 8-bit, only addressing only the internal RAM.  The PSW register includes the carry, auxiliary carry, user flag, register band select, overflow and parity flags.  The flags are updated as the result of the instruction. The SFR also hosts registers for peripherals.  There are 4 banks of general-purpose registers R0-7; they are also mapped to the internal RAM (address 0-31).  The internal RAM is 128 bytes.  The 8051 architecture actually allows 256 bytes of internal RAM; the upper 128 bytes overlap with the SFRs.  However, they can be accessed with the register-indirect addressing mode.  The SFRs can only be accessed with the direct addressing mode.  For address 20h to 2fh is 128 bits of bit addressable memory.  The SFR are also bit addressable.  They make it easy for bit operations. 
The instructions are 1 to 3 bytes long.  There are five basic addressing modes: register, direct, register-indirect, immediate and base-register-plus-index-register-indirect.  The register addressing mode accesses registers R0-7, the direct addressing mode accesses internal RAM address 0-255 and the register-indirect addressing mode accesses memory address in register R0 and R1.  The instruction MOVC is used to move a byte from the program memory using the base-register-plus-index-register-indirect mode where the accumulator is the index register and DPTR or PC is the base-register; this instruction is for table-lookup.  And MOVX is for moving data between the external data memory and the accumulator.  In addition to the usual arithmetic and logic instruction, there is an instruction for packed BCD format.  The jump instructions can be PC relative, 11-bit or 16-bit absolute address or indirect through DPTR.  The subroutine calls save PC on the stack and the return instructions restore PC from the stack.  The CPU begins execution from program memory address 0000h.  The interrupt vectors starts from address 0003h with 8-byte spacing.
The original 8051 runs off a 12MHz clock.  One instruction cycle takes 12 clock cycles.  The instructions takes 1 or 2 instruction cycles, except for MUL/DIV taking 4 instruction cycles.  The more recent implementation achieves one instruction cycle per clock cycle.  Clearly, executions are not pipe-lined.

68HC11

The 68HC11 also uses an accumulator architecture.  It has two 8-bit accumulator registers A and B, which can combine into a 16-bit register D.  The program counter (PC), the stack pointer (SP) and two index registers IX and IY are all 16-bit.   The condition code register (CCR) is 8-bit, which holds carry, overflow, zero, negative and interrupt mask.  The 68HC11 has a unified memory space: data memory, program memory and memory mapped peripheral registers reside in the same 64KB address space.  The are five addressing modes: immediate, direct, extended, indexed and relative.  The direct addressing mode is for the lower 256 bytes of RAM.  The extended addressing mode covers the entire 16-bit address.  The indexed addressing mode uses the index registers plus an 8-bit signed offset.   The relative addressing mode is for branching relative to PC with an 8-bit signed offset.    The instruction set supports some 16-bit operation through register D, such as adding two 16-bit numbers.  
In general, operations with the immediate addressing mode take two cycles, the direct three cycles, the extended and the index with IX four cycles and the index with IY five cycles.  The instructions that uses IY index addressing mode have one extra byte of op code.  The branch and jump instructions take three cycles and the jump to subroutine instruction takes 5-7 cycles depending the addressing modes. 
Because the stack pointer is 16-bit and can point any memory location, the 68HC11 is less restrictive than the 8051.

PCB layout: Techniques

Two-layer Board

Two-layer boards are the most common for low-cost applications. The PCB vendors normally offer 6mil minimum trace width and trace separation and 14mil minimum hole size. Some offer boards without solder masks at very attractive prices. Because of the lack of dedicated power and ground planes, the quality of the layout is largely determined how to manage power and ground traces. And the effective use of copper area for ground can reduce noise coupling and EMI. Good layout software will help a great deal; time spent on learning to use the tools is worthwhile. Knowing the keyboard shortcuts will improve efficiency. The following discussion assumes features supported by Protel or Orcad.
Setup
The first thing is to set up rules for trace width and clearance. Set minimum width and clearance higher than the fab house minimum. It improves board durability and reduces fab related issues. Set nominal (or preferred) width larger than the minimum and power and ground traces should have larger nominal width, but allow smaller width so they can be connected to pads. Then select a default via size. Make sure to specify an adequate annular ring size, usually at least the minimum trace width. I prefer to have solder mask over vias (tented vias). Some layout software let you specify the solder mask expansion, usually 3-4mils. The thermal relief should also be specified. The layout process is error-prone; make the computer do the check. The next thing is to set up the board outline and as well as the keep out area. The traces/holes should not be too close to the edge, say 50mil from the edge.
Placement
A good layout is predicated on a good placement. The placement grid makes easier to align certain components. Set an appropriate grid size based on the packages; 20mil is probably a good choice for placement and 10mil for routing. Choosing a different rats nest color for power nets can be helpful. Components can usually be grouped logically, such as power, analog, digital, RF etc. Be careful not to mix noisy digital signals with sensitive analog signals. High current sections should be closer to the power regulator to minimize noise. In the first pass, bring in the related components proximity. Cross-probing with schematics is useful here. There are usually some constraints on some components, such as mounting holes, connectors etc. Place these first and lock their locations. The other components locations and orientations are adjusted to minimize trace lengths to other connected components. Here some experiences come in to judge the ease of layout later. Consider the critical nets first; ensure that they have direct and short paths. It takes some trial and error; move components around until the distribution of nets seems even and not clustered together. Look at the density map if the software produces it. If certain areas appear congested, increase room that allows more traces to route. Place fiducial marks. Do a design rule check before moving on. The placement in the end should also be aesthetically pleasing.
Routing
Route the critical nets first. Then route the power and ground. Route analog nets and high speed nets before others. Do not worry too much about getting all traces neat and tidy first. Clean up later. In general, orient the trace in one direction on the top layer and in the orthogonal direction on the bottom layer. The goal is to minimize trace length and number of vias. Once all traces are routed, examine the nets again to optimize: straighten out traces, widen power and ground traces, reroutes traces to remove vias, etc. In the end, create copper pour for ground and power. Make small adjustments for maximum continuous copper area. The ability to turn on and off copper area would be helpful. Add teardrop if the tool supports it.
Verification
Check routing statistics to make all traces enabled and routed. Run DRC and resolve any issues. Silk screen labels orientation should be consistent. Include pin 1 and polarization marking. Leave a silk screen area for labeling. Add necessary labels, such as jumper settings. Make sure no silk screen over pads; I wish there is a design rule for that. Move silk screen labels away from via holes, which make silk screen labels illegible. Look at individual layers by itself, especially the solder mask to ensure the pads are exposed. Look at layers in the outline mode. If the software can generate 3D view, take a look to see if board looks correct.
Output
The last step is to generate the CAM files, usually Gerber files. Load Gerber files into a Gerber viewer. Look at the bottom layer from the bottom side. Also produce drawing files in format such as PDF. Generate composite top and bottom view, drill drawing with drill tables. Print out a copy and look at it. When designing a board that plugs into another board, it is helpful to print the layout on a transparency and do a fit check again the other board to make sure connectors and holes line up correctly.