Sunday, December 11, 2011

A Survey of Microcontroller CPU Core Architectures: 32-bit core

Continued previous post.

ARM7

ARM7 is a popular 32-bit architecture for microcontrollers; it has implemented in NXP (formerly Philips) LPC series, Analog Devices ADuC7000, Atmel AT91.  The architectural efficiency is 0.9 Dhrystone MIPS/MHz, which is about the integer performance of Intel 486.  Most commonly used architecture is ARM7TDMI-S, which includes the Thumb instruction set.  60MHz is the common top clock rate.  (See also the open source tools for the ARM processors.)

The ARM7 uses a relatively short 3-stage pipeline.  Registers R0-R13 are truly general-purpose 32-bit registers; R14 is used as the link register for subroutine calls and R15 holds the program counter. The ARM instructions are 32-bit long.  As typical of the RISC machine, it uses a load-store architecture.  The addressing modes include PC-relative, indexed with offset and the optional of auto-increment.  One unique feature is that every instruction is conditionally executed according to a 4-bit condition field in the instruction.  There are 15 different conditions, depended on the Z (zero), C (carry), N (negative), V (overflow) flags in the program status register CPSR.  This conditional execution feature reduces the need for branching.  The long instruction word allows the ALU operations to have independent second operand. The multiply and multiply-accumulate instructions use arrays of 8-bit Booth's algorithm; so the number of instruction cycles depend on how many of arrays is activated.  It does not have a division instruction.

 The Thumb mode is introduced to improve the code density.  The instruction length is reduced to 16 bits and the accessible number of registers to 8.  

ARM7 is superseded by Cortex-M.

ARM Cortex-M

ARM Cortex-M has several flavors.

ARM Cortex-M0

Cortex-M0 is ARM's low-power architecture with low gate count for deeply embedded systems. It implements ARMv6-M Thumb instruction set.
Of the 16 core registers, 12 of them R0-R12 are general-purpose. R13 is the stack pointer, R14 is the link register for subroutine calls, and R15 is the program counter. There is a program status register (PSR).

ARM Cortex-M3

Cortex-M3 is the replacement for  ARM7TDMI.  One of the most visible changes to the programmers is the addition of the hardware division instruction.   The Cortex-M3 implements the Thumb-2 instructions, which are a superset of Thumb instructions.  The Thumb-2 code size is only slightly larger than the Thumb code, but the performance is close to the ARM instructions.

The three-stage pipeline is augmented with branch prediction.  The relatively short pipeline is not pushing the clock speed, which is around 100MHz, sufficient for the applications of this microcontroller.

ARM Cortex-M4

Cortex-M4 includes single precision float number instruments.

ARM Cortex-R

Cortex-R is the real-time flavor of the Cortex core.  Its performance is between Cortex-A and Cortex-M.  

Hitachi/Renesas SH

The SuperH is a RISC architecture that is designed for mobile and embedded applications.  High code density drives the choice of 16-bit fixed instruction length.  It has 16 32-bit general registers; R0 is also used as an index register and R15 is used for stack pointer.  The 3 32-bit control registers are Status Register which holds instruction status bits (such as the T bit for conditional instructions) and interrupt mask bits, Global Base Register (GBR) for indirect addressing modes, VBR for exception processing.  Of the 4 32-bit system registers, two are used to store MAC results,  one is Procedure Register that stores the return address of a subroutine call and the other is the Program Counter.  System control instructions can operate on the system registers.  

Like other RISC architectures, SH is a load/store architecture with the several addressing modes: immediate data, register indirect with increment/decrement and 4-bit displacement, register indirect R0 indexed , GBR with displacement, GBR indirect R0 index, PC relative with 8 and 12-bit displacement and PC relative with register.

The 16-bit instruction set follows a fixed and very regular encoding pattern with great simplicity and elegance.  The 16-bit instruction is essentially divided into 4 nibbles; the operands are limited to 2 registers and the maximum displacement is 12 bits.  It starts with 4-bit opcode (and some have 4,8,12 more bits), two 4-bit fields to encode two register operands, the displacement can be 4, 8 or 12 bits and the immediate value is 8-bits.  

For the branching instructions, there are PC relative 12-bit displacement conditional/unconditional and procedure branch, and PC relative register or register indirect branch.  There are instructions for exception processing, TRAPA and RTE, which can be used to implement system calls.  There is no memory management.

A five stage pipeline (IF, ID, EX, MA,WB) with delayed branch is used. Most of instructions execute in one cycle; the branching takes two cycles.  Multiply and MAC can take 2-4 cycles.  IF and MA stages require memory access which can take more than one cycle.  The delayed branch is used to reduce pipeline stall by executing the delay slot while the branch destination is being resolved.  This means that writing the assembly code, the instruction that follows the branch is executed first; nop may have to be used if no useful instruction can be placed.  Delayed branch is a simple scheme that is no longer used in the newer processor as more advanced branch prediction is normally used.

Floating number instructions are added to the SH2A-FPU with 16 32-bit single precision floating point registers, which also form 8 64-bit double precision registers.  The FPU instructions are still encoded in the same 16-bit format.  But to distinguish single precision with double precision instruction, a precision mode bit is used in the FP status/control register. Most SP instructions execute in 1 cycle, including MAC; only division and square root take 10 and 9 cycles respectively.  The DP instructions take long: data movement takes 2 cycles, add/subtract/multiply takes 6 cycles, division 23 cycles, sqrt 22 cycles.

High end SH4 was used in Sega Dreamcast game console.  A 64-bit version of SH5 was defined, but no hardware was produced.  SH-2 is still being produced for deep embedded systems.  But the architecture did not have widespread use.  With the patents expired, there was some effort to create soft core, notably J-core project.  The architecture is supported by GCC.

Renesas RX 

Variable length instruction, 16 general-purpose registers, R0 also doubles for the stack pointer.  32-bit floating number

MIPS

SPARC V8

Saturday, October 15, 2011

Low temperature electronics

The commercial temperature range for electronic components is 0 °C to 70 °C, industrial −40 °C to 85 °C and military −55 °C to 125 °C.  So what happens if you take parts beyond these ranges?  Do they fail immediately? Usually not.  It is very much component dependent.  Some components are quite resilient to very low temperature.

Components generally degrade gradually over temperatures, but after some temperature, some might exhibit large changes and some might fall apart completely.  There are some reasons for these abrupt changes.  For instance, a gate driver stops working completely below -120C because the logic threshold increases too much and the input fails to trigger.  One voltage regulator falls apart after -130C and another regulator with similar specifications holds up well at -150C; this can be attributed to some small design differences in the current limit and thermal limit circuitry.  The SRAM based FPGA can fail in an unpredictable and dangerous way.  As the temperature goes below -125C, the increase in transistor threshold can cause the SRAM bits that hold the logic configuration to flip which could completely alter the logic output.

There is no good way of knowing which part might do better at low temperatures other than testing and screening.

Saturday, September 17, 2011

Batteries

  • Samsung cell phone battery Li-ion 3.7V 900mAh 4.6x3.4x0.5cm 19g
  • Samsung cell phone battery Li-ion 3.7V 1000mAh 21g
  • Energizer NiMH AA 1.2V 2300mAh 5.0cm O1.4cm 29g
  • Energizer NiMH AAA 1.2V 850mAh 4.5cm O1.0cm 12g
  • Energizer Akkaline C 1.5V 5.0cm O2.5cm
  • Duracell Alkaline MN1500 AA(LR6) 1.5V 5.0cm O1.4cm 24g
  • Duracell Alkaline MN2400 AAA(LR03) 1.5V 4.5cm O1.0cm 11g
  • Energizer 377/376 1.55V Silver Oxide [SR626W]
  • Energizer 364 1.55V Silver Oxide [SR621SW]
  • Rayovac Hybrid NiMH AA 1.2V 2100mAh 28g

Monday, June 6, 2011

Open-source tools for ARM microcontrollers

The ARM (Advanced RISC Machine) architecture appears in many 32-bit microcontrollers. I have NXP (Philips) LPC2138 (ARM7TMDI-S) and Luminary Micro LM3S811 (ARM7 Cortex-M3) microcontrollers. Embedded software development is well supported by open-source tools.

My current preferred development environment is Cygwin running under Windows, but Linux works well too. The GNU cross compiler for ARM7 can be built to run with Cygwin. Binutils, GCC, and GDB source code can be configured and compiled under Cygwin. The graphic user interface for gdb is available with insight and ddd. Windows Emacs also works with gdb. The Cortex-M3 support is not available with earlier versions of GCC. A relatively compact C library can be built from newlib.  The flash programming for LPC21xx can be done with the open source program lpc21isp.

The ARM7 microcontrollers have on-chip debugging capability with JTAG interface. Openocd (Open On-Chip Debugger) is the package for flash memory download and debugging interface for gdb. Openocd supports a number of JTAG hardwares, including simply parallel port "wiggler" interface, USB FT2232 interface. It can also be built under Cygwin. The parallel port interface requires the giveio driver (from AVRDUDE). The driver package (including .h and .lib) from FTDI can be used for the FT2232 interface. An open-source driver for FT2232 is also available.

The Luminary Micro's LM3S811 eval board use FTDI's FT2232 chip to drive the JTAG signals. It can also be used as the JTAG interface for external devices. Despite of lacking TRST (Tap Reset) and SRST (System Reset) signals at the external connector, it is able to interface with LPC2138. A small modification to the LM3S811 eval board allows the control of TRST. By lifting the resistor connecting to the USB_RSTn line and soldering a haywire to the connector TRST pad, we have the control of TRST. When we debug the on-board microcontroller, we need to insert a jumper wire between the TRST pin and the external debug pin. Note that the existing openocd software thinks it is toggling SRST, but I found it works quite well with LPC2138.  [Update: The later version of the LM3S811 Eval board has the SRST signal added.]
Modified to debug external device with OpenOCD.

Thus embedded software development is possible with the set of open-source tools. If a real-time operating system is desired, FreeRTOS has been ported to a number of ARM microcontrollers.

However, it is worth noting that the code generated by the GNU compiler seems far less efficient than the commercial compilers.  My single-precision Whetstone benchmark testing has shown it to be 6 times as slow and double-precision to be 3 times as slow with almost 3 times of code size.

Sunday, May 29, 2011

Solid state relay

Solid state relays that carry large current are bulky.  A simply way to have a small optocoupler driving a power MOSFET.

Monday, March 28, 2011

Layout Check List

  • DRC
    • Make sure the rules are complete
    • Resolve all DRC errors
  • Footprint verification
    • Check for footprint orientation, pin assignment, especially  transistors
    • Check for soldermask occlusion
  • Components clearance in 3D
    • Check for components extrusion beyond footprint, especially edge connectors
  • Silkscreen visibility
    • Check for overlap with holes
    • Check for overlap with components extrusion beyond footprint
    • Check for component designators
  • Fiducial marks
    • Three fiducial marks for the board
    • Fiducial marks for high pin count components
  • Polarization marks
    • Mark all polarized components
    • Mark Pin 1 of components/connectors
  • Logo/labels
    • Company logo
    • Board name
    • Fab date
    • Serial number
  • Gerber inspection
    • Inspect Gerber files with possible a different viewer
    • View in skeleton mode
    • View plane layer in positive mode

Tuesday, January 25, 2011

Miniature servo drive

A miniature servo motor may require a voltage of 5V or less and current of 200mA.  An audio amplifier is ideally suitable to drive this type of motors.  It normally uses a bridge tied load (BTL) configuration, which makes H-bridge unnecessary.  It is capably of 1-2W into 4 or 8 Ohm speaker, so it can source and sink the required motor current and the bandwidth is more than adequate for a servo drive.  The National LM4866 2.2W stereo audio amplifier in the small TSSOP-20 package can drive two small motors.  It is likely to result in lower components and smaller footprint than other solutions.