System16 - 16 bit CPU Core
           
    Introduction:
      The CPU core design I am working on at home started off as a 6809 VHDL
   design,  which was too big to 
      fit in a  200K gate FPGA. I am in the process of simplifying the design
  to a 16bit architecture with 14 general
      purpose registers, but using the 6809 ALU design extened to 16 bits.
  Although   initially this looks like a larger 
       design, I am hoping to reduce the size of the core by  making the
CPU   more  orthogonal. All the registers
      are 16  bit including the Program Counter so it can address 64Kwords.The 
  biggest file in the 6809  design was
      in the state sequencer for the instructions, so I am hoping a more
regular    design will simplify that. 
      
       I am using the OpenCores.org mini Uart and MC6845 CRTC core around 
the    CPU core to provide a complete 
       SOC (System On a Chip). I am looking to multiplex the 16 bit 15nsec
 SRAM    between the CPU and the CRTC,
      and use a 24MHz Clock (41 nsec cycle period).
      
           
    
           
    
    
    
    
    
           
    Design Updates:
     
    23 May 2002
 I've modified the addressing modes so there are now no indirect indexed
addressing mode, only absolute
 and PC relative indirect addressing. I've defined a Constant addressing
mode, which uses the effective address
 register number as an immediate constant. Note that the constant addressing 
mode is valid for both absolute
 and indexed addressing modes. Effective Address register 7 is used to signify 
absolute addressing but the
 register number is used as a constant if the EA mode is set for constant 
addressing. I've also added indexed
 (no offset) and indexed PC relative (no offset) instructions to supplement 
the addressing modes.
     
 I've added static (#n) and dynamic (Rea) Shift instructions and re-instated 
the Bit operators. These instructions
 only operate on the target register (Rt). The Target Register is assumed 
to be 16 bit and the Control Register
 (Rea) always 8 bit.  Rotate Left and Rotate Right, (ROL and ROR) have 
been renamed for this purpose and
 do not use the carry bit. The MSBs are wrapped around to the LSBs using
a barrel shifter.
     
 The single operand instructions are now called RCL and RCR (Rotate through 
Carry Left and Rotate
 through Carry Right) which is a better description of what the instructions 
actually do.
 The Shift Register Left and Shift Register Right (SHL and SHR) instructions 
shift the register up or down
 by the specified number of bits without wraping around, using the carry
bit as the MSB for SHR or LSB for SHL.
     
 I've also simplified the Push and Pull instructions so they only push and 
pull one register. Its a bit tedious saving
 a batch of registers, but it means that the inherent instructions are all 
single opcodes, rather than some having a
 second word. The Push and Pull instructions will always work on both higher 
and lower 8 bit register pairs.
 The LSB of the register number defines big or little endian storage.
     
 The software interrupt now uses the effective address register field as
a 4 bit vector number. The top 16 words
 of memory can be used for Interrupt Vectors. Note that software interrupts 
and hardware interrupts share the same
 vectors. I'm not sure if this is a good idea. If they do, software interrupt 
will have to mask the interrupts in the
 interrupt mask word just like hardware interrupts. This means that low priority 
software interrupts will be blocked
 by high priority hardware or software interrupts, so there is a possibility 
that you can hang the CPU :-(
    
I'm not sure if I should have 15 interrupts + Reset or 7 interrupts + Reset
     
 The 68000 only prioritises hardware interrupts. Presumably the interrupt 
status is not cleared on a return from
 interrupt, so if you return from a SWI or TRAP it won't affect the harware 
interrupt status. hmmmmmm..... 
     
I've removed the condition codes from Register 7 and made Register 7 the
stack pointer rather than register6.
It seemed pointless being able to perform arithmetic operations on the Condition
Codes and Interrupt Mask.
Additional instructions have been added to Load Condition Codes (LCC), Store
Condition Coded (SCC),
And Condition Codes (ACC) and Or Condition Codes (OCC). I've also added Load
Interrupt Mask (LIM)
Store Interrupt Mast (SIM), And Interrupt Mask (AIM) and Or Interrupt Mask
(OIM).
All Condition Code and Interrupt Mask instructions use an 8 bit register
as the argument. 
    
This does mean you cannot use the stack pointer as an effective address.
This is probably a good idea, although
it does mean any stack based operations, such as manipulating local variables
in a C function must be done
with a frame pointer. You have to load another register with the stack pointer
and index the stack that way.
A LINK and UNLINK instruction would be handy to allocate stack space, but
I have run out of opcode space.
           
    7th March 2002
  It was suggested to me some time back that the idea of making a purely
16  bit machine was fraught
  with peril and that most designed were eventually modified for byte manipulation.
 To that end I have
  made the register file 16 x 8 bit registers or 8 x 16 bit registers. The
 Size field is now only one bit,
  selecting either byte (8 bit) or Word (16 bit) which gives scope to expand
 the addressing modes by
  one bit.
      
  The addressing range will be 64Kbytes and not 64K words as before. There
 will be two 8 bit ALUs
  that can be concatenated for 16 bit operations. A byte reversal switch, 
to the left and right sides of
  both ALUs are derived formed from the LSB of the register number. This
will  allow access to the
  upper or lower byte of a register, to the lower ALU, and doubles as a big
 endian / little endian switch
  on word accesses. The upper ALU will pass the opposite byte in 8 bit mode
 which means that all
  memory accesses can be 16 bit. Note that 16 bit accesses must be byte aligned;
 a bit messy admittedly,
  but the 68000 survived doing that for many years.
      
  I have not modified the assembler or simulator at this stage.
      
           
    17th November 2001
   I've been a bit slack on the design these past couple of months, being 
busy with a 6809 Flex computer
      
   proposal for the Flex and UniFlex
 Users group
   . I am also interested in designing a board using the
      
  68EZ328 Dragon Ball
    processor as used by the Open Hardware
 group
   ,  running uCLinux
    with a
   Spartan II FPGA for image processing.
      
   I have gone through the instruction cycle timing, but its not complete 
yet. I have also used the Cygwin tools
      
   to compile an assembler and simulator. The assembler and simulator are 
not complete, but I've put them up
   on the web in case anyone would like to look at them or work on them (as
 unlikely as that may seem).
      
           
    20th August 2001
     1. I have re-arranged the instructions because I needed to add an LEA
 (Load  Effective Address) instruction
     for position independant code as well as an EOR (Exclusive OR), BIT
(Bit   Test) and MUL (8 bit Multiply).
      
     2. The static shift operators have been limited to one bit shifts, because 
  that is what is used most in arithmetic
     calculations. More than one shift can be done with multiple instructions 
  or a loop. I put the Shift instructions
     in the Single Operand line.
      
     3. The static Bit operators have been removed, as I figured that can 
just   as easily be done with other instructions:
     "BIT" for "BTST", "AND" for "BCLR", "OR" for "BSET" and "EOR" for "BCHG". 
  The only difference is that
     bit operators work on a bit number where as the logical operators work 
 on  a bit mask, which is probably more
     useful.
      
     4. Conditional Branches are now all PC Relative. An 8 bit signed offset
  is included in the instruction.This is more
      consistant with the original 6809. I figured I could use a zero offset
  to indicate that the following byte was a
     16bit offset for Long Branches.
      
           
    5th October 2001
    1. Added "Word" (.W), "Low Byte" (.L), "High Byte" (.H) and "Double Word"
  (.D) to the opcode map.
    Double word format is still being worked out, however it is propose that
  it will be microcoded as two 16 bit
    word operations with the appropriate condition codes carried over from
 the  second operation.
      
    2. The assembler mnemonics for bracnches will specificly refer to Long
 Branches  (LBRA) as well as short
    branches (BRA) even though it is the same opcode. The reason is to avoid
  phasing errors that result from
    trying to guess the length of the intruction in forward references. ie. 
 you cannot guess in the first pass if a
    forward reference offset is larger or smaller than 128 bytes.
      
    3. Moved inherent operators under single operand instructions to save 
space  in the opcode map. Inherent
    operators do not have a size or effective addressing mode so those bits 
 are used for sub instruction decoding.
      
    4. Conditional Branches have been moved form 0010 to 0001
    Load Effective Address (LEA) has been moved from 1111 to 0010
    This leaves a spare opcode at 1111 (f line).
           
    Tools:
           
    Assembler:
   The assembler is based on the Motorola microprocessor assembler suite. 
I  have changed the output format
   to a modified Intel Hex format to match the simulator. The assembler does 
 not support byte addressing any
   more. All byte accesses must be specifically specified in the instruction 
 as a .W word, .L lowbyte or .H high
   byte instruction. Psuedo ops have also been modified to reflect word addressing 
 rather than byte addressing. 
      
           
    Simulator:
   I have re-written a simulator for the System 16 in  C++. It is based on
 Ray Bellis's Usim0.91 for the 6809
   but is so different as to be considered a complete re-write. I have included 
 my FD1771 Floppy disk controller
   simulator for all it's worth and I need to modify the mc6850 similation
 code to match the MiniUART design
   from opencores.org.
      
   Its neither the Assembler or Simulator are complete, but you can down
load  it to take a look at what I have
   done so far. I am using Cygwin to compile it.
      
      UAsm16.tgz
      
      USim16.tgz
      
      
           
    Kbug16:
   The monitor program for the simulator needs a lot of work. I started off 
 with a 6809 monitor program, but the
   stack manipulation, and major change in register allocation make a complete 
 re-write necessary.
      
           
    Processor Model:
      
           
    Registers:
           
    
        
          
            R0.H 
             | 
            R0.L 
             | 
          
          
            R1.H 
             | 
            R1.L 
             | 
          
          
            R2.H 
             | 
            R2.L 
             | 
          
          
            R3.H 
             | 
            R3.L 
             | 
          
          
            R4.H 
             | 
            R4.L 
             | 
          
          
            R5.H 
             | 
            R5.L 
             | 
          
          
            R6.H 
             | 
          R6.L 
           | 
          
          
          | 
           Stack pointer R7 
           
           | 
        
        
            | Interrupt Masks IM | 
            Condition   Codes CC 
             | 
          
          
            Program Counter PC 
             | 
          
               
                
    
      
           
    Condition Code Register
    
    
      
        
          B7 
           | 
          B6 
           | 
          B5 
           | 
          B4 
           | 
          B3 
           | 
          B2 
           | 
          B1 
           | 
          B0 
           | 
        
        
           
           | 
           
           | 
           
           | 
          H 
           | 
          N 
           | 
          Z 
           | 
          V 
           | 
          C 
           | 
        
      
    
    
    Interrupt Mask Register
    
 There are 7 hardware interrupts (IRQ0 to IRQ6) and Reset (IRQ7). Interrupts
are prioritised.
The Interrupt Mask register reflects the interrupt level that the CPU is
sevicing.
Interrupts at and below the current interrupt level are masked. The interrupt
mask will be 1 more than the
interrupt number. ie. if Interrupt IRQ0 is generated the interrupt mask register
will read 001, similarly
Interrupt IRQ6 will read 111. Interrupt 7 or Reset is not maskable.
    
The interrupt Mask bits are set on an interrupt *after* the interrupt service 
routine is called and the
 mask register and condition codes are pushed onto the stack. The mask register 
can be cleared by 
 using the Load Interrupt Mask instruction (LIM) or it may be restored by
poping the Interrupt Mask and
 Condition Code Registers on a Return from Interrupt.
     
 The Interrupt Mask bits are priority encoded so that Reset (The highest
priority interrupt) will mask all
 lower level interrupts. This is the only way that the Interrupt Mask register 
can be restored in an orderly
 manner when returning from interrupts. Ie. The lowest priority interrupt 
Mask will be the last to be restored
 in the case of nested interrupts. 
    
    
    
      
        
          
            
              B7 
               | 
              B6 
               | 
              B5 
               | 
              B4 
               | 
              B3 
               | 
              B2 
               | 
              B1 
               | 
              B0 
               | 
            
            
               
               | 
               
               | 
               
               | 
               
               | 
               
               | 
              IM2 
               | 
              IM1 
               | 
              IM0 
               |