CD2470A ,80A,90A Summary

 

 

 

1.  CD2470A Summary. 3

1.1  CD2470A Block Diagram.. 4

1.2  CD2470A Core Pin Summary. 5

1.3  CD2470A Core – Memory Interface Diagram.. 5

1.4  CD2470A Basic Memory Access Timing Idea. 5

1.5  CD2470A Register summary. 6

1.6  CD2470A Registers Bit assignment. 7

1.7  CD2470A ALU Flag Updating Summary. 9

2.  CD2470A 16 Bit DSP Description. 10

2.1 GENERAL DESCRIPTION. 10

Numeric Data Representation and Overflow. 10

Timing. 11

2.2  ALU. 11

2.3  Barrel Shifter / Normalizer. 12

2.4  Multiplier. 12

2.5  Data Registers. 13

Numeric Registers. 13

Status Register 14

Program Counter 15

Other function Registers. 15

2.6  Memories. 17

2.7  Address Pointer Registers. 18

Indirect Addressing. 18

Stacks. 19

2.8  Conditional Instructions. 19

2.9  System Functions. 20

Interrupts. 20

Reset 21

Clock Control 21

3.  CD2470A Instruction Details. 23

3.1  CD2470A Instruction summary Table. 23

3.2 CD2470A Instruction condensed code table. 26

4.  Appendix. 27

4.1 Cyclic Buffer. 27

MODE 0. 28

MODE 1. 29

MODE 2. 30

4.2 Quick Do Loop. 30

Full Software Solution. 30

Repeat Operation. 31

Macro. 32

Nesting. 32

4.3 Round off. 32

General idea about the Round off. 32

Instructions for Round off. 33

4.4 Long Word Multiplication. 33

4.5 Bit Stream Data Read/Write. 35

RAM Pointer bit assignment 36

Bit stream reading. 36

Bit stream writing. 37

4.6 Variable Length Code. 37

Huffman Decoder 37

4.7 Table Look Up. 40

4.8 Find Min/Max Value. 42

 

 

 


 

1.  CD2470A Summary

 

 

This manual presents a comprehensive description of the Clarkspur’s 16 bit Fixed point DSP CD2470A Core. The 24 bit version (CD2480A) and the 32 bit version (CD2490A) of the DSP’s are also referred in this manual as the derivatives of the CD2470A. Actually, the CD2480A,90A have the same instruction set / architecture as that of CD2470A, except that the CD2480A,90A have different data register/memory bit width.

 

 

- Summary of the CD2470A -

 

·        16 bit Fixed point DSP with strong double word instructions.

·        Compatible with standard single port clocked memory IP.

·        Four sets of double word accumulator.

·        Strong barrel shifter / normalizer.

·        Bit stream data handling.

·        Variable length code handling (e.g. Huffman code).

·        Table look up capability.

·        Instruction compatibility with higher precision DSP (CD2480A, CD2490A).

·        No pipeline latency.

·        6% code space reserved for custom instructions.

·        Verilog HDL Synthesizable design with visualized block diagrams.

·        Communication port with outside Host hardware.

·        70MIPS for common 90nm Xilinx, Altera FPGA chips.

 

 

 

 

 

CD2470A

CD2480A

CD2490A

Data bit width

16

24

32

Instruction bit width

16

16

16

Memory

Program 64Kx16

Data0 64Kx16

Data1 64Kx16

Program 16Mx16

Data0 16Mx24

Data1 16Mx24

Program 4Gx16

Data0 4Gx32

Data1 4Gx32

 


 

 

1.1  CD2470A Block Diagram

 

 

 

 


 

1.2  CD2470A Core Pin Summary

 

 

 

1.3  CD2470A Core – Memory Interface Diagram

 

 

 

1.4  CD2470A Basic Memory Access Timing Idea 

 

 

1.5  CD2470A Register summary

 

 

 

 

1.6  CD2470A Registers Bit assignment

 

 

AL/AL0 (Accumulator-Low Register)  Register Address=0

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

A/AH/AH0 (Accumulator-High Register, Single word Accumulator)  Register Address=1

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

AL1/TL (Shadow Accumulator-Low Register)  Register Address=2

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

AH1/TH (Shadow Accumulator-High Register)  Register Address=3

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

AL2 (General Purpose-Low Register)  Register Address=4

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

AH2 (General Purpose -High Register)  Register Address=5

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

PL/AL3 (Product-Low Register)  Register Address=6

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

PH/AH3 (Product-High Register)  Register Address=7

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

X (MPYer input Register)  Register Address=8

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

Y (MPYer input Register)  Register Address=9

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

ST (Status Register)  Register Address=10

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

N

OV

Z

CY

RZ

OP

IEx

IE2

IE1

IE0

PRL

 

 

PC (Program Counter)  Register Address=11

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

SP (Stack Pointer Register)  Register Address=12

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

P15

P14

P13

P12

P11

P10

P9

P8

P7

P6

P5

P4

P3

P2

P1

P0

 

 

BF (PC I/F Buffer Register)  Register Address=13 (SELBF=1)

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

RC (Repeat Counter Register)  Register Address=13 (SELBF=0)

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

n7

n6

n5

n4

n3

n2

n1

n0

m7

m6

m5

m4

m3

m2

m1

m0

 

 

TR (Temporary  Register or  Barrel Shifting bit counter Register [Lower 6 bit])  Register Address=14

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

T15

T14

T13

T12

T11

T10

T9

T8

T7

T6

T5

T4

T3

T2

T1

T0

 

 

PM (Temporary  Register or  Pointer Modifier Register [Lower 9 bit], Guard bit Register[15-9])  Register Address=15

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

M15

M14

M13

M12

M11

M10

M9

M8

M7

M6

M5

M4

M3

M2

M1

M0

 

 

 

1.7  CD2470A ALU Flag Updating Summary

 

 

Instructions

 

W

C

 

N

OV

Z

CY

RZ

 

 

 

 

 

 

 

 

 

 

 

 

ADSI Rij,#sSImm

 

1

1

 

 

 

 

 

 

 

Aop A,RAM,w

 

1

1

 

 

 

 

 

 

 

Aop Ad,As,(Rij),w

 

1

1

 

 

 

 

 

 

 

Aop Ad,As,Rij

 

1

1

 

 

 

 

 

 

 

Aop Ad,As,S,w

 

1

1

 

 

 

 

 

 

 

AopI Ad,As,#Imm

 

2

2

 

 

 

 

 

 

 

AopSI As,#Simm

 

1

1

 

 

 

 

 

 

 

BRA cond

 

2

2

 

 

 

 

 

 

   * DRZi,NDRZi conditions only

CALL cond

 

2

2

 

 

 

 

 

 

 

INC (Rij) / DEC (Rij),cond

 

1

2

 

 

 

 

 

 

 

INC S / DEC S,cond

 

1

2

 

 

 

 

 

 

 

LD A,RAM,w / LD RAM,A,w

 

1

1

 

 

 

 

 

 

 

LD D,(Rij),w / LD (Rij),S,w

 

1

1

 

 

 

 

 

 

 

LD D,(Rij)p / LD (Rij)p,S

 

1

3/4

 

 

 

 

 

 

 

LD D,Rij / LD Rij,S

 

1

1

 

 

 

 

 

 

 

LD D,S,w

 

1

1

 

 

 

 

 

 

 

LD Rij,RAM / LD RAM,Rij

 

1

1

 

 

 

 

 

 

 

LDI D,#Imm

 

2

2

 

 

 

 

 

 

 

LDI Rij,#Imm

 

2

2

 

 

 

 

 

 

 

LDI (Rij),#Imm

 

2

2

 

 

 

 

 

 

 

LDSI D,#SImm

 

1

1

 

 

 

 

 

 

 

Lop A,RAM

 

1

1

 

 

 

 

 

 

 

Lop Ad,As,(Rij)

 

1

1

 

 

 

 

 

 

 

Lop Ad,As,S

 

1

1

 

 

 

 

 

 

 

LopI Ad,As,#Imm

 

2

2

 

 

 

 

 

 

 

LopSI As,#Simm

 

1

1

 

 

 

 

 

 

 

MODA OpA,cond,w

 

1

1

 

 

 

 

 

 

    *  Z only: cmpl    CY,Z only: lstr, rotl,rotr,

       lsl,lsr    CY,OV,Z,N: astr,addc,ro

MODB Ri,n

 

1

1

 

 

 

 

 

 

 

MODF

 

1

1

 

 

 

 

 

 

   *  setcy,rescy

MODR Rj,Ri

 

1

1

 

 

 

 

 

 

 

MPYx (Rj),(Ri)

 

1

1

 

 

 

 

 

 

   * MPYS, MPYA, MPYAU only

MPYx S,(Ri)

 

1

1

 

 

 

 

 

 

   * MPYS, MPYA, MPYAU only

MV (Rj),(Ri) / MV (Ri),(Rj)

 

1

1

 

 

 

 

 

 

 

NORM As,w

 

1

1

 

 

 

 

 

 

   * When actual shift takes place

PUSH S,w / POP D,w

 

1

1/2

 

 

 

 

 

 

 

RET

 

1

3

 

 

 

 

 

 

 

SHA As,w

 

1

1

 

 

 

 

 

 

 

SHL As,S,mode

 

1

1

 

 

 

 

 

 

 

SHL As,w

 

1

1

 

 

 

 

 

 

 

SWAP A,S,cond,w

 

1

1

 

 

 

 

 

 

 

TST (Rij)

 

1

1

 

 

 

 

 

 

 

TST S

 

1

1

 

 

 

 

 

 

 

VLCD (Ri)

 

1

1

 

 

 

 

 

 

 

 

W : Word count   C : Execution cycle count  

 

2.  CD2470A 16 Bit DSP Description

 

 

2.1 GENERAL DESCRIPTION

 

Numeric Data Representation and Overflow

 

The basic precision of the CD2470A is 16 bits. While the instructions and program memory have 16-bits data width, the D-Bus and the data path elements, including data memory, can all have a fixed-point precision N selected between 16 and 32 bits in CD24XX architecture. The CD2470A employs 16 as the N. The accumulator, A-Bus and the total product P bus have a corresponding precision of 32 (2N) bits.

 

Data is represented in two’s complement form with an implicit binary point to the right of the sign bit which is the most significant bit (MSB). Implicit values range between

 

+1.0-2-N+1 and -1.0.

 

It is common in digital signal processors to model a clipping or limiting of signal that happens in analog components, so that it keeps system stability even when numerical data handling encounters with overflows. This limiting of numeric overflow is provided in the CD2470A by either Overflow Protection (OP flag) or by executing the MOD sat instruction. Either positive or negative overflow in the AH register will result in substituting the corresponding positive or negative full-scale value in AH or AW(AH/AL), when the OP flag is on or the MOD sat instruction is executed. For example, 0x7FFF will be substituted upon a positive overflow and 0x8000 on a negative overflow. Note that this protection applies to both AH and AL when the MOD sat takes place in double word mode.  MOD sat instruction checks OVF flag and N Flag to determine if it should replace current AH or AW value with the full-scale value, even when the OP Flag is set to “0”.  If the OP flag is set to “1”, this limiting process takes place automatically at every arithmetic instruction execution.

 

As commonly seen in FIR filter computation, numeric overflow in the accumulator may not necessarily lead to incorrect results if the final total value of a summation lies within the range that can be represented in the accumulator. Thus, the overflow protection is optional and may be through Guard bit register checking when it is needed in successive MPY-ADD operations. Using overflow protection blindly may reduce the dynamic range of the computation unnecessarily.

 

 

Timing

 

The CD2470A uses a conventional three-step instruction sequence: fetch instructions, decode instructions and fetch operands, and finally execute the instructions. This sequence is normally invisible to the user as long as  the instruction does not set a new value to the PC (program counter). A dummy machine cycle is automatically inserted to restore the pipeline, if the PC is altered with a data transfer instruction.

 

Most of the CD2470A instructions are executed in one clock cycle (counted with the CKOUT). Obvious exceptions are two word instructions with immediate data fields and the branch/subroutine call with a PRAM address field. Other exceptions are INCrement/DECrement instructions that also take two clock cycles. Three clock cycles are taken when three-word long immediate (CD2480A,90A) instruction is executed, or the program memory is read indirectly. Four cycles are consumed when the program memory is written back indirectly. Stack related instructions such as a return from a subroutine or a load memory to a register using the stack pointer takes an additional clock cycle for the SP pre-incrementing. Whenever the PC is a destination register, an additional dummy cycle is inserted to allow the instruction pipeline to refill.

 

These variations in instruction execution are normally invisible to users because the operation can be considered complete at the end of its last clock cycle of an instruction execution. The only exception is on the CD2470A multiplier. The product for one pair of X, Y input register will be available on PW only after one cycle(next cycle) of Y (and/or X) updating cycle.

 

 

2.2  ALU

 

The arithmetic and logical operations in the CD2470A are accomplished with a full function 32-bit AU (Arithmetic Unit) and a 16-bit LU (Logical Unit). These units work either as a 32-bit AU or 16-bit ALU depending on the double word options in the instructions. The unit has two input ports a and b that are connected to one of AH|AL registers, D-Bus or Multiplier output.

 

Most operations are for 16-bit data and operate only on one of the 16-bit accumulator registers AHi. However, the MPY(multiply) and the double word AU instructions work both AHi and ALi at a time. The MOD OpA instruction works with either single-word 16-bit or double-word 32-bit. The results of the ALU operation control the N, OV, Z and CY flags in the status register ST that is available immediately (Actual ALU operations in these instructions take place in one cycle after the current execution cycle, though users do not feel it.). For example, users modify AH with one ALU instruction, and he or she can utilize the result at next instruction. No dummy wait cycle is necessary at all.

 

 

2.3  Barrel Shifter / Normalizer

 

The CD2470A comes with a +/- 31 bit one-cycle barrel shifter.  Only AHi|ALi (or AHi) registers can be modified with special Arithmetic or Logical shift instructions in one cycle. If the shift bit count is set as “0” in a shift instruction, the lower 6 bit of  “TR” register will be referred as the shift count in two’s complement number. A positive number of the shift count means right shift toward LSB, and a negative number to mean left shift toward MSB.  A sign bit is extended when an arithmetic right shift takes place, whereas zeros are filled on MSB side in a logical shift. Zeros are shifted-in from LSB side when a left shift takes place.

The CD2470A offers two cycle normalizer by introducing NORM instruction. The NORM instruction detects how many bit counts can you shift the AHi|ALi (or AHi) toward MSB (left) without having an overflow, and sets the TR register with such bit count in one cycle. You can actually shift the AHi|ALi (or AHi) by the bit count in the same cycle as an option of NORM instruction. These operations may include guard bit register as a part of Accumulator. You can normalize the AHi|ALi(or AHi) contents with NORM instruction so that the AHi|ALi (or AHi) will maintain the best accuracy for a coming usage.

 

2.4  Multiplier

 

The multiplier takes either two 16-bit signed (or unsigned) input or a pair of one signed and one unsigned number data in the X and Y registers. The multiplier produces either a signed 31-bit product (sign + 30 bit + “0”) in the 32-bit of the PH and PL data or 32-bit product (sign + 31 bit).  The MSB is the sign bit with the implicit binary point to its right. The LSB of PL is filled with zero when two signed numbers are fed into X and Y. For the case of -1.0 x -1.0 (0x8000 x 0x8000) in two signed input mode, the result is the number nearest to +1.0 that can be represented in the form of 0x7FFFFFFF(+0.9999999995).

 

This multiplier accepts one or two new operands to produce a new product every instruction clock cycle. The multiplier needs one clock cycle to get a product (one cycle latency). When you set a pair of input data on to the X and Y registers, the product of the pair will be available on the PH|PL (PW) registers at the next clock cycle. This means an pre-existing PH|PL value is taken for the MPYA. MPYS addition/subtraction (Not the new X*Y result). The new product reflecting current X,Y value will be available on the PH|PL from next clock (instruction) cycle.

 

Normally the multiplier operation starts with a loading of the Y register, either by means of a multiply instruction or the result of a load instruction to Y. Loading to X register does not initiate the multiplication. Once the Y register is written at some cycle, the multiplier starts its operation to write the new product data onto the PH|PL register, regardless of whether the pre-existing data in the PH|PL has been read out or not. This operation is automatic, and the current product data in the PH|PL must be used on or before the cycle that is next to Y register is written with a new data.

If the output product registers PH or PL are loaded from the D bus at the same time that a multiply operation is completed, the D-bus loading value has the precedence.

 

 

 

2.5  Data Registers

 

The CD2470A core has sixteen of essential 16-bit read/write data registers. Eight out of sixteen registers are general purpose registers used for data handling that may be used as eight individual registers or four double word registers. Other eight registers have dedicated functions like PC, SP, ST etc. See following table.

 

 

Numeric Registers

 

AH0(AH) and AL0(AL) Registers. The high and low 16-bit halves of the accumulator. Only double-word operations treat them together as a 32-bit register AW. Single-word operations treat AH and AL as two separate registers. AH or AW registers are implicit Accumulators where no accumulator selection is explicitly made on a instruction. These registers are the most convenient registers in the CD2470A resource and deemed as an “accumulator”.

 

AH1(TH) and AL1(TL) Registers. Second accumulator used either in two 16 bit registers or one 32bit long register (TW). Most of the ALU instructions specify this register as an alternative accumulator.

 

AH2 and AL2 Registers. Third accumulator used either in two 16 bit registers or one 32bit long register (BW). Most of the ALU instructions can specify this register as an alternative accumulator.

 

AH3(PH) and AL3(PL) Registers. The high and low 16-bit halves of the multiplier product. These may be used as temporary general-purpose data registers but the operation of the multiplier should be fully understood with regard to timing and precedence.  These registers may act as the fourth accumulator used either in two 16 bit registers or one 32 bit long register (PW). Most of the ALU instructions can specify this register as an alternative accumulator.

 

Status Register

 

ST Register. The 16-bit Status register contains not only the five condition flags, but the flags for interrupt controls, overflow protection mode, and address pointer loop options. The N,OV,Z, and CY flags are updated at one cycle after the related ALU instruction is executed. However, the next instruction of the related ALU instruction can utilize the resulted flag contents without feeling this internal delay. No dummy wait cycles are necessary to refer to the previous flag modification.

 

Status Register

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

 

N

Negative Flag. Indicates a negative ALU result for arithmetic operations. If an overflow occurs at the ALU operation, N flag does not show a correct sign.

 

OV

OVerflow Flag. Indicates the ALU result has an arithmetic overflow. This flag represents overflows only for single or double word operation. No guard bit register overflow is detected.

 

Z

Zero Flag. Indicates the ALU result is zero. Checks either single or double word full ALU output bits depending on the double word options

 

CY

Carry/Borrow Flag. Indicates a carry/borrow from the MSB of the ALU. CY holds shifted out bits from MSB or LSB at SHA, SHL instructions.

 

RZ

Pointer Register Zero Flag. Indicates if an updated address pointer register Rj or Ri whichever updated turned out to be zero.  If both Rj and/or Ri pointers are modified at a time, Ri modification has the precedence for RZ update. If No modification option was taken, RZ is not updated.

 

OP

Automatic Overflow Protection. Any ADD/SUB instruction except INC/DEC non AH instructions makes the AH value overflow protected automatically, if this flag is on.

 

IEx

Interrupt Enable. Enables external interrupts. If this flag is off, no interrupts are acknowledged. This flag goes off when any interrupt is acknowledged and goes on when "MODF ie" is executed or IE2,IE1,IE0 are rewritten. Actual effect of the flag updating appears after one cycle of such IEx modification takes place.

 

IE2,1,0

Interrupt Priority. Sets the acceptable external interrupt level.

 

PRL

Pointer Register Loop. Determines if the address pointer looping mode and the size of the loop.

Loop Size

PRL:ST[5:0]

No Looping

00x000

4

00x001

8

00x010

16

00x011

32

00x100

64

00x101

128

00x110

256

00x111

64P

01pppp

2Q

1qqqqq

 

 

 

 

 

 

 

 

 

 

 

 

Where

P:(pppp)={1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,0*}

Q:(qqqqq)={1,2,3,4,5,6,7,8,9,0,A,B,C,D,E,F,

                     10,11,12,13,14,15,16,17,18,19,1A,1B,1C,1D,1E,1F,0**}

*:counted as ‘h10    **: counted as ‘h20

 

 

 

The MODF (Modify Flags instruction) takes only one clock cycle to modify some frequently used flags.

 

 Program Counter

 

PC Register. The 16-bit program counter designates current program memory addresses. Loading this register with a new data always introduces a one clock cycle NOP delay to restore the instruction pipeline. The PC is treated as just one of the general purpose registers in the CD2470A.

 

Other function Registers

 

BF/RC Register. This register function is switched between BF and RC by a special input pin (SELBF), normally controlled by a HOST CPU. When the SELBF is held high, the BF/RC register is assigned for a 16 bit parallel IO port dedicated for the host interface in emulator control. This port may also be used to establish an on-the-fly communication between the PC host and the CD2470A DSP. Whereas, the RC function (Repeat Counter) is selected if the SELBF is held low. The RC register stands for the Repeat Counter register. A set of two eight-bit numbers is written onto this register to start a repeat operation. No specific instructions for the repeat operation exist.

 

The CD2470A gives a loop repeat function. Loading an arbitrary number onto the RC initiates the repeat operation. A 16 bit number n(8bit)|m(8bit) writing onto RC register initiates a loop repeat operation of next n+1 clock cycles worth instructions for m+1 loops. The clock cycle count must be matched with the instruction execution cycles precisely. The value set to RC does not change while and after the repeat operation. Interrupts are inhibited during the repeat operation (but memorized).

 

 

BF (PC I/F Buffer Register: Address C when SELBF=1)

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

 

RC (Repeat Counter Register: Address C when SELBF=0)

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

n7

n6

n5

n4

n3

n2

n1

n0

m7

m6

m5

m4

m3

m2

m1

m0

 

 

X and Y Registers. The X, Y registers are assigned as a pair of multiplier input registers. When Y register is written with any instruction (even INC/DEC Y does.), the PW(PH|PL) will be updated with the X*Y at next cycle. PW does not change with the X register setting.

 

SP Register. The SP is assigned as the only Stack Pointer in the CD2470A. The Stack is implicitly made on the RAM0. The stack is used for interrupt, subroutine call return address storage, and simple data storage through POP/PUSH instructions.

 

SP (Stack Pointer Register)

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

P15

P14

P13

P12

P11

P10

P9

P8

P7

P6

P5

P4

P3

P2

P1

P0

 

 

TR Register. The TR is used for barrel shifting, normalization and MODB, VLCD instructions. This register may be used as additional temporary register, too.

 

Lower six bits of the TR register are applicable for the NORM instruction and SHA, SHL barrel shifting instructions. Other bits on the TR register are simply read/writable registers. SHA, SHL instructions ignore the bits other than lower six bits of the TR register when referred.

 

 

 

TR (Temporary Register or Barrel Shifting bit counter Register [Lower 6 bit])

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

T15

T14

T13

T12

T11

T10

T9

T8

T7

T6

T5

T4

T3

T2

T1

T0

 

 

PM Register. The PM register is used for RAM pointer modification and Overflow counter. The Lower 9 bits as a Pointer Modifier, and the upper 7 bits as a Guard bit register (G). Both PM and G are 2's complement signed numbers. This register may be used as an additional temporary register, too. The Guard bit register works only with ADD/SUB/MPY/SHA for A (or AW) register. MLD and MOD cla instructions clear the Guard bit register. The CMPB, MAX, MIN instructions modify PM register implicitly.

 

PM (Temporary  Register or  Pointer Modifier Register [Lower 9 bit], Guard bit Register[15-9])

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

M15

M14

M13

M12

M11

M10

M9

M8

M7

M6

M5

M4

M3

M2

M1

M0

 

 

 

 

2.6  Memories

 

There are three operational memories available in the CD2470A: the data memories RAM0 and RAM1 (16bit x 64Kw each) and the program memory PRAM (16bit x 64Kw).  All the memories should have a simple synchronous clocked RAM architecture to get the best performance. Common clocked RAM IP available in most of IC vendors work fine with the CD2470A. The address and read/write control signals are available at the beginning of a RAM clock cycle (rising edge of a clock). A read out data from the RAM is to be available before the end of RAM clock cycle. A writing datum is available at the end of RAM cycle (rising edge of next clock).

 

The bottom 512 words of the data memories are directly addressable with the 10-bit DRAM address field in several instructions. The full 64K address spaces are indirectly addressable through the corresponding 16-bit address pointer registers. The Ri pointer registers (R0-R3) address RAM0 and the Rj pointer registers (R4-R7) address RAM1. The RAM addresses may be used as IO addresses (Mapped I/O), by decoding the specific addresses to generate necessary WAIT cycle for slow IO devices.

 

When the program memory PRAM is used for data storage, its full 64K address space can be addressed indirectly using any of the address pointer registers Rij (R0-R7).

 

Stack pointer register SP can be used to form an arbitrarily placed LIFO stack in RAM0. It is important to note that the stack is used not only for subroutine call and interrupts, but also as a data storage through POP, PUSH instructions.

 

 

2.7  Address Pointer Registers

Indirect Addressing

 

The eight address pointer registers R0-R7 generate indirect addresses for the data and program memories. The Ri registers, R0-R3, address RAM0 and PRAM while the Rj registers, R4-R7, address RAM1 and PRAM. By loading the pointer registers with short or long immediate data instructions, data transfer instructions set indirect addresses to the pointers. The pointer registers may be modified with the following modification options after such pointers are referred in an instruction.

 

Symbol

Pointer Modification

Rij

No modification

Rij+

+1 with the Looping boundary option.

Rij-

-1 with the Looping boundary option.

Rij+!

Add 9 bit signed integer number (+255~-256)  in PM register.

 

Address pointer modification with looping options allows no-overhead circular buffers that are useful in digital signal processing. The loop sizes for each DRAM are selected in the PRL field of the Status register. The loop sizes may be specified in this field in one of three different ways as follows.

 

Mode 0

This mode places address boundaries at every 2N addresses on the entire data memory. ( N = 2 to 8.) . A circular buffer starting at every 2N address and having a size of 2N is set. Only +1, -1 modification or +PM less than the loop size are allowed.

 

Mode 1

This mode places address boundaries at every 1024 addresses on the entire data memory. A circular buffer starting at every 1024 addresses and having a size of 64(P+1) is set. The cyclic buffer is made only between 1024n and 1024n+64(P+1) where n={0,1…} and P={0,1…14}. Only +1, -1 modification is allowed.

 

Mode 2

This mode places address boundaries at every 64 addresses on the whole data memory. A circular buffer starting at every 64 address and having a size of 2(Q+1) is set. The cyclic buffer is made only between 64n and 64n+2(Q+1) where n={0,1…} and Q={0,1…30}. Only +1, -1 modification is allowed.

 

 


Stacks

 

A Stack is made on the RAM0 (RAM1 and RAM0 in double word operation). The register SP is used as LIFO stack pointer. It is incremented before the address is accessed when executing the POP Reg instruction. This popping of data from a downward growing stack is done only with the POP Reg instruction. The SP is automatically incremented by one ignoring looping boundary setup when the STACK is accessed.

 

The program counter is pushed onto the stack by interrupts, reset, and subroutine calls. Any looping boundaries will not affect the stack operation on interrupt, reset, subroutine calls or returns. They are automatically treated as no looping operations regardless of PRL field setting in ST register.

 

 

2.8  Conditional Instructions

 

 

The CD2470A has several conditional instructions. These include CALL, BRA, INC, DEC, MODA, SWAP. Testable conditions in these conditional instructions are listed in the following table.

 

 

 

Four ALU related flags (N, OV, Z, CY), one RAM pointer modifier Zero detection flag (RZ) and two user input pin (USR1,USR0) are testable with the conditional instructions if they are either “1” or “0”. As the flags are virtually modified in the current instruction cycle(s), next instruction (right after a flag changing instruction is executed) can test these flags immediately. No pipeline propagation  waiting for flag testing is necessary in the CD2470A. The ALU related flags are modified with any arithmetic and logical instructions like ADD, SUB, CMP, AND, OR, EOR, INC, DEC, MODA, some MPY and MODF. The RZ flag is updated if either Rj or Ri is updated in an instruction. Ri modification has the precedence in RZ flag update if both Rj,Ri are updated at a time. The USR pins are directly testable as conditions. This feature is reserved for user enhancement when user specific instructions are added.

 

 

When a conditional instruction includes Rij pointer update operation, it updates Rij even if the condition is not met.

 

There is an extended set of conditions that are referenced only by two instructions: BRA and CALL. These extended conditions are shown in the following table.

 

 

 

Bit 3 of the code determines positive(Bit3=1) or negative(Bit3=0) condition.

* R0~7 is decremented regardless of the condition matching.

 

 

2.9  System Functions

 

 

System functions include signals for system reset, interrupts, user input/output and clock control along with their associated instructions

 

Interrupts

 

The CD2470A comes with seven external interrupt request signal pins. INT1, INT2 ….. INT7 are positive-going edge triggered inputs. They must remain high until the processor acknowledges the request. Necessary time for such acknowledgement (interrupt response time) is one clock cycle plus the number of clock cycles for the current instruction. That is for a minimum of two instruction clock cycles or a maximum of five clock cycles. (Interrupts are suspended while Repeat operation is taking place.)

 

Each interrupt pin has its own level of priority with INT1 being the highest and INT7 being the lowest. Interrupts are enabled by IEx flag in the Status register with a priority level determined by the IE[2:0] bits. Changes in IE[2:0] or IEx are delayed by one instruction clock cycle to allow completion of the changing instruction and avoid unnecessary nesting of interrupt service routines. Once an interrupt request has been acknowledged, all further interrupts are disabled automatically with IEx being cleared. Interrupts may be re-enabled within the interrupt service routine to accept further interrupts.

 

Interrupt vectors are stored in the highest seven program memory addresses as follows. Once an interrupt is acknowledged, the service routine address of the corresponding interrupt is read out from these addresses, and the controls are moved to the service programs.

 

Interrupt

Priority

Interrupt Vector Address

INT1

High

0xFFF9

INT2

 

0xFFFA

INT3

 

0xFFFB

INT4

 

0xFFFC

INT5

 

0xFFFD

INT6

 

0xFFFE

INT7

Low

0xFFFF

 

 

 

 

 

Asserting any of the interrupts (rising edge) or Reset takes the processor out of the Sleep mode invoked by the MODF sleep instruction. These logics are awake even at the Sleep mode.

 

Reset

 

The system Reset signal RES is an asynchronous input that initiates a non-maskable interrupt to a reset service routine. It must remain asserted for minimum of two clock cycles. When released, the CD2470A starts at the address given at program memory address 0xFFF8.

 

Interrupt

Priority

Reset Vector Address

RESb

Non-Maskable

0xFFF8

 

 

All existing interrupt requests are cleared and like other interrupts it initially disables all future interrupts. The system Reset clears IEx flag, but all other processor conditions remain unchanged. The Reset will take the processor out of the Sleep mode.

 

 

Clock Control

 

There are five control signals for the system clock on the CD2470A core processor:

 

SLEEP

This asynchronous input signal stops the internal processor clock when asserted. Processor operation resumes when this signal goes low again. The internal processor clock may also be stopped with the Sleep mode invoked by the MODF instruction and released from the sleep mode by an interrupt or system Reset. These two means of reducing power consumption work independently. Either condition will stop the internal clock and both must be released for normal operation.

 

 

CKEN

Ready or CKEN. A synchronous input control of the internal processor clock to extend the processor instruction clock cycle for memory access or input/output. Must be stable before and during CKIN is high.

 

 

CKIN

The system clock input.

 

 

CKOUT

The internal clock output. It remains low when the processor is in sleep mode. May be used as an internal clock monitor.

 

 

(M1)*

Shows the first cycle of an instruction execution.

 

 

 

* Available at some release only.


3.  CD2470A Instruction Details

3.1  CD2470A Instruction summary Table

 

Table 1  CD2470A Instruction summary    

 

Data Transfer

ADD/SUB/CMP

AND/OR/EOR

Pointer data Transfer

Pointer Modifier

 

 

 

 

 

 

 

 

Single

Word

LD D,S,x

LD S,D,x

Aop Ads,S

Lop Ads,S

LD Rij,RAM

LD RAM,Rij

 

PUSH x

POP x

 

 

 

 

 

LD A,RAM

LD RAM,A

Aop A,RAM

 

 

 

 

LD D,(Rij),x

LD (Rij),S,x

Aop Ads,(Rij)

Lop Ads,(Rij)

 

 

 

LD D,(Rij)p

LD (Rij)p,S

 

 

 

 

 

LD D,Rij

LD Rij,S

Aop Ads,Rij

 

LD D,Rij

LD Rij,S

 

MV (Rj),(Ri)

MV (Ri),(Ri)

 

 

 

 

MODR Rj,Ri

 

 

 

 

 

 

 

LDI D,#Imm

LDI (Rij),#Imm

Aop Ads,#Imm

Lop Ads,#Imm

LDI Rij,#Imm

 

 

LDI D,#Simm

 

Aop Ad,#SImm

Lop Ads,#SImm

 

 

ADSI Rij,#Simm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SWAP A,D

 

 

 

 

 

 

Double

Word

LD Dw,Sw,x

LD Sw,Dw,x

Aop Awds,Sw

 

 

 

 

PUSHw x

POPw x

 

 

 

 

 

LD Aw,RAMw

LD RAMw,Aw

Aop Aw,RAMw

 

 

 

 

LD Dw,(Rij)w,x

LD (Rij)w,Sw,x

Aop Awds,(Rij)w

 

 

 

 

SWAP Aw,Dw

 

 

 

 

 

 

 

 

 

 

Multiply

INC/DEC

TEST

MOD Acc

Special

Inst.

Machine Inst.

 

 

 

 

 

 

 

Single

Word

MPY S,(Ri)

INC/DEC S

TST S

MODA

 

MODF

MPY (Rj),(Ri)

 

 

SHA

 

BRA

 

 

 

SHL

 

 

 

INC/DEC (Rij)

TST (Rij)

NORM

 

CALL

 

 

 

 

 

RET (POP PC)

 

 

 

 

 

 

 

 

 

 

VLCD

 

 

 

 

 

MODB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Double

Word

 

INC/DEC Sw

 

MODA Awd

 

 

 

 

 

SHA Awd

 

 

 

 

 

SHL Awd

 

 

 

 

 

NORM Awd

 

 

 

 

 

 

 

 

 


 

Table 2   CD2470A Conditional instructions and Conditions to check

 

BRA, CALL

INC/DEC

MOD Acc

SWPA

 

 Basic

Extended

 

 

 

Condition

N

ALZ

N

N

N

OV

AHZ

OV

OV

OV

Z

AWZ

Z

Z

Z

CY

EAZ

CY

CY

CY

RZ

AN

RZ

RZ

RZ

USR1

EAN

USR1

USR1

USR1

USR0

NS

USR0

USR0

USR0

ALWAYS

 

ALWAYS

ALWAYS

ALWAYS

 

DRZ0 ~ DRZ7

 

 

 

 

Table 3  CD2470A MOD, MODF operation

 

MOD OpA,cond

MODF

 

Mne.

description

Mne.

description

Group

Operation

lstr

Logical shift (TR) bit

seti7

Set INT level 7

Group0

astr

Arithmatic shift (TR)bit

seti6

Set INT level 6

rotl

Rotate left through CY

seti5

Set INT level 5

rotr

Rotate right through CY

seti4

Set INT level 4

lsl

Logical shift 1bit left

seti3

Set INT level 3

lsr

Logical shift 1bit left

seti2

Set INT level 2

 

 

seti1

Set INT level 1

 

 

nop

 

cla

Clear A

ei

Enable Int

Group1

claf

Clear A and Guard bit

di

Disable Int

addc

Add CY to A

ei1

Enable INT for only 1 cycle

ro

Roundoff

nop

 

neg

Negate

setcy

Set CY flag

Group2

cmpl

1’Complement

rescy

Reset CY flag

sat

Saturate if OVF=1

setop

Set OP flag

sata

Sat. regardless of OVF

resop

Reset OP flag

 

 

sleep

Get into sleep mode

 

 

gen1

Generate INT1

 

 

gen7

Generate INT7

 

 

nop

 

 

 


Status Register

 

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

 

 

 

 

 

 

 

 

 

 

 

 

N

Negative Flag. Indicates a negative ALU result for arithmetic operations. If an overflow occurs at the ALU operation, N flag does not show a correct sign.

 

OV

OVerflow Flag. Indicates the ALU result has an arithmetic overflow. This flag represents overflows only for single or double word operation. No guard bit register overflow is detected.

 

Z

Zero Flag. Indicates the ALU result is zero. Checks either single or double word full ALU output bits depending on the double word options

 

CY

Carry/Borrow Flag. Indicates a carry/borrow from the MSB of the ALU. CY holds shifted out bits from MSB or LSB at SHA, SHL instructions.

 

RZ

Pointer Register Zero Flag. Indicates if an updated address pointer register Rj or Ri turned out to be zero. If No modification option was taken, RZ is not updated.

 

OP

Automatic Overflow Protection. Any ADD/SUB instruction except INC/DEC non AH instructions makes the AH value overflow protected automatically.

 

IEx

Interrupt Enable. Enables external interrupts. If this flag is off, no interrupts are acknowledged. This flag goes off when any interrupt is acknowledged and goes on when MODF ei (or MODF ei1) is executed or IE2, IE1,IE0 are updated. Actual effect of the flag updating appears after one cycle of such IEx modification takes place.

 

IE 2,1,0

Interrupt Priority. Sets the acceptable external interrupt level.

 

PRL

Pointer Register Loop. Determines if the address pointer looping mode and the size of the loop.

Loop Size

PRL:ST[5:0]

No Looping

00x000

4

00x001

8

00x010

16

00x011

32

00x100

64

00x101

128

00x110

256

00x111

64P

01pppp

2Q

1qqqqq

 

 

 

 

 

 

 

 

 

 

 

 

Where

P:(pppp)={1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,0*}

Q:(qqqqq)={1,2,3,4,5,6,7,8,9,0,A,B,C,D,E,F,

                     10,11,12,13,14,15,16,17,18,19,1A,1B,1C,1D,1E,1F,0**}

*:counted as ‘h10    **: counted as ‘h20

 


 

 

3.2 CD2470A Instruction condensed code table                      

 

 

 

W

C

 

ADSI Rij,#sSImm

1

1

 

Aop A,RAM

1

1

 

Aop Ad,As,(Rij)

1

1

 

Aop Ad,As,Rij

1

1

 

Aop Ad,As,S

1

1

 

AopI Ad,As,#Imm,sh

2

2

 

AopSI As,#Simm

1

1

 

BRA addr,cond

2

2

 

CALL addr,cond

2

2

 

INC (Rij) / DEC (Rij),cond

1

2

 

INC S / DEC S,cond

1

2

 

LD A,RAM / LD RAM,A

1

1

 

LD D,(Rij),x / LD (Rij),S,x

1

1

 

LD D,(Rij)p / LD (Rij)p,S

1

3/4

 

LD D,Rij / LD Rij,S

1

1

 

LD D,S,x

1

1

 

LD Rij,RAM / LD RAM,Rij

1

1

 

LDI D,#Imm,sh

2

2

 

LDI Rij,#Imm,sh

2

2

 

LDI (Rij),#Imm,sh

2

2

 

LDSI D,#SImm

1

1

 

Lop A,RAM

1

1

 

Lop Ad,As,(Rij)

1

1

 

Lop Ad,As,S

1

1

 

LopI Ad,As,#Imm,sh

2

2

 

LopSI As,#Simm

1

1

 

MODA As,OpA,cond

1

1

 

MODB Rij,n

1

1

 

MODF Gr2,Gr1,Gr0

1

1

 

MODR Rj,Ri,lp

1

1

 

MPY (Rj),(Ri),m

1

1

 

MPYx S,(Ri),m

1

1

 

MV (Rj),(Ri) / MV (Ri),(Rj)

1

1

 

NORM As,g,exe

1

1

 

PUSH S,x / POP D,x

1

1/2

 

SHA As,shift,g

1

1

 

SHL As,S,mode

1

1

 

SHL As,shift

1

1

 

SWPA A,S,cond

1

1

 

TST (Rij),Bit

1

1

 

TST S,Bit

1

1

 

VLCD (Ri),rs

1

1

 

 


4.  Appendix

4.1 Cyclic Buffer                      

 

Signal processing usually needs a long data buffer memory where real time data is stored and updated every incoming time interval. Suppose you need to get an analog input signal sampled every 22.7uS through a 51 Tap FIR filter. You need to keep successive 51 sampled data in a memory to compute one output data. Once this computation is done, you need to add another new sampled data in the input memory and discard the oldest data so that the next output data will be computed. Every sampling period, this memory data handling takes place.

 

Address

RAMi (t=k)

RAMi (t=k+1)

 

 

 

 

 

 

 

 

 

 

 

 

   à

 

 

 

 

 

 

 

st+50

in(k-4)

 

in(k-3)

 

st+49

in(k-3)

 

in(k-2)

 

...

in(k-2)

 

in(k-1)

 

st+1

in(k-1)

 

in(k)

 

st

in(k)

 

in(k+1)

 

 

 

 

 

 

 

 

 

 

 

Fig. 4.1.1  A sequence of data should be shifted.

 

 

You can do it by shifting all the 51 data in the memory by one address up and put the new incoming data at the starting address (st) of the memory space. This procedure takes up fairly big computing resource. If we do not have a special instruction for it, it may take about 102 clock cycles that is about two times bigger time consumption compare to the filter computation itself. (See Fig 4.1.1.)

 

 

Address

RAMi (t=k)

RAMi (t=k+1)

 

 

 

 

 

 

 

 

 

 

 

 

   à

 

 

st+50

in(k-4)

 

in(k-4)

 

st+49

in(k-3)

st+50

in(k-3)

 

...

in(k-2)

st+49

in(k-2)

 

st+1

in(k-1)

...

in(k-1)

 

st

in(k)

st+1

in(k)

 

 

 

st

in(k+1)

 

 

 

 

 

 

 

 

 

 

 

Fig. 4.1.2  Starting address of the memory may be moved instead of data shifting..

 

 

There is a little smart way to avoid this heavy work in the computation. Just decrement the "st" address by 1 instead of really moving all the data, and put a newly sampled input data at the new "st" address. (Fig 4.1.2) You need a little consideration to minimize the memory usage in this scheme, since blindly decrementing the "st" address at every new sample would cause an unlimited memory usage. You set a lower and the upper boundaries of the memory usage and check if your target address is in the range between two boundaries. If your target memory address is off the range, you count the address by rolling up the low boundary to high boundary as if the boundaries were connected like a paper tape ring you made in your kindergarten days. (Fig. 4.1.3). As the memory area is utilized repeatedly in this scheme, we call it as a "cyclic buffer" here after.

 

Now, please note the cyclic buffer size does not necessarily be exactly same size as the data memory size your application may need. As long as the physical buffer size is larger than the necessary data size, it works fine. For example, if you should have 51-word memory for your input buffer, the cyclic buffer size (Upper Boundary – Lower boundary ) can be 51,52,53....

 

 

Address

RAMi (t=k)

RAMi (t=k+1)

 

 

 

 

 

 

 

 à

 

Upper Boundary (UB)

 

 

st

in(k+1)

 

 

 

 

 

 

st+50

in(k-4)

 

in(k-4)

 

st+49

in(k-3)

st+50

in(k-3)

 

...

in(k-2)

st+49

in(k-2)

 

st+1

in(k-1)

...

in(k-1)

 

st

in(k)

st+1

in(k)

Lower Boundary (LB)

 

 

 

 

 

 

 

 

 

 

Fig. 4.1.3  Set Lower/Upper boundaries to make the memory address cyclic.

 

The CD2470A has the following three different modes to set up the boundaries on the DRAM.

 

MODE 0

The boundaries are placed at the address of 2N where N={2,3,4,5,6,7,8}.  The RAM pointers are conscious of  the boundaries only when they are incremented or decremented by less than 2N. If a value bigger than 2N is added to the RAM pointer, the digits upper than n are simply added and the digits lower or equal to n will be cyclic.

 

Example 1.

Set a MODE0 with N=3 and modify the RAM pointer with the following program. The pointer value at the end of every instruction is shown on the right.

 

LDLI ST,0x000002

Set Mode1 N=3

LDSI PM,0x6

 

LDI R2,0x12AE

R2=0012AE

LD (R2+),AH

R2=0012AF

LD (R2+),AH

R2=0012A8

LD (R2+),AH

R2=0012A9

LD (R2-),AH

R2=0012A8

LD (R2-),AH

R2=0012AF

LD (R2-),AH

R2=0012AE

LD (R2-),AH

R2=0012AD

ADSI R2,0x95

R2=001332

MOD R4,R2+!

R2=001330

MOD R4,R2+!

R2=001336

LD R2,ST

R2=000002

MOD R4,R2+!

R2=000000

 

 

MODE 1

Up to 960 words of the cyclic buffer in the step of 64 words at every 1024 words memory block is defined in MODE 1. The lower boundaries in this mode are implicitly placed at the address of 1024n.The upper boundaries are placed at the address of 1024n+64(P+1) where n={0,1,2,3...} and the P={0,1,2...14}.  The RAM pointers are conscious of the boundaries only when they are incremented or decremented by one("1"). Current logic of the RAM pointer modifier checks only lower 10 bits of the pointer if it is either "0" or 64(P+1)-1. If current RAM pointer resides in the range outside of valid boundary, the operation is unpredictable.

 

Example 2.

Set a MODE1 with P=2 and modify the RAM pointer with the following program. The pointer value at the end of every instruction is shown on the right.

 

LDLI ST,0x000012

Set Mode1 P=2

LDSI PM,0x6

 

LDI R2,0x12BE

R2=0012BE

LD (R2+),AH

R2=0012BF

LD (R2+),AH

R2=001280

LD (R2+),AH

R2=001281

LD (R2-),AH

R2=001280

LD (R2-),AH

R2=0012BF

LD (R2-),AH

R2=0012BE

LD (R2-),AH

R2=0012BD

ADSI R2,0x95

R2=001352

MOD R4,R2+!

R2=001338

MOD R4,R2+!

R2=00133E

LD R2,ST

R2=000002

MOD R4,R2+!

R2=000006

 

MODE 2

Up to 62 words of the cyclic buffer in the step of 2 words at every 64 words memory block is defined in MODE 2. The lower boundaries in this mode are implicitly placed at the address of 64n.The upper boundaries are placed at the address of 64n+2(Q+1) where n={0,1,2,3...} and the Q={0,1,2...30}.  The RAM pointers are conscious of the boundaries only when they are incremented or decremented by one("1"). Current logic of the RAM pointer modifier checks only lower 6 bits of the pointer if it is either "0" or 2(Q+1)-1. If current RAM pointer resides in the range outside of valid boundary, the operation is unpredictable.

 

Example 3.

Set a MODE2 with Q=5 and modify the RAM pointer with the following program. The pointer value at the end of every instruction is shown on the right.

 

LDLI ST,0x000012

Set Mode1 P=2

LDSI PM,0x6

 

LDI R2,0x1208

R2=001208

LD (R2+),AH

R2=001209

LD (R2+),AH

R2=001200

LD (R2+),AH

R2=001201

LD (R2-),AH

R2=001200

LD (R2-),AH

R2=001209

LD (R2-),AH

R2=001208

LD (R2-),AH

R2=001207

ADSI R2,0x95

R2=0013A2

MOD R4,R2+!

R2=0013A8

MOD R4,R2+!

R2=0013AE

LD R2,ST

R2=000002

MOD R4,R2+!

R2=000008

 

 

Current MODE1, 2 limit the increment/decrement values to the loop size or less. For example, if you define the loop size of 48 in MODE2, you can choose any increment/decrement value between 1~48.

 

 

 

4.2 Quick Do Loop                      

 

 

Full Software Solution

 

You have so much chance to need quick DO LOOPs in algorithmic processes. The DO LOOPs may be composed using fully programming manner when such LOOP overhead is not very big compare to the processing time in each LOOP. You normally use this software way to realize a DO LOOP in common programming for non time constraint application. Let's see how much computing time do you need for a DO LOOP set up when you use common CPU resources. Suppose you needed to execute a small routine program 20 times. Then, your assembly program code may be like..

 

LDI  R6,19

Loop1:

....

....

  (Process)

....

....

MOD R6-,R0

BRA @LOOP1,RZ1

 

This means you are wasting 3 clock cycles each LOOP processing only for DO LOOP constitution. This 3 cycle may or may not be a big problem on your ever needing fast execution requirement. One of the worst conditions happens if the target process (Process) is only one of one cycle instructions like MPYA (Rj+),(Ri-). Then you are loosing 75% of each LOOP execution cycle time, whereas you loose only 10% if the target process (Process) has 27 cycle execution time. Good thing is that the software way is quite flexible in its resource usage. You can use any RAM pointer as a DO LOOP counter. If you do need to keep RAM pointers for another purpose, you can use another general purpose register as a Loop counter though it takes one more cycle to execute.

 

 

 

Repeat Operation

 

Those who may concern about  3 cycle overhead each DO LOOP would need another option called "repeat" instruction. This option uses a special cycle counter to count the target process (Process) execution time as well as a LOOP counter that counts the LOOP cycles. The CD2470A offers an RC register for this purpose. When two 8 bit numbers concatenated as m|n are set to RC register, the instructions next to the RC modification cycle will be executed in m+1 cycles for n+1 times. No overhead for each LOOP exists in this operation. However, you need to know the exact cycle count for the target process (Process) that is not necessarily same as the instruction count, and any interrupt asserted during the LOOP operation will not be serviced until the LOOP operation finishes. This "repeat" instruction allows you to get really no overhead LOOP operation.

 

LDI RC,m|n

....

....

  (Process)

....

....

 

 

 

Macro

 

Although the "repeat" operation seems most preferable option on your application, because of its smallest overhead, it comes with another tough problem like long interrupt response time. Seeing these pro and con in the options, we strongly recommend to use "macro" for the part where enough program memory space exists. This option will take up larger program memory space. If the target process (Process) includes only m word instructions and you need to execute only n times,  you need m۰n words program memory space in macro, whereas "repeat" instruction takes up only m words. Good thing for the macro usage is that it does not need any special hardware or any interrupt consideration to use. In MACRO, you can stop the operation at any moment and resume the operation when you need it.

 

 

Nesting

 

Only most inner loop in a multiple nesting DO Loop may be assigned to a repeat instruction. Upper DO Loop may be described in full S/W way in the CD2470A. Suppose you have 20 cycles Process in a core part that should be repeated 10 times, then change parameters to repeat the same Process for p times. You would have about 2% (4/203)computation time loss if you use S/W way for the upper loop. The program memory usage will be about 24+ words. If the CD2470A had two repeat hardware sets and allows nested "repeat", your time loss is less than 1% (1/203), and consumes 21+ word program memory. As long as the extended first order of the nesting cycle count is big enough compare to the S/W DO Loop overhead (3 words/3 cycles), no nested DO Loop hardware has good meaning. What if you got a chance to need triple nested DO Loop even two inner loop extended cycle count is not big enough compare to the S/W DO Loop overhead (3 words/3 cycles)?  You can describe the most inner DO Loop as a MACRO.

 

 

 

4.3 Round off                      

 

 

 General idea about the Round off.

 

Any digital computing carries a precision limitation problem due to limited hardware size. Suppose you have a sixteen (16) bit register and an associated arithmetic unit to compute some algorithm, you may have maximum one LSB error at some operation if you simply ignore the below-LSB part. Let's consider about a simple multiplication of two 16 bit numbers. Assuming the binary point of each number exists right next to the sign bit, you will get 31 bit result (sign bit + 30 bit). If you need only the most significant 16 bits as a result, how can you get the best accuracy in that 16 bit? Actually, as you do not know if your original numbers are derived from exact 16 bit or another bit length number but trancated, you can't say if your 17th bit in the 31 bit result has meaning or not. However, let's assume the lower 15 bit of the multiplication result has a good meaning, here. If you just truncate the lower 15 bit, the resulted 16 bit number might have maximum 1 LSB error (0LSB~1LSB), which may or may not degrade your algorithm. So, you may think about a round off. If the 17th bit is "1", you add "1" to the LSB of the 16 bit result. If the 17th bit is "0" you just take the original 16 bit as the final result. By doing so, your expected maximum error will be 0.5 LSB (-0.5LSB ~+0.5LSB). This procedure reduces the error power to one quarter. This is the Round off.

 

 

Instructions for Round off.

 

The CD2470A DSP comes in two 16 bit accumulator registers combined to form a 32 bit long word accumulator. Computations done in double precision (32 bit) instructions will use this long word accumulator in full length. Then, You may need to convert the word length to 16 bit for another operation with least error. The CD2470A has an instruction..

 

MODA ro

 

where the MSB of the lower half of the long word accumulator is added on to the LSB of the upper half of the long word accumulator. This round-off operation may need 3~4 times more computation time without this instruction. The CD2470A comes with a  "conditional" option for this instruction. This option allows you to round off the Accumulator selectively. For example, you can keep a symmetry on both plus and minus values on some kind of  computation like divide-by-2 by right shifting, without puzzling with absolute value and sign change operation.

 

Ex.

...

SHA AW,1         // Right shift AW long register by one bit   A <-- 0.5*A

MODA ro,n        // Round off AH only when AH is negative

...

 

This two cycle operation gives a symmetrical result on plus and minus values.

 

 

 

4.4 Long Word Multiplication                      

 

 

The CD2470A carries only 16 x 16 bit signed/unsigned multiplier whereas a full set of double word instructions other than the multiplication is available.  While CD2480, CD2490 DSPs are recommended for the applications where extensive long word multiplications take place, it is possible even with the CD2470A to do a double word multiplication in about 11 cycles by a program.

 

#1 If  two double word data are on RAM0, RAM1 in successive addresses, double word multiplication takes following 10 cycles.  Where u(L1) represents unsigned lower half of the input data 1, and s(H0) represents signed upper half of the input data 0 etc.. The results are stored in A0 and A2 in a form of Sign bit + 62 bit + "0"

 

MLD (L1),(L0),uu   // set two lower halves on the MPYer

LD AL2,PL          // Store the lower half of the result

SHA  PW,16         // Shift the (L1)*(L0) by 16 bit right

MPYA (L1),(H0),us  // Move the shifted result to AW and set next input data

MPYA (H1),(L0),su  // Add the (L1)*(H1) to AW  and set next input data

MPYA (H1),(H0),ss  // Add the (H1)*(L1) to AW  and set next input data

LD AH2,AL          // Store the lower half of the accumulated result

SHA  AW,15         // Shift the upper half of the result to lower half

ADD  AW,PW         // Add the (H1)*(H0) to the AW

SHA  AW2,-1        // Compensate the lower half(A2) bit position.

 

If the input data are in the form of other CD2470A double word instructions, the word locations of the data need to be modified before starting the routine above. It takes another two(2) cycles as follows.

 

LD  AW,(H1)|(L0),swap  // Exchange the data position of the (H1) and (L0)

LD  (H1)|(L0),AW       //

 

This may also be replaced with simple two MV instructions.

 

Original input data reside in RAM1 and RAM0. The upper half double words are in RAM1 and the lower half double words are in the RAM0. All the double word data are stored in this way with any double word instruction in the CD2470A, so that one cycle access to the DRAM is available through two single word data RAMs.

 

EX1.  12 cycles, 12 words

 

LD  AW,(H1)|(L0),swap  // Exchange the data position of the (H1) and (L0)

LD  (H1)|(L0),AW       //

MLD (L1),(L0),uu        // set two lower halves on the MPYer

LD AL2,PL              // Store the lower half of the result

SHA  PW,16             // Shift the (L1)*(L0) by 16 bit right

MPYA (L1),(H0),us      // Move the shifted result to AW and set

                       // next input data

MPYA (H1),(L0),su      // Add the (L1)*(H1) to AW  and set next input data

MPYA (H1),(H0),ss      // Add the (H1)*(L1) to AW  and set next input data

LD AH2,AL              // Store the lower half of the accumulated result

SHA  AW,15             // Shift the upper half of the result to lower half

ADD  AW,PW             // Add the (H1)*(H0) to the AW

SHA  A2W,-1            // Compensate the lower half(A2) bit position.

 

 

EX2. 12 cycles  12 words

 

LD PL,(L0)

MLD PL,(L1),uu

LD  AL2,PL

SHA  PW,16

MPYA (H0),(L1),su

MPYA (H1),(L0),su

LD (Temp),(H0)

MPYA (H1),(Temp),ss

LD AH2,AL

SHA  AW,16,G

ADD  AW,PW

SHA  AW2,-1

 

 

 

4.5 Bit Stream Data Read/Write                      

 

 

The CD2470A can handle double word DRAM access, with that a seamless bit streaming RAM access is available through several special instructions. Assuming a bit stream data is stored in the sequence shown in the fig 1., the word pointer Wp and the bit position pointer Bp will point the current bit position in the whole bit stream stored in the RAM1,RAM0. There are two kinds of bit counting direction. One count the bit from MSB to LSB(m=0), the other LSB to MSB(m=1). Also, there are two cases when the target bit granule(n bit) resides across the word boundary. Here, we assume the target bit granule size is equal or less than the word size. We can use Rij pointers as the Wp word pointers, and the TR[4:0] register for Bp, hereafter.

 

 

 

 

 

 

Fig 4.5.1. Bit stream data storage and construction

 

 

RAM Pointer bit assignment

         Fig 4.5.2. Bit stream pointers

 

The RAM bit address pointer register is composed of one pair of [R4,R0], [R5,R1], [R6,R2], [R7,R3], RAM pointers and LSB five bits of the TR register as shown in Fig 2.

One pair of data words {(Ri+4),(Ri)} are accessed at a time, reading or writing 32 bit long word, where the bit address on the long word is designated by the TR[4:0]. Since the contents of (Ri+4) and (Ri) are modified independently, arbitrary number (less or equal to 16) of bits from any bit-word address is readable or writable as shown in Fig 1.

 

Bit stream reading

 

A bit string having a length "n" bit is read out from a buffer memory formed in the DRAM conveniently by utilizing LD and SHL instructions.  It needs 3 or 4 cycles to read up to 16 bit bit-stream data from the memory composed in RAM1,0. A double word reading with or without word swapping is done with LD in double word mode and thereafter a couple of SHL instructions cut out the necessary part of the words as the final bit stream data onto one of Accumulators. Bit address pointers are updated with the MODB instruction for successive reading.

 

 

          Fig. 4.5.3  Bit stream reading

 

 

Bit stream writing

 

 

Writing a bit string having a length "n" to the top of existing bit stream in DRAM is conveniently programmed by utilizing SHL and LD instructions.  It needs 5 cycles to write up to 16 bit stream data to a buffer memory area composed in RAM1,0. A double word reading with or without word swapping is done with LD in double word mode and thereafter a couple of SHL instructions cut out the necessary part of the words so that the current bit stream data is patched with it. Finally the patched data is written back into the DRAM. This routine takes either 5 or 6 cycles depending on the bit streaming direction.

 

 

 

 

          Fig. 4.5.4  Bit stream writing

 

 

4.6 Variable Length Code                      

 

 

Huffman Decoder

 

Variable bit Length Coding (VLC) technique is widely used in many data compression algorithms.  The CD2470A DSP carries a special instruction to make a fast decoder for such coding techniques. The VLCD instruction (stands for Variable bit Length Code – Decoding) makes a very simple yet fast decoding with a special data reference table format.  The code table for a VLC needs to be in a standard tree structure data where two numbers are stored in a pair location of RAM1 and RAM0 words that represent two address distances on the table to next tree nodes. Each node has two branches for input code bit either "0" or "1". The VLCD instruction can handle this one bit decoding in one cycle.

 

 

 

Let's see an example of a Huffman decoder program using VLCD instruction.

 

 Suppose you had 4-bit original binary code that was encoded using VLC technique based on the code table Tab.1. This table means your original code "0000" is encoded as "1", "0001" is encoded as "010", "0010" is encoded as "000110" and so on.

 

The code table in the Tab.1 needs to be stored in the DRAM with the form like Tab.2 so that the VLCD instruction can work properly for decoding of such encoded bit stream. (The data in the Tab.2 are represented in Hex format.)

 

 

A component of the DRAM table contains two 16-bit values. For example, the Table_top + 8 address has the data H = Tt+09 and L = Tt+16.

 

These values are possible next table addresses. The next address we may need to reference is either Table_top + 9 or Table_top + 16 in this case. The values (H, L) give the absolute next table addresses to reference.

 

The Tab.2 is showing a binary tree for the Tab.1. When the first bit out of an encoded bit stream comes in, you are at the Table_top of the Tab.2. If the first bit was "1", you can see the Tab.1 to find the first original code was "0000" because all the code that begin with "1" on the encoded bit stream is "0000" only.  On the Tab.2, you read the data and get H=Tt+01 and L=Tt+02. As the first bit of the encoded bit stream was "1", you choose H as the new table address. (If the encoded bit come in was "0" you would choose L as the new address.). So, new table data you read at Tt+1 will be H=0, L=0. Now, seeing the L = "0", you find you have reached the end of decoding, and the corresponding original code is found in H, that is "0".

 

Assuming the incoming bit stream was 010..., you can trace how you decode it with the Tab.2. You go through Table top, Tt+2, Tt+3, then reaches Tt+5 that gives an original code "1".

 

 

The CD2470A DSP can process this tree searching in one cycle at every node branching as follows.

 

LD TR,encoded_bit_stream

LDI R0,@Tt  //Table Top

LDI R4,@Tt  //Table Top

LDSI RC,15

VLCD (R0+),L

LD AH0,(R4)

 

// The "repeat loop" breaks once a decoding is completed.

// Zero flag is on if decoding is completed.

// A0 will have a decoded code.

// "Repeat" continues one more cycle after a "Break" is detected.

// Once a decoding is completed, the RAM pointer content stays at the same value.

// Consumed original code bit count should be placed in part of decoded code data.

 

 

This example program executes one symbol decoding in 14 cycles in the worst case for the Tab.1 code table.

 

 

 

4.7 Table Look Up
                      

 

 

Finding a target bit pattern from a big table is one of commonly seen time consuming routines in signal processing programs. The CD2470A has a quick table look-up method where one cycle per one memory word comparison is possible. The CMPB As,(Rij+) instruction compares the RAM data with As register, then clear the Loop counter (termination of the Repeat operation)  and the Rij contents is copied to PM register, if they match. This instruction allows to find the matched memory address in the PM register with the usage of repeat function as follows.

 

 

Example:

You have a table of 62 words on RAM0.  A value “0AA6” is in AH0 register to find the address of the table having the value “0AA6”.

 

Ri       

 

 

 

 

 

 

 

st+61

74F2

st+60

98DD

 

st+23

0AA6

...

st+1

2C05

st

01A3

 

 

Fig. 4.7.1

 

 

 

LDSI PM,0

LDI R0,@st

LDSI RC,61

CMPB AH0,(R0+)

LD AH0,PM

BRA @no_match, AHZ

 

 

This program takes 7 to 68 cycles to find a matched data. The bit pattern to be searched is given in AH0, and the address of the matched data comes in AH0.

 

An alternative program that does not use CMPB instruction takes about 5 times of the cycles. See an example coding.

 

LDI R0,@st

LDI R7,62

 

LP1:

CMP AH0,(R0+)

BRA @LP2,Z

BRA @LP1,NDRZ7

BRA @no_match

 

LP2:

MODR R4,R0-

LD AH0,R0

 

 

 

4.8 Find Min/Max Value

 

 

Finding a maximum or minimum value from a big table is one of commonly seen time consuming routine in signal processing programs. The CD2470A has  specific instructions for this purpose. The MAX As,(Rij+) and MIN As,(Rij+) instructions compare the RAM data with As register to load the larger or smaller value onto As, then the Rij contents is copied to PM register, if As is updated. This instruction allows to find the Maximum or Minimum value on the table with its memory address onto the PM register.

 

 

Example:

You have a table of 200 words on RAM0.  Find a most positive value and its memory address.

 

LDI AH0,’h8000

LDI R0,@st

LDSI RC,199

MAX AH0,(R0+)

 

Ri       

 

 

 

 

 

 

 

st+199

0344

st+198

FA09

...

st+1

790F

st

A432

 

 

Fig. 4.8.1

 

 

 

This program takes 202 cycles to find the MAX value, and its memory address in PM register.

 

 

An alternative program that does not use MAX or MIN instruction takes about 3~8 times of the cycles. See an example coding.

 

 

LDI AH0,’h8000

LDI R0,@st

LDI R7,199

 

MM1:

CMP AH0,(R0+)

BRA @MM1,N

MODR R4,R0-

LD PM,R0

LD AH0,(R0+)

BRA @MM1,NRZ7-