CD2470A ,80A,90A Summary
1.3 CD2470A Core – Memory
Interface Diagram
1.4 CD2470A Basic Memory Access
Timing Idea
1.6 CD2470A Registers Bit
assignment
1.7 CD2470A ALU Flag Updating
Summary
2.
CD2470A 16 Bit DSP Description
Numeric
Data Representation and Overflow
2.3 Barrel Shifter / Normalizer
3.
CD2470A Instruction Details
3.1 CD2470A Instruction summary
Table
3.2 CD2470A Instruction
condensed code table
General idea
about the Round off.
4.5 Bit Stream Data Read/Write
This manual presents a comprehensive description of
the Clarkspur’s 16 bit Fixed point DSP CD2470A Core. The 24 bit version (CD2480A) and the 32 bit version (CD2490A) of the DSP’s are also referred in this manual as
the derivatives of the CD2470A. Actually, the CD2480A,90A have the same instruction set /
architecture as that of CD2470A, except that the CD2480A,90A have different data register/memory
bit width.
- Summary of the CD2470A -
·
16 bit Fixed point DSP with strong
double word instructions.
·
Compatible with standard single port
clocked memory IP.
·
Four sets of double word accumulator.
·
Strong barrel shifter / normalizer.
·
Bit stream data handling.
·
Variable length code handling (e.g.
Huffman code).
·
Table look up capability.
·
Instruction compatibility with higher
precision DSP (CD2480A, CD2490A).
·
No pipeline latency.
·
6% code space reserved for custom
instructions.
·
Verilog HDL Synthesizable design with
visualized block diagrams.
·
Communication port with outside Host
hardware.
·
70MIPS for common 90nm Xilinx, Altera
FPGA chips.
|
|
CD2470A |
CD2480A |
CD2490A |
|
Data bit width |
16 |
24 |
32 |
|
Instruction bit width |
16 |
16 |
16 |
|
Memory |
Program 64Kx16 Data0 64Kx16 Data1 64Kx16 |
Program 16Mx16 Data0 16Mx24 Data1 16Mx24 |
Program 4Gx16 Data0 4Gx32 Data1 4Gx32 |





AL/AL0
(Accumulator-Low Register) Register
Address=0
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
A/AH/AH0
(Accumulator-High Register, Single word Accumulator) Register Address=1
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
AL1/TL
(Shadow Accumulator-Low Register)
Register Address=2
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
AH1/TH
(Shadow Accumulator-High Register)
Register Address=3
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
AL2
(General Purpose-Low Register) Register
Address=4
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
AH2
(General Purpose -High Register)
Register Address=5
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
PL/AL3
(Product-Low Register) Register
Address=6
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
PH/AH3
(Product-High Register) Register
Address=7
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
X
(MPYer input Register) Register
Address=8
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
Y
(MPYer input Register) Register
Address=9
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
ST
(Status Register) Register Address=10
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
N |
OV |
Z |
CY |
RZ |
OP |
IEx |
IE2 |
IE1 |
IE0 |
PRL |
|||||
PC
(Program Counter) Register Address=11
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
SP
(Stack Pointer Register) Register
Address=12
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
P15 |
P14 |
P13 |
P12 |
P11 |
P10 |
P9 |
P8 |
P7 |
P6 |
P5 |
P4 |
P3 |
P2 |
P1 |
P0 |
BF
(PC I/F Buffer Register) Register
Address=13 (SELBF=1)
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
RC
(Repeat Counter Register) Register
Address=13 (SELBF=0)
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
n7 |
n6 |
n5 |
n4 |
n3 |
n2 |
n1 |
n0 |
m7 |
m6 |
m5 |
m4 |
m3 |
m2 |
m1 |
m0 |
TR
(Temporary Register or Barrel Shifting bit counter Register [Lower
6 bit]) Register Address=14
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
T15 |
T14 |
T13 |
T12 |
T11 |
T10 |
T9 |
T8 |
T7 |
T6 |
T5 |
T4 |
T3 |
T2 |
T1 |
T0 |
PM
(Temporary Register or Pointer Modifier Register [Lower 9 bit], Guard bit Register[15-9]) Register
Address=15
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
M15 |
M14 |
M13 |
M12 |
M11 |
M10 |
M9 |
M8 |
M7 |
M6 |
M5 |
M4 |
M3 |
M2 |
M1 |
M0 |
|
Instructions |
|
W |
C |
|
N |
OV |
Z |
CY |
RZ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ADSI
Rij,#sSImm |
|
1 |
1 |
|
|
|
|
|
|
|
|
Aop
A,RAM,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
Aop
Ad,As,(Rij),w |
|
1 |
1 |
|
|
|
|
|
|
|
|
Aop
Ad,As,Rij |
|
1 |
1 |
|
|
|
|
|
|
|
|
Aop
Ad,As,S,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
AopI
Ad,As,#Imm |
|
2 |
2 |
|
|
|
|
|
|
|
|
AopSI
As,#Simm |
|
1 |
1 |
|
|
|
|
|
|
|
|
BRA
cond |
|
2 |
2 |
|
|
|
|
|
|
* DRZi,NDRZi conditions only |
|
CALL
cond |
|
2 |
2 |
|
|
|
|
|
|
|
|
INC
(Rij) / DEC (Rij),cond |
|
1 |
2 |
|
|
|
|
|
|
|
|
INC
S / DEC S,cond |
|
1 |
2 |
|
|
|
|
|
|
|
|
LD
A,RAM,w / LD RAM,A,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
LD
D,(Rij),w / LD (Rij),S,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
LD
D,(Rij)p / LD (Rij)p,S |
|
1 |
3/4 |
|
|
|
|
|
|
|
|
LD
D,Rij / LD Rij,S |
|
1 |
1 |
|
|
|
|
|
|
|
|
LD
D,S,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
LD
Rij,RAM / LD RAM,Rij |
|
1 |
1 |
|
|
|
|
|
|
|
|
LDI
D,#Imm |
|
2 |
2 |
|
|
|
|
|
|
|
|
LDI
Rij,#Imm |
|
2 |
2 |
|
|
|
|
|
|
|
|
LDI
(Rij),#Imm |
|
2 |
2 |
|
|
|
|
|
|
|
|
LDSI
D,#SImm |
|
1 |
1 |
|
|
|
|
|
|
|
|
Lop
A,RAM |
|
1 |
1 |
|
|
|
|
|
|
|
|
Lop
Ad,As,(Rij) |
|
1 |
1 |
|
|
|
|
|
|
|
|
Lop
Ad,As,S |
|
1 |
1 |
|
|
|
|
|
|
|
|
LopI
Ad,As,#Imm |
|
2 |
2 |
|
|
|
|
|
|
|
|
LopSI
As,#Simm |
|
1 |
1 |
|
|
|
|
|
|
|
|
MODA
OpA,cond,w |
|
1 |
1 |
|
|
|
|
|
|
*
Z only: cmpl CY,Z only:
lstr, rotl,rotr, lsl,lsr CY,OV,Z,N: astr,addc,ro |
|
MODB
Ri,n |
|
1 |
1 |
|
|
|
|
|
|
|
|
MODF |
|
1 |
1 |
|
|
|
|
|
|
*
setcy,rescy |
|
MODR
Rj,Ri |
|
1 |
1 |
|
|
|
|
|
|
|
|
MPYx
(Rj),(Ri) |
|
1 |
1 |
|
|
|
|
|
|
|
|
MPYx
S,(Ri) |
|
1 |
1 |
|
|
|
|
|
|
|
|
MV
(Rj),(Ri) / MV (Ri),(Rj) |
|
1 |
1 |
|
|
|
|
|
|
|
|
NORM
As,w |
|
1 |
1 |
|
|
|
|
|
|
* When actual shift
takes place |
|
PUSH
S,w / POP D,w |
|
1 |
1/2 |
|
|
|
|
|
|
|
|
RET |
|
1 |
3 |
|
|
|
|
|
|
|
|
SHA
As,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
SHL
As,S,mode |
|
1 |
1 |
|
|
|
|
|
|
|
|
SHL
As,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
SWAP
A,S,cond,w |
|
1 |
1 |
|
|
|
|
|
|
|
|
TST
(Rij) |
|
1 |
1 |
|
|
|
|
|
|
|
|
TST
S |
|
1 |
1 |
|
|
|
|
|
|
|
|
VLCD
(Ri) |
|
1 |
1 |
|
|
|
|
|
|
|
W : Word count C :
Execution cycle count
2. CD2470A 16 Bit DSP
Description
The basic precision
of the CD2470A is 16 bits. While the
instructions and program memory have 16-bits data width, the D-Bus and the data
path elements, including data memory, can all have a fixed-point precision N
selected between 16 and 32 bits in CD24XX architecture. The CD2470A employs 16 as the N. The accumulator, A-Bus and the
total product P bus have a corresponding precision of 32 (2N) bits.
Data is represented
in two’s complement form with an implicit binary point to the right of the sign
bit which is the most significant bit (MSB). Implicit values range between
+1.0-2-N+1 and -1.0.
It is common in
digital signal processors to model a clipping or limiting of signal that
happens in analog components, so that it keeps system stability even when
numerical data handling encounters with overflows. This limiting of numeric
overflow is provided in the CD2470A by either Overflow Protection (OP flag) or by
executing the MOD sat
instruction. Either positive or negative overflow in the AH register will
result in substituting the corresponding positive or negative full-scale value
in AH or AW(AH/AL), when the OP flag is on or the MOD sat
instruction is executed. For example, 0x7FFF will be substituted upon a
positive overflow and 0x8000 on a negative overflow. Note that this protection
applies to both AH and AL when the MOD sat takes place in double word
mode. MOD sat instruction checks
OVF flag and N Flag to determine if it should replace current AH or AW value with the full-scale value, even when the OP Flag is set to “0”.
If the OP flag is set to “1”, this limiting
process takes place automatically at every arithmetic instruction execution.
As commonly seen in
FIR filter computation, numeric overflow in the accumulator may not necessarily
lead to incorrect results if the final total value of a summation lies within
the range that can be represented in the accumulator. Thus, the overflow
protection is optional and may be through Guard bit register checking when it is needed in successive MPY-ADD
operations. Using overflow protection blindly may reduce the dynamic range of
the computation unnecessarily.
The CD2470A uses a conventional
three-step instruction sequence: fetch instructions, decode instructions and
fetch operands, and finally execute the instructions. This sequence is
normally invisible to the user as long as the instruction does not set a new value to the PC (program counter). A dummy machine cycle is automatically inserted to
restore the pipeline, if the PC is altered with a data transfer instruction.
Most of the CD2470A
instructions are executed in one clock cycle (counted with the CKOUT). Obvious
exceptions are two word instructions with immediate data fields and the
branch/subroutine call with a PRAM address field. Other exceptions are
INCrement/DECrement instructions that also take two clock cycles. Three clock
cycles are taken when three-word long immediate (CD2480A,90A) instruction is executed, or the program
memory is read indirectly. Four cycles are consumed when the program memory is
written back indirectly. Stack related instructions such as a return from a
subroutine or a load memory to a register using the stack pointer takes an
additional clock cycle for the SP pre-incrementing. Whenever the PC is a
destination register, an additional dummy cycle is inserted to allow the
instruction pipeline to refill.
These variations in
instruction execution are normally invisible to users because the operation can
be considered complete at the end of its last clock cycle of an instruction
execution. The only exception is on the CD2470A multiplier. The product for one
pair of X, Y input register will be available on PW only after one cycle(next cycle) of Y (and/or X) updating cycle.
The arithmetic and
logical operations in the CD2470A are accomplished with a full function 32-bit AU (Arithmetic Unit) and a 16-bit LU (Logical Unit). These units work
either as a 32-bit AU or 16-bit ALU depending on the double word options in the
instructions. The unit has two input ports a and b
that are connected to one of AH|AL registers, D-Bus or Multiplier output.
Most operations are for 16-bit data and operate only on one of the 16-bit accumulator registers AHi. However, the MPY(multiply) and the double
word AU instructions work both AHi and ALi at a time. The MOD OpA instruction
works with either single-word 16-bit or double-word 32-bit. The results of the ALU operation control the N,
OV, Z and CY flags in the status register ST that is available immediately
(Actual ALU operations in these instructions take place in one cycle after the
current execution cycle, though users do not feel it.). For example, users
modify AH with one ALU instruction, and he or she can utilize the
result at next instruction. No dummy wait cycle is necessary at all.
The CD2470A
comes with a +/- 31 bit one-cycle barrel shifter. Only AHi|ALi (or AHi) registers can be modified with special
Arithmetic or Logical shift instructions in one cycle. If the shift bit count
is set as “0” in a shift instruction, the lower 6 bit of “TR” register will be referred as the shift
count in two’s complement number. A positive number of the shift count means
right shift toward LSB, and a negative number to mean left shift toward
MSB. A sign bit is extended when an
arithmetic right shift takes place, whereas zeros are filled on MSB side in a
logical shift. Zeros are shifted-in from LSB side when a left shift takes
place.
The CD2470A
offers two cycle normalizer by introducing NORM
instruction. The NORM instruction detects how many bit counts can you shift the
AHi|ALi (or AHi) toward MSB (left) without having an overflow, and sets the TR register with such bit count in one cycle. You can actually shift
the AHi|ALi (or AHi) by the bit count in the same
cycle as an option of NORM instruction. These operations
may include guard bit register as a part of Accumulator. You can
normalize the AHi|ALi(or AHi) contents with NORM instruction so that the
AHi|ALi (or AHi) will maintain the best accuracy for a coming usage.
The multiplier
takes either two 16-bit signed (or unsigned) input or a pair of one signed and one unsigned number data
in the X and Y registers. The multiplier produces either a signed 31-bit product (sign + 30 bit + “0”) in the 32-bit
of the PH and PL data or 32-bit
product (sign + 31 bit). The MSB is the sign bit with the implicit
binary point to its right. The LSB of PL is filled with zero when two signed
numbers are fed into X and Y. For the case of -1.0 x -1.0 (0x8000 x 0x8000) in
two signed input mode, the result is the number nearest to +1.0 that can be
represented in the form of 0x7FFFFFFF(+0.9999999995).
This multiplier
accepts one or two new operands to produce a new product every instruction clock
cycle. The multiplier needs one clock cycle to get a product (one cycle
latency). When you set a pair of input data on to the X and Y registers, the
product of the pair will be available on the PH|PL (PW) registers at the next clock cycle. This
means an pre-existing PH|PL value
is taken for the MPYA. MPYS
addition/subtraction (Not the new X*Y result). The new product reflecting current X,Y
value will be available on the PH|PL
from next clock (instruction) cycle.
Normally the
multiplier operation starts with a loading of the Y register, either by means
of a multiply instruction or the result of a load instruction to Y. Loading to
X register does not initiate the multiplication. Once the Y register is written
at some cycle, the multiplier starts its operation to write the new product
data onto the PH|PL register, regardless of whether the pre-existing data in
the PH|PL has been read out or not. This operation is automatic, and the
current product data in the PH|PL must be used on or before the cycle that is next
to Y register is written with a new data.
If the output
product registers PH or PL are loaded from the D bus at the same time that a
multiply operation is completed, the D-bus loading value has the precedence.
The CD2470A core
has sixteen of essential 16-bit
read/write data registers. Eight out of sixteen registers are general purpose
registers used for data handling that may be used as eight individual registers
or four double word registers. Other eight registers have dedicated functions
like PC, SP, ST etc. See following table.

AH0(AH) and
AL0(AL) Registers. The
high and low 16-bit halves of the
accumulator. Only double-word operations treat them together as a 32-bit register AW. Single-word operations treat
AH and AL as two separate registers. AH or AW registers are implicit
Accumulators where no accumulator selection is explicitly made on a
instruction. These registers are the most convenient registers in the CD2470A
resource and deemed as an “accumulator”.
AH1(TH) and
AL1(TL) Registers. Second
accumulator used either in two 16
bit registers or one 32bit
long register (TW). Most of the ALU instructions specify this register as an
alternative accumulator.
AH2 and AL2
Registers. Third
accumulator used either in two 16
bit registers or one 32bit
long register (BW). Most of the ALU instructions can specify this register as
an alternative accumulator.
AH3(PH) and
AL3(PL) Registers. The
high and low 16-bit halves of the multiplier product. These may be used as
temporary general-purpose data registers but the operation of the multiplier
should be fully understood with regard to timing and precedence. These registers may act as the fourth
accumulator used either in two 16
bit registers or one 32
bit long register (PW). Most of the ALU instructions can specify this register
as an alternative accumulator.
ST Register. The 16-bit Status register contains not only the five condition flags,
but the flags for interrupt controls, overflow protection mode, and address
pointer loop options. The N,OV,Z, and CY flags are updated at one cycle after
the related ALU instruction is executed. However, the next instruction of the
related ALU instruction can utilize the resulted flag contents without feeling this
internal delay. No dummy wait cycles are necessary to refer to the previous
flag modification.
Status Register
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
N |
Negative Flag. Indicates a negative ALU result for arithmetic operations. If an overflow occurs at the ALU operation, N flag does not show a correct sign. |
||||||||||||||||||||||
|
OV |
OVerflow Flag. Indicates the ALU result has an arithmetic overflow. This flag represents overflows only for single or double word operation. No guard bit register overflow is detected. |
||||||||||||||||||||||
|
Z |
Zero Flag. Indicates the ALU result is zero. Checks either single or double word full ALU output bits depending on the double word options |
||||||||||||||||||||||
|
CY |
Carry/Borrow Flag. Indicates a carry/borrow from the MSB of the ALU. CY holds shifted out bits from MSB or LSB at SHA, SHL instructions. |
||||||||||||||||||||||
|
RZ |
Pointer Register Zero Flag. Indicates if an updated address pointer register Rj or Ri whichever updated turned out to be zero. If both Rj and/or Ri pointers are modified at a time, Ri modification has the precedence for RZ update. If No modification option was taken, RZ is not updated. |
||||||||||||||||||||||
|
OP |
Automatic Overflow Protection. Any ADD/SUB instruction except INC/DEC non AH instructions makes the AH value overflow protected automatically, if this flag is on. |
||||||||||||||||||||||
|
IEx |
Interrupt Enable. Enables external interrupts. If this flag is off, no interrupts are acknowledged. This flag goes off when any interrupt is acknowledged and goes on when "MODF ie" is executed or IE2,IE1,IE0 are rewritten. Actual effect of the flag updating appears after one cycle of such IEx modification takes place. |
||||||||||||||||||||||
|
IE2,1,0 |
Interrupt Priority. Sets the acceptable external interrupt level. |
||||||||||||||||||||||
|
PRL |
Pointer Register Loop. Determines if the address pointer looping mode and the size of the loop.
Where P:(pppp)={1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,0*} Q:(qqqqq)={1,2,3,4,5,6,7,8,9,0,A,B,C,D,E,F,
10,11,12,13,14,15,16,17,18,19,1A,1B,1C,1D,1E,1F,0**} *:counted as ‘h10 **: counted as ‘h20 |
The MODF (Modify
Flags instruction) takes only one clock cycle to modify some frequently used
flags.
PC Register. The 16-bit program counter designates current program memory addresses.
Loading this register with a new data always introduces a one clock cycle NOP
delay to restore the instruction pipeline. The PC is treated as just one of the
general purpose registers in the CD2470A.
BF/RC Register. This register function is switched
between BF and RC by a special input pin (SELBF), normally controlled by a HOST
CPU. When the SELBF is held high, the BF/RC register is assigned for a 16 bit parallel IO port dedicated for the
host interface in emulator control. This port may also be used to establish an
on-the-fly communication between the PC host and the CD2470A DSP. Whereas, the
RC function (Repeat Counter) is selected if the SELBF is held low. The RC
register stands for the Repeat Counter register. A set of two eight-bit numbers
is written onto this register to start a repeat operation. No specific
instructions for the repeat operation exist.
The CD2470A gives a
loop repeat function. Loading an arbitrary number onto the RC initiates the
repeat operation. A 16 bit number n(8bit)|m(8bit) writing onto RC
register initiates a loop repeat operation of next n+1 clock cycles
worth instructions for m+1 loops. The clock cycle count must be matched
with the instruction execution cycles precisely. The value set to RC does not
change while and after the repeat operation. Interrupts are inhibited during
the repeat operation (but memorized).
BF
(PC I/F Buffer Register: Address C when SELBF=1)
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
RC
(Repeat Counter Register: Address C when SELBF=0)
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
n7 |
n6 |
n5 |
n4 |
n3 |
n2 |
n1 |
n0 |
m7 |
m6 |
m5 |
m4 |
m3 |
m2 |
m1 |
m0 |
X and Y
Registers. The X, Y
registers are assigned as a pair of multiplier input registers. When Y register
is written with any instruction (even INC/DEC Y does.), the PW(PH|PL) will be
updated with the X*Y at next cycle. PW does not change with the X register
setting.
SP Register. The SP is assigned as the only Stack
Pointer in the CD2470A. The Stack is implicitly made on the RAM0. The stack is
used for interrupt, subroutine call return address storage, and simple data
storage through POP/PUSH instructions.
SP
(Stack Pointer Register)
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
P15 |
P14 |
P13 |
P12 |
P11 |
P10 |
P9 |
P8 |
P7 |
P6 |
P5 |
P4 |
P3 |
P2 |
P1 |
P0 |
TR Register. The TR is used for barrel shifting,
normalization and MODB, VLCD instructions. This register may be used as
additional temporary register, too.
Lower six bits of
the TR register are applicable for the NORM instruction and SHA, SHL barrel
shifting instructions. Other bits on the TR register are simply read/writable
registers. SHA, SHL instructions ignore the bits other than lower six bits of
the TR register when referred.
TR
(Temporary Register or Barrel Shifting bit counter Register [Lower 6 bit])
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
T15 |
T14 |
T13 |
T12 |
T11 |
T10 |
T9 |
T8 |
T7 |
T6 |
T5 |
T4 |
T3 |
T2 |
T1 |
T0 |
PM Register. The PM register is used for RAM pointer
modification and Overflow counter. The Lower 9 bits as a Pointer Modifier, and
the upper 7 bits as a Guard bit register (G). Both PM and G
are 2's complement signed numbers. This register may be used as an additional
temporary register, too. The Guard bit register works only with ADD/SUB/MPY/SHA
for A (or AW) register. MLD and MOD cla instructions clear the Guard bit
register. The CMPB, MAX, MIN instructions modify PM register implicitly.
PM
(Temporary Register or Pointer Modifier Register [Lower 9 bit], Guard bit Register[15-9])
|
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
M15 |
M14 |
M13 |
M12 |
M11 |
M10 |
M9 |
M8 |
M7 |
M6 |
M5 |
M4 |
M3 |
M2 |
M1 |
M0 |
There are three
operational memories available in the CD2470A: the data memories RAM0 and RAM1
(16bit x 64Kw each) and the program memory PRAM (16bit x 64Kw).
All the memories should have a simple synchronous clocked RAM architecture to get the best
performance. Common clocked RAM IP available in most of IC vendors work fine
with the CD2470A. The address and
read/write control signals are available at the beginning of a RAM clock cycle
(rising edge of a clock). A read out data from the RAM is to be available
before the end of RAM clock cycle. A writing datum is available at the end of
RAM cycle (rising edge of next clock).
The bottom 512 words of the data memories are directly
addressable with the 10-bit DRAM address field in several instructions. The
full 64K address spaces are
indirectly addressable through the corresponding 16-bit address pointer registers. The Ri pointer registers
(R0-R3) address RAM0 and the Rj pointer registers (R4-R7) address RAM1. The RAM
addresses may be used as IO addresses (Mapped I/O), by decoding the specific
addresses to generate necessary WAIT cycle for slow IO devices.
When the program
memory PRAM is used for data storage, its full 64K address space can be addressed indirectly using any of the
address pointer registers Rij (R0-R7).
Stack pointer
register SP can be used to form an arbitrarily placed LIFO stack in RAM0. It is
important to note that the stack is used not only for subroutine call and
interrupts, but also as a data storage through POP, PUSH instructions.
The eight address
pointer registers R0-R7 generate indirect addresses for the data and program
memories. The Ri registers, R0-R3, address RAM0 and PRAM while the Rj
registers, R4-R7, address RAM1 and PRAM. By loading the pointer registers with
short or long immediate data instructions, data transfer instructions set
indirect addresses to the pointers. The pointer registers may be modified with
the following modification options after such pointers are referred in an
instruction.
|
Symbol |
Pointer Modification |
|
Rij |
No modification |
|
Rij+ |
+1 with the Looping boundary
option. |
|
Rij- |
-1 with the Looping boundary
option. |
|
Rij+! |
Add 9 bit signed integer number (+255~-256) in PM register. |
Address pointer
modification with looping options allows no-overhead circular buffers that are
useful in digital signal processing. The loop sizes for each DRAM are selected
in the PRL field of the Status register. The loop sizes may be specified in
this field in one of three different ways as follows.
|
Mode 0 |
This mode
places address boundaries at every 2N addresses on the entire data
memory. ( N = 2 to 8.) . A circular buffer starting at every 2N
address and having a size of 2N is set. Only +1, -1 modification
or +PM less than the loop size are allowed. |
|
Mode 1 |
This mode
places address boundaries at every 1024 addresses on the entire data memory.
A circular buffer starting at every 1024 addresses and having a size of
64(P+1) is set. The cyclic buffer is made only between 1024n and 1024n+64(P+1) where n={0,1…} and P={0,1…14}. Only +1, -1 modification is allowed. |
|
Mode 2 |
This mode places address
boundaries at every 64 addresses on the whole data memory. A circular buffer
starting at every 64 address and having a size of 2(Q+1) is set. The cyclic
buffer is made only between 64n and 64n+2(Q+1) where n={0,1…}
and Q={0,1…30}. Only +1, -1 modification is allowed. |
A Stack is made on
the RAM0 (RAM1 and RAM0 in double word operation). The register SP is used as
LIFO stack pointer. It is incremented before the address is accessed when
executing the POP Reg instruction. This popping of data from a downward growing
stack is done only with the POP Reg instruction. The SP is automatically
incremented by one ignoring looping boundary setup when the STACK is accessed.
The program counter
is pushed onto the stack by interrupts, reset, and subroutine calls. Any
looping boundaries will not affect the stack operation on interrupt, reset,
subroutine calls or returns. They are automatically treated as no looping
operations regardless of PRL field setting in ST register.
The CD2470A has
several conditional instructions. These include CALL, BRA, INC, DEC, MODA,
SWAP. Testable conditions in these conditional instructions are listed in the
following table.

Four ALU related
flags (N, OV, Z, CY), one RAM pointer modifier Zero detection flag (RZ) and two
user input pin (USR1,USR0) are testable
with the conditional instructions if they are either “1” or “0”. As the flags
are virtually modified in the current instruction cycle(s), next instruction
(right after a flag changing instruction is executed) can test these flags
immediately. No pipeline propagation
waiting for flag testing is necessary in the CD2470A. The ALU related
flags are modified with any arithmetic and logical instructions like ADD, SUB,
CMP, AND, OR, EOR, INC, DEC, MODA, some MPY and MODF. The RZ flag is updated if
either Rj or Ri is updated in an instruction. Ri modification has the
precedence in RZ flag update if both Rj,Ri are updated at a time. The USR pins are
directly testable as conditions. This feature is reserved for user enhancement
when user specific instructions are added.
When a conditional instruction includes Rij pointer update
operation, it updates Rij even if the condition is not met.
There is an extended set of conditions that are referenced
only by two instructions: BRA and CALL. These extended conditions
are shown in the following table.

Bit 3 of the code determines
positive(Bit3=1) or negative(Bit3=0) condition.
* R0~7 is decremented regardless of the condition matching.
System functions
include signals for system reset, interrupts, user input/output and clock
control along with their associated instructions
The CD2470A comes
with seven external interrupt request signal pins. INT1, INT2 ….. INT7 are
positive-going edge triggered inputs. They must remain high until the processor
acknowledges the request. Necessary
time for such acknowledgement (interrupt response time) is one clock
cycle plus the number of clock cycles for the current instruction. That is for
a minimum of two instruction clock cycles or a maximum of five clock cycles.
(Interrupts are suspended while Repeat operation is taking place.)
Each interrupt pin has its own level of priority with INT1
being the highest and INT7 being the lowest. Interrupts are enabled by IEx flag
in the Status register with a priority level determined by the IE[2:0] bits.
Changes in IE[2:0] or IEx are delayed by one instruction clock cycle to allow
completion of the changing instruction and avoid unnecessary nesting of
interrupt service routines. Once an interrupt request has been acknowledged,
all further interrupts are disabled automatically with IEx being cleared.
Interrupts may be re-enabled within the interrupt service routine to accept
further interrupts.
Interrupt vectors
are stored in the highest seven program memory addresses as follows. Once an
interrupt is acknowledged, the service routine address of the corresponding
interrupt is read out from these addresses, and the controls are moved to the
service programs.
|
Interrupt |
Priority |
Interrupt Vector Address |
|
INT1 |
High |
0xFFF9 |
|
INT2 |
|
0xFFFA |
|
INT3 |
|
0xFFFB |
|
INT4 |
|
0xFFFC |
|
INT5 |
|
0xFFFD |
|
INT6 |
|
0xFFFE |
|
INT7 |
Low |
0xFFFF |
|
|
|
|
Asserting any of the interrupts (rising edge) or Reset takes the
processor out of the Sleep mode invoked by the MODF sleep instruction.
These logics are awake even at the Sleep mode.
The system Reset
signal RES is an asynchronous input that initiates a non-maskable interrupt to
a reset service routine. It must remain asserted for minimum of two clock
cycles. When released, the CD2470A starts at the address given at program
memory address 0xFFF8.
|
Interrupt |
Priority |
Reset Vector Address |
|
RESb |
Non-Maskable |
0xFFF8 |
All existing
interrupt requests are cleared and like other interrupts it initially disables
all future interrupts. The system Reset clears IEx flag, but all other processor conditions
remain unchanged. The Reset will take the processor out of the Sleep mode.
There are five control signals for the system clock on the CD2470A
core processor:
|
SLEEP |
This
asynchronous input signal stops the internal processor clock when asserted. Processor operation resumes when this signal goes
low again. The internal processor clock may also be stopped with the Sleep mode invoked by the MODF
instruction and released from the sleep mode by an interrupt or system Reset.
These two means of reducing power consumption work independently. Either condition will stop the internal clock and both
must be released for normal operation. |
|
|
CKEN |
Ready or CKEN. A synchronous input control of the
internal processor clock to extend the processor instruction clock cycle for
memory access or input/output. Must be stable before and during CKIN is high.
|
|
|
CKIN |
The system
clock input. |
|
|
CKOUT |
The internal
clock output. It remains low when the processor is in sleep mode. May be used as an
internal clock monitor. |
|
|
(M1)* |
Shows the first
cycle of an instruction execution. |
|
*
Available at some release only.
3. CD2470A Instruction Details
Table 1 CD2470A Instruction summary
|
|
Data Transfer |
ADD/SUB/CMP |
AND/OR/EOR |
Pointer data Transfer |
Pointer Modifier |
||
|
|
|
|
|
|
|
|
|
|
Single Word |
LD D,S,x |
LD S,D,x |
Aop Ads,S |
Lop Ads,S |
LD Rij,RAM |
LD RAM,Rij |
|
|
PUSH x |
POP x |
|
|
|
|
|
|
|
LD A,RAM |
LD RAM,A |
Aop A,RAM |
|
|
|
|
|
|
LD D,(Rij),x |
LD (Rij),S,x |
Aop Ads,(Rij) |
Lop Ads,(Rij) |
|
|
|
|
|
LD D,(Rij)p |
LD (Rij)p,S |
|
|
|
|
|
|
|
LD D,Rij |
LD Rij,S |
Aop Ads,Rij |
|
LD D,Rij |
LD Rij,S |
|
|
|
MV
(Rj),(Ri) |
MV
(Ri),(Ri) |
|
|
|
|
MODR
Rj,Ri |
|
|
|
|
|
|
|
|
|
|
|
LDI D,#Imm |
LDI (Rij),#Imm |
Aop Ads,#Imm |
Lop Ads,#Imm |
LDI Rij,#Imm |
|
|
|
|
LDI D,#Simm |
|
Aop Ad,#SImm |
Lop
Ads,#SImm |
|
|
ADSI Rij,#Simm |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SWAP A,D |
|
|
|
|
|
|
|
|
Double Word |
LD Dw,Sw,x |
LD Sw,Dw,x |
Aop Awds,Sw |
|
|
|
|
|
PUSHw x |
POPw x |
|
|
|
|
|
|
|
LD Aw,RAMw |
LD RAMw,Aw |
Aop Aw,RAMw |
|
|
|
|
|
|
LD Dw,(Rij)w,x |
LD (Rij)w,Sw,x |
Aop Awds,(Rij)w |
|
|
|
|
|
|
SWAP Aw,Dw |
|
|
|
|
|
|
|
|
|
Multiply |
INC/DEC |
TEST |
MOD Acc |
Special Inst. |
Machine Inst. |
|
|
|
|
|
|
|
|
|
Single Word |
MPY S,(Ri) |
INC/DEC S |
TST S |
MODA |
|
MODF |
|
MPY (Rj),(Ri) |
|
|
SHA |
|
BRA |
|
|
|
|
|
SHL |
|
|
|
|
|
INC/DEC (Rij) |
TST (Rij) |
NORM |
|
CALL |
|
|
|
|
|
|
|
RET (POP PC) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
VLCD |
|
|
|
|
|
|
|
MODB |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Double Word |
|
INC/DEC Sw |
|
MODA Awd |
|
|
|
|
|
|
SHA
Awd |
|
|
|
|
|
|
|
SHL
Awd |
|
|
|
|
|
|
|
NORM
Awd |
|
|
|
|
|
|
|
|
|
|
Table 2 CD2470A Conditional instructions and Conditions to check
|
|
BRA, CALL |
INC/DEC |
MOD Acc |
SWPA |
|
|
|
Basic |
Extended |
|
|
|
|
Condition |
N |
ALZ |
N |
N |
N |
|
OV |
AHZ |
OV |
OV |
OV |
|
|
Z |
AWZ |
Z |
Z |
Z |
|
|
CY |
EAZ |
CY |
CY |
CY |
|
|
RZ |
AN |
RZ |
RZ |
RZ |
|
|
USR1 |
EAN |
USR1 |
USR1 |
USR1 |
|
|
USR0 |
NS |
USR0 |
USR0 |
USR0 |
|
|
ALWAYS |
|
ALWAYS |
ALWAYS |
ALWAYS |
|
|
|
DRZ0 ~
DRZ7 |
|
|
|
|
Table 3 CD2470A MOD, MODF operation
|
|
MOD OpA,cond |
MODF |
|||
|
|
Mne. |
description |
Mne. |
description |
Group |
|
Operation |
lstr |
Logical shift (TR) bit |
seti7 |
Set INT level 7 |
Group0 |
|
astr |
Arithmatic shift (TR)bit |
seti6 |
Set INT level 6 |
||
|
rotl |
Rotate left through CY |
seti5 |
Set INT level 5 |
||
|
rotr |
Rotate right through CY |
seti4 |
Set INT level 4 |
||
|
lsl |
Logical shift 1bit left |
seti3 |
Set INT level 3 |
||
|
lsr |
Logical shift 1bit left |
seti2 |
Set INT level 2 |
||
|
|
|
seti1 |
Set INT level 1 |
||
|
|
|
nop |
|
||
|
cla |
Clear A |
ei |
Enable Int |
Group1 |
|
|
claf |
Clear A and Guard bit |
di |
Disable Int |
||
|
addc |
Add CY to A |
ei1 |
Enable INT for only 1 cycle |
||
|
ro |
Roundoff |
nop |
|
||
|
neg |
Negate |
setcy |
Set CY flag |
Group2 |
|
|
cmpl |
1’Complement |
rescy |
Reset CY flag |
||
|
sat |
Saturate if OVF=1 |
setop |
Set OP flag |
||
|
sata |
Sat. regardless of OVF |
resop |
Reset OP flag |
||
|
|
|
sleep |
Get into sleep mode |
||
|
|
|
gen1 |
Generate INT1 |
||
|
|
|
gen7 |
Generate INT7 |
||
|
|
|
nop |
|
||
Status Register
|
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
N |
Negative Flag. Indicates a negative ALU result for arithmetic operations. If an overflow occurs at the ALU operation, N flag does not show a correct sign. |
||||||||||||||||||||||
|
OV |
OVerflow Flag. Indicates the ALU result has an arithmetic overflow. This flag represents overflows only for single or double word operation. No guard bit register overflow is detected. |
||||||||||||||||||||||
|
Z |
Zero Flag. Indicates the ALU result is zero. Checks either single or double word full ALU output bits depending on the double word options |
||||||||||||||||||||||
|
CY |
Carry/Borrow Flag. Indicates a carry/borrow from the MSB of the ALU. CY holds shifted out bits from MSB or LSB at SHA, SHL instructions. |
||||||||||||||||||||||
|
RZ |
Pointer Register Zero Flag. Indicates if an updated address pointer register Rj or Ri turned out to be zero. If No modification option was taken, RZ is not updated. |
||||||||||||||||||||||
|
OP |
Automatic Overflow Protection. Any ADD/SUB instruction except INC/DEC non AH instructions makes the AH value overflow protected automatically. |
||||||||||||||||||||||
|
IEx |
Interrupt Enable. Enables external interrupts. If this flag is off, no interrupts are acknowledged. This flag goes off when any interrupt is acknowledged and goes on when MODF ei (or MODF ei1) is executed or IE2, IE1,IE0 are updated. Actual effect of the flag updating appears after one cycle of such IEx modification takes place. |
||||||||||||||||||||||
|
IE 2,1,0 |
Interrupt Priority. Sets the acceptable external interrupt level. |
||||||||||||||||||||||
|
PRL |
Pointer Register Loop. Determines if the address pointer looping mode and the size of the loop.
Where P:(pppp)={1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,0*} Q:(qqqqq)={1,2,3,4,5,6,7,8,9,0,A,B,C,D,E,F,
10,11,12,13,14,15,16,17,18,19,1A,1B,1C,1D,1E,1F,0**} *:counted as ‘h10 **: counted as ‘h20 |
|
|
W |
C |
|
|
ADSI
Rij,#sSImm |
1 |
1 |
|
|
Aop A,RAM |
1 |
1 |
|
|
Aop Ad,As,(Rij) |
1 |
1 |
|
|
Aop Ad,As,Rij |
1 |
1 |
|
|
Aop Ad,As,S |
1 |
1 |
|
|
AopI Ad,As,#Imm,sh |
2 |
2 |
|
|
AopSI As,#Simm |
1 |
1 |
|
|
BRA addr,cond |
2 |
2 |
|
|
CALL addr,cond |
2 |
2 |
|
|
INC (Rij) / DEC (Rij),cond |
1 |
2 |
|
|
INC S / DEC S,cond |
1 |
2 |
|
|
LD A,RAM / LD RAM,A |
1 |
1 |
|
|
LD D,(Rij),x / LD (Rij),S,x |
1 |
1 |
|
|
LD D,(Rij)p / LD (Rij)p,S |
1 |
3/4 |
|
|
LD D,Rij / LD Rij,S |
1 |
1 |
|
|
LD D,S,x |
1 |
1 |
|
|
LD Rij,RAM / LD RAM,Rij |
1 |
1 |
|
|
LDI D,#Imm,sh |
2 |
2 |
|
|
LDI Rij,#Imm,sh |
2 |
2 |
|
|
LDI (Rij),#Imm,sh |
2 |
2 |
|
|
LDSI D,#SImm |
1 |
1 |
|
|
Lop A,RAM |
1 |
1 |
|
|
Lop Ad,As,(Rij) |
1 |
1 |
|
|
Lop Ad,As,S |
1 |
1 |
|
|
LopI Ad,As,#Imm,sh |
2 |
2 |
|
|
LopSI As,#Simm |
1 |
1 |
|
|
MODA As,OpA,cond |
1 |
1 |
|
|
MODB Rij,n |
1 |
1 |
|
|
MODF Gr2,Gr1,Gr0 |
1 |
1 |
|
|
MODR Rj,Ri,lp |
1 |
1 |
|
|
MPY (Rj),(Ri),m |
1 |
1 |
|
|
MPYx S,(Ri),m |
1 |
1 |
|
|
MV (Rj),(Ri) / MV (Ri),(Rj) |
1 |
1 |
|
|
NORM As,g,exe |
1 |
1 |
|
|
PUSH S,x / POP D,x |
1 |
1/2 |
|
|
SHA As,shift,g |
1 |
1 |
|
|
SHL As,S,mode |
1 |
1 |
|
|
SHL As,shift |
1 |
1 |
|
|
SWPA A,S,cond |
1 |
1 |
|
|
TST (Rij),Bit |
1 |
1 |
|
|
TST S,Bit |
1 |
1 |
|
|
VLCD (Ri),rs |
1 |
1 |
|
Signal processing usually
needs a long data buffer memory where real time data is stored and updated
every incoming time interval. Suppose you need to get an analog input signal
sampled every 22.7uS through a 51 Tap FIR filter. You need to keep successive
51 sampled data in a memory to compute one output data. Once this computation
is done, you need to add another new sampled data in the input memory and
discard the oldest data so that the next output data will be computed. Every
sampling period, this memory data handling takes place.
|
Address |
RAMi (t=k) |
RAMi (t=k+1) |
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
à |
|
|
|
|
|
|
|
|
|
st+50 |
in(k-4) |
|
in(k-3) |
|
|
st+49 |
in(k-3) |
|
in(k-2) |
|
|
... |
in(k-2) |
|
in(k-1) |
|
|
st+1 |
in(k-1) |
|
in(k) |
|
|
st |
in(k) |
|
in(k+1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig. 4.1.1 A sequence of data should be shifted. |
||||
You can do it by shifting
all the 51 data in the memory by one address up and put the new incoming data
at the starting address (st) of the memory space. This procedure takes up
fairly big computing resource. If we do not have a special instruction for it,
it may take about 102 clock cycles that is about two times bigger time
consumption compare to the filter computation itself. (See Fig 4.1.1.)
|
Address |
RAMi (t=k) |
RAMi (t=k+1) |
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
à |
|
|
|
st+50 |
in(k-4) |
|
in(k-4) |
|
|
st+49 |
in(k-3) |
st+50 |
in(k-3) |
|
|
... |
in(k-2) |
st+49 |
in(k-2) |
|
|
st+1 |
in(k-1) |
... |
in(k-1) |
|
|
st |
in(k) |
st+1 |
in(k) |
|
|
|
|
st |
in(k+1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig. 4.1.2 Starting address of the memory may be
moved instead of data shifting.. |
||||
There is a little smart way
to avoid this heavy work in the computation. Just decrement the "st"
address by 1 instead of really moving all the data, and put a newly sampled
input data at the new "st" address. (Fig 4.1.2) You need a little
consideration to minimize the memory usage in this scheme, since blindly
decrementing the "st" address at every new sample would cause an
unlimited memory usage. You set a lower and the upper boundaries of the memory
usage and check if your target address is in the range between two boundaries.
If your target memory address is off the range, you count the address by
rolling up the low boundary to high boundary as if the boundaries were
connected like a paper tape ring you made in your kindergarten days. (Fig.
4.1.3). As the memory area is utilized repeatedly in this scheme, we call it as
a "cyclic buffer" here after.
Now, please note the cyclic
buffer size does not necessarily be exactly same size as the data memory size
your application may need. As long as the physical buffer size is larger than
the necessary data size, it works fine. For example, if you should have 51-word
memory for your input buffer, the cyclic buffer size (Upper Boundary – Lower
boundary ) can be 51,52,53....
|
Address |
RAMi (t=k) |
RAMi (t=k+1) |
||
|
|
|
|
|
|
|
|
|
à |
|
Upper Boundary (UB) |
|
|
|
st |
in(k+1) |
|
|
|
|
|
|
|
|
st+50 |
in(k-4) |
|
in(k-4) |
|
|
st+49 |
in(k-3) |
st+50 |
in(k-3) |
|
|
... |
in(k-2) |
st+49 |
in(k-2) |
|
|
st+1 |
in(k-1) |
... |
in(k-1) |
|
|
st |
in(k) |
st+1 |
in(k) |
Lower Boundary (LB) |
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig. 4.1.3 Set Lower/Upper boundaries to make the
memory address cyclic. |
||||
The CD2470A has the
following three different modes to set up the boundaries on the DRAM.
The boundaries are placed
at the address of 2N where N={2,3,4,5,6,7,8}. The RAM pointers are conscious of the boundaries only when they are
incremented or decremented by less than 2N. If a value bigger than 2N
is added to the RAM pointer, the digits upper than n are simply added and the
digits lower or equal to n will be cyclic.
Example 1.
Set a MODE0 with N=3 and
modify the RAM pointer with the following program. The pointer value at the end
of every instruction is shown on the right.
|
LDLI ST,0x000002 |
Set Mode1 N=3 |
|
LDSI PM,0x6 |
|
|
LDI R2,0x12AE |
R2=0012AE |
|
LD (R2+),AH |
R2=0012AF |
|
LD (R2+),AH |
R2=0012A8 |
|
LD (R2+),AH |
R2=0012A9 |
|
LD (R2-),AH |
R2=0012A8 |
|
LD (R2-),AH |
R2=0012AF |
|
LD (R2-),AH |
R2=0012AE |
|
LD (R2-),AH |
R2=0012AD |
|
ADSI R2,0x95 |
R2=001332 |
|
MOD R4,R2+! |
R2=001330 |
|
MOD R4,R2+! |
R2=001336 |
|
LD R2,ST |
R2=000002 |
|
MOD R4,R2+! |
R2=000000 |
Up to 960 words of the
cyclic buffer in the step of 64 words at every 1024 words memory block is
defined in MODE 1. The lower boundaries in this mode are implicitly placed at
the address of 1024n.The upper boundaries are placed at the address of
1024n+64(P+1) where n={0,1,2,3...} and the P={0,1,2...14}. The RAM pointers are conscious of the
boundaries only when they are incremented or decremented by one("1").
Current logic of the RAM pointer modifier checks only lower 10 bits of the
pointer if it is either "0" or 64(P+1)-1. If current RAM pointer
resides in the range outside of valid boundary, the operation is unpredictable.
Example 2.
Set a MODE1 with P=2 and
modify the RAM pointer with the following program. The pointer value at the end
of every instruction is shown on the right.
|
LDLI ST,0x000012 |
Set Mode1 P=2 |
|
LDSI PM,0x6 |
|
|
LDI R2,0x12BE |
R2=0012BE |
|
LD (R2+),AH |
R2=0012BF |
|
LD (R2+),AH |
R2=001280 |
|
LD (R2+),AH |
R2=001281 |
|
LD (R2-),AH |
R2=001280 |
|
LD (R2-),AH |
R2=0012BF |
|
LD (R2-),AH |
R2=0012BE |
|
LD (R2-),AH |
R2=0012BD |
|
ADSI R2,0x95 |
R2=001352 |
|
MOD R4,R2+! |
R2=001338 |
|
MOD R4,R2+! |
R2=00133E |
|
LD R2,ST |
R2=000002 |
|
MOD R4,R2+! |
R2=000006 |
Up to 62 words of the
cyclic buffer in the step of 2 words at every 64 words memory block is defined
in MODE 2. The lower boundaries in this mode are implicitly placed at the
address of 64n.The upper boundaries are placed at the address of
64n+2(Q+1) where n={0,1,2,3...} and the Q={0,1,2...30}. The RAM pointers are conscious of the
boundaries only when they are incremented or decremented by one("1").
Current logic of the RAM pointer modifier checks only lower 6 bits of the
pointer if it is either "0" or 2(Q+1)-1. If current RAM pointer
resides in the range outside of valid boundary, the operation is unpredictable.
Example 3.
Set a MODE2 with Q=5 and
modify the RAM pointer with the following program. The pointer value at the end
of every instruction is shown on the right.
|
LDLI ST,0x000012 |
Set Mode1 P=2 |
|
LDSI PM,0x6 |
|
|
LDI R2,0x1208 |
R2=001208 |
|
LD (R2+),AH |
R2=001209 |
|
LD (R2+),AH |
R2=001200 |
|
LD (R2+),AH |
R2=001201 |
|
LD (R2-),AH |
R2=001200 |
|
LD (R2-),AH |
R2=001209 |
|
LD (R2-),AH |
R2=001208 |
|
LD (R2-),AH |
R2=001207 |
|
ADSI R2,0x95 |
R2=0013A2 |
|
MOD R4,R2+! |
R2=0013A8 |
|
MOD R4,R2+! |
R2=0013AE |
|
LD R2,ST |
R2=000002 |
|
MOD R4,R2+! |
R2=000008 |
Current MODE1, 2 limit the
increment/decrement values to the loop size or less. For example, if you define
the loop size of 48 in MODE2, you can choose any increment/decrement value
between 1~48.
You have so much chance to
need quick DO LOOPs in algorithmic processes. The DO LOOPs may be composed
using fully programming manner when such LOOP overhead is not very big compare
to the processing time in each LOOP. You normally use this software way to
realize a DO LOOP in common programming for non time constraint application.
Let's see how much computing time do you need for a DO LOOP set up when you use
common CPU resources. Suppose you needed to execute a small routine program 20
times. Then, your assembly program code may be like..
LDI R6,19
Loop1:
....
....
(Process)
....
....
MOD R6-,R0
BRA @LOOP1,RZ1
This means you are wasting
3 clock cycles each LOOP processing only for DO LOOP constitution. This 3 cycle
may or may not be a big problem on your ever needing fast execution
requirement. One of the worst conditions happens if the target process
(Process) is only one of one cycle instructions like MPYA (Rj+),(Ri-). Then you
are loosing 75% of each LOOP execution cycle time, whereas you loose only 10%
if the target process (Process) has 27 cycle execution time. Good thing is that
the software way is quite flexible in its resource usage. You can use any RAM
pointer as a DO LOOP counter. If you do need to keep RAM pointers for another
purpose, you can use another general purpose register as a Loop counter though
it takes one more cycle to execute.
Those who may concern
about 3 cycle overhead each DO LOOP
would need another option called "repeat" instruction. This option
uses a special cycle counter to count the target process (Process) execution
time as well as a LOOP counter that counts the LOOP cycles. The CD2470A offers
an RC register for this purpose. When two 8 bit numbers concatenated as m|n are
set to RC register, the instructions next to the RC modification cycle will be
executed in m+1 cycles for n+1 times. No overhead for each LOOP exists in this
operation. However, you need to know the exact cycle count for the target
process (Process) that is not necessarily same as the instruction count, and
any interrupt asserted during the LOOP operation will not be serviced until the
LOOP operation finishes. This "repeat" instruction allows you to get
really no overhead LOOP operation.
LDI RC,m|n
....
....
(Process)
....
....
Although the
"repeat" operation seems most preferable option on your application,
because of its smallest overhead, it comes with another tough problem like long
interrupt response time. Seeing these pro and con in the options, we strongly
recommend to use "macro" for the part where enough program memory
space exists. This option will take up larger program memory space. If the
target process (Process) includes only m word instructions and you need to
execute only n times, you need m۰n
words program memory space in macro, whereas "repeat" instruction
takes up only m words. Good thing for the macro usage is that it does not need
any special hardware or any interrupt consideration to use. In MACRO, you can
stop the operation at any moment and resume the operation when you need it.
Only most inner loop in a
multiple nesting DO Loop may be assigned to a repeat instruction. Upper DO Loop
may be described in full S/W way in the CD2470A. Suppose you have 20 cycles
Process in a core part that should be repeated 10 times, then change parameters
to repeat the same Process for p times. You would have about 2%
(4/203)computation time loss if you use S/W way for the upper loop. The program
memory usage will be about 24+ words. If the CD2470A had two repeat hardware
sets and allows nested "repeat", your time loss is less than 1%
(1/203), and consumes 21+ word program memory. As long as the extended first
order of the nesting cycle count is big enough compare to the S/W DO Loop
overhead (3 words/3 cycles), no nested DO Loop hardware has good meaning. What
if you got a chance to need triple nested DO Loop even two inner loop extended
cycle count is not big enough compare to the S/W DO Loop overhead (3 words/3
cycles)? You can describe the most
inner DO Loop as a MACRO.
Any digital computing carries a precision limitation
problem due to limited hardware size. Suppose you have a sixteen (16) bit
register and an associated arithmetic unit to compute some algorithm, you may
have maximum one LSB error at some operation if you simply ignore the below-LSB
part. Let's consider about a simple multiplication of two 16 bit numbers. Assuming
the binary point of each number exists right next to the sign bit, you will get
31 bit result (sign bit + 30 bit). If you need only the most significant 16
bits as a result, how can you get the best accuracy in that 16 bit? Actually,
as you do not know if your original numbers are derived from exact 16 bit or
another bit length number but trancated, you can't say if your 17th bit in the
31 bit result has meaning or not. However, let's assume the lower 15 bit of the
multiplication result has a good meaning, here. If you just truncate the lower
15 bit, the resulted 16 bit number might have maximum 1 LSB error (0LSB~1LSB),
which may or may not degrade your algorithm. So, you may think about a round
off. If the 17th bit is "1", you add "1" to the LSB of the
16 bit result. If the 17th bit is "0" you just take the original 16
bit as the final result. By doing so, your expected maximum error will be 0.5
LSB (-0.5LSB ~+0.5LSB). This procedure reduces the error power to one quarter.
This is the Round off.
The CD2470A DSP comes in two 16 bit accumulator
registers combined to form a 32 bit long word accumulator. Computations done in
double precision (32 bit) instructions will use this long word accumulator in
full length. Then, You may need to convert the word length to 16 bit for
another operation with least error. The CD2470A has an instruction..
MODA ro
where the MSB of the lower half of the long word
accumulator is added on to the LSB of the upper half of the long word accumulator.
This round-off operation may need 3~4 times more computation time without this
instruction. The CD2470A comes with a
"conditional" option for this instruction. This option allows
you to round off the Accumulator selectively. For example, you can keep a
symmetry on both plus and minus values on some kind of computation like divide-by-2 by right
shifting, without puzzling with absolute value and sign change operation.
Ex.
...
SHA AW,1
// Right shift AW long register by one bit A <-- 0.5*A
MODA ro,n
// Round off AH only when AH is negative
...
This two cycle operation gives a symmetrical result on
plus and minus values.
The CD2470A carries only 16
x 16 bit signed/unsigned multiplier whereas a full set of double word
instructions other than the multiplication is available. While CD2480, CD2490 DSPs are recommended
for the applications where extensive long word multiplications take place, it
is possible even with the CD2470A to do a double word multiplication in about
11 cycles by a program.
#1 If two double word data are on RAM0, RAM1 in
successive addresses, double word multiplication takes following 10
cycles. Where u(L1) represents unsigned
lower half of the input data 1, and s(H0) represents signed upper half of the
input data 0 etc.. The results are stored in A0 and A2 in a form of Sign bit +
62 bit + "0"
MLD (L1),(L0),uu //
set two lower halves on the MPYer
LD AL2,PL // Store the lower half of the
result
SHA PW,16 // Shift the (L1)*(L0) by 16 bit right
MPYA (L1),(H0),us //
Move the shifted result to AW and set next input data
MPYA (H1),(L0),su //
Add the (L1)*(H1) to AW and set next
input data
MPYA (H1),(H0),ss //
Add the (H1)*(L1) to AW and set next
input data
LD AH2,AL // Store the lower half of the
accumulated result
SHA AW,15 // Shift the upper half of the result to lower half
ADD AW,PW // Add the (H1)*(H0) to the AW
SHA AW2,-1 // Compensate the lower half(A2) bit position.
If the input data are in
the form of other CD2470A double word instructions, the word locations of the
data need to be modified before starting the routine above. It takes another
two(2) cycles as follows.
LD AW,(H1)|(L0),swap // Exchange the data position of the (H1) and (L0)
LD (H1)|(L0),AW //
This may also be replaced
with simple two MV instructions.
Original input data reside
in RAM1 and RAM0. The upper half double words are in RAM1 and the lower half
double words are in the RAM0. All the double word data are stored in this way
with any double word instruction in the CD2470A, so that one cycle access to
the DRAM is available through two single word data RAMs.
EX1. 12 cycles, 12 words
LD AW,(H1)|(L0),swap // Exchange the data position of the (H1) and (L0)
LD (H1)|(L0),AW //
MLD (L1),(L0),uu
// set two lower halves on the MPYer
LD AL2,PL // Store the lower half of the
result
SHA PW,16 // Shift the (L1)*(L0) by 16 bit right
MPYA (L1),(H0),us
// Move the shifted result to AW and set
// next input data
MPYA (H1),(L0),su
// Add the (L1)*(H1) to AW and
set next input data
MPYA (H1),(H0),ss
// Add the (H1)*(L1) to AW and
set next input data
LD AH2,AL // Store the lower half of the accumulated result
SHA AW,15 // Shift the upper half of the result to lower half
ADD AW,PW // Add the (H1)*(H0) to the AW
SHA A2W,-1 // Compensate the lower half(A2) bit position.
EX2. 12
cycles 12 words
LD PL,(L0)
MLD PL,(L1),uu
LD AL2,PL
SHA PW,16
MPYA (H0),(L1),su
MPYA (H1),(L0),su
LD (Temp),(H0)
MPYA (H1),(Temp),ss
LD AH2,AL
SHA AW,16,G
ADD AW,PW
SHA AW2,-1
The CD2470A can handle double
word DRAM access, with that a seamless bit streaming RAM access is available
through several special instructions. Assuming a bit stream data is stored in
the sequence shown in the fig 1., the word pointer Wp and the bit position
pointer Bp will point the current bit position in the whole bit stream stored
in the RAM1,RAM0. There are two kinds of bit counting direction. One count the
bit from MSB to LSB(m=0), the other LSB to MSB(m=1). Also, there are two cases
when the target bit granule(n bit) resides across the word boundary. Here, we
assume the target bit granule size is equal or less than the word size. We can
use Rij pointers as the Wp word pointers, and the TR[4:0] register for Bp,
hereafter.

Fig
4.5.1. Bit stream data storage and construction

Fig 4.5.2. Bit stream pointers
The
RAM bit address pointer register is composed of one pair of [R4,R0], [R5,R1],
[R6,R2], [R7,R3], RAM pointers and LSB five bits of the TR register as shown in
Fig 2.
One
pair of data words {(Ri+4),(Ri)} are accessed at a
time, reading or writing 32 bit long word, where the bit address on the long
word is designated by the TR[4:0]. Since the contents of (Ri+4) and (Ri) are modified independently, arbitrary
number (less or equal to 16) of bits from any bit-word address is readable or
writable as shown in Fig 1.
A
bit string having a length "n" bit is read out from a buffer memory
formed in the DRAM conveniently by utilizing LD and SHL instructions. It needs 3 or 4 cycles to read up to 16 bit
bit-stream data from the memory composed in RAM1,0. A double word reading with
or without word swapping is done with LD in double word mode and thereafter a
couple of SHL instructions cut out the necessary part of the words as the final
bit stream data onto one of Accumulators. Bit address pointers are updated with
the MODB instruction for successive reading.

Fig. 4.5.3 Bit stream reading
Writing
a bit string having a length "n" to the top of existing bit stream in
DRAM is conveniently programmed by utilizing SHL and LD instructions. It needs 5 cycles to write up to 16 bit
stream data to a buffer memory area composed in RAM1,0. A double word reading
with or without word swapping is done with LD in double word mode and
thereafter a couple of SHL instructions cut out the necessary part of the words
so that the current bit stream data is patched with it. Finally the patched
data is written back into the DRAM. This routine takes either 5 or 6 cycles
depending on the bit streaming direction.

Fig. 4.5.4 Bit stream writing
Variable bit Length Coding
(VLC) technique is widely used in many data compression algorithms. The CD2470A DSP carries a special
instruction to make a fast decoder for such coding techniques. The VLCD
instruction (stands for Variable bit Length Code – Decoding) makes a very
simple yet fast decoding with a special data reference table format. The code table for a VLC needs to be in a
standard tree structure data where two numbers are stored in a pair location of
RAM1 and RAM0 words that represent two address distances on the table to next
tree nodes. Each node has two branches for input code bit either "0"
or "1". The VLCD instruction can handle this one bit decoding in one
cycle.


Let's see an example of a
Huffman decoder program using VLCD instruction.
Suppose you had 4-bit original binary code that was encoded using
VLC technique based on the code table Tab.1. This table means your original
code "0000" is encoded as "1", "0001" is encoded
as "010", "0010" is encoded as "000110" and so
on.
The code table in the Tab.1
needs to be stored in the DRAM with the form like Tab.2 so that the VLCD
instruction can work properly for decoding of such encoded bit stream. (The
data in the Tab.2 are represented in Hex format.)
A component of the DRAM table
contains two 16-bit values. For example, the Table_top + 8 address has the data
H = Tt+09 and L = Tt+16.
These values are possible
next table addresses. The next address we may need to reference is either
Table_top + 9 or Table_top + 16 in this case. The values (H, L) give the
absolute next table addresses to reference.
The Tab.2 is showing a binary
tree for the Tab.1. When the first bit out of an encoded bit stream comes in,
you are at the Table_top of the Tab.2. If the first bit was "1", you
can see the Tab.1 to find the first original code was "0000" because
all the code that begin with "1" on the encoded bit stream is
"0000" only. On the Tab.2,
you read the data and get H=Tt+01 and L=Tt+02. As the first bit of the encoded
bit stream was "1", you choose H as the new table address. (If the
encoded bit come in was "0" you would choose L as the new address.).
So, new table data you read at Tt+1 will be H=0, L=0. Now, seeing the L =
"0", you find you have reached the end of decoding, and the corresponding
original code is found in H, that is "0".
Assuming the incoming bit
stream was 010..., you can trace how you decode it with the Tab.2. You go
through Table top, Tt+2, Tt+3, then reaches Tt+5 that gives an original code
"1".
The CD2470A DSP can process
this tree searching in one cycle at every node branching as follows.
LD TR,encoded_bit_stream
LDI R0,@Tt //Table Top
LDI R4,@Tt //Table Top
LDSI RC,15
VLCD (R0+),L
LD AH0,(R4)
// The "repeat
loop" breaks once a decoding is completed.
// Zero flag is on if decoding
is completed.
// A0 will have a decoded
code.
// "Repeat"
continues one more cycle after a "Break" is detected.
// Once a decoding is
completed, the RAM pointer content stays at the same value.
// Consumed original code bit
count should be placed in part of decoded code data.
This example program executes
one symbol decoding in 14 cycles in the worst case for the Tab.1 code table.
Finding a target bit pattern
from a big table is one of commonly seen time consuming routines in signal
processing programs. The CD2470A has a quick table look-up method where one
cycle per one memory word comparison is possible. The CMPB As,(Rij+)
instruction compares the RAM data with As register, then clear the Loop counter
(termination of the Repeat operation)
and the Rij contents is copied to PM register, if they match. This
instruction allows to find the matched memory address in the PM register with
the usage of repeat function as follows.
Example:
You have a table of 62 words
on RAM0. A value “0AA6” is in AH0
register to find the address of the table having the value “0AA6”.
Ri
|
|
|
|
|
|
|
|
|
|
|
|
st+61 |
74F2 |
|
st+60 |
98DD |
|
|
… |
|
st+23 |
0AA6 |
|
... |
… |
|
st+1 |
2C05 |
|
st |
01A3 |
|
|
|
|
Fig.
4.7.1 |
|
LDSI PM,0
LDI R0,@st
LDSI RC,61
CMPB AH0,(R0+)
LD AH0,PM
BRA @no_match, AHZ
This program takes 7 to 68
cycles to find a matched data. The bit pattern to be searched is given in AH0,
and the address of the matched data comes in AH0.
An alternative program that
does not use CMPB instruction takes about 5 times of the cycles. See an example
coding.
LDI R0,@st
LDI R7,62
LP1:
CMP AH0,(R0+)
BRA @LP2,Z
BRA @LP1,NDRZ7
BRA @no_match
LP2:
MODR R4,R0-
LD AH0,R0
Finding a maximum or minimum
value from a big table is one of commonly seen time consuming routine in signal
processing programs. The CD2470A has
specific instructions for this purpose. The MAX As,(Rij+) and MIN
As,(Rij+) instructions compare the RAM data with As register to load the larger
or smaller value onto As, then the Rij contents is copied to PM register, if As
is updated. This instruction allows to find the Maximum or Minimum value on the
table with its memory address onto the PM register.
Example:
You have a table of 200 words
on RAM0. Find a most positive value and
its memory address.
LDI AH0,’h8000
LDI R0,@st
LDSI RC,199
MAX AH0,(R0+)
Ri
|
|
|
|
|
|
|
|
|
|
|
|
st+199 |
0344 |
|
st+198 |
FA09 |
|
… |
… |
|
… |
… |
|
... |
… |
|
st+1 |
790F |
|
st |
A432 |
|
|
|
|
Fig.
4.8.1 |
|
This program takes 202 cycles
to find the MAX value, and its memory address in PM register.
An alternative program that
does not use MAX or MIN instruction takes about 3~8 times of the cycles. See an
example coding.
LDI AH0,’h8000
LDI R0,@st
LDI R7,199
MM1:
CMP AH0,(R0+)
BRA @MM1,N
MODR R4,R0-
LD PM,R0
LD AH0,(R0+)
BRA @MM1,NRZ7-