# -------------------------------------------------------------------
# 68K                                   (c) Copyright 1996 Nat! & KKP
# -------------------------------------------------------------------
# These are some of the results/guesses that Klaus and Nat! found
# out about the Jaguar with a few helpful hints by other people, 
# who'd prefer to remain anonymous. 
#
# Since we are not under NDA or anything from Atari we feel free to 
# give this to you for educational purposes only.
#
# Please note, that this is not official documentation from Atari
# or derived work thereof (both of us have never seen the Atari docs)
# and Atari isn't connected with this in any way.
#
# Please use this informationphile as a starting point for your own
# exploration and not as a reference. If you find anything inaccurate,
# missing, needing more explanation etc. by all means please write
# to us:
#    nat@zumdick.rhein-main.de
# or
#    kkp@gamma.dou.dk
#
# If you could do us a small favor, don't use this information for
# those lame flamewars on r.g.v.a or the mailing list.
#
# HTML soon ?
# -------------------------------------------------------------------
#  68k.html,v 1.11 1997/03/30 02:27:11 
# -------------------------------------------------------------------

Preface:
   There isn't much we need to tell you about the 68K. First you 
   already know the chip since ten years probably, and secondly 
   there are enough reference books available in case your memory
   is failing you. Let's just look at the way the processor is bound
   into the system and some things to watch out.






IRQs:
=-=-=

      IPL         Name           Vector            Control
   ---------+---------------+---------------+---------------
       2      VBLANK IRQ         $100         INT1 bit #0 
       2      GPU IRQ            $100         INT1 bit #1
       2      HBLANK IRQ         $100         INT1 bit #2
       2      Timer IRQ          $100         INT1 bit #3

   Note: Both timer interrupts (JPIT && PIT) are on the same INT1 bit.
         and are therefore indistinguishable.

   A typical way to install a LEVEL2 handler for the 68000 would be 
   something like this, you gotta supply "last_line" and "handler".
   Note that the interrupt is auto vectored thru $100 (not $68)


   V_AUTO   = $100
   VI       = $F004E
   INT1     = $F00E0
   INT2     = $F00E2
   
   IRQS_HANDLED=$909                ;; VBLANK and TIMER

         move.w   #$2700,sr         ;; no IRQs please
         move.l   #handler,V_AUTO   ;; install our routine

         move.w   #last_line,VI     ;; scanline where IRQ should occur
                                    ;; should be 'odd' BTW
         move.w   #IRQS_HANDLE&$FF,INT1  ;; enable VBLANK + TIMER
         move.w   #$2100,sr         ;; enable IRQs on the 68K
         ...

handler:
         move.w   d0,-(a7)
         move.w   INT1,d0
         btst.b   #0,d0
         bne.b    .no_blank

         ...

.no_blank:
         btst.b   #3,d0
         beq.b    .no_timer
      
         ...

.no_timer:
         move.w   #IRQS_HANDLED,INT1      ; clear latch, keep IRQ alive
         move.w   #0,INT2                 ; let GPU run again
         move.w   (a7)+,d0
         rte

   As you can see, if you have multiple INT1 interrupts coming in,
   you need to check the lower byte of INT1, to see which interrupt
   happened.


Superstitions / Things to watch out for:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

   It looks like word/byte accesses to ROM space don't work. Looking
   at some code in the Jaguar Server indicates that the MEMCON registers
   come into play here.

   I have a hunch that RWM cycles (like CLR.W (a0)) on TOM registers
   aren't 100% safe.

   NEUROMANCER adds:	
      NEVER do a clr.l (a0) into GPU/DSP memory you must do a 
      move.l #0,(a0) or a move.l d0,(a0).
   
   The special thing about a CLR (on the 68000, fixed in the 68010
   and onwards I believe) is, that the processor does a source read 
   before doing a destination write. It could be that this buggy read
   is done in a slightly incompatible fashion to the other RMW
   instructions like TAS , BCLR <??>,, ASL 
   et.c.

   Otherwise you must refrain from using any RMW instruction on
   GPU/DSP memory.

   If the 68K does not soak up leftover cycles, but does use up valuable
   bus resources its best to put it to sleep with

         HALT  #2000

   so it will sleep until the next IRQ wakes it up again.



ADDENDUM:
=========


Timing:
=-=-=-=

A few timing session got us the following results. Note that the timing
was done with the video system, the GPU and the DSP shut down. 
See the addendum for part of the timing routine. [ This could be all 
bullshit of course ]
                                       total           instr
I R W                             min  max  avg   min  max  avg   sus
------------------------------------------------+----------------------
1      8x moveq   #0,d0            28  132   81    4   17   10
1      8x move.w  d0,d0            28  132   81    4   17   10
1      8x move.l  d0,d0            28  132   81    4   17   10
2      8x move.w  #$FFF0,d0       108  212  162   14   27   20
1 1    8x move.w  (a0),d0         172  276  223   22   35   28
1   1  8x move.w  d0,(a0) (+/-)   188  292  243   24   37   30	   34
3      8x move.l  #$3FFF0,d0      188  292  243   24   37   30
2 1    8x move.w  $3FF0,d0        252  356  308   32   45   39
1 2    8x move.l  (a0),d0         252  356  309   32   45   39		42
2   1  8x move.w  d0,$3FF0        268  372  324   34   47   41		
1   2  8x move.l  d0,(a0) (+/-)   284  388  341	  36   49   43		46
3 1    8x move.w  $3FFF0,d0       332  436  390   42   55   49
3   1  8x move.w  d0,$3FFF0       348  453  406   44   57   51
3 2    8x move.l  $3FFF0,d0       412  516  471   52   65   59
3   2  8x move.l  d0,$3FFF0       444  548  503   56   69   63 
3 1 1  8x move.w  $1000,$1004     492  596  552   62   75   69
5 1 1  8x move.w  $30000,$30004   652  756  716   82   95   90
3 2 2  8x move.l  $1000,$1004     668  772  732   84   97   92
       8x mulu.w  d1,d0           700  784  754   88   98   94
5 2 2  8x move.l  $30000,$30004   828  932  894  104  117  112

1 2    4x move.l  (a0),d0         100  204  154   25   51   39
1 2    8x move.l  (a0),d0         252  356  309   32   45   39
1 2   32x move.l  (a0),d0        1164 1268 1236   36   40   39	

------------------------------------------------+----------------------
I:   instruction words
R:   data words read
W:   data words written

avg: average  
min: minimum encountered   
max: maximum encountered
sus: approx. sustained average     (doing 16 mio accesses)

cycle times in 26.591 Mhz cycles
-----------------------------------------------------------------------
4 cycles for 8x move.l d0,d0 looks weird at first. This result can happen 
if the 'reference value' was off. The maximum number could happen if the 
'reference value' is OK and the timing 'value' is off. If one looks 
closely then the difference between min and max is 104 cycles on a 
measurement basis, therefore the average value should be about right.

Due to the apparent preference for immediate data, it would appear that 
the I/O Latch also acts as a small read cache (64 bit probably) for 
the 68000. Technically though, this sounds like a riscy idea for a multi-
processor system, because there's no bus snooping to be expected.

Data writes on the average are a bit slower than data reads. This is
a bit strange, because the timings suggest that for every write of the
68K an indivisible read modify write cycle is done, effectively using 
two bus cycles for a write. Of course architecturally this would be 
very stupid.

It would seem that the memory interface acknowledges to the 68000 only 
when the data has indeed been written (doesn't buffer). The 2 cycles 
slower average on the timings suggest that happening. 


The sustained measurement was done with a simple C, doing 16 times
      move  d0,(a0)+    
or    move  (a0)+,d0
and this for 1 million iterations. (not very accurate, because the
loop code was not filtered out)

The results with VIDEO OFF:

         access      time   mio bytes/s   cycles/move
   ---------------+--------+------------+-------------
    16 bit writes    20.6s       1.6          34 
    32 bit writes    27.8s       2.3          46
    32 bit reads     25.3s       2.5          42


The code:
=-=-=-=-=

;; can't use D6+D7
      .macro   TESTCODE
         .rept    8
            move.l   (a0),d0
         .endr
      .endm
code:
      movem.l  d1-a6,-(a7)

      lea      $3FFF0,a0
      moveq    #23,d1
      moveq    #7,d0
      moveq    #-1,d5
.punt:
      move.w   d5,PITLO
      move.w   PITLO,d6
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      move.w   PITLO,d7
      sub.w    d6,d7
      bcc.b    .ok
      neg.w    d7
.ok:
      move.w   d7,-(a7)       ; reference

      lea      $3FFF0,a0
      move.w   d5,PITLO
      move.w   PITLO,d6
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      TESTCODE
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      nop
      move.w   PITLO,d7
      sub.w    d6,d7
      bcc.b    .ok2
      neg.w    d7
.ok2:
      sub.w    (a7)+,d7
      bcs      .punt

      moveq    #0,d0
      move.w   d7,d0
      movem.l  (a7)+,d1-a6
      rts






------------------------------------------------------------------------
Nat! (nat@zumdick.rhein-main.de)
Klaus (kkp@gamma.dou.dk)




 1997/03/30 02:27:11