-Snes9x's APU code Sampled-
When trying to figure out how to create an spc emulation
loop, I started by looking through the snes9x source code.
After examining it I copied what you were doing with any
thing related to the APU (with an adjustment to the timers
of course). The reason I just copied what you did was that
I just wanted to get it working and didn't want to do to
many diffrent things with it. So here is what I came up
with :
for(APU.Cycles=0;APU.Cycles<20480;APU.Cycles++){
APU_EXECUTE1();
IAPU.TimerErrorCounter++;
if((IAPU.TimerErrorCounter&31)==0){
DoTimer();
}
APURegisters.PC = IAPU.PC - IAPU.RAM;
S9xAPUPackStatus();
}
S9xMixSamples(buffer,882);
Of course the timer is diffrent and more accurate because
I can afford to be and I mix samples every 100 times a
second (also because I can do it that way). I made a small
version of the code just to test performance using some tools
that come with watcom (it samples the code and tells you where
most of the time is spent in the code). This code sampled like
this, testing 3 spc files (main.cpp is where the emulation loop
is) :
chrono.spc (ChronoTrigger)
r-type.spc (R-Type)
ranma.spc (Ranma1/2 hard battle part2)
So, looking at these results shows that alot of time is spent
in the main loop. Since that is what was slow I decided to examine
this code more to see if there is any way to optimize it, I also
decided to examine the emulation core at this time.
After examining it I found that there were 2 things that were being
done unnecesarily. The APURegisters.PC = IAPU.PC - IAPU.RAM; and
the S9xAPUPackStatus();. Since I also searched the spc emulation
core, I couldn't find any where it would be necesary to
use APURegisters.PC or APURegisters.P and where it was used it
refreshed the values before using it so they were unnecesary. One
thing I didn't check was the code that is not in any of the apu
code, like the snes cpu emulation cores, ppu emulation and so on,
so I don't know if any of those use those 2 variables.
I altered the code to not do any of these refreshes :
for(APU.Cycles=0;APU.Cycles<20480;APU.Cycles++){
APU_EXECUTE1();
IAPU.TimerErrorCounter++;
if((IAPU.TimerErrorCounter&31)==0){
DoTimer();
}
}
S9xMixSamples(buffer,882);
It sampled like this :
chrono.spc (ChronoTrigger)
r-type.spc (R-Type)
ranma.spc (Ranma1/2 hard battle part2)
After altering the code I tested it with alot of spc files just
in case something might go wrong, but it worked fine.
Just to show you what the finished optimized code looks like
(but I don't think it would be usable in snes9x) :
for(c=0;c<640;c++){
for(ic=0;ic<32;ic++){
APU_EXECUTE1();
}
IAPU.TimerErrorCounter+=32;
DoTimer();
}
S9xMixSamples(buffer,882);
Note that I am not unrolling the inner loop, the reason I didn't
do this was because testing shows it is slower, and I think it
might be agp stalls, so the problem seems to be more with my
pentium not being able to take advantage of an unrolled loop.
Here is how it samples :
chrono.spc (ChronoTrigger)
r-type.spc (R-Type)
ranma.spc (Ranma1/2 hard battle part2)
Seems kind of strange that the only code I optimized was the
smaller inner loop, but I hope you can benefit in some way
from the information I used. I also realize that snes emulation
is slowed more by graphics but I haven't tried sampling the
entire snes9x source code as I would have to rewrite the code
to run under watcom, I will do it some time later when I get
the time but for now this is the only information I can give.
CitiZen X