-Snes9x's APU code Sampled-


When trying to figure out how to create an spc emulation loop, I started by looking through the snes9x source code. After examining it I copied what you were doing with any thing related to the APU (with an adjustment to the timers of course). The reason I just copied what you did was that I just wanted to get it working and didn't want to do to many diffrent things with it. So here is what I came up with :
for(APU.Cycles=0;APU.Cycles<20480;APU.Cycles++){
   APU_EXECUTE1();
   IAPU.TimerErrorCounter++;
   if((IAPU.TimerErrorCounter&31)==0){
       DoTimer();
   }
   APURegisters.PC = IAPU.PC - IAPU.RAM;
   S9xAPUPackStatus();
}
S9xMixSamples(buffer,882);
Of course the timer is diffrent and more accurate because I can afford to be and I mix samples every 100 times a second (also because I can do it that way). I made a small version of the code just to test performance using some tools that come with watcom (it samples the code and tells you where most of the time is spent in the code). This code sampled like this, testing 3 spc files (main.cpp is where the emulation loop is) :

chrono.spc (ChronoTrigger)

47.2% Main.cpp
30.6% SPC700.cpp
21.7% soundux.cpp

r-type.spc (R-Type)

46.9% Main.cpp
33.7% SPC700.cpp
18.8% soundux.cpp

ranma.spc (Ranma1/2 hard battle part2)

48.6% Main.cpp
32.3% SPC700.cpp
18.4% soundux.cpp

So, looking at these results shows that alot of time is spent in the main loop. Since that is what was slow I decided to examine this code more to see if there is any way to optimize it, I also decided to examine the emulation core at this time.
After examining it I found that there were 2 things that were being done unnecesarily. The APURegisters.PC = IAPU.PC - IAPU.RAM; and the S9xAPUPackStatus();. Since I also searched the spc emulation core, I couldn't find any where it would be necesary to use APURegisters.PC or APURegisters.P and where it was used it refreshed the values before using it so they were unnecesary. One thing I didn't check was the code that is not in any of the apu code, like the snes cpu emulation cores, ppu emulation and so on, so I don't know if any of those use those 2 variables. I altered the code to not do any of these refreshes :
for(APU.Cycles=0;APU.Cycles<20480;APU.Cycles++){
    APU_EXECUTE1();
    IAPU.TimerErrorCounter++;
    if((IAPU.TimerErrorCounter&31)==0){
        DoTimer();
    }
}
S9xMixSamples(buffer,882);
It sampled like this :

chrono.spc (ChronoTrigger)

31.9% Main.cpp
39.9% SPC700.cpp
27.7% soundux.cpp

r-type.spc (R-Type)

30.9% Main.cpp
44.0% SPC700.cpp
24.6% soundux.cpp

ranma.spc (Ranma1/2 hard battle part2)

32.3% Main.cpp
42.3% SPC700.cpp
24.5% soundux.cpp

After altering the code I tested it with alot of spc files just in case something might go wrong, but it worked fine.

Just to show you what the finished optimized code looks like (but I don't think it would be usable in snes9x) :
for(c=0;c<640;c++){
    for(ic=0;ic<32;ic++){
        APU_EXECUTE1();
    }
    IAPU.TimerErrorCounter+=32;
    DoTimer();
}
S9xMixSamples(buffer,882);
Note that I am not unrolling the inner loop, the reason I didn't do this was because testing shows it is slower, and I think it might be agp stalls, so the problem seems to be more with my pentium not being able to take advantage of an unrolled loop. Here is how it samples :

chrono.spc (ChronoTrigger)

14.9% Main.cpp
49.2% SPC700.cpp
35.1% soundux.cpp

r-type.spc (R-Type)

15.2% Main.cpp
53.2% SPC700.cpp
30.6% soundux.cpp

ranma.spc (Ranma1/2 hard battle part2)

15.9% Main.cpp
53.0% SPC700.cpp
30.0% soundux.cpp

Seems kind of strange that the only code I optimized was the smaller inner loop, but I hope you can benefit in some way from the information I used. I also realize that snes emulation is slowed more by graphics but I haven't tried sampling the entire snes9x source code as I would have to rewrite the code to run under watcom, I will do it some time later when I get the time but for now this is the only information I can give.

CitiZen X