Here is the rest of the Neon64 development history, moved off the main page
because it was getting too big. Note that some of the earliest images are broken,
I've moved this page between about four servers and some things have been lost.
The Wizards & Warriors bugs go away when I shrink the graphics cache by
half, and the games that need a large cache for speed (Battle Tank, TMNT2 and 3,
Xenophobe) all seem to run fine. So here's version 1.2a, with source.
I'm preparing to release a v1.2a, with that crash bug fixed (and including
source), and I noticed that Wizards & Warriors has been slightly broken since
v1.2. I'm not sure exactly what the issue is, but the title screen is messed
up. The game is still totally playable, though.
I took another look at the HI/LO thing because it seemed like the fix I made
was slowing things down a lot (and because I remember fondly the days of
developing for a console where everyone in the world has the same specs...). I
rewrote the fix and now speed is not impacted, yet the bug is still fixed.
If there are future releases they will contain a version of extend.exe that
only extends them to 0x101000 bytes instead of 0x200000 bytes, which is the
minimum needed for the CRC to work reliably. This will allow for faster send
times and more space on a CD made with makeall.bat.
I found the cause of the odd crashing with the FPS meter activated.
While I save all other registers before doing anything that might change them
in the interrupt handler I failed to save the HI and LO registers, which are
used to store the result of multiplication and division operations. Because of
this there is a small chance that the interrupt will occur just between the
instruction that writes LO or HI and the instruction that reads them. A
multiplication done during the interrupt will overwrite this value. I fixed this
by simply saving HI and LO before and restoring them after anything that changes
them, in this case the text routine and the framerate calculation. The calculation itself is done continuously, and this may have been the cause of the random crashes which occasionally plagued the previous version (even with FPS off).
It definately eliminates the bizarre behavior seen when starting Elite with FPS enabled.
I've had v1.2 sitting unchanged for weeks now, I was just hoping I'd find a
solution to my problems, but... I didn't. So here it is, not the huge
improvement 1.1 was but significantly better, as well as the addition of the
PAL Super Mario 64 support.
Found some more bugs, none newly introduced (thankfully), check the TODO
I have fixed a problem with DMC IRQ timing, Ian Bell's Tank Demo now works
as well as it does in any emulator I know of.
I have verified that the new solution works on actual PAL Super Mario 64
After several hours with my v64jr plugged into my Gameshark I figured out
how to load Neon64 with the European version of Super Mario 64. Now someone
better use it (let me know now if you actually have the cartridge to test it
Fixed some bugs in the save system, (the first save you make would not
actually work, the signature was not fully written) and also allowed it to be
backwards-compatible with saves from v1.1
Got Batle Tank up to speed by increasing dramatically the number of cache
Ok, I lied, I am working on the program again. I've rewritten the audio
mixing as per some new information blargg has published, and I've made
substantial increases in the speed of games using VRAM (Starship Hector and
that Wizardry game are now running properly) and overall everything is running
faster due to some code reorganization. Battle Tank is still slow at the intro.
I made sound mixing a little faster and a little more accurate.
I also found several games that are unbearably slow due to graphics compilation:
Battle Tanks, Wizardry: Legacy of Llygamyn (or something), Starship Hector.
I have no plans to fix these, just letting it be known that they're problematic.
Apparently the Z64 does in fact have support for emulators, I said
differently in the readme. Sorry for any confusion this may have caused.
I noticed that the Minigame Pack that Memblers had put together was running slow, and I saw that it runs its code out of WRAM (aka SRAM). I made SRAM cached so
that this would be faster (I have to transfer it to an uncached location when
transferring to the controller pack, and then out of that area when transferring
to the controller) and it brought Zelda also up to speed.
I now use triple buffering for the video, which fixed the flickersomeness in
I added an option to disable the DMC, as it was really irritating me in Final
Fantasy 3 (which is an incredible game, get the translation at The Whirlpool!)
So at long last here's the release... I'm going to have all the source code
available just as soon as I put the files together.
I don't plan to work on this
again any time soon.
Made it much harder for a fatal exception to happen by putting proper
restrictions on reading opcodes from undefined memory (it would use 0 as a base
address). Aladdin and Sonic 3D 6 still crash, but Aladdin keeps playing its
music and both can be restarted via the menu.
I made a set of batch files and utilities (which will be included with the
release) which can be used to make a set of NES ROMs into N64 ROMs for backup
units without proper emulator support (like CD64 and, I think, Z64).
I've out together v1.1, I'm just going to wait until the 25th to
release it. If you want it now I'll send it to you, I'm just now ready for the
full-blown release yet.
There was a click in the square channels in SMB, I fixed it by changing the
time and manner in which the volume is updated, based on some info I got from
FCEU. I also switched to 16-bit sound and made the volume a bit lower to
prevent any sound overflows (which was happening in Kirby) but still maintain
all of the detail of the sound. You'll just have to turn up the volume a bit.
OK, I lied, Bignose Freaks Out isn't perfect, there's an annoying
buzz after the Code Masters logo appears. But gameplay is perfect.
Im currently in the process of compiling a readme for v1.1. Again I beg for the
help of people with backup devices to help me write a section for their devices. I'm aiming right now for a Christmas release date.
Here's the Bignose Freaks Out screenshots:
I'm trying to fix the linear counter to solve that irritating bug where the
tri channel cuts off too soon in the SMB3 castle. blargg recently made a post on
the subject on NESDev and I'm going to
implement his algorithm exactly. Yep, it worked, now it sounds the same as in
NotSoFatso. I also fixed
a dumb bug I introduced yesterday which broke the iNES NES Test, now it works.
So what's left to do? :)
My DMC IRQ support was totally broken. I fixed it and now Bignose Freaks Out
works perfectly! It's something like Sonic crossed with Bignose the Caveman.
I'll have screen shots of this rarely-well-emulated game as
soon as I can get a disk drive back in my laptop.
Fixed a plethora of sound errors today through proper emulation of the sound
status register. This fixed NES Test, Star Wars, and the DMC of Retrocoders and
Solar Wars (both using NT2, I
I also fixed that evil clicking problem in the triangle wave by just stopping
oscillation when the wave is disabled, not setting the output to zero (while
that may work for square waves or the noise channel which are either on or off,
the tri wave has different output levels and suddenly switching it on or off
will produce a click.)
There was also a slight bug introduced into the SMB underworld music, this was
fixed (a problem with the linear counter.)
Implemented proper screen limits for PAL and NTSC (NTSC doesn't show top and
bottom 8 lines but PAL does).
Loading message now shows for entire duration of loading.
I tried to implement mapper 66 (Mega Man, SMB Duck Hunt) but it seemed to fail
totally at making either game work, so I went back to making Mega Man just use
mapper 2, which it works perfectly with.
I added partial mapper 71 support (Big Nose the Caveman). Solstice is also
supposed to use this mapper, but it works better with mapper 7 so I have it use
that. Big Nose Freaks Out works until the game itself starts, then it freezes.
it does some pretty weird things with scrolling so I think I'll just leave it be
. I don't know of any other mapper 71 games that are working.
I had a bug in the MMC3 SRAM enable/disable loops, they were overwriting
registers in use by the CPU emulator. Now Startropics works perfectly, the
intro music plays, there is no longer a graphics problem upon entering the
"test of island courage", and the jumping sound always works right. I fixed a
similar issue in the Arkanoid paddle handler, but it didn't seem to affect
Tested 3D Block (3D Tetris) and Startropics 2 for the first time in a while,
both work. Rad Racer no longer looks as good as it used to.
Tada! With a new timing innovation (not just one copied from FCEU) I've
solved all the bouncing problems in the several Rare games that had them
(Wizards & Warriors 3, Battletoads, Battletoads Double Dragon). This also fixed
the periodic graphics glitching in Elite, so now it is practically perfect (the
only remaining flaw being the fault of the author (it runs in PAL mode, not
NTSC), but fortunately Neon64 is able to handle that.
with PAL mode active
For the longest time I thought that was my fault... then I finally got
around to testing it in FCE Ultra. Same results.
I found that the Star Wars problem was an MMC3 IRQ timing issue and not
something fundamentally wrong with scrolling. I fiddled the timing a bit to fix
that problem, but it caused some other very minor problems. I don't know if it
is worth striving for total accuracy in this area, as the errors are very small
and in no way affect gameplay.
By disallowing writes to CHRROM I fixed a slight glitch on the Star Wars
title screen (not the star destroyer problem), and the RTC demo now fails its
emulator test and works properly. The Dropoff 7 demo now also displays its title
I also removed an erroneous optimization I had made to the CHRRAM pattern table
compiler, I was only compiling when the second bitplane had been written now I
compile upon the writing of either, this fixes the reversed colors in the
Wizards and Warriors 3 intro and made Princess Tomato in Salad Kingdom playable
(you can see text now!)
I made many changes to the order of things in the emulation loop to more
accurately reflect FCEU's EmLoop, and lo and behold Bomberman works with nary a
glitch. Capcom OST also switches smoothly. I made some more changes and got both
Double Dragon and W&W2 working with no status text bouncing.
I fixed a bug I had introduced in MMC1 when I fixed Dragon Warrior 3 and 4,
Monster Party now works again. I had to adjust the number of cycles before NMI
significantly before I could get both Big Foot and Dragon Warrior 3 working
properly in the same version. Unfortunately the optimum arrangement involves a
small amount of status text bouncing in Battletoads.
I now have only minor graphical or sound errors in every licensed USA game I
know of (with a mapper I support).
I've compressed the main program with RNC Pro-Pack to get it down to 22K from 80K. It was one
of the easiest things I've ever done...
I changed the ROM detection system to be able to find the ROM at the very end of
the code and not just at 2MB, this is for an experimental method to allow
creation of a ROM CD with the CD64. By the way, if anyone has a CD64 and would
like to test it let me know!
Made slight change so that the FPS meter wouldn't interfere with the
Wrote all of the December log entries online. A lot of work, no?
Several games that did nothing but crash before (Solstice and Big Foot) are
now working. I attribute this to the fact that i just removed a stupid
irritating debugging tool I've had sitting around for months, which checks for
stack oveflow constantly. This had been left in because for some reason removing
it slowed emulation down considerably. Removing it now has not changed speed at
all that I can tell, but its certainly saved the processor some work and the
program some space. I know this directly fixed flickering in
Marble Madness, which works quite well. It also fixed a slight flickering in
the "door opening"/"door closing" effect when entering/leaving a town in FF.
I fixed a graphics corruption issue with the backup unit version's reset
function (most noticeable in SMB3).
I now clip the top and bottom 8 lines of the screen (all the work is actually
done by the RDP, I just changed the "scissor" value), as I think the NES is
supposed to do. Now games with one-screen vertical scrolling (like Final Fantasy) no longer have a line of garbage at the bottom.
Double Dragon problem was due to my handling of reads from valueless
registers, I fixed that and it now works.
Bomberman shows a blank screen after the stage number, though the music still
I figured out an odd slowdown that was happening after loading a new ROM in the
GS version, by doing so I found the ultimate source of the cache problems that
had plagued me for so long and fixed them (it involved initialization, I would
set the screen buffer to 0 and then draw there, but this would overwrite code!)
Implemented a full save system, with three saves per controller pack and
name entry so you know what saves there are (and which to delete when out of
Fixed a slight timing issue, instructions that do stores across page
boundaries do not have the extra cycle penalty, only read-only instructions.
By waiting 4 cycles instead of 35 between setting the vblank flag and triggering
the NMI I now have Deadly Towers working properly. The change in values was
between two version of the FCEU sorce I had looked at.
Added a PAL mode which simply extends the vblank period by 50 lines, doesn't
change the audio generation rate though. Elite and Asterix work nearly perfectly
in this mode.
For some reason Double Dragons does not get past the start screen.
Added support for mapper 34, but only the Deadly Towers portion (mapper 34
is two mappers combined under one number, one mapper is far games with VRAM like
Deadly Towers, the other is for games with CHRROM like Impossible Mission 2. I
am not supporting the Impossible Mission 2 part (will generate an unsupported
mapper error if there is CHRROM present). This also means that I now check all
8 bits of the mapper number instead of just the low 4. I had to add a hack to
get Mega Man working with this, for some reason it is listed as mapper 66 but
runs fine with mapper 2, though perhaps 66 is subtlely different as SMB+Duck
Hunt is mapper 66 and doesn't work right.
Deadly Towers only works until you go into the first room, then the controller
stops working. It looks like the NMI is called while still in the NMI handler.
Added menu system.
I added a function to restart the GS version so that a new ROM can be loaded
without having to restart the N64 and patch the emulator in again, but there
were some issues with cache and nonsense. I seem to have gotten it to work.
Changed some UI aspects, made a neat new ASCII logo (though it could use
I'm going to stop continuously working on Neon64 after the next release (1.1 is coming up soon, a whole lot better than 1.0 and 1.05), and I was wondering if there are any games people would like to see emulated before that happens. Just send me a list and I'll tell you:
if they'll work in the current dev version
if they might work before the release
if I'll never support them (within the forseeable future)
It would probably be a good idea to test the games on the current version before
Bugs fixed today:
The issue of Dragon Warrior 3 and 4 not working was solved, I had not realized
that the 1st/2nd 256K PRGROM switch applied to the hardwired bank as well.
This also enabled FF1+2 to run, however when I tried to play FF1 it crashed when
I attempted to strike up a conversation with anyone. This is only with the FF1+2
combined cart, FF1 by itself works fine. I didn't notice any problems in the
I removed the irritating sound from Bubble Bobble by extending the "if
wavelength < 8 disable channel" thing to the triangle channel. Now isn't that
I still need to remember to go back in and optimize (ever so slightly) the sp0
hit detection, I don't need to cycle through each pixel of the sprite now that
I'm just checking for any active pixels at all, I can just or the bitplanes
together and check if it is nonzero.
Additionally I just recently found
Wizards & Warriors 2 (it is called Ironsword) and noted that the following
happens when you enter a store without any money:
I found that the Battletoads hack caused Zelda 2's status text to glitch. I then
found a much better hack which not only improves Battletoads but causes no known
problems in other games: instead of checking for an intersection between sprite
0 and the background I just check for the first non-transparent sprite pixel
(thinking about it I could do this a whole lot more efficiently than I do now).
The Battletoads background is now properly aligned with the sprites.
For whatever reason the break flag is supposed to be set by PHP. I implemented
this and it fixed the R/W SR and BRK flags tests in NEStress.
For some reason Wizards & Warriors reads from $4006. I return 0 for unknown
I combined the Arkanoid paddle data into the second player controller so that
both two player games and Arkanoid work without modification.
I fixed two problems with sound. SMB2 sets the sweep shift amount to 0 to
stop the sweep of square channel 1, but I did not detect this. The other fix was
with the sweep end conditions, where the channel would be disabled with a
wavelength over $7ff or under 8. I didn't have this properly disabling the
channel. The two problems this fixed were in Dr. Mario and SMB3, and both
situations involved a thud sound.
After fixing these problems the program ran noticeably slower, but not because
of an increased workload. The ordering of the various included files greatly
affects the speed of the program, due, I expect, to cacheing issues. I managed
to get everything to a very nice speed by putting all includes except the
read and write handlers at the very start of the code, and I left the read and
write handlers at the end (just before the logo data).
I also noticed a new sound problem: Startropics doesn't have any sound at all
(except for an occasional bizarre glitchy squeal) until you talk to the island
chief. The sound cuts off again just when you enter the "Test of Island Courage", but quickly returns. When the music is active it sounds perfect... I tried
reenabling frame IRQs, but that was not helpful. The sound registers are all
zeroed until sound is activated.
I also found a problem which may go a long way towards fixing some other issues.
The Dropoff 7 demo is essentially just a DMC demo, but it is supposed to have a
title screen. The title screen does not appear in Neon64, though the sound works
"perfectly". Since the graphics setup is likely very simple it should be easy
to disassemble this code and see how it works, and from there I can find my
On a positive note, the To-Do list fits on a single screen for the first time
Ys and RC Pro-Am were fixed by recognizing that there are two one-screen
mirroring modes in the MMC1. This tidbit of information was found in FCEU.
Implemented support for greyscale mode, the only games I know of use it only
for effects, like flashing the screen (FF3 does this when you enter a battle,
SMB3 also does it when you have beaten a castle.)
I also discovered that the frame IRQ causes Wizards & Warriors 3 to crash, so I
left the IRQ itself out, though I left in the rest of the logic for it. I see no
reason to leave
it in, it doesn't help anything.
Began to implement support for the Zapper, but a complete lack of success in
actually hitting anything (the trigger worked fine) made me pause. I did
successfully implement the Arkanoid paddle, though it seems to me to be a great
deal harder to use. Its nice to have some true "analog" control in there,
though. In order to allow the user to select this feature I intend to implement
a menu for configuration.
I got the brick breaking sound in Super Mairo Brothers working by putting
the frame counter for sound channel length right at the beginning of vblank
instead of after it. This was done by accident trying to fix another problem.
This also made Samus' footfalls in Metroid sound right.
I also added support for sound frame IRQs, but no game yet seems improved by
them (I have it to the point where it doesn't break anything, either).
FluBBa's NEStress enabled me to find a problem with the TXS instruction, it
should not affect the flags at all. This doesn't seem to have fixed anything at
all, but its nice to know that I'm just a little more accurate than I was
before. It also pointed out some other problems which I'll look into.
I also implemented color deemphasis (the three high bits of $2001). The
programs I know the behavior of seem to work properly with this, Chris Covell's
Wall Demo, Super Spy Hunter (when paused), and Final Fantasy (upon entering a battle).
I also doubled the size of the pattern cache. Teenage Mutant Ninja Turtles 2 and
3, as well as Xenophobe, run at a good speed as a result.
Reverted to an old version of sprite 0 hit detection, as the new
experimental one gave me nothing but headaches. Solar Wars scrolls properly
anyway. The following screenshots are from working games, new and old (but only
Fixed by proper formatting of $4016.
This used to crash
at various points but is now at least playable in the first level (a slight
graphical glitch in the intro is the only problem I've seen).
incredible little unlicensed game...
FF2: This used to have scroll problems, but not anymore. The translation intro is
messed up, but I'm told it doesn't work on the hardware anyway. And a little note from Wizards & Warriors 3
I think I've located the thread that binds Battletoads and Elite in
glitchiness, both have the screen turned off at the beginning of the drawing of
a screen. Also, it seems that Elite reads from VRAM 8 times for no apparent
reason, yet this is vitally important to getting the game working right.
Many thanks to Jsr from the NESDev forum for pointing out my problem with controller access, I had to OR all of the
values read from $4016 with $40. Now Mad Max, Dirty Harry, Paperboy, and
I should test Super Mario Brothers Adventure (SMB3 hack) and Flubba's
Improved timing (by adding the various +1 cycle when a page boundary is
crossed and added a delay between vblank flag set and NMI) to the point that the
status text in Battletoads no longer bounces as the character moves and the line
that appeared in the Vulture (in the scene just before the game starts) has been
eliminated. I still
need the sprite 0 hit hack to make BT work, but now I know why:
When the game starts the one-screen mirroring is set up wrong, and the screen is
set to a blank screen until an sp0 hit is detected. Of course with a blank screen there is nothing for sp0 to hit, so nothing happens. The only reason I was getting any results at all is that on skipped frames the faster sp0 hit detection
will always find a hit, because it is set in the first detection of sp0
regardless of background. I don't yet know how to get this to toggle right.
Reverted to the old SPRDMA method (copying the data to a memory location
set aside for SPRRAM) and the Castlevania and Battle of Olympus flickering
problems were corrected.
Fixed some timing issues, sound is now calculated at the proper 262 lines
per screen (didn't work before because I had left the sound calculation out of
one line) and an h-retrace is 1 cycle longer (due to two rounding operations this
cycle was left off both here and the scanline, battletoads and w&w were not happy).
Made sp0 hit on frames with gfx disabled (for speed) more accurate by recording
what line the hit was triggered on on the prvious line.
By puttin v=t (see loopy's docs) after the junk scanline for loading line 0
sprites I was able to fix W&W without any hacks, as well as get Battletoads to
work almost perfectly (except that sp0 hit thing), all with the 512 cycle SPRDMA latency so that Castlevania works. Yay! The only remaining issue with 'vania
is the "C" on the title screen flickers. There was a similar problem when I
first implemented the new SPR DMA method...
I had fiddled with sp0 timing quite a bit, and the Solar Wars title scroll is
now off by a line or two. Drat. And I must find out why sp0 doesn't work for
Battletoads, I'd rather not have any stupid hacks in there.
Wizards and Warriors 3 works perfectly, and Castlevania 2 now runs at a good
speed (before it was unplayably slow).
And now, just so you know this isn't one big lie, here are some screen shots.
Wizards & Warriors
Wizards & Warriors 3, the great abuser of graphics
And I just recently discovered an Easter Egg in Kirby's Adventure, so I loaded it in
Neon64 to see if it worked:
I currently do not know the cause of the black block on the top, but there was
a similar issue in Mike Tyson's Punch-Out.
Also, the dialogs in Final
Fantasy now cut off at the right point! Yay! I'll have to test Battle of
I changed MMC1 implementation a bit, to more correctly fit Matt Richey's
explanation of a reset, but it has had no noticeable effect.
Zelda intro scroll and Wizards and Warriors worked when I cleared the 8 sprite
and sp0 flag when the vblank flag is set instead of waiting until the end of
There was quite a nasty bug in the DMC IRQ, which would have messed with frame
toggling and mmc1 reg0, but fixing it doesn't help anything. Oh well. Bomberman 2 now works, but I'm not sure if I can attribute this to this fix. Probably not.
The games which were previously quite slow seem to be running at a good speed,
such as SMB3, Journey to Silius, and Zelda, I'm not really sure what to attribute this to.
I messed with SPRDMA timing a bit, and to that end I made the 6502 emualtor
check if the cycle counter is > 0 before it executes a single instruction, but
again this messed things up a bit so I left the 512 cycle delay out.
I played around with the name table bits int he VRAM address register, but in
the end I decided to leave it alone and stick with a strict interpretation of
I got Battletoads working decently by triggering a sp0 hit at the end of a
sprite, whether it has hit the bg or not. This is not perfect, and is in fact a
dirty hack, but I don't yet know what aspect of sp0 hit I am emulating
I've noticed that Castlevania no longer works right, but will work very well
with an SPRDMA latency of 512, but this destroys Wizards & Warriors and messes
up Battletoads a lot. I'm just going to leave it for now and be happy with
W&W and BT working.
I've been running through the disassembly of Elite to try and find my
problem with it, and in the progress I found an RLE routine. Pretty neat stuff.
A series of dirty hacks got Punch-Out!! working as well as it can. The thing
is that it would require greater precision than the Neon64 engine currently can
provide in order to work perfectly, so the only way to even approximate is via
hacks. Most what I did was disregard sprites and VRAM writes/reads, check the
first 16 (why? because it works.) tiles in a line for something to change the
MMC2 latch, and set the latch to $fd at the beginning of each frame. And here
. . . are my results!
The glitches on the VS screen and on the intro to the Mike Tyson version are the
only ones I know of, they would require mid-scanline bank-switching to work
corretly and that ain't gonna happen.
I also fiddled with Battletoads a bit, but with no success.
I've a nice Zelda 2 screenshot I never bothered to upload, demonstrating that it
does indeed work:
Also, I've noticed some bizarre status bar problems with Ys and Princess Tomato
in Salad Kingdom. Xenophobe has been tested and is really slow.
Began to write support for MMC2. The Punch-Out! demo mode works, but the
actual in-game graphics are messed up (the status bar and crowd).
Notes: make sure sprite checker increments pointer to SPRRAM. Also, SPRRAM DMA
maybe take 512 CPU cycles, not 256.
Added a bit of a hack to MMC1 that allows Die Hard to work correctly. It
sets 8k CHRROM mode in reg0, but it still uses both reg1 and reg2 as if it were
in 4k mode. I check if any write is made to reg2, and if so I override the
reg0 setting and treat it as 4k.There is no reason why this should be happening,
but sometimes you must just go with what works.
It should be noted that even FCE Ultra doesn't do MMC3 IRQs right.
I've been working for a few days on getting MMC3 IRQ timing to work right.
It is impossible for it to be perfect, because Neon64 is only scanline-accurate
at best. It is now good enough, however, that Kirby and Super Mario Brothers 3
work almost perfectly (SMB3 has a few very minor glitches on screens where more
than one IRQ is done per screen.) As a side effect Earthbound's battle screens
now work perfectly, though there are occasional point in the game with minor
name table corruption, but it is highly playable. For unknown reasons Solar
Wars' music is now up to speed. Several games also no longer work: Star Wars,
Sonic 3D, Somario (all MMC3)
Added mapper 11 support, now you can play all the fun Bible and pest
control- themed games you grew up with. But seriously, P'radikus Conflict
actually seems like a decent game.
Tried to work out the problems with Final Fantasy 1&2 (a combo cart). A lot
of my possible solutions involved initial values for the for the MMC1 registers,
but I only succeeded in confusing myself.
Fiddled with MMC3 IRQ timing a bit, seems to have fixed the little
flickering issue above the SMB3 status text. There does seem to be some "off by
one" issues, such as on the world select screen and in the room in the world 1
castle where the ceiling moves.
Made screen not swap when in debug mode, so that status text doesn't
end up on the other screen. Also made bad opcode trigger debug mode
so it would also enjoy this benefit.
SMB3 still a bit glitchy just above the status text...
Found problem with Zelda and Solar Wars scroll: since I now have
the spr DMA go directly through the RSP I never write the values to the
CPU's copy of SPRRAM. I fixed this by having the DMA routine copy the
sp0 data (first 4 bytes of SPRRAM) into the CPU's SPRRAM, where it can
be accessed by the sp0 hit syncer. Mach Rider works very well again.
Made sprite 0 detection a bit more appropriate, easy with the 8bpp
background texture where 0 is unconditionally transparent.
Zelda 2 status text now works right.
Implemented a system where sprite DMA is done directly by the RSP
instead of intermediately by the CPU first, had to write back the
values from cache first.
Sprite 0 flag is now set at (approximately) the correct point on the
line. Zelda status text cuts off a bit late, Mach Rider is horrible.
Solar Wars title is slow.
VRAM status is now only updated when it is changed, so I don't have to
be continuously accessing the RSP.
Implemented proper RMW operation (though not for zero page
instructions as they would never access any memory mapped I/O).
Also implemented MMC1 "too fast" exception.
With these modifications (both based on comments by Xodnizel) Bill &
Ted now works.
Figured out how to make the MMC1 CHRROM switch work right for
almost everything. I removed any special treatment for 8k except the
following: upon a write to reg1 (while in 8k mode) write that value +1
to reg2. This works in Zelda 2, Big Foot, and Die Hard (until you
change floors, but then activating the start menu clears up the
problem, which I still haven't identified. There is also occasional
name table glitching...)
Solved "line doubling" in both SMB and Zelda by putting both a
PipeSync and TileSync after each rectangle.
I made TLUT (palette) load only when the palette actually changes.
Interestingly, when I disabled changing it the palette remained in TMEM
even after N64 power had been turned off.
I moved the MMC3 IRQ to before hblank, this seems to fix the jumpiness
in Kirby (start screen) and Crystalis (message boxes), but I suspect it
may be triggered a line too late (early?) Plus I don't have it
properly react to the effect of changing the VRAM addr. Fiddling the
timing by a line seems to mess it up in certain situations, so I'll
just leave it be for now. I don't have any mechanism for triggering it
during a scanline anyway.
Removed PC "optimization", W&W works.
Very, very rarely I see a line out of place. The flaw would likely be
invisible to the untrained eye. Both times I noticed it on the W&W
title screen, where the knight is facing off two bad guys.
The status text in W&W bounces a lot. I'm not sure if this is new.
Maybe if the RSP DMAs the sprite data directly from the address
specified by the DMA command we can save costly CPU overhead in the
matter? Writes to SPRRAM via $2004 (which I'm told no game uses) could
be made directly to the RAM in the RSP. Something to consider...
Because it seems like sprite DMA efficiency is very important in some
of the slower games (Sprite demo).
I think the slight MMC3 rearrangement has caused an unwanted line in
SMB3 (world select screen)
In addition to the music being slow, Solar Wars also occasionally
freezes after the planet select screen (waiting for music to end?)
Big Foot runs super slow, as well as having the zelda 2/die hard
glitches. I've heard it has multi-split-screen scrolling, maybe it is
too big for the cache?
I moved the VRAM_V and VRAM_X registers to RAM, the RSP now DMAs them
when it want to update them (along with the pages), instead of having
the CPU write the values directly to DMEM. This was intended to help
with the glitchy scrolling. It didn't work.
SMB sp0 is not flickering, but rather bouncing. Zelda status text does
the same thing. Happens no matter how much DMA protection and what
parts of sprite rendering I take out. Doesn't happen when paused (with
Journey to Silius had major scroll issues, it was a problem with the
new VRAM_V DMA. This was fixed by caching VRAM_V, VRAM_X, and VRAM_T.
I also had to perform a cache op (#25) each time I use VRAM_V or VRAM_X
Put gfxless PPU in a seperate loop, didn't help with the occasional
crash on start but may have made it a little faster.
Wizards & Warriors is broken, probably because of PC optimization.
I never knew that Journey to Silius and Zelda ran in 8x16 sprite mode.
This is interesting because both have speed problems, this may be
because a game in 8x16 sprite mode can have twice as many sprite pixels
on the screen at one time, and more pixels means less speed. To help
out with this a little I made 8x16 sprite switching a tad more
efficient (no noticable speed increase).
I've found out some information as to why Bill & Ted doesn't work,
apparently instructions which read from memory, modify the value, and
then write the result, first write back the original value. This,
combined with the fact that when writes are made quickly only the first
one is detected by the MMC1, should enable Neon64 to run Bill & Ted.
Also, only the address to which the last (fifth) bit is written
determines the MMC1 register which the value is written to.
This information, by the way, came mostly from a thread on Memblers'
NESemDev forum about FCEU, with some comments by Xodnizel himself.
I also got an updated version of Brad Taylor's PPU doc, hopefully to
help me with th scroll bug and some other issues.
I also found a description of how monochrome mode and the various color
emphasis bits work from Chris Covell.
An idea that I had simmering for a while came to life today, I figured
out how to pipeline sprite pattern DMAs like I did with BG patterns.
This did make things a bit faster. There is some serious flickering.
I figured out why the fadeout wouldn't work sometimes, it was
simply that the double buffering was switching the screen, and so half
of the time the fadeout would be on the other screen.
Zelda crashed, seemingly at random because I haven't been able to
reproduce it. Crystalis had done the same thing...
I think that the CPU optimization implemented yesterday provides such
a small benefit yet still makes emulation less accurate... it should
by all rights be removed.
Yet the difficulty in such an operation is not to be underestimated.
I could replace codes.asm, 6502macs.inc, 6502defs.inc, and a6502.asm
with older versions... but I'd still have to make additional changes
to neon64.asm, sound.asm, and probably others.
Added a slight CPU optimization where the PC page isn't recalculated on
every read from PRGROM (or RAM, if there is self-modifying code). Unfortunately
this does not seem to have had much of an improvement in speed, and this
decreases compatibility measureably... oh well, it will stay for now, its too
hard to undo.
I also fixed a problem with CHRROM corruption involving that blasted cache
issue again, the ROM is now once more uncached (I think I changed this because
of GS). This fixes at least Mega Man 4, Mega Man 6, and Final Fantasy 3.
I also invalidate any cache overlapping the ROM, which fixes some
crashes, most notably Crystalis.
By removing some dmarealwaits in the PPU I was able to speed it up slightly,
with no ill effects. This was based on the assumption that the RDP runs slower
than the RDP's DMA, so I don't have to wait until a texture RSP DMA (out of DMEM
to DRAM) is complete to tell the RDP to draw something with that texture, as the
RDP probably isn't done drawing the previous primitive anyway. I did have to fix
some problems with the bg renderer due to dmawaits being inside DMA requests
instead of before them, as was true everywhere else in the PPU.
I fixed the Solar Wars title flickering problem by only updating VRAM_X at the
end of a scanline. Graphics are only scanline-accurate in Neon64 anyway
(which mean it'll never be able to run certain games), so this doesn't change
anything (other than fixing that problem, but I'm not sure why that wasn't a
problem in the old version) I listened to the Solar Wars intro the first time
while testing this, and it seemed abnormally slow, not that the tones were
downshifted, just the tempo was off.
There are issues with the noise channel in NES Test (of course I've issues with
the noise channel everywhere anyway.)
I fixed a problem or two with my new alpha=1 graphics, several places
where code cleared the screen it filled it with zeros and this produced small
I stabilized the sprites (Kirby sprites jumped around a lot) by inserting a
tile sync at the end of the display list. I also implemented a slightly better
sprite 0 hit detection (when I reivsed the PPU for the RDP I made it just do the
hit on the first line of the sp0), which sets the flag on the first line of the
sprite with any set pixels in it. I don't know of any game that this isn't
currently working for, though it is a bit of a cheat.
Solar Wars seems to drop lines out of its scrolling flame effect at random.
There are also some problems with saving and reseting (part of the same system),
problems of the "it doesn't work" variety.
Also, Mega Man 6 has developed the same pattern corruption issue that Zelda 2
and Final Fantasy 3 already had.
It also seems that there are some instances where sprites seem not to flicker
properly, like in Castlevania (lines drop out in an odd way).
I got the background rendering working a little faster with the RDP, but
still only around 20 FPS (by a stretch of the imagination). I came up with a
better idea anyway, which let me draw he BG with the RDP yet still operate at
the speed of the RSP "compiled tile" versoion. The idea is that I have the
old background renderer write an 8bpp color indexed texture instead of an actual
16bpp truecolor line. I then have the RDP draw this line to the screen, using
the NES palette as a TLUT (texture lookup table). This is the same way I handle
sprites, so now all I do with the RDP is draw four rectangles (one to fill with
the background color, one for background sprites, one for the
background/playfield, and one for the foreground sprites, drawn in that order)
and all transparency is handled for me (except I have to have the sprites erase
each other...) I had to implement double
buffering for the video, because there was quite a bit of flicker caused by the
sequentiality of drawing, I also synced the screen updates to the vertical
retrace so it looks considerably smoother now.
The sprite priority problems in SMB3 and Castlevania were fixed by the new RDP
useage, as that was the main point of doing it anyway. I have the suspicion
that some games run a little slower (like Journey to Silius) but I haven't done
a side-by-side comparison yet, and they actually seem to run smoother anyway
from the double buffering. I should also note that when I press reset in
Journey to Silius the fade to red routine which I have inside the reset
interrupt handler is not run. I have not noticed this in any other game, perhaps
it is a clue as to the game's extraordinary slowness?
Expect an update to both the backup unit and gameshark versions soon, after I
implement a few more optimizations I'd like to try.
Back on the subject of resets, I've found that a reset can be delayed by
constantly issuing SI DMA requests, i.e. if you press the reset button but the
N64 program is constantly issuing DMA requests the N64 will not reset, until,
that is, the DMAs stop. Another issue is the freeze at boot. For some reason
every N64 program must write a certain value to a certain place in PIFRAM (IIRC)
or the N64 will freeze. It turns out that constantly issuing SI DMAs will also
prevent this freeze from occuring. I have not yet done any quantitative
experiments to find out exactly how frequent the DMAs must be to prevent a reset
or freeze. Also, since the freeze on boot can be prevented either by writing to
PIFRAM or by issuing DMAs, I thought that maybe reset might be stopped in the
same way. It didn't work, though, when I tried it. When I had my reset handler
write the value it didn't reset, but merely froze. Maybe the interrupt line was
Background rendering with the RDP is also complete, but everything is very,
very slow at the moment. It comes down to an issue of waiting for certain
things at the right time and no other.
I've been able to convert the sprite rendering portion of Neon64 to use the
RDP, next comes the background.
I also merged the two versions of Neon64 into a single pile of code so they can
both be updated at once.
I'm noticing a bit of a pattern in the log, I say something is wrong and
then the next day I contradict myself.
It turns out that RDP control from the RSP is almost exactly the same as from
the CPU, in fact the way I was doing it was correct, there was just so much else
wrong with my test program that it wouldn't work.
I did learn that it is apparently impossible for the RDP to read textures or
palettes from RSP DMEM (or IMEM, I tried), even with the fully qualified
addres (0xa4000000). This means I'll have to use another SP DMA to get my
rendered sprites out of DMEM and into RAM where the RDP can get to it.
Controlling the RDP from the RSP is not like using it from the CPU.
No success yet.
I would like to apologize to the RDP, there's nothing odd about the palette
at all. I was just getting odd effects because it appears that you can't make a
texture less than 16 texels wide, and I was trying to do it with 8. By padding
my textures eveything works swimmingly. I haven't done it throught the RSP yet,
that's the next step.
Not only have I figured out textured rectangles, but I've got palettized
textures working, too. The only problem is that they're really weird and I don't
fully understand them yet.
I also have a fairly solid system for sprite/bg priority set up, which should
fix the SMB3 and Castlevania sprite priority glitches. It turns out that things
are a tad more complex than they appear.
I also made several modifications to U64ASM:
support for parenthesis and commas in macro parameters
trapping divide by zero errors
fixed a problem with multiline macros with 10 or more parameters
I finally got around to looking through the RDP source Destop sent me, I've
been happily drawing variously colored (even striped) rectangles for a while
now. The next step is textured rectangles, where I see the greatest promise for
Neon64 acceleration. I also figured out a way to use the RDP for drawing
without sacrificing the scanline-at-a-time methodology I've been using: I need
only use the Set Scissor command to limit the RDP to drawing a single line.
I also fixed a bug in the controller strobe code, but it didn't fix
I think I may have found the source of the terrible slowdown in the
Gameshark version. I had enabled an interrupt accidentally, and I have no idea
what it is for, but apparently it was eating up a whole lot of cycles running
through my excpetion handler.
I implemented a frame rate monitor, accessible with the L button. It tells me
that most games run around 40 to 50 FPS.
I doubled the horizontal screen resolution at the intro/loading screen, which
allowed me to fit an old extended ASCII art logo there.
Of course on a TV the aspect ratio works
right, it almost looks like normal text mode.
Fixed an error in the GS version which was introduced yesterday, SMB3
sprites now works right.
It should be pointed out that the GS version is abysmally slow when it comes to
sprites, in Journey to Silius with many sprites on screen I have seen the
framerate drop to about 2 FPS. The frameskip is just about perfect, though, I
have yet to hear the music skip.
You might ask why the Gameshark version of Neon64 has just been released while
the normal version remains unupdated. The reason is that my v64jr has been very
unreliable as of late and I've rarely gotten more than a few seconds of use out
of it before it fails. The next big release, which will include the sprite
speedup (which is NOT in either version right now) will update both versions and
will be called v1.1. By then I may have worked out a general purpose loader for
the GS and seperate versions may not be needed, who knows.
To clear up a bit of confusion, the GS version is indeed different from the
normal version. It has better frameskipping, bugs in the DMC (a sound channel)
have been worked out, and mapper 7 support (Wizards & Warriors) has been
improved. The GS version does not "enable use of the GameShark", it is an
entirely seperate version which can be used with the GameShark. The two
versions are not interchangable.
Improved frameskip so that the music is actually kept working. Simplified
the sending procedure for Gameshark (down from 1,000 steps to 999!). Got
Wizards & Warriors working.
FF3 has significant sprite corruption in the GS version.
I have made modifications which allow me to run ROMs of larger size over the
GS, such as Kirby or Metroid. Neon64 runs noticeably slower over the GS...
Some of the slowdown (most noticable on SMB, which slowed to a crawl) was
because I hadn't written back the NES ROM to RAM, it was all in cache. This
also caused some sprite corruption. Neon64 is still a bit slower on the
Gameshark, though, mostly when a lot of sprites are on screen the music will
break up under the strain. Interestingly enough, I just happen to be working on
a sprite speedup now.
I've successfully run Neon64 using only a GameShark! Go out and buy one if you
know what's good for you.
Once I figured out how to send code to the N64's RAM, all I had to do was find
some space in RAM to store the emulator and NES ROM and make some small changes
to the Neon64 initialization. I believe this is the first time the Gameshark
has ever been used in this way...
I'm still in the debugging process, but I have played Super Mario Brothers for
quite a while. Awesome!
I am indeed working on the sprite speedup, but since I don't have my
equipment with me I've been forcedd to test in an emulator, the retardation
of which is well known. I do think that there remain no massive technical
hurdles to o'erleap.
Note to self: have seperate function for bg and fg sprites, just execute
I have an idea of how I can use compiled tiles for sprites, which will
help speed quite a bit, as sprites are currently the least effecient part of
the PPU. The problem is that this will require a complete redesign of several
components, and I don't really feel up to it now.
I've hit upon an interesting new idea, which may allow anyone
with a Gameshark to play Neon64. See
my post at
Dextrose for details.
Fixed DMC support, Kirby's Adventure doesn't crackle all the time now.
Whoops, big error in CRC calculation, fixed now.
Added a proper saving system. Still only one save, but it CRCs the game
which makes the save and asks you if you want to overwrite data from another
Added Delta Modulation Channel (aka DMC or DPCM).
I can draw black rectangles with the RDP.
Applied my solution to the noise channel, while not perfect it makes
Final Fantasy sound worlds better.
There are still issues, the sound of breaking blocks in SMB and Samus' footfalls
aren't right. I stopped the linear counter from automatically switching to
load mode on terminal count, this fixed several situations where the triangle
wave was playing too long (Castlevania pause, SMB underworld).
Improved speed throttling to eliminate the effect I used to get of half
the screen being rendered twice as often as the other half.
Made several changes that resulted in stopping the buzzing in Mega Man 2 before
the intro music starts.
Found the cause of a Zelda problem (on name entry screen), came up with a
good solution which might also be applicable to the noise channel. The issue
involved audio resolution.
Made essentially an infinite number of changes and additions.
Everything up through the noise channel is working pretty well. And by pretty
well I mean it kicks ass. I love game music.
I've had a lot of initial success with sound, but it remains initial
success. Songs and sounds are recognizable but so much is off so often...
It's going to take a lot more work, my overall audio generation engine is crap.
CPYABS was set up to use rRAM_A6502, which can only (correctly) access the
zero page. Thus essentially every use of CPYABS was broken, because any time
you might want to use the zero page you'd use CPYIMM for speed and size
savings. I found the same problem in CPXABS. This warrants a closer scrutiny of
the CPU for this kind of error.
Super Mario Brothers 3 is now continuously playable!
Dragon Warrior is now playable!
Wizardry is now playable!
You have done well in defeating the Bug. Thy Experience increases by 1.
Interestingly, when I removed my debugging code Neon64 actually ran slower...
so I put it back in.
By removing the MMC3 IRQ count reload from latch (on disable) I was able to make
the moving-ceiling room of the world 1 castle (MMC3) have proper status text.
I implemented mapper #7 (AOROM), which is used on many Rare games, but when I
tried it with Wizards & Warriors it was pretty much unplayable.
I've been trying to implement sound scheduling and so on, but I've been running
into some trouble. Quite a bit actually.
I fixed the jumpiness in Total Recall (et al).
It seems that when IRQs are disabled SMB3 runs fine, but that's as far as I've
I made another boot cart out of my Gameshark. Now I can run Neon64 without
worrying about the v64jr failing, in fact I don't even need to leave it
connected at all. I still haven't found the SMB3 problem, but I have narrowed it
Still unknown problem with SMB3, I'm closer to finding it but it is
becoming difficult, my v64jr keeps crashing after a while. I may need to
rebuild my boot cart just to get anything to work.
I fixed a few CPU errors involving improper behavior during stack overflow.
Now the SMB3 freezes completely instead of crashing. I expect some infinite
I fixed the bg clipping so that it actually works with scrolling, SMB3 and
Stars SE work much better now.
I found that PNG is much better than JPEG for these screenshots, so the below
images have been duly replaced.
I found several problems preventing SMB3 from running. First, I needed to
disable MMC3 IRQs when the background is turned off. This allowed most of the
game to run correctly. However then, upon exiting Toad's "line up the pictures"
game, the game crashed. I found that it was due to an error in the CPU core,
improper behavior when the stack overflowed. However, even upon fixing this,
the stack still overflowed, thus the game would freeze for several seconds and
then crash. I do not currently know why, but I suspect the CPU. Here are some
I also added a much more efficient tracer (no more bullet time while debugging)
and found out more about the Zelda 2 and Die Hard bugs (which increasingly
seem to be the same bug.)
Fixed several issues with cache age processing, made Sack of Flour title
screen work. Also fixed an MMC3 PRGROM problem (in $a000 and $c000 mode), which
has allowed the SMB3 title screen to work perfectly, though when I try to start
a game it crashes. Also AD&D Dragon Strike now works, since I added a modulo
to the PRGROM switch (games seem to like to try to access stuff like page $3f
when they really mean $0f, since it only has $10 pages in the 1st place
$3f%$10=$0f). I have added that same feature to CHRROM (since Dragon Strike's
gfx are still a bit off) and to MMC1 PRGROM and CHRROM (hopefully will improve
Zelda 2 and maybe Bill & Ted), but I have not yet been able to test it.
Tested, no improvements.
I found a nice image conversion program called pic2pic, it let me make a bunch
of screenshots (JPEG compressed) for ya:
There was an issue wherein the last entry in the compiled graphics
pointer table was not being transferred to the RSP, this is what
cause the Mega Man 3 (et al) problem. This was introduced by accident on
Another issue involved speed. Neon64 has been running slow since I
changed the ROM pointer to uncached. When I changed it to cached it
sped up but several games displayed sprite glitches. I reached a
solution by making two pointers, one cached and one uncached. The
cached version is used to access PRGROM, and the uncached version
is used to access CHRROM. I may do something similar with the VRAM
pointers, which need to be written uncached so the RSP can read
them, but they are read much more often than written. Some
well-placed cache instructions should solve this and boost speed.
All Mega Man games, as well as SMB2 and FF3, now run without visible
Quick story: To find the first issue mentioned above I went to
the Media Center (computer lab with some books and a copy machine).
Usually I can set up my N64, V64jr, and laptop, connect the N64 to a
monitor (via the N64->VGA adaptor I finally got) and code away, but
today they wouldn't let me use a monitor (despite the fact that there
were a dozen free). So how does one debug a video game without video?
I loaded the ROM, pressed buttons on the controller from memory, took
a screen shot, then saved it as a BMP on the laptop, viewed it in
MS Paint. And that's how I found my bug. Good timing, eh?
This suggested the possibility of taking many screen shots rapidly
and displaying them instead of saving them. Video over parallel port...
Some issues have sprung up since 5/31/03, causing glitches between tiles in
just about every game.
It turns out that JrGrab sucked, so I wrote my own version. It supports any
resolution and color depth and I intend to release it, with full source code,
when I get it into a more final form. (Here it is
in case you're really interested.)
Here's a screenshot from MM6, dumped with my
screen capture utility.
There were at least three things wrong, and I managed to find them all
The ROM was being treated as if it were in cache (which may have been what
crashed SMB3). This helped me to catch #2.
The 1K CHRROM switch was writing two pointers to the VRAM page table, which
was writing over name table pointers, so effectively pattern tables were being
used as name & attribute tables.
The mirroring register was backwards (I think this was goroh's mistake [edit: no, it was my misreading]).
Now Mega Man 3 is working almost perfectly, with just a small pattern table
confusion at the title screen and stage select screen. Haven't gotten to test
others yet, I only had 30 minutes. Screenshots will be available as soon as I
take them. I may incorporate JrGrab into Neon64 to save some trouble.
Got MMC3 IRQs to work right, but it looks like I'm still having problems
in a variety of games (Zelda 2 included, which isn't even an MMC3 game). I have
something fundamentally wrong with name table/attribute table.
MMC3 doesn't crash, but the graphics are messed up. Does anyone know of a
document more detailed than \Firebug\'s and in English?
Fixed Kung Fu; apparently it had sprite clipping on, and sprite 0 was in
the clipping zone, and thus not being caught. Apparently even if sprite 0 is
not being drawn it will still trigger an sp0 hit.
I'm trying to add MMC3 support...
Changed some stuff in an attempt to fix scroll problems (mostly just
removed old vestigial stuff). I reenabled the clearing of the vblank flag on
the recommendation of some other authors. I also discovered the reason for the
crashing of several games, I had miscoded the return from the MMC1 handler. I
was returning to ra, which is the return address for the instruction, not the
write function (which was stored at writera), so the instruction was exited
before the PC could be incremented, thus there was an attempt to execute the
middle of an instruction. Dragon Warrior has some bg corruption (wrong tiles in
the wrong places). Kung Fu freezes waiting for sprite 0, it looks like sprite 0
is completely transparent and thus would never trigger a hit.
I also noticed an interesting phenomenon: when the debugger is waiting for a
button to be pressed and the reset button on the N64 is pressed nothing happens.
The reset only occurs after a button is pressed and the wait loop is exited.
The ability to delay the reset could be most interestingly exploited...
In the past 24 hours I completely revised Neon64 to work with the cache for
compiled tiles. Mechanized Attack, Jaws, and several others which required
midscreen bank switching now work, and I have a nice, clean interface for
adding more mappers.
I looked into the NES Test source code again, and in the pause function
it seemed pretty obvious that it wasn't expecting the vblank flag to
be cleared after it was read, although all documents I have say it
should be. So I took out the vblank clearing and NES Test now runs as
nice as you could ask for. Plus it didn't seem to break anything new,
in fact Tecmo Bowl's scroll now looks a lot nicer.
During Calculus and Lunch I implemented analog stick control and second
player control, respectively.
I've implemented basic saving, triggered on pressing the reset button
(which also activates a neat fadeout). I've worked on some compression
research, and it seems that RLE would be the best to implement if I want to
fit more than 4 saves onto the controller pack at once, but I'll have to get
some sort of menu up and running. Right now it just writes SRAM on reset and
reads it on startup (though only if the SRAM enabled bit in the header is set.)
I also changed the controller access to work properly (DMA command to PI,
DMA result back) with the PI driver I wrote for accessing the controller pack.
This seems to be a bit faster.
Thanks to LaC for clearing up any confusion as to how the PI works.
The Nemu debugger has proved very useful, just minutes after I got it
working I found a bug which was the casue of Final Fantasy crashing:
Neon64 has a section of memory reserved for what I call "bgline", which is
just the background color repeated 320 times. This is loaded by the RSP via DMA
in order to clear out its internal buffer at the beginning of a line.
This means that I never have to explicitly write a background color, which in
all cases gives me a speed advantage.
Several versions ago I was running Neon64 in 256x240 mode, and recently I
switched to 320 or so. As such, I had to increase the size of the bgline from
256*2=512 (2 bytes per pixel) to 320*2=640. I changed the code which writes and
reads the bgline, but not the actual size of the memory region itself. The next
area in memory was the compiled pattern table array. Whenever bgline was being
updated, the pattern table would be overwritten with pixels from bgline. It
just so happens that the primary background colors I'd been testing with
(black and Super Mario Brothers' blue) were valid instructions, so this never
became a problem unitl Final Fantasy's map screen used a different color for
background. This color was written over the compiled pattern table and read
into the RSP, but when it was executed the RSP simply gave up and crashed.
Thanks to LaC and lemmy for this great debugging tool!
So here's the evil plan: Nemu 0.8 has a very nice RSP debugger built in, so
I should be able to use it to debug Neon64. I actually got to try it out
today, and I found a bug in the RSP emulator that I need to work around (at
least until I can get LaC to fix it). It turns out that there is no point to
using the lwu instruction on the RSP (as apposed to lw), since the registers
are only 32-bit anyway, so in Nemu the lwu instruction is not recognized,
though it is on the N64 (it's treated exactly the same as lw). I hope that this
debugging ability, though really slow, will enable me to more quickly find bugs.
And in response to the message someone left in the guestbook, if you'd like
to beta test the latest version just ask.
LaC sent me a corrected version of Nemu. I'll see if I can speed it up at
all past the 3 FPS I'm getting now...
I finally got to implement the pipelining I had wanted, along with more
efficient address calculation, and that has finally pushed emulation speed over
the top to 110%!
I'll work on that bug in Mach Rider (and possibly the same one in Final
Fantasy) some other time.
I was going to work on optimizing the PPU, but first I found a problem in
Mach Rider, with it occasionally crashing on the Absolute Indexed INC
instruction. In trying to find the source of this problem I found bugs in the
assembler, which I did manage to fix, but I have still not located the root of
the problem. I am not even sure if it is a problem with a) the ABXINC
instruction b) some other part of the CPU core c) some other part of the
program, possibly even running on the other processor so I'll never be able to
track it. To top it off I now have no faith in my exception handling routine's
error reporting capabilities. All of this will no doubt be resolved when I have
some more time and patience.
By the way, Mach Rider runs at about the same accuracy level as in loopy's
NES emulator (loopyNES), and its his documentation that I'm basing my work on,
The problem is more likely that we both decided not to bother with writes to
the VRAM address register mid-frame for speed considerations. At least I'm
pretty sure that's my problem.
I reverted to the version from 2/8/03 and then went about making many of the
changes I'd been planning. I removed sprite caching and got an accurate sprite 0
hit working, so now Mach Rider is playable (though still very glitchy).
I took a bit of a speed hit in this, though, so speed was then at 80%.
Then I fixed some glitches with the new sprite loading routine and finally
got the graphics compiler to work when only one byte needs to be recompiled,
rather than the whole tile, which led to much nicer speed on games like A Boy
and his Blob and Alfred Chicken.
Then I modified attribute byte loading to only load each byte once, for a new
speed of 83%.
Then I made the RSP start before checking the sprite 0 hit stuff in the CPU and
made some sprite VRAM table lookup changes to get rid of a problem I would
have when using some higher mappers. I also made the sprite 0 hit check not
take place if its been found already, and no effort is expended if the sprite
is blank, because then no hit can take place (that check is really cheap anyway)
. After that speed was back up to 85% again!
I also made some attempts at rearranging the background drawing function,
as it still does plenty of redundant things (such as looking up addresses in
the VRAM page table, which will only change on a name table change), but I
messed something up and had to scrap that. I did get some good ideas for when
I'll finally implement the pipelining that I've always wanted, mostly involving
loops. I think a lot of inspiration for this came indirectly from Brad Taylor's
NES Emulation Discussion document.
So effectively I got no speed increase but greater compatibility. Not a bad
I moved the sprite section of the PPU to the CPU, spent a few hours
optimizing, then realized that it wasn't worth it, as I was still getting low
I think I'll just revert to the 2/8/03 version, remove the stuff with the
cache (which at best causes no improvement and at worst is slow and glitchy),
and try to optimize the PPU there. I will keep the concept of loading the
sprite patterns when loading the sprite, instead of while drawing as in previous
versions, as this will allow me to avoid having to wait for the DMA to
I was able to rearrange the BG rendering so that it was calculated on the
CPU and drawn on the RSP, but this proved to be too much of a load on the CPU,
as I got only about 60% speed without attribute tables or sprites...
The new strategy will be to only do sprites on the CPU... we'll see how that
A few screenshots I just got around to uploading:
The best part of the game...
Um, should I laugh or scream?
In the running for best intro on the NES.
What SMB really looks like on the NES.
The NES' video output isn't quite at the same quality level as the N64's, is
I'm going to take the PPU and totally rework it, read the TODO for details.
By increasing the cache size from 8 to 16 I got Castlevania working
pretty much perfectly.
Sorry there haven't been updates for a while, I have been heavily working
on the PPU, which I now have at about 82% speed, getting those revisions I've
been promising in place, such as the overall reogranization and sprite cacheing
(only for 8x8 sprites at the moment), I'll get the bugs out of those (such as
Final Fantasy crashing at, before, and around the map screen) before
proceeding. I have a new todo list up , the link is above. Consider all
release date questions answered.
I also finally got an AC adapter for my NES (at Radio Shack on sale for $5
apparently they've devalued from the $16 it was in my catalog) so I now have an
accurate layout for the video output. I'll be changing my output to reflect
this sometime soon.
I did a huge amount of work on the sprite cache, getting it to work with
various games and 8x16 sprites, but I still have bugs in it. Given the small
if anything performance boost, I wonder if it is worth all of the effort I'm
putting into it.
On a positive note, Total Recall and Alfred Chicken have been tested and
work splendidly, despite the fact that Total Recall does not deserve to exist.
I also got the sprite 0 hit working reliably in both SMB and Solar Wars at
I did a major upgrade of the exception handler, in order to track down
the cause of the annoying little line that appeared when Final Fantasy crashed
(just before going into the battle scene), it now displayes the contents of all
registers as well as the word at the location causing the exception.
I also implemented a system to allow me to watch a block of addresses.
allowed me to track the source of the bug to the CPU emulator, where I had
used my zero page memory handler instead of the absolute memory handler, which
is faster but doesn't work outside of the zero page, in the increment
and decrement instructions.
This caused the Final Fantasy battle scenes to work, as well as the enemies in
I fixed the annoying line of garbage at the top of the screen. I also fixed
the interpretation of the MMC1 CHRROM switch registers, all 5 bits are used
(contrary to Firebug's mapper doc). This made Teenage Mutant Ninja Turtles
and A Boy and His Blob work correctly.
This section wasn't working before.
I noticed some random crashes in Zelda and TMNT, but this could be because
of a hardware problem, as the N64 died in a way it never has before each crash.
I accidentally deleted this entry, it went something like:
Fixed Mighty Bomb Jack (I wasn't clearing unsupported registers like controller
2, it was reading both controllers and ORing the results together). Fixed some
MMC1 graphical glitches (skipped CHRROM switching for games without CHRROM).
Made a breakthrough in MMC1 (I had an xori instead of an ori because of a
misinterpretation) which now allows Zelda, Metroid, and Final Fantasy 1 to run!
The Real Thing!
Yeah, they're slow and still have plenty of glitches, but at least they run!
In other news I ran Mighty Bomb Jack again, it still seems to confuse the right
arrow button with the start button, another case to look into.
There actually is no problem in SMB, I just misremembered the behavior
of the world -1 glitch.
Found a bug in BIT (bit test) that cleared the carry flag instead of the zero
flag, this fixed Rygar and Destination Earthstar, but not the world -1 SMB
Got a whole lot done today:
MMC1 working (though not on games with PRGROM switch, like Zelda, Metroid, Final Fantasy, etc)
Increased performance on games that change the pattern tables frequently from
"abysmally slow" to "annoyingly slow" by limiting graphics recompiles to one
per frame and making graphics compiler a macro instead of a subroutine.
Made debugger accessible at run time by pressing L and R.
Maybe it doesn't seem like a lot, but look at all the games I've got running!
Ah, the sweet smell of failure.
A little glitchy, but most of it works.
This is a cool game, how come I've never heard of it before?
Those crazy Russians and their Mind Games
Blame ATI for the horible image quality. Do any of these games look
familiar, Gavin? Oh, and Stars now works.
Not done revising the PPU yet, but I thought I should post some pictures
of the scaling in action.
This doesn't hurt my speed a bit, as it is all done internally by the VI.
I do lose a few pixels on the left, but this may just be my video capture card
misbehaving. It doesn't affect gameplay, anyway.
There was more wrong with the sprite 0 detection than I'd care to mention
now. I'm going to bit the bullet and rewrite the PPU emulator, maintaining
much of the same organization and functionality but placing it in an order that
can be easily maintained. As it stands currently I just added features
haphazardly and jiggled them around a bit until they worked, which has led to
the "Total nonsense" mentioned above. Or below, rather.
By the way, I'm planning to add zip decompression to the ROM loading routine,
so that one can cram more games onto a single CD (for those who have backup
units with CD-ROM drives). The entire GoodNES set is about 400 MB... wouldn't
it be neat to play any NES game you want on your N64 from one disk?
It'll also cut out that extra step in loading a ROM. Many PC NES emulators
support this and I figure its about time Neon64 did, too.
Reporting from school today... I suspect that the problem with Solar Wars is
caused by the failure of the sprite 0 hit to account for X scrolling within
individual tiles, but I'll check upon my return to the newly reconfigured
Bat Cave (aka my room).
I'm also planning to implement pipelined DMAs, so that I can load one tile
while the previous one is drawing, as I had wanted to do from the start.
Changed the initial VI settings and the PPU to work with a real (scaled)
256x240 video mode. It looks a little goofy now, but I'll see how it looks on
a real TV before I make any final decision on it.
There us a problem with the emulation of some little flag somehwere, I
think. In SMB, when one attempts to enter world -1, occasionally the game will
reset as soon as you enter the "invisible pipe", and if you do actually make it
there is an end, while world -1 is not supposed to end. This may also be the
bug that crashes stars, I tried to get the source code to find the problem but
it appears that the bug is in the sound code, which Chris didn't include
because someone else wrote it...
Started to work on MMC1 but so far I've been thwarted, I want to fix the CPU
bug first now.
I reinstituted drawing the top 8 lines of the screen, it turns out that my TV
is nonstandard and shows more at the top than others.
Another problem to solve:
Chris Covell's Stars demo crashes after a few seconds (but the scrolling works!) with a bad opcode. CPU debugging, ahoy!
Yes, my true inner awesomeness has shone through.
I fixed the bug, which was caused not by the CPU not being fast enough but by
my new PPU code being too fast for even the mighty SP DMA
, so I had to put a wait in there. I should have known this when I noticed
that if I put a delay in the attribute table secion (which is between this
particular DMA and where the DMAed data is used) the problem went away, but not
if I put it elsewhere, but I didn't. The breakthrough came when I ran my
favorite debugging buddy, Arkanoid, and noticed that the glitches were not with
the attribute table (the flickersomeness was with actual patterns, not colors)
but rather with the pattern or name table. As the attribute and name table
calculations work exactly the same way, based on the PPU V register, the allowed
me to free V from suspicion, as I had been examining it before. I knew that it
must be a problem with the pattern table access.
AND INDEED IT WAS!
I was trying to run my compiled pattern table tiles occasionally before they
were fully loaded into the RSP, thus it would be drawing the previous tile, thus
the screen would appear to be shifted right.
Using the new highly scientific "SMB Timeout Test", in which I start a Super
Mario Brothers game in both Neon64 64 and a full speed PU NES emu at the same
time, then read how much time is left on the clock on the slower emulator
(mine) when the time on the other runs out, I determined that Neon64 runs at
78% of full speed. Again, as the CPU runs at about 600%, this is all due to the
PPU, which is eight pages of total nonsense at the moment. Total
Solar Wars still has... issues with the title screen, minor ones but still
annoying, considering that this was one of the strongest points of the early
PPU. Bah, humbug.
NES Test runs almost perfectly, if "perfectly" can be used to describe the
behavior of a program running at an estimated 2 FPS. Let's call it bullet-time.
It's no longer physically painful to play this game.
Yeah, that's right, it runs.
Wow, its 2 AM. Time to celebrate... let's write a Physics lab report! Yay!
Worked on name table speedup, some very entertaining PPU bugs have cropped
up, but at least speed has improved.
That problem was quickly dispatched, let's note the structure of the SP DMA
length register for posterity:
bits 11-00: length to transfer -8, lower three bits are ignored
bits 19-12: number of times to transfer that length -1
bits 31-20: bytes to skip between each transfer (skipping only occurs in DRAM
for either RSP->RAM or RAM->RSP), probably lower three bits ignored
Fixed mapper 2 games by compiling the entire tile each time a write was made to
VRAM, I'm still not quite sure what the problem was.
Made sprites draw backwards but load forwards, in order to properly emulate the
Added attribute table speedup, really nice speed now, but it seems that now it
is unwilling to wait for the CPU to catch up, we somehow occasionally start at
the end of the current line, so the screen flickers a lot. A problem with too
much speed is a welcome one :-)
Made a few changes to BRK.
There's a problem with the scrolling on the Solar Wars title screen, I'll be
looking into it shortly.
Started to implement the speedup for name tables today, but other real life
things came up so I didn't have the time to finish.
Made sprites draw in reverse order, as they should.
Eliminated drawing of top eight lines of screen, apparently this is the way
things should be, which fixed the "Mario going off screen" problem.
Enforced black borders on either side of the drawn screen, so scrolled
games look right at the edges. Here's some of the nice new screenshots:
And away we go!
WOW! The mushroom is behind the other graphics! HOW DID HE DO THAT?!?!?
Um, jumping ...
I noticed that when Mario jumps off the top of the screen his head disappears
before he gets there. Apparently it is possible to have a negative screen
position for sprites or something, I'll have to look into it.
Got some nice new ideas for speeding up the PPU. They should almost always
work, I'll need a special case for some mappers, but since I don't support those
mappers yet anyway that's not a problem.
Hammered at mapper 2 some more, problem still not solved.
Made mapper 2 (Castlevania, Mega Man, et al) work, had to call graphics
compiler more frequently since those games spontaneously draw patterns. I'm
having problems with the compiler getting tiles confused, almost as if its
using the sprite pattern table for the background, which I'm sure it isn't.
Got 8x16 sprites working. Need to get NES Test working completely, appears
to have some background and palette issues.
After much toil I got a sprite 0 hit detector working that is as
accurate as any I can devise. The only forseeable flaws are that it
does not account for 8x16 sprites or sprite 0 being flipped.
Come to think of it, I don't have 8x16 sprites covered at all, or the
option to ignore things in the 8 pixel border. Oh well, a later version.
Seems to work nicely on Mario, except that whenever another sprite is
in the same scanline as sprite 0 it crashes. Oops.
It turns out that sltiu will clear the result if it doesn't set it,
rather than leave it alone. I knew this but didn't compensate for it.
So I added an or and that seems to have fixed it all up.
I'd have screenshots, but I'm away at my father's house and have no screen capture capability.
Attribute tables (background colors) work perfectly. Running a little
slow and still have arithmetic errors...
Fixed problem with sprites occasionally becoming background by
accident, it turns out that I was checking bit 5 of the tile index, not
sprite property, byte. It was another case of confusion between ones
THE CASE IS SOLVE-ED!
I didn't clear the carry flag for LSR (logical shift right).
This appears to make Solar Wars behave splendidly, I have yet to test
MARIO WORKS! But since my sprite 0 hit is a bit cheap the status text
cuts off a little early...
I was checking ppu control reg 1 ($2000) instead of ppu control reg 2
($2001) to determine if fixit should run... Thus even if the program
had deactivated the screen so that the rendering process wouldn't be
a problem, it would usually happen anyway. Doh.
Seems to fix the terrain problem with Solar Wars and the sprite
corruption problem with Arkanoid, which now has a few other visible
problems but I think I'll leave them alone for now and concentrate on
the graphics generation.
Got pseudo-color graphics compiler working. Some problems still exist,
namely that the Arkanoid screen still doesn't clear entirely, Solar
Wars arithmetic is messed up, Arkanoid demo levels are wrong, but I'll
examine that further tomorrow. It looks great!
Took a chance without backing up the last version, luckily it paid off.
For a while there was a single line of the wrong name table at the top
of the screen. I set off into the code to find the reason and discovered
that I have made the $2000/$2001 mistake again when resetting v=t, which
only happens if the bg or sprites are active. This also solved the
incomplete clearence problem in Arkanoid! There is still a sprite/bg
misalignment problem, though. (Solved!)
Captain! Three screenshots off the starboard bow!
All of the background is drawn by code running on the RSP which was dynamically
generated by the CPU.
Well, cross out one more place to search for the problem. The random
hill things being found in Solar Wars are caused, once again, by
the vblank happening during them.
AHA! In read.inc the read 2002 function loaded the clear vblank flag
constant with an li, while the constant was greater in size than could
be loaded with an li, so I changed it to an la. But that seemed not to
have solved anything.
Tore apart Solar Wars' terrain drawing function, which appears to get
all the right data from the terrain generation function.
It works in two parts: one draws air (blank space) from the top of the
screen and the other places a surface tile below that. This works
column by column. Initially the entire screen is an array of ground
tiles. I determine that even when the air drawing section is changed to
fill the enitre column there are occasional places where it stops
partway down the column (and before it has reached the surface), so
either it does not make the change completely or the change does not
get completely put to the screen. I am more or less sure that the code
to make the change executes, so next I'll check if that is reflected in
the local pattern table and then in the RSP (which accesses the CPU's
pattern table to actually do the drawing, but I strongly doubt that it
I played around a lot with ADC and SBC, I'm now pretty sure that
they are accurate (I didn't change anything).
I've found that Arkanoid's failure to clear the screen is because
it does not have enough time in vblank to do so, the VRAM address
gets reset as drawing begins and thus accidental writes are made to
the CHRROM (which I shouldn't allow).
I'm sticking with 20*113 cycles per vblank for now, which helps a
little but does not solve the problem. Maybe the instructions
take too many cycles, or maybe something is running in error that takes too
long, sort of like an emulation cancer. On second thought, I had partially
solved this problem before and I didn't do anything with instruction
What's that? You want screenshots? Alright, but just remember you
brought this on yourself.
Title screen, notice newly working blue arrow.
Planet selection screen (also newly working).
Fixed a palette bug with Solar Wars et al, I had dw instead of dcw.
Fixed a bug in the pause routine of the debugger.
In true "Idiotic Programmer" fashion, I fixed a bg problem with Arkanoid, then promptly
lost the change and forgot what I did.
Fixed CHRROM loading, both initially and for mapper #3 (CNROM, helped Solar Wars a lot). Before pattern table #1 wasn't being loaded into the RSP, and it wasn't updated on either the RSP or the CPU on a CNROM switch.
I have verified that the screen-clearing function does indeed execute.
I found the screen clearing function, which is what I have been looking for.
Now all I have to do is get my computer back and I can track down the problem
I put up another page to document my Arkanoid
exploits. I've come pretty far, but I haven't found what I'm looking for yet.
I've been doing a bit more disassembly, its coming along steadily but
I've found the routines for drawing the blocks and the warp to next level so
far. I'm also pondering possibly writing a disassembler to do what I'm doing
now, that is to break the code up into subroutines and pseudo-emulate.
If you change byte 0x3900 (0xb8f0 in PROM) to 0x00 the next level warp will
always be there, except while the level is just starting.
I should be getting my real development computer back in about a week,
'till then I'm using my new laptop.
Sorry about the lack of updates, but I haven't done much over the past 2
weeks. I currently don't have access to my nice new computer
and I'm reluctant to work with old versions of the
source that I have floating around. What I have been doing is looking over
a disassembly of Arkanoid, trying to find how it works and therefore what
the emulator is doing wrong. I have several variables and the reset
vector understood, but I'm in a bit over my head so it will take a while.
I've been toying with the idea of replacing the CPU emulator with the version
from beta 1 and then swapping back in bits of the beta 2 emulator until I
find the segment that causes a problem, but again I'll have to wait until I
have access to the computer.
Some improvement but
none of the major problems are fixed yet, among them the frequent corruption
of sprites (with the addition of background, but other problems as well) and
the still-looming problem with the CPU. I did look through the addressing
modes but found nothing problematic (except something redundant which saved
me 300 bytes). The Solar Wars title screen works, with the wavy effect.
I figured out that the background in Arkanoid, along with some other problems,
was not being cleared before drawn on. Since most everything else in Arkanoid
works, this should enable me to pinpoint the problem. I've diassembled the
program and I'll be searching for the loop that clears the name tables. When
that is found, I should be able to see if it is executing, why if not, and
what its doing wrong if it is.
Also found a rather major flaw in U64ASM, a label can be defines in terms of
itself successfully, but it won't have any meaning whatsoever. This should
generate an error, but I can't think of a quick way to do that, so I'm
leaving it as is for now.
I got the graphics compiler running under very artificial circumstances,
but it works. That's what you get for planning things out in advance.
My preliminary work on the PPU has been to read the name tables. I made a lot
of progress today, fixed a lot of long-standing bugs, and possibly identified
the CPU emulation problem. I ran Pac-Man, and the ghosts just kept bouncing
around in a small sqaure. This might imply an error in an addressing mode.
The PPU still needs a lot of work, and it seems like whenever I make a change
the sprites get messed up. I'll be working on locating the source of this
Got color, horizontal and vertical flipping to work for sprites, which
now only requires 16x8 sprite support. Here's a nice color image from
Arkanoid, one of the only games that works flawlessly:
And here's a view of Headless Mario, all that is displayed when I try to run
Super Mario Brothers:
I've also had problems with Solar Wars, Space Invaders, and the PD NES Test,
but I'm not going to work on fixing the CPU right away. First I'm going to
finish up the graphics, and as of now the next step in that is the
background. My idea for speeding this up, graphics compilation, is now well
developed and I have the compiler itself written in only 16 instructions.
I hope that the burden of loading name table, attribute table, and compiled
patterns won't slow down the emulator enough to take it sub-realtime, while
its at about 2x-ish now. At some point I'll put together an FPS counter...
When I do, however, get around to the CPU part, I'll probably use
Nintendulator or some other PC NES emulator with a debugger to step through
a game and see what differences in execution arise.
I finally regained access to my old source from beta 1 (the
version I'm working on now is beta 2). In celebration, I assembled
it and saw that it's quite better than what's out now. So now you can
download beta 1 v3, it's very nice. But it is still the year-old
version, so its not a huge improveent. I mostly just took out the
vsynch, which boosted speed a lot.
Later that day...
I take back all of the bad stuff I've said about the RSP, it's all my fault.
I keep getting my "make a string of x data type" instruction confused,
the first parameter is how many and the second is what they should all be.
I kept doing this in reverse, so I'd be defining 0 bytes each containing 32
instead of 32 bytes containing 0. I'm going to put a safeguard into U64ASM
to remind me next time I try to make zero of something.
Anyway, this was causing the primary problem with the program, namely that my
sprite buffer was size 0 so other variables were writing over it. The
sprites now look as pretty as can be! Note that I don't have flipping,
priority, or even color working yet so all you see is a shadow of the true
sprite, thus Vaus looks a bit odd.
You can see a brick shining here, very nice.
I'd also like to note that this is the exact opposite order that I wrote beta 1
in. For it I did the background first, then the sprites.
Worked on a proto-sprite renderer.
Revelation of the day:
Never use any other data size but a word on the RSP. It just doesn't work.
I might get to do some experiments later to confirm exactly what doesn't work,
why and how, but right now suffice it to say that sb does nothing but cause
the RSP to overwrite nearby data. Eeew.
I replaced my fake sprite renderer with the skeleton of the real one, now it
shows lines instead of dots where the sprites should be (a step in the right
direction, the next will be colored blocks!) This new version more closely
mimicks the NES's internal "temporary sprite buffer" with one of its own that
is loaded with the next scanline's sprites. It's also at least twice as fast,
despite doing 8 times as much work. Yay!
Set up palette so I can charge right into actual graphics. The coin color in
SMB is seen to be blinking. I have the background drawing done (I had it done
by default a while ago, but now it has the right color.) SMB currrntly
looks like this:
The vertical bars on the left are the palette. You can't see them against
the bright background, but there are a few white lines for sprites on
the screen. SMB still doesn't run, though...
Finally got a chance to sit down to the program for a few minutes today,
changed main memory from kseg1 to kseg0 for some speed increase. I noticed some
weird things happening in Arkanoid, as well as the old problem with Solar
Wars, but it seemed like SMB accidentally worked for a moment. I figure that
this is a 6502 emulation problem, I'm going to go back through the old code
from last year and see what I'm doing differently, since that worked the CPU
flawlessly. My main concern is the ADC and SBC instructions.
Made background color set to black at start. I found that my DMA wait
macros for the RSP branched 8 bytes back instead of 4, oops. This and another
minor fix caused a change the the Arkanoid screen. Vaus is made up of 4
sprites, but the one farthest to the left was often not being displayed.
Here's the new image:
School has started again, and in the flurry of activity Neon64 development
has slowed. I do however, have some great news: I got a new computer from the
father of a friend, and I'll be using it for Neon64 now because it had a TV
tuner, so I can code and test on the same screen. I am also finally able to take
screenshots from the actual hardware. As such, I present to you the first public
image of Neon64 beta 2:
No, it isn't much as it stands, but when you realize that only the positions of
sprites are shown, and that the PPU is running seperately from the CPU in
custon ucode on the RSP, maybe it'll mean a little more. This is actually fully
playable, you just can't see where the blocks are yet. In case you can't
figure it out, here's an explanation:
This marks about the third time I've made a sweeping, fundamental change
to the program. The RSP is now custodian of all things PPU, because I decided
that my problem was caused by the CPU- and RSP-instigated DMAs interrupting
each other. They both use exactly the same registers, so you can see how, if the
CPU is writing to these registers, the PPU could be doing so at the exact same
time, thus causing a mix of values that screws everything up. Now only the
RSP is allowed to DMA, which meant that I had to go back and make the CPU's
routines subserviant to the call of the RSP.
An interesting side effect (well, something I needed to do to make this work
but not really a goal) is that
I made the PPU status bits show up in the RSP's signal bits, so it's really
easy to set and clear them from either processor. I also used a signal bit
(there are 8 of them, so I have 2 left) to indicate when the RSP's copy of the
SPRRAM is out of date, so it can know when to DMA the CPU's copy. The CPU does
an SPRRAM DMA with the ld and sd instructions (moving 8 bytes at a time is faster)
from the RAM location to its local copy of SPRRAM.
My problem was that, when I made any change to the PPU emulator (even just adding a NOP), it
wouldn't run or would run erratically. I expect that this was because I was
just lucky enough with the old version that the timing was just right and
the DMAs were nonconflicting. Now that problem seems to be gone, and I can
finally get some more actual emulation done.
On the assembler front, I made a new instruction, watch, that displays a
warning when an instruction is assembled at the given PC. This allows me
to take the EPC given to me by the excpetion handler and match it up with
its assembly source. Before I had to use my really slow debug version, which
outputs every line with its PC and offset.
It turns out that what I said about RSP writes before was inaccurate, the
eight is added. I didn't have a chance to fully test this, and I still don't,
but I'm pretty sure that I was mistaken. I have Arkanoid running
on the new setup.
I've been converting all of my writes to DMEM into DMAs, an icky task,
yet I think I've got it down.
I found out how to hook the pre-NMI from when you press the reset button,
(the general exception vector (0x180) is called with bit 12 (0x1000) of the
Cause COP0 register set) so I could have some neat effect there later, but
right now all I do is break the RSP so that the reset can proceed normally.
Actually, now I found that, if I use a breakpoint, the RSP won't restart on
reset. If I halt it, however, everything works nicely.
I made use of the RSP's signal bits for the first time today, so I can use
them to tell the PPU emulator to start, and report back to the CPU when it is
done, for much more accurate synchonization.
I started work on the palette, using BMF's RGB palette ('The only palette slow-roasted to perfection'), yesterday, and I got
a nice display of all of the colors, but when I tried to implement it fully
I got all sorts of weird problems, which led me to the DMA revision that I
mentioned above. It was still giving me trouble today, so I performed a
On another topic, I have 4 MB of space free on this hard drive. Now that my
assembler outputs a 2 MB ROM, I run out of space really quickly. It turns out
that I didn't check if there was actually enough space for the finished ROM,
so now if there isn't an error is generated.
On yet another topic, I discovered that there is a rather large difference
between the sp_wr_len_reg (which gives, approximately, the number of bytes
to DMA out of the RSP into DRAM) as seen from the CPU (at 0x0404000c) and as
seen from the RSP (at register 3, entrylo1 by my incorrect notation). From the
CPU, the actual number of bytes is the value in sp_wr_len_reg, with the low 3
bits masked out, plus 8. On the RSP the low three bits are also masked out,
but the 8 is not added. This is probably true for sp_rd_len_reg, too, but I
haven't tested it.
I made the ROM loader correct the VRAM page array so that any CHRROM will
be accessed as pattern tables 0 and 1.
I ran across the source to Andreas Sterbenz's Checksum 64, so I decided to finally take
the leap and make U64ASM produce complete ROM images, which it does perfectly.
I also incorporated the drjr send utility, so now I have one program to do
everything I need.
I've also been doing more RSP research, and I've now found what the
rest of the length DMA register is for...
I wrote the ROM loading routines for everything but the CHRROM. I moved a
lot of stuff around and rewrote the SPRRAM handler. After a talk with LaC I've
decided that I'm (eventually) going to change all writes to DMEM to DMAs, for
greater accuracy and speed.
As I mentioned before, you can have two simultaneous SP DMAs going on (one active,
one queued). That's what the DMA_FULL register is for, it tells you when you can't
schedule another DMA. When this came into doubt today I performed several experiments
that appear to prove that point.
I just discovered that my 2005 writing routine never actually read the
current value of the t register (temporary VRAM address). I used a store instead of
a load. Oops. Again.
I ported over a ROM loader I wrote for another assembler, and in doing so I
fixed an error in my two-part addressing macro, because I didn't realize
that the offset was sign extended when I wrote it. This loader doesn't actually
setup all of the pointers like it should just yet, but it gets the CHR and
PRG-ROM page counts and displays them onscreen, along with the mapper and
mirroring types (as text, not just numbers).
I incorporated the exception handler into Neon64, which helped with debugging the
above. I also tested the emulator with Arkanoid, which, unlike Solar Wars and
SMB, performed perfectly. I'll be starting on the graphics now, so I'll
probably be using Arkanoid for that and I'll come back to the others later.
I started working with interrupts and exceptions today. I managed to use
the VI and count/compare interrupts to run two counters at different speeds.
I also got error detection and reporting working, which will allow a lot
of fun debugging to take place! No work on Neon64 just yet.
Well, I seem to have fixed that specific problem by immeidately following
a write to the RSP with a read from the same location. I don't entirely know
why, but it seems to always solve the problem. I made the other writes to the
RSP use the same method, and I rewrote the SPRRAM write. There is now a RAM
copy of the SPRRAM, a write to SPRRAM is written here, then the new SPRRAM
is DMAed into the RSP. I had to have the real SPRRAM DMA (4014) write its
values back into this local copy, too.
I also found that the write to PPU control register 2 was a load instead of a store. Oops.
I wrote a new 'main emulation loop' and PPU emulator skeleton for Neon64, using a more structured
format that I actually planned out first, based roughly on FCEU's main loop.
After the change none of my test programs appeared to run any better or worse.
SMB loads two sprites and just runs aimlessly, and in Solar Wars everything works
pretty well except the projectile jumps around the screen.
FCEU counts three cycles for every one on the 6502, this helps to make its count
more accurate since there are 12 PPU cycles to a CPU cycle and a lot of my
timing is off by 1/3 or 2/3 or a cycle, which can add up. I might incorporate this into Neon64.
I also may have fixed that register math error that I kept getting in U64ASM.
I also moved a lot of the older news off of this main page, since it's gotten
I managed to isolate the problem with SMB, it seems that there are
specific times when the RSP ignores writes to DMEM. Even though the
NMI was turned back on, the write was ocassionally (and fatally) not
made to PPU control register 1, which I keep on the RSP. If I can't figure
out how to detect these moments of ignorance, or work around them, I'll
have to move all the variables for the PPU into main RAM and have the
RSP DMA them in, or DMA the variables into the RSP. I hope that it doesn't
come down to that, though.
I came back from vacation today to find that my R4000 User's Manual had arrived in the mail.
I've got stacks of plans for the PPU that I'm ready to try out, but I want
to fix a weird problem with SMB first, it turns off the vertical retrace NMI
and then just sits...
I got some info on interrupts from LaC, I think that this will help schedule the
While I was sitting on the beach today I was developing an idea I had
last night: graphics compilation. Since the pattern table changes very rarely,
if at all, during the life of an NES program, the same tile is interpereted
in exactly the same way over and over again. It makes sense, therefore, to
figure out in advance what R4300i instructions are required to write a tile
to the screen. Then I can just DMA these instructions into the RSP to have it
draw the tile. I can even have one tile drawing while the next one is being
loaded. This should speed up PPU emulation quite a bit. It is also far simpler
than total CPU recompilation, since all of the tiles will always be aligned
in exactly the same way (except in 8x16 sprite mode, but I've worked that out).
PPU emulation was the main speed problem in the first version of Neon64,
so hopefully this will help a lot.
I got SPR-RAM DMA working, using a real RSP DMA to speed things up. In
the process, I switched a lot of labels over to kseg1 to avoid having the
RSP/PPU work with old data. This was as simple as ORing the mem label with
0xA0000000, since all of my other arrays build on that.
I made a fake sprite renderer that just shows a vertical line where a sprite
is, this allowed me to see that Solar Wars is indeed progressing well. There
just seems to be a problem with the projectile, it jumps around a lot,
I'm not sure if this is normal behavior or not.
I'm going to be away for two weeks, and my computer isn't coming with me. I'll
try to write out some plans for the PPU renderer when I find the time.
I rewrote the SPR-RAM access register ($2004) with two things in mind:
That the CPU can only access RSP data in 1 word chunks, reading or writing
That all word access must be word aligned.
Now Solar Wars works. I can even see a change in behavior when I press the
start button! Yay!
When I did the VRAM page array, I found that you can have at least two
simultaneous DMAs to the RSP, something I had previously only suspected.
Well, not truly simultaneous, but I can queue one while the other is still
I got the 2005/2006 registers emulated, as well as their accompanying
V,T, and X registers (as described by loopy).
By the way, the "Backup Often" mantra protects not only against natural
disaster, but also your own stupidity. After I wrote the above and got it
all working, I stormed ahead to all sorts of other stuff and got myself
confused, so I had to revert to my last backed-up working version from last
night. That meant that I had to rewrite all of the stuff that I had already
done. Oh well.
On another note, I made the $2007 (VRAM I/O) register work, after beating the
bugs out of it with a big stick. I still have to get my hands on a good
Solar Wars and SMB are running without graphics, I can select different
planets in Solar Wars and so on.
I had the idea of making two page tables, one for reading and one
for writing, so that trapping mapper writes will be easy. This didn't take
even a single additional instruction in the read and write macros.
I wrote the skeleton of the PPU emulator and the routines needed to load it
to the RSP and reboot it once per frame.
I wrote a lot of the PPU register handlers. I have a basic PPU emulator
(without any actual graphical output) working. I got the vblank interrupt
My tests with Super Mario Brothers run until a loop that waits for the sprite
0 hit flag to be set, which is good. But Solar Wars crashes, possibly because
VRAM isn't properly emulated yet...
I also found out how to detect a reset, so that the N64 doesn't lock up when
you press the reset button, waiting for the RSP to break. I might make a cool
transition on reset, like Mario 64 has.
Totally reorganized the program, so everything will be nice and clean
for the NES stuff coming in.
Wrote NMI and IRQ interrupts, tested with square root program.
Wrote controller 1 handler, might be a bit different in the final version.
I tested Neon64 out with Super Mario Brothers today. I don't have any
graphics/sound/etc emulated yet, but the program did appear to run until it
hit the wait loop, where it waits until the vblank interrupt occurs. Solar Wars
performed equally well. To get these working I had to set up a skeleton for
the registers. I also discovered that including a file as large as an
NES ROM image apparenty crashes my assembler, so I had to rewrite a little
More bug fixes in the 6502 emulator took place. One test program I was
using, that took the square root of a number, didn't work before but then
worked when I made another mistake. I found the source of the original
error (an incorrectly set carry flag in CMP/CPX/CPY), fixed the subsequent one,
and now everything seems to be working fine. I also figured out how the JSR/RTS
instruction pair works:
JSR increments the PC, then pushes the PC and loads the new one, then RTS
pulls/pops the PC and increments it again, so the PC points to the instruction
after the JSR.
There was also an error in the day of week calculation test program, since
TASM will assume that LSR without an operand is pointing to zero, while this
program used that to mean LSR A.
I made an option to not export a label into a header file (for U64ASM), so I
don't have to keep editing codes.h whenever I assemble codes.asm, which
contains the opcodes. I assemble this seperately from the rest of the
program because it contains the opcodes, which use macros heavily, slowing
down assembly by A LOT.
After I test the emulator a little more thoroughly, I will move on to
actually loading NES ROM images, the registers, the controller, video, sound,
and eventually mappers.
I reassembled everything to work with the new CPU text renderer, which
protects the registers, too. I fixed several glitches and cosmetic problems
in U64ASM, including one that would cause the program to crash if a macro
parameter had size 0.
I found several glaring errors in Neon64. The emulated RAM overlapped
part of the executable code because of a misplaced 0. The sign and zero flags
were frequently not cleared before setting them. These errors were corrected.
I revised my preprocessor to allow macro parameters to include spaces and I fixed
some of my macros. I rewrote my text routine to run on the CPU, and in doing
so I learned the difference between kseg0 and kseg1 (kseg0 is cached). For a
while I was writing to the frame buffer in kseg0 but nothing was happening,
because my writes were just being cached and not put to the framebuffer.
I can now use this text routine to start debugging the PPU emulator that I
will be writing for the RSP, since I think it would be too complicated to
have both the text writer and the PPU emulator running on the RSP, plus
I'd probably run out of IMEM to work with (there's only 4K).
I also dumped the DOS extended character set, because I want to use the
line characters to draw windows and boxes. I only had the ASCII set before.
No direct work on Neon64 was done during this time.
Made paged memory access work. The speed is now around 7 Mhz, using
mostly the absolute addressing mode to test.
There are 743 lines of code, not including the opcodes or test programs.
I got the jump table working, so now I have a primitive 6502 emulator.
I assembled the opcodes seperately so I just need to include their assembled
form rather than reassemble them each time I build Neon64. This and a change
I made to U64ASM's binary inclusion means that assembly speed has increased fourfold.
That's not terribly important to you, but I felt that I should mention it
I fixed a problem with my text macros that caused the CPU to try to make
the RSP draw a new string before it was done with the old one. The busy
flag isn't set fast enough by the RSP, so the CPU now waits for the
busy flag to be set before finishing with a text string.
I also got to run some test programs, assembled with TASM. From observing
the operation of these programs with my stopwatch, I determined that the
emulator is currently running at about 10 Mhz, approximately 6 times faster
than the NES's 6502, approximately twice as fast as the last version of
Neon64's 6502 emulator.
I fixed an error in the jump table, opcode 0x8E (STX absolute) was listed
as STY zero page, a rather large error since those two instructions are
different sizes and the high byte of the absolute address will be read as
the next opcode, plus it would write to the wrong register.
Happy Birthday, Dad!
I added cycle counting to each and every opcode, by hand, today. I also made
the jump table but I haven't gotten it to work right yet. There are 1,537
lines in Neon64 so far. I deleted a lot of blank lines in the opcodes, so this
number is a little deceptive.
I had a little time when I came home today, so I finished off the rest of the opcodes.
Now all I need to do is make the jump table, handle periodic tasks, and handle interrupts, and then I can start on the real NES stuff.
Neon64 is currently 1,488 lines.
I got a lot more opcodes done today. I have been writing them in alphabetical order, up to LDY.
Only a few more need to be done, and everything has tested out well. There are 1,096 lines of code in Neon64 right now.
I also edited the drjr send program so it says "Uploading" instead of "Downloading". That had always gotten on my nerves.
I'll be away for the rest of the weekend, so there won't be any more work for a few days.
It isn't that I haven't done anything over the past few days, its just
that I haven't gotten around to updating the site. I have the ADC,AND, and ASL
opcodes completely emulated in all of their addressing modes. As I was doing this, I noticed that the time
taken to assemble was becoming excessive. I have added many little speed-ups to the preprocessor, where most of
the time was spent. Just changing this part of the program, I managed to almost double the speed of the entire assembly!
I also added a routine to check if a macro's name contains another macro's name, which can cause unexpected results. I found that I had
a few instances of this in code I had already written. Neon64 is currently 574 lines in size.
This evening I also did some work with sound generation on the N64. I expect this will be my weakest point, since the furthest I've ever come in sound programming has been my PC Speaker Zelda Theme.
So I wrote a little program that makes a two-tone siren sound (two different frequency sawtooths), and fiddled with it until the clicking went away. It wound up being 90 lines long.
I wound up working into the next day this time. I finished most of the operation macros, up to ROR in alphabetical order. My macro preprocessor isn't very good, so I changed some names around to speed it up. All of my macros started
with A6502 before, so it has to read that at the beginning of each entry. I changed the order of the names so that A6502 is at the end of the macro name. I probably saved one or two seconds at the current stage. Right now I have 391 lines of code.
Finished up the addressing modes and wrote the ADC macro, for the add with carry series of opcodes. I've already been able to improve the program, the old version of the ADC macro had 19 instructions, some of them branch and load instructions. The n
Fixed more bugs in U64ASM, added nested macros. In Neon64 I started on the 6502 CPU core, I have the absolute addressing mode and zero and sign flag processing done. No actual instructions are emulated yet...
Today I got back to work on Neon64. I moved my PC next to my TV and started making some debugging macros (for text output, etc.)
In the process of doing this, I found a bug in the number display routine that would cause unpredictable results if the number was larger than could be displayed.
I also added some more functions to my assembler, U64ASM. I also found a bug in my byteswapping program that caused it to mess up files with an odd number of bytes.
The new version will have nothing in common with the old one, I am starting from scratch. There are currently 164 lines of code, not counting the I/O functions, which are preassembled. None of this code actually does any
emulation yet, but at least it's a start. I wanted to wait until I got my N64 to VGA adaptor box, but its been months since I ordered it and it seems like it will never arrive. Oh well.