Here is the rest of the Neon64 development history, moved off the main page because it was getting too big.
Note that some of the earliest images are broken, I've moved this page between about four servers and some things have been lost.
I'm preparing to release a v1.2a, with that crash bug fixed (and including source), and I noticed that Wizards & Warriors has been slightly broken since v1.2. I'm not sure exactly what the issue is, but the title screen is messed up. The game is still totally playable, though.
I took another look at the HI/LO thing because it seemed like the fix I made was slowing things down a lot (and because I remember fondly the days of developing for a console where everyone in the world has the same specs...). I rewrote the fix and now speed is not impacted, yet the bug is still fixed.
If there are future releases they will contain a version of extend.exe that only extends them to 0x101000 bytes instead of 0x200000 bytes, which is the minimum needed for the CRC to work reliably. This will allow for faster send times and more space on a CD made with makeall.bat.
I found the cause of the odd crashing with the FPS meter activated. While I save all other registers before doing anything that might change them in the interrupt handler I failed to save the HI and LO registers, which are used to store the result of multiplication and division operations. Because of this there is a small chance that the interrupt will occur just between the instruction that writes LO or HI and the instruction that reads them. A multiplication done during the interrupt will overwrite this value. I fixed this by simply saving HI and LO before and restoring them after anything that changes them, in this case the text routine and the framerate calculation. The calculation itself is done continuously, and this may have been the cause of the random crashes which occasionally plagued the previous version (even with FPS off). It definately eliminates the bizarre behavior seen when starting Elite with FPS enabled.
I've had v1.2 sitting unchanged for weeks now, I was just hoping I'd find a solution to my problems, but... I didn't. So here it is, not the huge improvement 1.1 was but significantly better, as well as the addition of the PAL Super Mario 64 support.
Found some more bugs, none newly introduced (thankfully), check the TODO page.
I have fixed a problem with DMC IRQ timing, Ian Bell's Tank Demo now works as well as it does in any emulator I know of.
I have verified that the new solution works on actual PAL Super Mario 64 cartridges.
After several hours with my v64jr plugged into my Gameshark I figured out how to load Neon64 with the European version of Super Mario 64. Now someone better use it (let me know now if you actually have the cartridge to test it with.)
Fixed some bugs in the save system, (the first save you make would not actually work, the signature was not fully written) and also allowed it to be backwards-compatible with saves from v1.1
Got Batle Tank up to speed by increasing dramatically the number of cache pages.
Ok, I lied, I am working on the program again. I've rewritten the audio mixing as per some new information blargg has published, and I've made substantial increases in the speed of games using VRAM (Starship Hector and that Wizardry game are now running properly) and overall everything is running faster due to some code reorganization. Battle Tank is still slow at the intro.
Oh, look, I've been linked to from Lik-Sang's Gameshark page... it's at the very bottom.
I made sound mixing a little faster and a little more accurate.
I also found several games that are unbearably slow due to graphics compilation: Battle Tanks, Wizardry: Legacy of Llygamyn (or something), Starship Hector.
I have no plans to fix these, just letting it be known that they're problematic.
Apparently the Z64 does in fact have support for emulators, I said differently in the readme. Sorry for any confusion this may have caused.
I noticed that the Minigame Pack that Memblers had put together was running slow, and I saw that it runs its code out of WRAM (aka SRAM). I made SRAM cached so that this would be faster (I have to transfer it to an uncached location when transferring to the controller pack, and then out of that area when transferring to the controller) and it brought Zelda also up to speed.
I now use triple buffering for the video, which fixed the flickersomeness in Final Fantasy.
I added an option to disable the DMC, as it was really irritating me in Final Fantasy 3 (which is an incredible game, get the translation at The Whirlpool!)

So at long last here's the release... I'm going to have all the source code available just as soon as I put the files together.
I don't plan to work on this again any time soon.
Made it much harder for a fatal exception to happen by putting proper restrictions on reading opcodes from undefined memory (it would use 0 as a base address). Aladdin and Sonic 3D 6 still crash, but Aladdin keeps playing its music and both can be restarted via the menu.
I made a set of batch files and utilities (which will be included with the release) which can be used to make a set of NES ROMs into N64 ROMs for backup units without proper emulator support (like CD64 and, I think, Z64).
I've out together v1.1, I'm just going to wait until the 25th to release it. If you want it now I'll send it to you, I'm just now ready for the full-blown release yet.
There was a click in the square channels in SMB, I fixed it by changing the time and manner in which the volume is updated, based on some info I got from FCEU.
I also switched to 16-bit sound and made the volume a bit lower to prevent any sound overflows (which was happening in Kirby) but still maintain all of the detail of the sound. You'll just have to turn up the volume a bit.
OK, I lied, Bignose Freaks Out isn't perfect, there's an annoying buzz after the Code Masters logo appears. But gameplay is perfect.
Im currently in the process of compiling a readme for v1.1. Again I beg for the help of people with backup devices to help me write a section for their devices.
I'm aiming right now for a Christmas release date.
Here's the Bignose Freaks Out screenshots:
I'm trying to fix the linear counter to solve that irritating bug where the tri channel cuts off too soon in the SMB3 castle. blargg recently made a post on the subject on NESDev and I'm going to implement his algorithm exactly. Yep, it worked, now it sounds the same as in NotSoFatso. I also fixed a dumb bug I introduced yesterday which broke the iNES NES Test, now it works.
So what's left to do? :)
My DMC IRQ support was totally broken. I fixed it and now Bignose Freaks Out works perfectly! It's something like Sonic crossed with Bignose the Caveman. I'll have screen shots of this rarely-well-emulated game as soon as I can get a disk drive back in my laptop.
Fixed a plethora of sound errors today through proper emulation of the sound status register. This fixed NES Test, Star Wars, and the DMC of Retrocoders and Solar Wars (both using NT2, I think.)
I also fixed that evil clicking problem in the triangle wave by just stopping oscillation when the wave is disabled, not setting the output to zero (while that may work for square waves or the noise channel which are either on or off, the tri wave has different output levels and suddenly switching it on or off will produce a click.)
There was also a slight bug introduced into the SMB underworld music, this was fixed (a problem with the linear counter.)
Implemented proper screen limits for PAL and NTSC (NTSC doesn't show top and bottom 8 lines but PAL does).
Loading message now shows for entire duration of loading.
I tried to implement mapper 66 (Mega Man, SMB Duck Hunt) but it seemed to fail totally at making either game work, so I went back to making Mega Man just use mapper 2, which it works perfectly with.
I added partial mapper 71 support (Big Nose the Caveman). Solstice is also supposed to use this mapper, but it works better with mapper 7 so I have it use that. Big Nose Freaks Out works until the game itself starts, then it freezes. it does some pretty weird things with scrolling so I think I'll just leave it be . I don't know of any other mapper 71 games that are working.
I had a bug in the MMC3 SRAM enable/disable loops, they were overwriting registers in use by the CPU emulator. Now Startropics works perfectly, the intro music plays, there is no longer a graphics problem upon entering the "test of island courage", and the jumping sound always works right. I fixed a similar issue in the Arkanoid paddle handler, but it didn't seem to affect anything.
Tested 3D Block (3D Tetris) and Startropics 2 for the first time in a while, both work. Rad Racer no longer looks as good as it used to.
Tada! With a new timing innovation (not just one copied from FCEU) I've solved all the bouncing problems in the several Rare games that had them (Wizards & Warriors 3, Battletoads, Battletoads Double Dragon). This also fixed the periodic graphics glitching in Elite, so now it is practically perfect (the only remaining flaw being the fault of the author (it runs in PAL mode, not NTSC), but fortunately Neon64 is able to handle that.
with PAL mode activewithout

For the longest time I thought that was my fault... then I finally got around to testing it in FCE Ultra. Same results.
I found that the Star Wars problem was an MMC3 IRQ timing issue and not something fundamentally wrong with scrolling. I fiddled the timing a bit to fix that problem, but it caused some other very minor problems. I don't know if it is worth striving for total accuracy in this area, as the errors are very small and in no way affect gameplay.
By disallowing writes to CHRROM I fixed a slight glitch on the Star Wars title screen (not the star destroyer problem), and the RTC demo now fails its emulator test and works properly. The Dropoff 7 demo now also displays its title screen.
I also removed an erroneous optimization I had made to the CHRRAM pattern table compiler, I was only compiling when the second bitplane had been written now I compile upon the writing of either, this fixes the reversed colors in the Wizards and Warriors 3 intro and made Princess Tomato in Salad Kingdom playable (you can see text now!)
I made many changes to the order of things in the emulation loop to more accurately reflect FCEU's EmLoop, and lo and behold Bomberman works with nary a glitch. Capcom OST also switches smoothly. I made some more changes and got both Double Dragon and W&W2 working with no status text bouncing.
I fixed a bug I had introduced in MMC1 when I fixed Dragon Warrior 3 and 4, Monster Party now works again. I had to adjust the number of cycles before NMI significantly before I could get both Big Foot and Dragon Warrior 3 working properly in the same version. Unfortunately the optimum arrangement involves a small amount of status text bouncing in Battletoads.
I now have only minor graphical or sound errors in every licensed USA game I know of (with a mapper I support).
I've compressed the main program with RNC Pro-Pack to get it down to 22K from 80K. It was one of the easiest things I've ever done...
I changed the ROM detection system to be able to find the ROM at the very end of the code and not just at 2MB, this is for an experimental method to allow creation of a ROM CD with the CD64. By the way, if anyone has a CD64 and would like to test it let me know!
Made slight change so that the FPS meter wouldn't interfere with the logo drawing.
Wrote all of the December log entries online. A lot of work, no?
Several games that did nothing but crash before (Solstice and Big Foot) are now working. I attribute this to the fact that i just removed a stupid irritating debugging tool I've had sitting around for months, which checks for stack oveflow constantly. This had been left in because for some reason removing it slowed emulation down considerably. Removing it now has not changed speed at all that I can tell, but its certainly saved the processor some work and the program some space. I know this directly fixed flickering in Marble Madness, which works quite well. It also fixed a slight flickering in the "door opening"/"door closing" effect when entering/leaving a town in FF.
I fixed a graphics corruption issue with the backup unit version's reset function (most noticeable in SMB3).
I now clip the top and bottom 8 lines of the screen (all the work is actually done by the RDP, I just changed the "scissor" value), as I think the NES is supposed to do. Now games with one-screen vertical scrolling (like Final Fantasy) no longer have a line of garbage at the bottom.
Double Dragon problem was due to my handling of reads from valueless registers, I fixed that and it now works.
Bomberman shows a blank screen after the stage number, though the music still plays.
I figured out an odd slowdown that was happening after loading a new ROM in the GS version, by doing so I found the ultimate source of the cache problems that had plagued me for so long and fixed them (it involved initialization, I would set the screen buffer to 0 and then draw there, but this would overwrite code!)
Implemented a full save system, with three saves per controller pack and name entry so you know what saves there are (and which to delete when out of space).
Fixed a slight timing issue, instructions that do stores across page boundaries do not have the extra cycle penalty, only read-only instructions.
By waiting 4 cycles instead of 35 between setting the vblank flag and triggering the NMI I now have Deadly Towers working properly. The change in values was between two version of the FCEU sorce I had looked at.
Added a PAL mode which simply extends the vblank period by 50 lines, doesn't change the audio generation rate though. Elite and Asterix work nearly perfectly in this mode.
For some reason Double Dragons does not get past the start screen.
Added support for mapper 34, but only the Deadly Towers portion (mapper 34 is two mappers combined under one number, one mapper is far games with VRAM like Deadly Towers, the other is for games with CHRROM like Impossible Mission 2. I am not supporting the Impossible Mission 2 part (will generate an unsupported mapper error if there is CHRROM present). This also means that I now check all 8 bits of the mapper number instead of just the low 4. I had to add a hack to get Mega Man working with this, for some reason it is listed as mapper 66 but runs fine with mapper 2, though perhaps 66 is subtlely different as SMB+Duck Hunt is mapper 66 and doesn't work right.
Deadly Towers only works until you go into the first room, then the controller stops working. It looks like the NMI is called while still in the NMI handler.
Added menu system.
I added a function to restart the GS version so that a new ROM can be loaded without having to restart the N64 and patch the emulator in again, but there were some issues with cache and nonsense. I seem to have gotten it to work.
Changed some UI aspects, made a neat new ASCII logo (though it could use improvement).
I'm going to stop continuously working on Neon64 after the next release (1.1 is coming up soon, a whole lot better than 1.0 and 1.05), and I was wondering if there are any games people would like to see emulated before that happens. Just send me a list and I'll tell you:
  1. if they'll work in the current dev version
  2. if they might work before the release
  3. if I'll never support them (within the forseeable future)
It would probably be a good idea to test the games on the current version before asking...

Bugs fixed today:
The issue of Dragon Warrior 3 and 4 not working was solved, I had not realized that the 1st/2nd 256K PRGROM switch applied to the hardwired bank as well.
This also enabled FF1+2 to run, however when I tried to play FF1 it crashed when I attempted to strike up a conversation with anyone. This is only with the FF1+2 combined cart, FF1 by itself works fine. I didn't notice any problems in the FF2 portion.
I removed the irritating sound from Bubble Bobble by extending the "if wavelength < 8 disable channel" thing to the triangle channel. Now isn't that nice.
I still need to remember to go back in and optimize (ever so slightly) the sp0 hit detection, I don't need to cycle through each pixel of the sprite now that I'm just checking for any active pixels at all, I can just or the bitplanes together and check if it is nonzero.

Additionally I just recently found Wizards & Warriors 2 (it is called Ironsword) and noted that the following happens when you enter a store without any money:
I found that the Battletoads hack caused Zelda 2's status text to glitch. I then found a much better hack which not only improves Battletoads but causes no known problems in other games: instead of checking for an intersection between sprite 0 and the background I just check for the first non-transparent sprite pixel (thinking about it I could do this a whole lot more efficiently than I do now). The Battletoads background is now properly aligned with the sprites.
For whatever reason the break flag is supposed to be set by PHP. I implemented this and it fixed the R/W SR and BRK flags tests in NEStress.
For some reason Wizards & Warriors reads from $4006. I return 0 for unknown registers...
I combined the Arkanoid paddle data into the second player controller so that both two player games and Arkanoid work without modification.
I fixed two problems with sound. SMB2 sets the sweep shift amount to 0 to stop the sweep of square channel 1, but I did not detect this. The other fix was with the sweep end conditions, where the channel would be disabled with a wavelength over $7ff or under 8. I didn't have this properly disabling the channel. The two problems this fixed were in Dr. Mario and SMB3, and both situations involved a thud sound.
After fixing these problems the program ran noticeably slower, but not because of an increased workload. The ordering of the various included files greatly affects the speed of the program, due, I expect, to cacheing issues. I managed to get everything to a very nice speed by putting all includes except the read and write handlers at the very start of the code, and I left the read and write handlers at the end (just before the logo data).
I also noticed a new sound problem: Startropics doesn't have any sound at all (except for an occasional bizarre glitchy squeal) until you talk to the island chief. The sound cuts off again just when you enter the "Test of Island Courage", but quickly returns. When the music is active it sounds perfect... I tried reenabling frame IRQs, but that was not helpful. The sound registers are all zeroed until sound is activated.
I also found a problem which may go a long way towards fixing some other issues. The Dropoff 7 demo is essentially just a DMC demo, but it is supposed to have a title screen. The title screen does not appear in Neon64, though the sound works "perfectly". Since the graphics setup is likely very simple it should be easy to disassemble this code and see how it works, and from there I can find my problem.

On a positive note, the To-Do list fits on a single screen for the first time ever!
Ys and RC Pro-Am were fixed by recognizing that there are two one-screen mirroring modes in the MMC1. This tidbit of information was found in FCEU.
Implemented support for greyscale mode, the only games I know of use it only for effects, like flashing the screen (FF3 does this when you enter a battle, SMB3 also does it when you have beaten a castle.)
I also discovered that the frame IRQ causes Wizards & Warriors 3 to crash, so I left the IRQ itself out, though I left in the rest of the logic for it. I see no reason to leave it in, it doesn't help anything.
Began to implement support for the Zapper, but a complete lack of success in actually hitting anything (the trigger worked fine) made me pause. I did successfully implement the Arkanoid paddle, though it seems to me to be a great deal harder to use. Its nice to have some true "analog" control in there, though. In order to allow the user to select this feature I intend to implement a menu for configuration.
I got the brick breaking sound in Super Mairo Brothers working by putting the frame counter for sound channel length right at the beginning of vblank instead of after it. This was done by accident trying to fix another problem. This also made Samus' footfalls in Metroid sound right.
I also added support for sound frame IRQs, but no game yet seems improved by them (I have it to the point where it doesn't break anything, either).
FluBBa's NEStress enabled me to find a problem with the TXS instruction, it should not affect the flags at all. This doesn't seem to have fixed anything at all, but its nice to know that I'm just a little more accurate than I was before. It also pointed out some other problems which I'll look into.
I also implemented color deemphasis (the three high bits of $2001). The programs I know the behavior of seem to work properly with this, Chris Covell's Wall Demo, Super Spy Hunter (when paused), and Final Fantasy (upon entering a battle).
I also doubled the size of the pattern cache. Teenage Mutant Ninja Turtles 2 and 3, as well as Xenophobe, run at a good speed as a result.
Reverted to an old version of sprite 0 hit detection, as the new experimental one gave me nothing but headaches. Solar Wars scrolls properly anyway. The following screenshots are from working games, new and old (but only recently tested.)

Fixed by proper formatting of $4016.

This used to crash at various points but is now at least playable in the first level (a slight graphical glitch in the intro is the only problem I've seen).

Quite an incredible little unlicensed game...

FF2: This used to have scroll problems, but not anymore. The translation intro is messed up, but I'm told it doesn't work on the hardware anyway.

And a little note from Wizards & Warriors 3
I think I've located the thread that binds Battletoads and Elite in glitchiness, both have the screen turned off at the beginning of the drawing of a screen. Also, it seems that Elite reads from VRAM 8 times for no apparent reason, yet this is vitally important to getting the game working right.
Many thanks to Jsr from the NESDev forum for pointing out my problem with controller access, I had to OR all of the values read from $4016 with $40. Now Mad Max, Dirty Harry, Paperboy, and Bomberman work.
I should test Super Mario Brothers Adventure (SMB3 hack) and Flubba's NESStress
Improved timing (by adding the various +1 cycle when a page boundary is crossed and added a delay between vblank flag set and NMI) to the point that the status text in Battletoads no longer bounces as the character moves and the line that appeared in the Vulture (in the scene just before the game starts) has been eliminated. I still need the sprite 0 hit hack to make BT work, but now I know why:
When the game starts the one-screen mirroring is set up wrong, and the screen is set to a blank screen until an sp0 hit is detected. Of course with a blank screen there is nothing for sp0 to hit, so nothing happens. The only reason I was getting any results at all is that on skipped frames the faster sp0 hit detection will always find a hit, because it is set in the first detection of sp0 regardless of background. I don't yet know how to get this to toggle right.
Reverted to the old SPRDMA method (copying the data to a memory location set aside for SPRRAM) and the Castlevania and Battle of Olympus flickering problems were corrected.
Fixed some timing issues, sound is now calculated at the proper 262 lines per screen (didn't work before because I had left the sound calculation out of one line) and an h-retrace is 1 cycle longer (due to two rounding operations this cycle was left off both here and the scanline, battletoads and w&w were not happy).
Made sp0 hit on frames with gfx disabled (for speed) more accurate by recording what line the hit was triggered on on the prvious line.
By puttin v=t (see loopy's docs) after the junk scanline for loading line 0 sprites I was able to fix W&W without any hacks, as well as get Battletoads to work almost perfectly (except that sp0 hit thing), all with the 512 cycle SPRDMA latency so that Castlevania works. Yay! The only remaining issue with 'vania is the "C" on the title screen flickers. There was a similar problem when I first implemented the new SPR DMA method...
I had fiddled with sp0 timing quite a bit, and the Solar Wars title scroll is now off by a line or two. Drat. And I must find out why sp0 doesn't work for Battletoads, I'd rather not have any stupid hacks in there.
Wizards and Warriors 3 works perfectly, and Castlevania 2 now runs at a good speed (before it was unplayably slow).
And now, just so you know this isn't one big lie, here are some screen shots.


Wizards & Warriors

Wizards & Warriors 3, the great abuser of graphics
And I just recently discovered an Easter Egg in Kirby's Adventure, so I loaded it in Neon64 to see if it worked:

I currently do not know the cause of the black block on the top, but there was a similar issue in Mike Tyson's Punch-Out.

Also, the dialogs in Final Fantasy now cut off at the right point! Yay! I'll have to test Battle of Olympus...
I changed MMC1 implementation a bit, to more correctly fit Matt Richey's explanation of a reset, but it has had no noticeable effect.
Zelda intro scroll and Wizards and Warriors worked when I cleared the 8 sprite and sp0 flag when the vblank flag is set instead of waiting until the end of vblank.
There was quite a nasty bug in the DMC IRQ, which would have messed with frame toggling and mmc1 reg0, but fixing it doesn't help anything. Oh well. Bomberman 2 now works, but I'm not sure if I can attribute this to this fix. Probably not.
The games which were previously quite slow seem to be running at a good speed, such as SMB3, Journey to Silius, and Zelda, I'm not really sure what to attribute this to.
I messed with SPRDMA timing a bit, and to that end I made the 6502 emualtor check if the cycle counter is > 0 before it executes a single instruction, but again this messed things up a bit so I left the 512 cycle delay out.
I played around with the name table bits int he VRAM address register, but in the end I decided to leave it alone and stick with a strict interpretation of loopy's docs.
I got Battletoads working decently by triggering a sp0 hit at the end of a sprite, whether it has hit the bg or not. This is not perfect, and is in fact a dirty hack, but I don't yet know what aspect of sp0 hit I am emulating incorrectly.
I've noticed that Castlevania no longer works right, but will work very well with an SPRDMA latency of 512, but this destroys Wizards & Warriors and messes up Battletoads a lot. I'm just going to leave it for now and be happy with W&W and BT working.
I've been running through the disassembly of Elite to try and find my problem with it, and in the progress I found an RLE routine. Pretty neat stuff.
A series of dirty hacks got Punch-Out!! working as well as it can. The thing is that it would require greater precision than the Neon64 engine currently can provide in order to work perfectly, so the only way to even approximate is via hacks. Most what I did was disregard sprites and VRAM writes/reads, check the first 16 (why? because it works.) tiles in a line for something to change the MMC2 latch, and set the latch to $fd at the beginning of each frame. And here . . . are my results!

The glitches on the VS screen and on the intro to the Mike Tyson version are the only ones I know of, they would require mid-scanline bank-switching to work corretly and that ain't gonna happen.
I also fiddled with Battletoads a bit, but with no success.
I've a nice Zelda 2 screenshot I never bothered to upload, demonstrating that it does indeed work:

Also, I've noticed some bizarre status bar problems with Ys and Princess Tomato in Salad Kingdom. Xenophobe has been tested and is really slow.
Began to write support for MMC2. The Punch-Out! demo mode works, but the actual in-game graphics are messed up (the status bar and crowd).
Notes: make sure sprite checker increments pointer to SPRRAM. Also, SPRRAM DMA maybe take 512 CPU cycles, not 256.
Added a bit of a hack to MMC1 that allows Die Hard to work correctly. It sets 8k CHRROM mode in reg0, but it still uses both reg1 and reg2 as if it were in 4k mode. I check if any write is made to reg2, and if so I override the reg0 setting and treat it as 4k.There is no reason why this should be happening, but sometimes you must just go with what works.
It should be noted that even FCE Ultra doesn't do MMC3 IRQs right.
I've been working for a few days on getting MMC3 IRQ timing to work right. It is impossible for it to be perfect, because Neon64 is only scanline-accurate at best. It is now good enough, however, that Kirby and Super Mario Brothers 3 work almost perfectly (SMB3 has a few very minor glitches on screens where more than one IRQ is done per screen.) As a side effect Earthbound's battle screens now work perfectly, though there are occasional point in the game with minor name table corruption, but it is highly playable. For unknown reasons Solar Wars' music is now up to speed. Several games also no longer work: Star Wars, Sonic 3D, Somario (all MMC3)
Added mapper 11 support, now you can play all the fun Bible and pest control- themed games you grew up with. But seriously, P'radikus Conflict actually seems like a decent game.
Tried to work out the problems with Final Fantasy 1&2 (a combo cart). A lot of my possible solutions involved initial values for the for the MMC1 registers, but I only succeeded in confusing myself.
Fiddled with MMC3 IRQ timing a bit, seems to have fixed the little flickering issue above the SMB3 status text. There does seem to be some "off by one" issues, such as on the world select screen and in the room in the world 1 castle where the ceiling moves.
Made screen not swap when in debug mode, so that status text doesn't end up on the other screen. Also made bad opcode trigger debug mode so it would also enjoy this benefit.
SMB3 still a bit glitchy just above the status text...
Found problem with Zelda and Solar Wars scroll: since I now have the spr DMA go directly through the RSP I never write the values to the CPU's copy of SPRRAM. I fixed this by having the DMA routine copy the sp0 data (first 4 bytes of SPRRAM) into the CPU's SPRRAM, where it can be accessed by the sp0 hit syncer. Mach Rider works very well again.
Made sprite 0 detection a bit more appropriate, easy with the 8bpp background texture where 0 is unconditionally transparent.
Zelda 2 status text now works right.
Implemented a system where sprite DMA is done directly by the RSP instead of intermediately by the CPU first, had to write back the values from cache first.
Sprite 0 flag is now set at (approximately) the correct point on the line. Zelda status text cuts off a bit late, Mach Rider is horrible. Solar Wars title is slow.
VRAM status is now only updated when it is changed, so I don't have to be continuously accessing the RSP.
Implemented proper RMW operation (though not for zero page instructions as they would never access any memory mapped I/O). Also implemented MMC1 "too fast" exception. With these modifications (both based on comments by Xodnizel) Bill & Ted now works.
Figured out how to make the MMC1 CHRROM switch work right for almost everything. I removed any special treatment for 8k except the following: upon a write to reg1 (while in 8k mode) write that value +1 to reg2. This works in Zelda 2, Big Foot, and Die Hard (until you change floors, but then activating the start menu clears up the problem, which I still haven't identified. There is also occasional name table glitching...)
Solved "line doubling" in both SMB and Zelda by putting both a PipeSync and TileSync after each rectangle.
I made TLUT (palette) load only when the palette actually changes. Interestingly, when I disabled changing it the palette remained in TMEM even after N64 power had been turned off.
I moved the MMC3 IRQ to before hblank, this seems to fix the jumpiness in Kirby (start screen) and Crystalis (message boxes), but I suspect it may be triggered a line too late (early?) Plus I don't have it properly react to the effect of changing the VRAM addr. Fiddling the timing by a line seems to mess it up in certain situations, so I'll just leave it be for now. I don't have any mechanism for triggering it during a scanline anyway.
Removed PC "optimization", W&W works. Very, very rarely I see a line out of place. The flaw would likely be invisible to the untrained eye. Both times I noticed it on the W&W title screen, where the knight is facing off two bad guys.
The status text in W&W bounces a lot. I'm not sure if this is new.
Maybe if the RSP DMAs the sprite data directly from the address specified by the DMA command we can save costly CPU overhead in the matter? Writes to SPRRAM via $2004 (which I'm told no game uses) could be made directly to the RAM in the RSP. Something to consider... Because it seems like sprite DMA efficiency is very important in some of the slower games (Sprite demo).
I think the slight MMC3 rearrangement has caused an unwanted line in SMB3 (world select screen)
In addition to the music being slow, Solar Wars also occasionally freezes after the planet select screen (waiting for music to end?)
Big Foot runs super slow, as well as having the zelda 2/die hard glitches. I've heard it has multi-split-screen scrolling, maybe it is too big for the cache?
I moved the VRAM_V and VRAM_X registers to RAM, the RSP now DMAs them when it want to update them (along with the pages), instead of having the CPU write the values directly to DMEM. This was intended to help with the glitchy scrolling. It didn't work.
SMB sp0 is not flickering, but rather bouncing. Zelda status text does the same thing. Happens no matter how much DMA protection and what parts of sprite rendering I take out. Doesn't happen when paused (with SELECT).
Journey to Silius had major scroll issues, it was a problem with the new VRAM_V DMA. This was fixed by caching VRAM_V, VRAM_X, and VRAM_T. I also had to perform a cache op (#25) each time I use VRAM_V or VRAM_X .
Put gfxless PPU in a seperate loop, didn't help with the occasional crash on start but may have made it a little faster.
Wizards & Warriors is broken, probably because of PC optimization.
I never knew that Journey to Silius and Zelda ran in 8x16 sprite mode. This is interesting because both have speed problems, this may be because a game in 8x16 sprite mode can have twice as many sprite pixels on the screen at one time, and more pixels means less speed. To help out with this a little I made 8x16 sprite switching a tad more efficient (no noticable speed increase).
I've found out some information as to why Bill & Ted doesn't work, apparently instructions which read from memory, modify the value, and then write the result, first write back the original value. This, combined with the fact that when writes are made quickly only the first one is detected by the MMC1, should enable Neon64 to run Bill & Ted. Also, only the address to which the last (fifth) bit is written determines the MMC1 register which the value is written to. This information, by the way, came mostly from a thread on Memblers' NESemDev forum about FCEU, with some comments by Xodnizel himself.
I also got an updated version of Brad Taylor's PPU doc, hopefully to help me with th scroll bug and some other issues.
I also found a description of how monochrome mode and the various color emphasis bits work from Chris Covell.
An idea that I had simmering for a while came to life today, I figured out how to pipeline sprite pattern DMAs like I did with BG patterns. This did make things a bit faster. There is some serious flickering.
I figured out why the fadeout wouldn't work sometimes, it was simply that the double buffering was switching the screen, and so half of the time the fadeout would be on the other screen.
Zelda crashed, seemingly at random because I haven't been able to reproduce it. Crystalis had done the same thing...
I think that the CPU optimization implemented yesterday provides such a small benefit yet still makes emulation less accurate... it should by all rights be removed. Yet the difficulty in such an operation is not to be underestimated. I could replace codes.asm,,, and a6502.asm with older versions... but I'd still have to make additional changes to neon64.asm, sound.asm, and probably others.
Added a slight CPU optimization where the PC page isn't recalculated on every read from PRGROM (or RAM, if there is self-modifying code). Unfortunately this does not seem to have had much of an improvement in speed, and this decreases compatibility measureably... oh well, it will stay for now, its too hard to undo.
I also fixed a problem with CHRROM corruption involving that blasted cache issue again, the ROM is now once more uncached (I think I changed this because of GS). This fixes at least Mega Man 4, Mega Man 6, and Final Fantasy 3.
I also invalidate any cache overlapping the ROM, which fixes some crashes, most notably Crystalis.
By removing some dmarealwaits in the PPU I was able to speed it up slightly, with no ill effects. This was based on the assumption that the RDP runs slower than the RDP's DMA, so I don't have to wait until a texture RSP DMA (out of DMEM to DRAM) is complete to tell the RDP to draw something with that texture, as the RDP probably isn't done drawing the previous primitive anyway. I did have to fix some problems with the bg renderer due to dmawaits being inside DMA requests instead of before them, as was true everywhere else in the PPU.
I fixed the Solar Wars title flickering problem by only updating VRAM_X at the end of a scanline. Graphics are only scanline-accurate in Neon64 anyway (which mean it'll never be able to run certain games), so this doesn't change anything (other than fixing that problem, but I'm not sure why that wasn't a problem in the old version) I listened to the Solar Wars intro the first time while testing this, and it seemed abnormally slow, not that the tones were downshifted, just the tempo was off.
There are issues with the noise channel in NES Test (of course I've issues with the noise channel everywhere anyway.)
I fixed a problem or two with my new alpha=1 graphics, several places where code cleared the screen it filled it with zeros and this produced small problems.
I stabilized the sprites (Kirby sprites jumped around a lot) by inserting a tile sync at the end of the display list. I also implemented a slightly better sprite 0 hit detection (when I reivsed the PPU for the RDP I made it just do the hit on the first line of the sp0), which sets the flag on the first line of the sprite with any set pixels in it. I don't know of any game that this isn't currently working for, though it is a bit of a cheat.
Solar Wars seems to drop lines out of its scrolling flame effect at random.
There are also some problems with saving and reseting (part of the same system), problems of the "it doesn't work" variety.
Also, Mega Man 6 has developed the same pattern corruption issue that Zelda 2 and Final Fantasy 3 already had.
It also seems that there are some instances where sprites seem not to flicker properly, like in Castlevania (lines drop out in an odd way).
I got the background rendering working a little faster with the RDP, but still only around 20 FPS (by a stretch of the imagination). I came up with a better idea anyway, which let me draw he BG with the RDP yet still operate at the speed of the RSP "compiled tile" versoion. The idea is that I have the old background renderer write an 8bpp color indexed texture instead of an actual 16bpp truecolor line. I then have the RDP draw this line to the screen, using the NES palette as a TLUT (texture lookup table). This is the same way I handle sprites, so now all I do with the RDP is draw four rectangles (one to fill with the background color, one for background sprites, one for the background/playfield, and one for the foreground sprites, drawn in that order) and all transparency is handled for me (except I have to have the sprites erase each other...)
I had to implement double buffering for the video, because there was quite a bit of flicker caused by the sequentiality of drawing, I also synced the screen updates to the vertical retrace so it looks considerably smoother now.
The sprite priority problems in SMB3 and Castlevania were fixed by the new RDP useage, as that was the main point of doing it anyway. I have the suspicion that some games run a little slower (like Journey to Silius) but I haven't done a side-by-side comparison yet, and they actually seem to run smoother anyway from the double buffering. I should also note that when I press reset in Journey to Silius the fade to red routine which I have inside the reset interrupt handler is not run. I have not noticed this in any other game, perhaps it is a clue as to the game's extraordinary slowness?
Expect an update to both the backup unit and gameshark versions soon, after I implement a few more optimizations I'd like to try.
Back on the subject of resets, I've found that a reset can be delayed by constantly issuing SI DMA requests, i.e. if you press the reset button but the N64 program is constantly issuing DMA requests the N64 will not reset, until, that is, the DMAs stop. Another issue is the freeze at boot. For some reason every N64 program must write a certain value to a certain place in PIFRAM (IIRC) or the N64 will freeze. It turns out that constantly issuing SI DMAs will also prevent this freeze from occuring. I have not yet done any quantitative experiments to find out exactly how frequent the DMAs must be to prevent a reset or freeze. Also, since the freeze on boot can be prevented either by writing to PIFRAM or by issuing DMAs, I thought that maybe reset might be stopped in the same way. It didn't work, though, when I tried it. When I had my reset handler write the value it didn't reset, but merely froze. Maybe the interrupt line was never cleared?
Background rendering with the RDP is also complete, but everything is very, very slow at the moment. It comes down to an issue of waiting for certain things at the right time and no other.
I've been able to convert the sprite rendering portion of Neon64 to use the RDP, next comes the background.
I also merged the two versions of Neon64 into a single pile of code so they can both be updated at once.
I'm noticing a bit of a pattern in the log, I say something is wrong and then the next day I contradict myself.
It turns out that RDP control from the RSP is almost exactly the same as from the CPU, in fact the way I was doing it was correct, there was just so much else wrong with my test program that it wouldn't work.
I did learn that it is apparently impossible for the RDP to read textures or palettes from RSP DMEM (or IMEM, I tried), even with the fully qualified addres (0xa4000000). This means I'll have to use another SP DMA to get my rendered sprites out of DMEM and into RAM where the RDP can get to it.
Controlling the RDP from the RSP is not like using it from the CPU. No success yet.
I would like to apologize to the RDP, there's nothing odd about the palette at all. I was just getting odd effects because it appears that you can't make a texture less than 16 texels wide, and I was trying to do it with 8. By padding my textures eveything works swimmingly. I haven't done it throught the RSP yet, that's the next step.
Not only have I figured out textured rectangles, but I've got palettized textures working, too. The only problem is that they're really weird and I don't fully understand them yet.
I also have a fairly solid system for sprite/bg priority set up, which should fix the SMB3 and Castlevania sprite priority glitches. It turns out that things are a tad more complex than they appear.
I also made several modifications to U64ASM:
  1. support for parenthesis and commas in macro parameters
  2. trapping divide by zero errors
  3. fixed a problem with multiline macros with 10 or more parameters
I finally got around to looking through the RDP source Destop sent me, I've been happily drawing variously colored (even striped) rectangles for a while now. The next step is textured rectangles, where I see the greatest promise for Neon64 acceleration. I also figured out a way to use the RDP for drawing without sacrificing the scanline-at-a-time methodology I've been using: I need only use the Set Scissor command to limit the RDP to drawing a single line.
I also fixed a bug in the controller strobe code, but it didn't fix anything.
I think I may have found the source of the terrible slowdown in the Gameshark version. I had enabled an interrupt accidentally, and I have no idea what it is for, but apparently it was eating up a whole lot of cycles running through my excpetion handler.
I implemented a frame rate monitor, accessible with the L button. It tells me that most games run around 40 to 50 FPS.
I doubled the horizontal screen resolution at the intro/loading screen, which allowed me to fit an old extended ASCII art logo there.

Of course on a TV the aspect ratio works right, it almost looks like normal text mode.
Fixed an error in the GS version which was introduced yesterday, SMB3 sprites now works right.
It should be pointed out that the GS version is abysmally slow when it comes to sprites, in Journey to Silius with many sprites on screen I have seen the framerate drop to about 2 FPS. The frameskip is just about perfect, though, I have yet to hear the music skip.
You might ask why the Gameshark version of Neon64 has just been released while the normal version remains unupdated. The reason is that my v64jr has been very unreliable as of late and I've rarely gotten more than a few seconds of use out of it before it fails. The next big release, which will include the sprite speedup (which is NOT in either version right now) will update both versions and will be called v1.1. By then I may have worked out a general purpose loader for the GS and seperate versions may not be needed, who knows.

To clear up a bit of confusion, the GS version is indeed different from the normal version. It has better frameskipping, bugs in the DMC (a sound channel) have been worked out, and mapper 7 support (Wizards & Warriors) has been improved. The GS version does not "enable use of the GameShark", it is an entirely seperate version which can be used with the GameShark. The two versions are not interchangable.
Improved frameskip so that the music is actually kept working. Simplified the sending procedure for Gameshark (down from 1,000 steps to 999!). Got Wizards & Warriors working.
FF3 has significant sprite corruption in the GS version.
I have made modifications which allow me to run ROMs of larger size over the GS, such as Kirby or Metroid. Neon64 runs noticeably slower over the GS...
Some of the slowdown (most noticable on SMB, which slowed to a crawl) was because I hadn't written back the NES ROM to RAM, it was all in cache. This also caused some sprite corruption. Neon64 is still a bit slower on the Gameshark, though, mostly when a lot of sprites are on screen the music will break up under the strain. Interestingly enough, I just happen to be working on a sprite speedup now.
I've successfully run Neon64 using only a GameShark! Go out and buy one if you know what's good for you.
Once I figured out how to send code to the N64's RAM, all I had to do was find some space in RAM to store the emulator and NES ROM and make some small changes to the Neon64 initialization. I believe this is the first time the Gameshark has ever been used in this way...
I'm still in the debugging process, but I have played Super Mario Brothers for quite a while. Awesome!
I am indeed working on the sprite speedup, but since I don't have my equipment with me I've been forcedd to test in an emulator, the retardation of which is well known. I do think that there remain no massive technical hurdles to o'erleap.
Note to self: have seperate function for bg and fg sprites, just execute straight through?
I have an idea of how I can use compiled tiles for sprites, which will help speed quite a bit, as sprites are currently the least effecient part of the PPU. The problem is that this will require a complete redesign of several components, and I don't really feel up to it now.
I've hit upon an interesting new idea, which may allow anyone with a Gameshark to play Neon64. See my post at Dextrose for details.
Fixed DMC support, Kirby's Adventure doesn't crackle all the time now.
Whoops, big error in CRC calculation, fixed now.
Added a proper saving system. Still only one save, but it CRCs the game which makes the save and asks you if you want to overwrite data from another game.
Added Delta Modulation Channel (aka DMC or DPCM).
I can draw black rectangles with the RDP.
Applied my solution to the noise channel, while not perfect it makes Final Fantasy sound worlds better.
There are still issues, the sound of breaking blocks in SMB and Samus' footfalls aren't right. I stopped the linear counter from automatically switching to load mode on terminal count, this fixed several situations where the triangle wave was playing too long (Castlevania pause, SMB underworld).
Improved speed throttling to eliminate the effect I used to get of half the screen being rendered twice as often as the other half.
Made several changes that resulted in stopping the buzzing in Mega Man 2 before the intro music starts.
Found the cause of a Zelda problem (on name entry screen), came up with a good solution which might also be applicable to the noise channel. The issue involved audio resolution.
Increased the size of the audio buffer to add some stability.
Here's samples of sounds... Mega Man 3 intro Metroid intro ...recorded from my N64.
Made essentially an infinite number of changes and additions.
Everything up through the noise channel is working pretty well. And by pretty well I mean it kicks ass. I love game music.
I've had a lot of initial success with sound, but it remains initial success. Songs and sounds are recognizable but so much is off so often...
It's going to take a lot more work, my overall audio generation engine is crap.
Problem located:
CPYABS was set up to use rRAM_A6502, which can only (correctly) access the zero page. Thus essentially every use of CPYABS was broken, because any time you might want to use the zero page you'd use CPYIMM for speed and size savings. I found the same problem in CPXABS. This warrants a closer scrutiny of the CPU for this kind of error.
Super Mario Brothers 3 is now continuously playable!
Dragon Warrior is now playable!
Wizardry is now playable!

You have done well in defeating the Bug.
Thy Experience increases by 1.

Interestingly, when I removed my debugging code Neon64 actually ran slower... so I put it back in.
By removing the MMC3 IRQ count reload from latch (on disable) I was able to make the moving-ceiling room of the world 1 castle (MMC3) have proper status text.
I implemented mapper #7 (AOROM), which is used on many Rare games, but when I tried it with Wizards & Warriors it was pretty much unplayable.
I've been trying to implement sound scheduling and so on, but I've been running into some trouble. Quite a bit actually.
I fixed the jumpiness in Total Recall (et al).
It seems that when IRQs are disabled SMB3 runs fine, but that's as far as I've gotten.
I made another boot cart out of my Gameshark. Now I can run Neon64 without worrying about the v64jr failing, in fact I don't even need to leave it connected at all. I still haven't found the SMB3 problem, but I have narrowed it down.
Still unknown problem with SMB3, I'm closer to finding it but it is becoming difficult, my v64jr keeps crashing after a while. I may need to rebuild my boot cart just to get anything to work.
I fixed a few CPU errors involving improper behavior during stack overflow. Now the SMB3 freezes completely instead of crashing. I expect some infinite recursion somewhere.
I fixed the bg clipping so that it actually works with scrolling, SMB3 and Stars SE work much better now.
I found that PNG is much better than JPEG for these screenshots, so the below images have been duly replaced.
I found several problems preventing SMB3 from running. First, I needed to disable MMC3 IRQs when the background is turned off. This allowed most of the game to run correctly. However then, upon exiting Toad's "line up the pictures" game, the game crashed. I found that it was due to an error in the CPU core, improper behavior when the stack overflowed. However, even upon fixing this, the stack still overflowed, thus the game would freeze for several seconds and then crash. I do not currently know why, but I suspect the CPU. Here are some SMB3 screenshots:

I also added a much more efficient tracer (no more bullet time while debugging) and found out more about the Zelda 2 and Die Hard bugs (which increasingly seem to be the same bug.)
Fixed several issues with cache age processing, made Sack of Flour title screen work. Also fixed an MMC3 PRGROM problem (in $a000 and $c000 mode), which has allowed the SMB3 title screen to work perfectly, though when I try to start a game it crashes. Also AD&D Dragon Strike now works, since I added a modulo to the PRGROM switch (games seem to like to try to access stuff like page $3f when they really mean $0f, since it only has $10 pages in the 1st place $3f%$10=$0f). I have added that same feature to CHRROM (since Dragon Strike's gfx are still a bit off) and to MMC1 PRGROM and CHRROM (hopefully will improve Zelda 2 and maybe Bill & Ted), but I have not yet been able to test it.
Tested, no improvements.
I found a nice image conversion program called pic2pic, it let me make a bunch of screenshots (JPEG compressed) for ya:

(yesterday's news)
There was an issue wherein the last entry in the compiled graphics pointer table was not being transferred to the RSP, this is what cause the Mega Man 3 (et al) problem. This was introduced by accident on 6/2/03.
Another issue involved speed. Neon64 has been running slow since I changed the ROM pointer to uncached. When I changed it to cached it sped up but several games displayed sprite glitches. I reached a solution by making two pointers, one cached and one uncached. The cached version is used to access PRGROM, and the uncached version is used to access CHRROM. I may do something similar with the VRAM pointers, which need to be written uncached so the RSP can read them, but they are read much more often than written. Some well-placed cache instructions should solve this and boost speed.
All Mega Man games, as well as SMB2 and FF3, now run without visible flaws.
Quick story:
To find the first issue mentioned above I went to the Media Center (computer lab with some books and a copy machine). Usually I can set up my N64, V64jr, and laptop, connect the N64 to a monitor (via the N64->VGA adaptor I finally got) and code away, but today they wouldn't let me use a monitor (despite the fact that there were a dozen free). So how does one debug a video game without video?
I loaded the ROM, pressed buttons on the controller from memory, took a screen shot, then saved it as a BMP on the laptop, viewed it in MS Paint. And that's how I found my bug. Good timing, eh?
This suggested the possibility of taking many screen shots rapidly and displaying them instead of saving them. Video over parallel port...
Some issues have sprung up since 5/31/03, causing glitches between tiles in just about every game.
It turns out that JrGrab sucked, so I wrote my own version. It supports any resolution and color depth and I intend to release it, with full source code, when I get it into a more final form. (Here it is in case you're really interested.)

Here's a screenshot from MM6, dumped with my screen capture utility.
There were at least three things wrong, and I managed to find them all during homeroom:
  1. The ROM was being treated as if it were in cache (which may have been what crashed SMB3). This helped me to catch #2.
  2. The 1K CHRROM switch was writing two pointers to the VRAM page table, which was writing over name table pointers, so effectively pattern tables were being used as name & attribute tables.
  3. The mirroring register was backwards (I think this was goroh's mistake [edit: no, it was my misreading]).
Now Mega Man 3 is working almost perfectly, with just a small pattern table confusion at the title screen and stage select screen. Haven't gotten to test others yet, I only had 30 minutes. Screenshots will be available as soon as I take them. I may incorporate JrGrab into Neon64 to save some trouble.
Got MMC3 IRQs to work right, but it looks like I'm still having problems in a variety of games (Zelda 2 included, which isn't even an MMC3 game). I have something fundamentally wrong with name table/attribute table.
MMC3 doesn't crash, but the graphics are messed up. Does anyone know of a document more detailed than \Firebug\'s and in English?
Fixed Kung Fu; apparently it had sprite clipping on, and sprite 0 was in the clipping zone, and thus not being caught. Apparently even if sprite 0 is not being drawn it will still trigger an sp0 hit.
I'm trying to add MMC3 support...
Changed some stuff in an attempt to fix scroll problems (mostly just removed old vestigial stuff). I reenabled the clearing of the vblank flag on the recommendation of some other authors. I also discovered the reason for the crashing of several games, I had miscoded the return from the MMC1 handler. I was returning to ra, which is the return address for the instruction, not the write function (which was stored at writera), so the instruction was exited before the PC could be incremented, thus there was an attempt to execute the middle of an instruction. Dragon Warrior has some bg corruption (wrong tiles in the wrong places). Kung Fu freezes waiting for sprite 0, it looks like sprite 0 is completely transparent and thus would never trigger a hit.
I also noticed an interesting phenomenon: when the debugger is waiting for a button to be pressed and the reset button on the N64 is pressed nothing happens. The reset only occurs after a button is pressed and the wait loop is exited. The ability to delay the reset could be most interestingly exploited...
In the past 24 hours I completely revised Neon64 to work with the cache for compiled tiles. Mechanized Attack, Jaws, and several others which required midscreen bank switching now work, and I have a nice, clean interface for adding more mappers.
I looked into the NES Test source code again, and in the pause function it seemed pretty obvious that it wasn't expecting the vblank flag to be cleared after it was read, although all documents I have say it should be. So I took out the vblank clearing and NES Test now runs as nice as you could ask for. Plus it didn't seem to break anything new, in fact Tecmo Bowl's scroll now looks a lot nicer.
During Calculus and Lunch I implemented analog stick control and second player control, respectively.
I've implemented basic saving, triggered on pressing the reset button (which also activates a neat fadeout). I've worked on some compression research, and it seems that RLE would be the best to implement if I want to fit more than 4 saves onto the controller pack at once, but I'll have to get some sort of menu up and running. Right now it just writes SRAM on reset and reads it on startup (though only if the SRAM enabled bit in the header is set.)
I also changed the controller access to work properly (DMA command to PI, DMA result back) with the PI driver I wrote for accessing the controller pack. This seems to be a bit faster. Thanks to LaC for clearing up any confusion as to how the PI works.
The Nemu debugger has proved very useful, just minutes after I got it working I found a bug which was the casue of Final Fantasy crashing:
Neon64 has a section of memory reserved for what I call "bgline", which is just the background color repeated 320 times. This is loaded by the RSP via DMA in order to clear out its internal buffer at the beginning of a line. This means that I never have to explicitly write a background color, which in all cases gives me a speed advantage.
Several versions ago I was running Neon64 in 256x240 mode, and recently I switched to 320 or so. As such, I had to increase the size of the bgline from 256*2=512 (2 bytes per pixel) to 320*2=640. I changed the code which writes and reads the bgline, but not the actual size of the memory region itself. The next area in memory was the compiled pattern table array. Whenever bgline was being updated, the pattern table would be overwritten with pixels from bgline. It just so happens that the primary background colors I'd been testing with (black and Super Mario Brothers' blue) were valid instructions, so this never became a problem unitl Final Fantasy's map screen used a different color for background. This color was written over the compiled pattern table and read into the RSP, but when it was executed the RSP simply gave up and crashed.
Thanks to LaC and lemmy for this great debugging tool!
So here's the evil plan: Nemu 0.8 has a very nice RSP debugger built in, so I should be able to use it to debug Neon64. I actually got to try it out today, and I found a bug in the RSP emulator that I need to work around (at least until I can get LaC to fix it). It turns out that there is no point to using the lwu instruction on the RSP (as apposed to lw), since the registers are only 32-bit anyway, so in Nemu the lwu instruction is not recognized, though it is on the N64 (it's treated exactly the same as lw). I hope that this debugging ability, though really slow, will enable me to more quickly find bugs.
And in response to the message someone left in the guestbook, if you'd like to beta test the latest version just ask.
LaC sent me a corrected version of Nemu. I'll see if I can speed it up at all past the 3 FPS I'm getting now...
I finally got to implement the pipelining I had wanted, along with more efficient address calculation, and that has finally pushed emulation speed over the top to 110%!
I'll work on that bug in Mach Rider (and possibly the same one in Final Fantasy) some other time.
I was going to work on optimizing the PPU, but first I found a problem in Mach Rider, with it occasionally crashing on the Absolute Indexed INC instruction. In trying to find the source of this problem I found bugs in the assembler, which I did manage to fix, but I have still not located the root of the problem. I am not even sure if it is a problem with a) the ABXINC instruction b) some other part of the CPU core c) some other part of the program, possibly even running on the other processor so I'll never be able to track it. To top it off I now have no faith in my exception handling routine's error reporting capabilities. All of this will no doubt be resolved when I have some more time and patience.
By the way, Mach Rider runs at about the same accuracy level as in loopy's NES emulator (loopyNES), and its his documentation that I'm basing my work on, so maybe...
The problem is more likely that we both decided not to bother with writes to the VRAM address register mid-frame for speed considerations. At least I'm pretty sure that's my problem.
I reverted to the version from 2/8/03 and then went about making many of the changes I'd been planning. I removed sprite caching and got an accurate sprite 0 hit working, so now Mach Rider is playable (though still very glitchy).

I took a bit of a speed hit in this, though, so speed was then at 80%.
Then I fixed some glitches with the new sprite loading routine and finally got the graphics compiler to work when only one byte needs to be recompiled, rather than the whole tile, which led to much nicer speed on games like A Boy and his Blob and Alfred Chicken.
Then I modified attribute byte loading to only load each byte once, for a new speed of 83%.
Then I made the RSP start before checking the sprite 0 hit stuff in the CPU and made some sprite VRAM table lookup changes to get rid of a problem I would have when using some higher mappers. I also made the sprite 0 hit check not take place if its been found already, and no effort is expended if the sprite is blank, because then no hit can take place (that check is really cheap anyway) . After that speed was back up to 85% again!

I also made some attempts at rearranging the background drawing function, as it still does plenty of redundant things (such as looking up addresses in the VRAM page table, which will only change on a name table change), but I messed something up and had to scrap that. I did get some good ideas for when I'll finally implement the pipelining that I've always wanted, mostly involving loops. I think a lot of inspiration for this came indirectly from Brad Taylor's NES Emulation Discussion document.
So effectively I got no speed increase but greater compatibility. Not a bad day's work.
I moved the sprite section of the PPU to the CPU, spent a few hours optimizing, then realized that it wasn't worth it, as I was still getting low speeds.
I think I'll just revert to the 2/8/03 version, remove the stuff with the cache (which at best causes no improvement and at worst is slow and glitchy), and try to optimize the PPU there. I will keep the concept of loading the sprite patterns when loading the sprite, instead of while drawing as in previous versions, as this will allow me to avoid having to wait for the DMA to complete.
I was able to rearrange the BG rendering so that it was calculated on the CPU and drawn on the RSP, but this proved to be too much of a load on the CPU, as I got only about 60% speed without attribute tables or sprites...
The new strategy will be to only do sprites on the CPU... we'll see how that works out.
A few screenshots I just got around to uploading:
The best part of the game... Um, should I laugh or scream?

In the running for best intro on the NES.

What SMB really looks like on the NES. The NES' video output isn't quite at the same quality level as the N64's, is it?
I'm going to take the PPU and totally rework it, read the TODO for details.
By increasing the cache size from 8 to 16 I got Castlevania working pretty much perfectly.
Sorry there haven't been updates for a while, I have been heavily working on the PPU, which I now have at about 82% speed, getting those revisions I've been promising in place, such as the overall reogranization and sprite cacheing (only for 8x8 sprites at the moment), I'll get the bugs out of those (such as Final Fantasy crashing at, before, and around the map screen) before proceeding. I have a new todo list up , the link is above. Consider all release date questions answered.
I also finally got an AC adapter for my NES (at Radio Shack on sale for $5 apparently they've devalued from the $16 it was in my catalog) so I now have an accurate layout for the video output. I'll be changing my output to reflect this sometime soon.
I did a huge amount of work on the sprite cache, getting it to work with various games and 8x16 sprites, but I still have bugs in it. Given the small if anything performance boost, I wonder if it is worth all of the effort I'm putting into it.
On a positive note, Total Recall and Alfred Chicken have been tested and work splendidly, despite the fact that Total Recall does not deserve to exist.
I also got the sprite 0 hit working reliably in both SMB and Solar Wars at last.
I did a major upgrade of the exception handler, in order to track down the cause of the annoying little line that appeared when Final Fantasy crashed (just before going into the battle scene), it now displayes the contents of all registers as well as the word at the location causing the exception. I also implemented a system to allow me to watch a block of addresses. This allowed me to track the source of the bug to the CPU emulator, where I had used my zero page memory handler instead of the absolute memory handler, which is faster but doesn't work outside of the zero page, in the increment and decrement instructions.
This caused the Final Fantasy battle scenes to work, as well as the enemies in Metroid.

I fixed the annoying line of garbage at the top of the screen. I also fixed the interpretation of the MMC1 CHRROM switch registers, all 5 bits are used (contrary to Firebug's mapper doc). This made Teenage Mutant Ninja Turtles and A Boy and His Blob work correctly.

This section wasn't working before.
I noticed some random crashes in Zelda and TMNT, but this could be because of a hardware problem, as the N64 died in a way it never has before each crash.
I accidentally deleted this entry, it went something like:
Fixed Mighty Bomb Jack (I wasn't clearing unsupported registers like controller 2, it was reading both controllers and ORing the results together). Fixed some MMC1 graphical glitches (skipped CHRROM switching for games without CHRROM).
Made a breakthrough in MMC1 (I had an xori instead of an ori because of a misinterpretation) which now allows Zelda, Metroid, and Final Fantasy 1 to run!

The Real Thing!

Yeah, they're slow and still have plenty of glitches, but at least they run!
In other news I ran Mighty Bomb Jack again, it still seems to confuse the right arrow button with the start button, another case to look into.
There actually is no problem in SMB, I just misremembered the behavior of the world -1 glitch.
Found a bug in BIT (bit test) that cleared the carry flag instead of the zero flag, this fixed Rygar and Destination Earthstar, but not the world -1 SMB problem.
Got a whole lot done today: Maybe it doesn't seem like a lot, but look at all the games I've got running!

Ah, the sweet smell of failure.

A little glitchy, but most of it works.

This is a cool game, how come I've never heard of it before?

Those crazy Russians and their Mind Games

Blame ATI for the horible image quality. Do any of these games look familiar, Gavin?
Oh, and Stars now works.
Not done revising the PPU yet, but I thought I should post some pictures of the scaling in action.

This doesn't hurt my speed a bit, as it is all done internally by the VI.
I do lose a few pixels on the left, but this may just be my video capture card misbehaving. It doesn't affect gameplay, anyway.
There was more wrong with the sprite 0 detection than I'd care to mention now. I'm going to bit the bullet and rewrite the PPU emulator, maintaining much of the same organization and functionality but placing it in an order that can be easily maintained. As it stands currently I just added features haphazardly and jiggled them around a bit until they worked, which has led to the "Total nonsense" mentioned above. Or below, rather.
By the way, I'm planning to add zip decompression to the ROM loading routine, so that one can cram more games onto a single CD (for those who have backup units with CD-ROM drives). The entire GoodNES set is about 400 MB... wouldn't it be neat to play any NES game you want on your N64 from one disk?
It'll also cut out that extra step in loading a ROM. Many PC NES emulators support this and I figure its about time Neon64 did, too.
Reporting from school today... I suspect that the problem with Solar Wars is caused by the failure of the sprite 0 hit to account for X scrolling within individual tiles, but I'll check upon my return to the newly reconfigured Bat Cave (aka my room).
I'm also planning to implement pipelined DMAs, so that I can load one tile while the previous one is drawing, as I had wanted to do from the start.
Changed the initial VI settings and the PPU to work with a real (scaled) 256x240 video mode. It looks a little goofy now, but I'll see how it looks on a real TV before I make any final decision on it.
There us a problem with the emulation of some little flag somehwere, I think. In SMB, when one attempts to enter world -1, occasionally the game will reset as soon as you enter the "invisible pipe", and if you do actually make it there is an end, while world -1 is not supposed to end. This may also be the bug that crashes stars, I tried to get the source code to find the problem but it appears that the bug is in the sound code, which Chris didn't include because someone else wrote it...
Started to work on MMC1 but so far I've been thwarted, I want to fix the CPU bug first now.
I reinstituted drawing the top 8 lines of the screen, it turns out that my TV is nonstandard and shows more at the top than others.
Another problem to solve:
Chris Covell's Stars demo crashes after a few seconds (but the scrolling works!) with a bad opcode. CPU debugging, ahoy!
Yes, my true inner awesomeness has shone through.
I fixed the bug, which was caused not by the CPU not being fast enough but by my new PPU code being too fast for even the mighty SP DMA , so I had to put a wait in there. I should have known this when I noticed that if I put a delay in the attribute table secion (which is between this particular DMA and where the DMAed data is used) the problem went away, but not if I put it elsewhere, but I didn't. The breakthrough came when I ran my favorite debugging buddy, Arkanoid, and noticed that the glitches were not with the attribute table (the flickersomeness was with actual patterns, not colors) but rather with the pattern or name table. As the attribute and name table calculations work exactly the same way, based on the PPU V register, the allowed me to free V from suspicion, as I had been examining it before. I knew that it must be a problem with the pattern table access.


I was trying to run my compiled pattern table tiles occasionally before they were fully loaded into the RSP, thus it would be drawing the previous tile, thus the screen would appear to be shifted right.

Using the new highly scientific "SMB Timeout Test", in which I start a Super Mario Brothers game in both Neon64 64 and a full speed PU NES emu at the same time, then read how much time is left on the clock on the slower emulator (mine) when the time on the other runs out, I determined that Neon64 runs at 78% of full speed. Again, as the CPU runs at about 600%, this is all due to the PPU, which is eight pages of total nonsense at the moment. Total unoptimized nonsense.
Solar Wars still has... issues with the title screen, minor ones but still annoying, considering that this was one of the strongest points of the early PPU. Bah, humbug.
NES Test runs almost perfectly, if "perfectly" can be used to describe the behavior of a program running at an estimated 2 FPS. Let's call it bullet-time.

It's no longer physically painful to play this game.

Yeah, that's right, it runs.

Wow, its 2 AM. Time to celebrate... let's write a Physics lab report! Yay!
Worked on name table speedup, some very entertaining PPU bugs have cropped up, but at least speed has improved.
That problem was quickly dispatched, let's note the structure of the SP DMA length register for posterity:
bits 11-00: length to transfer -8, lower three bits are ignored
bits 19-12: number of times to transfer that length -1
bits 31-20: bytes to skip between each transfer (skipping only occurs in DRAM for either RSP->RAM or RAM->RSP), probably lower three bits ignored
Fixed mapper 2 games by compiling the entire tile each time a write was made to VRAM, I'm still not quite sure what the problem was.
Made sprites draw backwards but load forwards, in order to properly emulate the 8-sprites-on-a-line behavior.
Added attribute table speedup, really nice speed now, but it seems that now it is unwilling to wait for the CPU to catch up, we somehow occasionally start at the end of the current line, so the screen flickers a lot. A problem with too much speed is a welcome one :-)
Made a few changes to BRK.
There's a problem with the scrolling on the Solar Wars title screen, I'll be looking into it shortly.
Started to implement the speedup for name tables today, but other real life things came up so I didn't have the time to finish.
Made sprites draw in reverse order, as they should.
Eliminated drawing of top eight lines of screen, apparently this is the way things should be, which fixed the "Mario going off screen" problem.
Enforced black borders on either side of the drawn screen, so scrolled games look right at the edges. Here's some of the nice new screenshots:
And away we go!

WOW! The mushroom is behind the other graphics! HOW DID HE DO THAT?!?!?

Um, jumping ...
I noticed that when Mario jumps off the top of the screen his head disappears before he gets there. Apparently it is possible to have a negative screen position for sprites or something, I'll have to look into it.
Got some nice new ideas for speeding up the PPU. They should almost always work, I'll need a special case for some mappers, but since I don't support those mappers yet anyway that's not a problem.
Hammered at mapper 2 some more, problem still not solved.
Made mapper 2 (Castlevania, Mega Man, et al) work, had to call graphics compiler more frequently since those games spontaneously draw patterns. I'm having problems with the compiler getting tiles confused, almost as if its using the sprite pattern table for the background, which I'm sure it isn't.
Got 8x16 sprites working. Need to get NES Test working completely, appears to have some background and palette issues.
After much toil I got a sprite 0 hit detector working that is as accurate as any I can devise. The only forseeable flaws are that it does not account for 8x16 sprites or sprite 0 being flipped. Come to think of it, I don't have 8x16 sprites covered at all, or the option to ignore things in the 8 pixel border. Oh well, a later version. Seems to work nicely on Mario, except that whenever another sprite is in the same scanline as sprite 0 it crashes. Oops.
It turns out that sltiu will clear the result if it doesn't set it, rather than leave it alone. I knew this but didn't compensate for it. So I added an or and that seems to have fixed it all up.
I'd have screenshots, but I'm away at my father's house and have no screen capture capability.
Attribute tables (background colors) work perfectly. Running a little slow and still have arithmetic errors...
Fixed problem with sprites occasionally becoming background by accident, it turns out that I was checking bit 5 of the tile index, not sprite property, byte. It was another case of confusion between ones and twos.
I didn't clear the carry flag for LSR (logical shift right). This appears to make Solar Wars behave splendidly, I have yet to test other games. YAY!
MARIO WORKS! But since my sprite 0 hit is a bit cheap the status text cuts off a little early...
I was checking ppu control reg 1 ($2000) instead of ppu control reg 2 ($2001) to determine if fixit should run... Thus even if the program had deactivated the screen so that the rendering process wouldn't be a problem, it would usually happen anyway. Doh.
Seems to fix the terrain problem with Solar Wars and the sprite corruption problem with Arkanoid, which now has a few other visible problems but I think I'll leave them alone for now and concentrate on the graphics generation.
Got pseudo-color graphics compiler working. Some problems still exist, namely that the Arkanoid screen still doesn't clear entirely, Solar Wars arithmetic is messed up, Arkanoid demo levels are wrong, but I'll examine that further tomorrow. It looks great!
Took a chance without backing up the last version, luckily it paid off. For a while there was a single line of the wrong name table at the top of the screen. I set off into the code to find the reason and discovered that I have made the $2000/$2001 mistake again when resetting v=t, which only happens if the bg or sprites are active. This also solved the incomplete clearence problem in Arkanoid! There is still a sprite/bg misalignment problem, though. (Solved!)
Captain! Three screenshots off the starboard bow!
(slightly old)

All of the background is drawn by code running on the RSP which was dynamically generated by the CPU.
Well, cross out one more place to search for the problem. The random hill things being found in Solar Wars are caused, once again, by the vblank happening during them.
AHA! In the read 2002 function loaded the clear vblank flag constant with an li, while the constant was greater in size than could be loaded with an li, so I changed it to an la. But that seemed not to have solved anything.
Tore apart Solar Wars' terrain drawing function, which appears to get all the right data from the terrain generation function.
It works in two parts: one draws air (blank space) from the top of the screen and the other places a surface tile below that. This works column by column. Initially the entire screen is an array of ground tiles. I determine that even when the air drawing section is changed to fill the enitre column there are occasional places where it stops partway down the column (and before it has reached the surface), so either it does not make the change completely or the change does not get completely put to the screen. I am more or less sure that the code to make the change executes, so next I'll check if that is reflected in the local pattern table and then in the RSP (which accesses the CPU's pattern table to actually do the drawing, but I strongly doubt that it is responsible.)
I played around a lot with ADC and SBC, I'm now pretty sure that they are accurate (I didn't change anything).
I've found that Arkanoid's failure to clear the screen is because it does not have enough time in vblank to do so, the VRAM address gets reset as drawing begins and thus accidental writes are made to the CHRROM (which I shouldn't allow). I'm sticking with 20*113 cycles per vblank for now, which helps a little but does not solve the problem. Maybe the instructions take too many cycles, or maybe something is running in error that takes too long, sort of like an emulation cancer. On second thought, I had partially solved this problem before and I didn't do anything with instruction timing...
What's that? You want screenshots? Alright, but just remember you brought this on yourself.

Title screen, notice newly working blue arrow.

Planet selection screen (also newly working).

Fixed a palette bug with Solar Wars et al, I had dw instead of dcw.
Fixed a bug in the pause routine of the debugger.
In true "Idiotic Programmer" fashion, I fixed a bg problem with Arkanoid, then promptly lost the change and forgot what I did.
Fixed CHRROM loading, both initially and for mapper #3 (CNROM, helped Solar Wars a lot). Before pattern table #1 wasn't being loaded into the RSP, and it wasn't updated on either the RSP or the CPU on a CNROM switch.
I have verified that the screen-clearing function does indeed execute.
I found the screen clearing function, which is what I have been looking for. Now all I have to do is get my computer back and I can track down the problem with Neon64.
I put up another page to document my Arkanoid exploits. I've come pretty far, but I haven't found what I'm looking for yet.
I've been doing a bit more disassembly, its coming along steadily but slowly. I've found the routines for drawing the blocks and the warp to next level so far. I'm also pondering possibly writing a disassembler to do what I'm doing now, that is to break the code up into subroutines and pseudo-emulate.
If you change byte 0x3900 (0xb8f0 in PROM) to 0x00 the next level warp will always be there, except while the level is just starting.
I should be getting my real development computer back in about a week, 'till then I'm using my new laptop.
Sorry about the lack of updates, but I haven't done much over the past 2 weeks. I currently don't have access to my nice new computer and I'm reluctant to work with old versions of the source that I have floating around. What I have been doing is looking over a disassembly of Arkanoid, trying to find how it works and therefore what the emulator is doing wrong. I have several variables and the reset vector understood, but I'm in a bit over my head so it will take a while.
I've been toying with the idea of replacing the CPU emulator with the version from beta 1 and then swapping back in bits of the beta 2 emulator until I find the segment that causes a problem, but again I'll have to wait until I have access to the computer.
Some improvement but none of the major problems are fixed yet, among them the frequent corruption of sprites (with the addition of background, but other problems as well) and the still-looming problem with the CPU. I did look through the addressing modes but found nothing problematic (except something redundant which saved me 300 bytes). The Solar Wars title screen works, with the wavy effect.

I figured out that the background in Arkanoid, along with some other problems, was not being cleared before drawn on. Since most everything else in Arkanoid works, this should enable me to pinpoint the problem. I've diassembled the program and I'll be searching for the loop that clears the name tables. When that is found, I should be able to see if it is executing, why if not, and what its doing wrong if it is.
Also found a rather major flaw in U64ASM, a label can be defines in terms of itself successfully, but it won't have any meaning whatsoever. This should generate an error, but I can't think of a quick way to do that, so I'm leaving it as is for now.
I got the graphics compiler running under very artificial circumstances, but it works. That's what you get for planning things out in advance.
My preliminary work on the PPU has been to read the name tables. I made a lot of progress today, fixed a lot of long-standing bugs, and possibly identified the CPU emulation problem. I ran Pac-Man, and the ghosts just kept bouncing around in a small sqaure. This might imply an error in an addressing mode.
The PPU still needs a lot of work, and it seems like whenever I make a change the sprites get messed up. I'll be working on locating the source of this problem.
Got color, horizontal and vertical flipping to work for sprites, which now only requires 16x8 sprite support. Here's a nice color image from Arkanoid, one of the only games that works flawlessly:

And here's a view of Headless Mario, all that is displayed when I try to run Super Mario Brothers:

I've also had problems with Solar Wars, Space Invaders, and the PD NES Test, but I'm not going to work on fixing the CPU right away. First I'm going to finish up the graphics, and as of now the next step in that is the background. My idea for speeding this up, graphics compilation, is now well developed and I have the compiler itself written in only 16 instructions. I hope that the burden of loading name table, attribute table, and compiled patterns won't slow down the emulator enough to take it sub-realtime, while its at about 2x-ish now. At some point I'll put together an FPS counter...
When I do, however, get around to the CPU part, I'll probably use Nintendulator or some other PC NES emulator with a debugger to step through a game and see what differences in execution arise.
I finally regained access to my old source from beta 1 (the version I'm working on now is beta 2). In celebration, I assembled it and saw that it's quite better than what's out now. So now you can download beta 1 v3, it's very nice. But it is still the year-old version, so its not a huge improveent. I mostly just took out the vsynch, which boosted speed a lot.
Later that day...
I take back all of the bad stuff I've said about the RSP, it's all my fault. I keep getting my "make a string of x data type" instruction confused, the first parameter is how many and the second is what they should all be. I kept doing this in reverse, so I'd be defining 0 bytes each containing 32 instead of 32 bytes containing 0. I'm going to put a safeguard into U64ASM to remind me next time I try to make zero of something.
Anyway, this was causing the primary problem with the program, namely that my sprite buffer was size 0 so other variables were writing over it. The sprites now look as pretty as can be! Note that I don't have flipping, priority, or even color working yet so all you see is a shadow of the true sprite, thus Vaus looks a bit odd.

You can see a brick shining here, very nice.
I'd also like to note that this is the exact opposite order that I wrote beta 1 in. For it I did the background first, then the sprites.
Worked on a proto-sprite renderer.

Revelation of the day:
Never use any other data size but a word on the RSP. It just doesn't work. I might get to do some experiments later to confirm exactly what doesn't work, why and how, but right now suffice it to say that sb does nothing but cause the RSP to overwrite nearby data. Eeew.
I replaced my fake sprite renderer with the skeleton of the real one, now it shows lines instead of dots where the sprites should be (a step in the right direction, the next will be colored blocks!) This new version more closely mimicks the NES's internal "temporary sprite buffer" with one of its own that is loaded with the next scanline's sprites. It's also at least twice as fast, despite doing 8 times as much work. Yay!
Set up palette so I can charge right into actual graphics. The coin color in SMB is seen to be blinking. I have the background drawing done (I had it done by default a while ago, but now it has the right color.) SMB currrntly looks like this:

The vertical bars on the left are the palette. You can't see them against the bright background, but there are a few white lines for sprites on the screen. SMB still doesn't run, though...
Finally got a chance to sit down to the program for a few minutes today, changed main memory from kseg1 to kseg0 for some speed increase. I noticed some weird things happening in Arkanoid, as well as the old problem with Solar Wars, but it seemed like SMB accidentally worked for a moment. I figure that this is a 6502 emulation problem, I'm going to go back through the old code from last year and see what I'm doing differently, since that worked the CPU flawlessly. My main concern is the ADC and SBC instructions.
Made background color set to black at start. I found that my DMA wait macros for the RSP branched 8 bytes back instead of 4, oops. This and another minor fix caused a change the the Arkanoid screen. Vaus is made up of 4 sprites, but the one farthest to the left was often not being displayed. Here's the new image:
School has started again, and in the flurry of activity Neon64 development has slowed. I do however, have some great news: I got a new computer from the father of a friend, and I'll be using it for Neon64 now because it had a TV tuner, so I can code and test on the same screen. I am also finally able to take screenshots from the actual hardware. As such, I present to you the first public image of Neon64 beta 2:

No, it isn't much as it stands, but when you realize that only the positions of sprites are shown, and that the PPU is running seperately from the CPU in custon ucode on the RSP, maybe it'll mean a little more. This is actually fully playable, you just can't see where the blocks are yet. In case you can't figure it out, here's an explanation:
This marks about the third time I've made a sweeping, fundamental change to the program. The RSP is now custodian of all things PPU, because I decided that my problem was caused by the CPU- and RSP-instigated DMAs interrupting each other. They both use exactly the same registers, so you can see how, if the CPU is writing to these registers, the PPU could be doing so at the exact same time, thus causing a mix of values that screws everything up. Now only the RSP is allowed to DMA, which meant that I had to go back and make the CPU's routines subserviant to the call of the RSP.
An interesting side effect (well, something I needed to do to make this work but not really a goal) is that I made the PPU status bits show up in the RSP's signal bits, so it's really easy to set and clear them from either processor. I also used a signal bit (there are 8 of them, so I have 2 left) to indicate when the RSP's copy of the SPRRAM is out of date, so it can know when to DMA the CPU's copy. The CPU does an SPRRAM DMA with the ld and sd instructions (moving 8 bytes at a time is faster) from the RAM location to its local copy of SPRRAM.
My problem was that, when I made any change to the PPU emulator (even just adding a NOP), it wouldn't run or would run erratically. I expect that this was because I was just lucky enough with the old version that the timing was just right and the DMAs were nonconflicting. Now that problem seems to be gone, and I can finally get some more actual emulation done.
On the assembler front, I made a new instruction, watch, that displays a warning when an instruction is assembled at the given PC. This allows me to take the EPC given to me by the excpetion handler and match it up with its assembly source. Before I had to use my really slow debug version, which outputs every line with its PC and offset.
It turns out that what I said about RSP writes before was inaccurate, the eight is added. I didn't have a chance to fully test this, and I still don't, but I'm pretty sure that I was mistaken. I have Arkanoid running on the new setup.
I've been converting all of my writes to DMEM into DMAs, an icky task, yet I think I've got it down.
I found out how to hook the pre-NMI from when you press the reset button, (the general exception vector (0x180) is called with bit 12 (0x1000) of the Cause COP0 register set) so I could have some neat effect there later, but right now all I do is break the RSP so that the reset can proceed normally. Actually, now I found that, if I use a breakpoint, the RSP won't restart on reset. If I halt it, however, everything works nicely.
I made use of the RSP's signal bits for the first time today, so I can use them to tell the PPU emulator to start, and report back to the CPU when it is done, for much more accurate synchonization.
I started work on the palette, using BMF's RGB palette ('The only palette slow-roasted to perfection'), yesterday, and I got a nice display of all of the colors, but when I tried to implement it fully I got all sorts of weird problems, which led me to the DMA revision that I mentioned above. It was still giving me trouble today, so I performed a complete palette-ectomy.
On another topic, I have 4 MB of space free on this hard drive. Now that my assembler outputs a 2 MB ROM, I run out of space really quickly. It turns out that I didn't check if there was actually enough space for the finished ROM, so now if there isn't an error is generated.
On yet another topic, I discovered that there is a rather large difference between the sp_wr_len_reg (which gives, approximately, the number of bytes to DMA out of the RSP into DRAM) as seen from the CPU (at 0x0404000c) and as seen from the RSP (at register 3, entrylo1 by my incorrect notation). From the CPU, the actual number of bytes is the value in sp_wr_len_reg, with the low 3 bits masked out, plus 8. On the RSP the low three bits are also masked out, but the 8 is not added. This is probably true for sp_rd_len_reg, too, but I haven't tested it.
I made the ROM loader correct the VRAM page array so that any CHRROM will be accessed as pattern tables 0 and 1.
I ran across the source to Andreas Sterbenz's Checksum 64, so I decided to finally take the leap and make U64ASM produce complete ROM images, which it does perfectly. I also incorporated the drjr send utility, so now I have one program to do everything I need.
I've also been doing more RSP research, and I've now found what the rest of the length DMA register is for...
I wrote the ROM loading routines for everything but the CHRROM. I moved a lot of stuff around and rewrote the SPRRAM handler. After a talk with LaC I've decided that I'm (eventually) going to change all writes to DMEM to DMAs, for greater accuracy and speed.
As I mentioned before, you can have two simultaneous SP DMAs going on (one active, one queued). That's what the DMA_FULL register is for, it tells you when you can't schedule another DMA. When this came into doubt today I performed several experiments that appear to prove that point.
I just discovered that my 2005 writing routine never actually read the current value of the t register (temporary VRAM address). I used a store instead of a load. Oops. Again.
I ported over a ROM loader I wrote for another assembler, and in doing so I fixed an error in my two-part addressing macro, because I didn't realize that the offset was sign extended when I wrote it. This loader doesn't actually setup all of the pointers like it should just yet, but it gets the CHR and PRG-ROM page counts and displays them onscreen, along with the mapper and mirroring types (as text, not just numbers).
I incorporated the exception handler into Neon64, which helped with debugging the above. I also tested the emulator with Arkanoid, which, unlike Solar Wars and SMB, performed perfectly. I'll be starting on the graphics now, so I'll probably be using Arkanoid for that and I'll come back to the others later.
I started working with interrupts and exceptions today. I managed to use the VI and count/compare interrupts to run two counters at different speeds. I also got error detection and reporting working, which will allow a lot of fun debugging to take place! No work on Neon64 just yet.
Well, I seem to have fixed that specific problem by immeidately following a write to the RSP with a read from the same location. I don't entirely know why, but it seems to always solve the problem. I made the other writes to the RSP use the same method, and I rewrote the SPRRAM write. There is now a RAM copy of the SPRRAM, a write to SPRRAM is written here, then the new SPRRAM is DMAed into the RSP. I had to have the real SPRRAM DMA (4014) write its values back into this local copy, too.
I also found that the write to PPU control register 2 was a load instead of a store. Oops.
I wrote a new 'main emulation loop' and PPU emulator skeleton for Neon64, using a more structured format that I actually planned out first, based roughly on FCEU's main loop. After the change none of my test programs appeared to run any better or worse. SMB loads two sprites and just runs aimlessly, and in Solar Wars everything works pretty well except the projectile jumps around the screen.
FCEU counts three cycles for every one on the 6502, this helps to make its count more accurate since there are 12 PPU cycles to a CPU cycle and a lot of my timing is off by 1/3 or 2/3 or a cycle, which can add up. I might incorporate this into Neon64.
I also may have fixed that register math error that I kept getting in U64ASM.
I also moved a lot of the older news off of this main page, since it's gotten rather large.
I managed to isolate the problem with SMB, it seems that there are specific times when the RSP ignores writes to DMEM. Even though the NMI was turned back on, the write was ocassionally (and fatally) not made to PPU control register 1, which I keep on the RSP. If I can't figure out how to detect these moments of ignorance, or work around them, I'll have to move all the variables for the PPU into main RAM and have the RSP DMA them in, or DMA the variables into the RSP. I hope that it doesn't come down to that, though.
I came back from vacation today to find that my R4000 User's Manual had arrived in the mail. I've got stacks of plans for the PPU that I'm ready to try out, but I want to fix a weird problem with SMB first, it turns off the vertical retrace NMI and then just sits...
I got some info on interrupts from LaC, I think that this will help schedule the audio emulation.
While I was sitting on the beach today I was developing an idea I had last night: graphics compilation. Since the pattern table changes very rarely, if at all, during the life of an NES program, the same tile is interpereted in exactly the same way over and over again. It makes sense, therefore, to figure out in advance what R4300i instructions are required to write a tile to the screen. Then I can just DMA these instructions into the RSP to have it draw the tile. I can even have one tile drawing while the next one is being loaded. This should speed up PPU emulation quite a bit. It is also far simpler than total CPU recompilation, since all of the tiles will always be aligned in exactly the same way (except in 8x16 sprite mode, but I've worked that out).
PPU emulation was the main speed problem in the first version of Neon64, so hopefully this will help a lot.
I got SPR-RAM DMA working, using a real RSP DMA to speed things up. In the process, I switched a lot of labels over to kseg1 to avoid having the RSP/PPU work with old data. This was as simple as ORing the mem label with 0xA0000000, since all of my other arrays build on that.
I made a fake sprite renderer that just shows a vertical line where a sprite is, this allowed me to see that Solar Wars is indeed progressing well. There just seems to be a problem with the projectile, it jumps around a lot, I'm not sure if this is normal behavior or not.
I'm going to be away for two weeks, and my computer isn't coming with me. I'll try to write out some plans for the PPU renderer when I find the time.
I rewrote the SPR-RAM access register ($2004) with two things in mind:
  1. That the CPU can only access RSP data in 1 word chunks, reading or writing
  2. That all word access must be word aligned.
Now Solar Wars works. I can even see a change in behavior when I press the start button! Yay!
When I did the VRAM page array, I found that you can have at least two simultaneous DMAs to the RSP, something I had previously only suspected.
Well, not truly simultaneous, but I can queue one while the other is still in progress.
I got the 2005/2006 registers emulated, as well as their accompanying V,T, and X registers (as described by loopy).
By the way, the "Backup Often" mantra protects not only against natural disaster, but also your own stupidity. After I wrote the above and got it all working, I stormed ahead to all sorts of other stuff and got myself confused, so I had to revert to my last backed-up working version from last night. That meant that I had to rewrite all of the stuff that I had already done. Oh well.
On another note, I made the $2007 (VRAM I/O) register work, after beating the bugs out of it with a big stick. I still have to get my hands on a good palette...
Solar Wars and SMB are running without graphics, I can select different planets in Solar Wars and so on.
I had the idea of making two page tables, one for reading and one for writing, so that trapping mapper writes will be easy. This didn't take even a single additional instruction in the read and write macros.
I wrote the skeleton of the PPU emulator and the routines needed to load it to the RSP and reboot it once per frame.
I wrote a lot of the PPU register handlers. I have a basic PPU emulator (without any actual graphical output) working. I got the vblank interrupt working.
My tests with Super Mario Brothers run until a loop that waits for the sprite 0 hit flag to be set, which is good. But Solar Wars crashes, possibly because VRAM isn't properly emulated yet...
I also found out how to detect a reset, so that the N64 doesn't lock up when you press the reset button, waiting for the RSP to break. I might make a cool transition on reset, like Mario 64 has.
Totally reorganized the program, so everything will be nice and clean for the NES stuff coming in.
Wrote NMI and IRQ interrupts, tested with square root program.
Wrote controller 1 handler, might be a bit different in the final version.
I tested Neon64 out with Super Mario Brothers today. I don't have any graphics/sound/etc emulated yet, but the program did appear to run until it hit the wait loop, where it waits until the vblank interrupt occurs. Solar Wars performed equally well. To get these working I had to set up a skeleton for the registers. I also discovered that including a file as large as an NES ROM image apparenty crashes my assembler, so I had to rewrite a little of that.
More bug fixes in the 6502 emulator took place. One test program I was using, that took the square root of a number, didn't work before but then worked when I made another mistake. I found the source of the original error (an incorrectly set carry flag in CMP/CPX/CPY), fixed the subsequent one, and now everything seems to be working fine. I also figured out how the JSR/RTS instruction pair works:
JSR increments the PC, then pushes the PC and loads the new one, then RTS pulls/pops the PC and increments it again, so the PC points to the instruction after the JSR.
There was also an error in the day of week calculation test program, since TASM will assume that LSR without an operand is pointing to zero, while this program used that to mean LSR A.
I made an option to not export a label into a header file (for U64ASM), so I don't have to keep editing codes.h whenever I assemble codes.asm, which contains the opcodes. I assemble this seperately from the rest of the program because it contains the opcodes, which use macros heavily, slowing down assembly by A LOT.
After I test the emulator a little more thoroughly, I will move on to actually loading NES ROM images, the registers, the controller, video, sound, and eventually mappers.
I reassembled everything to work with the new CPU text renderer, which protects the registers, too. I fixed several glitches and cosmetic problems in U64ASM, including one that would cause the program to crash if a macro parameter had size 0.
I found several glaring errors in Neon64. The emulated RAM overlapped part of the executable code because of a misplaced 0. The sign and zero flags were frequently not cleared before setting them. These errors were corrected.
I revised my preprocessor to allow macro parameters to include spaces and I fixed some of my macros. I rewrote my text routine to run on the CPU, and in doing so I learned the difference between kseg0 and kseg1 (kseg0 is cached). For a while I was writing to the frame buffer in kseg0 but nothing was happening, because my writes were just being cached and not put to the framebuffer.
I can now use this text routine to start debugging the PPU emulator that I will be writing for the RSP, since I think it would be too complicated to have both the text writer and the PPU emulator running on the RSP, plus I'd probably run out of IMEM to work with (there's only 4K).
I also dumped the DOS extended character set, because I want to use the line characters to draw windows and boxes. I only had the ASCII set before.
No direct work on Neon64 was done during this time.
Made paged memory access work. The speed is now around 7 Mhz, using mostly the absolute addressing mode to test.
There are 743 lines of code, not including the opcodes or test programs.
I got the jump table working, so now I have a primitive 6502 emulator.
I assembled the opcodes seperately so I just need to include their assembled form rather than reassemble them each time I build Neon64. This and a change I made to U64ASM's binary inclusion means that assembly speed has increased fourfold. That's not terribly important to you, but I felt that I should mention it anyway.
I fixed a problem with my text macros that caused the CPU to try to make the RSP draw a new string before it was done with the old one. The busy flag isn't set fast enough by the RSP, so the CPU now waits for the busy flag to be set before finishing with a text string.
I also got to run some test programs, assembled with TASM. From observing the operation of these programs with my stopwatch, I determined that the emulator is currently running at about 10 Mhz, approximately 6 times faster than the NES's 6502, approximately twice as fast as the last version of Neon64's 6502 emulator.
I fixed an error in the jump table, opcode 0x8E (STX absolute) was listed as STY zero page, a rather large error since those two instructions are different sizes and the high byte of the absolute address will be read as the next opcode, plus it would write to the wrong register.
Happy Birthday, Dad!
I added cycle counting to each and every opcode, by hand, today. I also made the jump table but I haven't gotten it to work right yet. There are 1,537 lines in Neon64 so far. I deleted a lot of blank lines in the opcodes, so this number is a little deceptive.
I had a little time when I came home today, so I finished off the rest of the opcodes. Now all I need to do is make the jump table, handle periodic tasks, and handle interrupts, and then I can start on the real NES stuff. Neon64 is currently 1,488 lines.
I got a lot more opcodes done today. I have been writing them in alphabetical order, up to LDY. Only a few more need to be done, and everything has tested out well. There are 1,096 lines of code in Neon64 right now.
I also edited the drjr send program so it says "Uploading" instead of "Downloading". That had always gotten on my nerves.
I'll be away for the rest of the weekend, so there won't be any more work for a few days.
It isn't that I haven't done anything over the past few days, its just that I haven't gotten around to updating the site. I have the ADC,AND, and ASL opcodes completely emulated in all of their addressing modes. As I was doing this, I noticed that the time taken to assemble was becoming excessive. I have added many little speed-ups to the preprocessor, where most of the time was spent. Just changing this part of the program, I managed to almost double the speed of the entire assembly! I also added a routine to check if a macro's name contains another macro's name, which can cause unexpected results. I found that I had a few instances of this in code I had already written. Neon64 is currently 574 lines in size.
This evening I also did some work with sound generation on the N64. I expect this will be my weakest point, since the furthest I've ever come in sound programming has been my PC Speaker Zelda Theme. So I wrote a little program that makes a two-tone siren sound (two different frequency sawtooths), and fiddled with it until the clicking went away. It wound up being 90 lines long.
I wound up working into the next day this time. I finished most of the operation macros, up to ROR in alphabetical order. My macro preprocessor isn't very good, so I changed some names around to speed it up. All of my macros started with A6502 before, so it has to read that at the beginning of each entry. I changed the order of the names so that A6502 is at the end of the macro name. I probably saved one or two seconds at the current stage. Right now I have 391 lines of code.
Finished up the addressing modes and wrote the ADC macro, for the add with carry series of opcodes.
I've already been able to improve the program, the old version of the ADC macro had 19 instructions, some of them branch and load instructions. The n
Fixed more bugs in U64ASM, added nested macros. In Neon64 I started on the 6502 CPU core, I have the absolute addressing mode and zero and sign flag processing done. No actual instructions are emulated yet...
Today I got back to work on Neon64. I moved my PC next to my TV and started making some debugging macros (for text output, etc.) In the process of doing this, I found a bug in the number display routine that would cause unpredictable results if the number was larger than could be displayed. I also added some more functions to my assembler, U64ASM. I also found a bug in my byteswapping program that caused it to mess up files with an odd number of bytes.
The new version will have nothing in common with the old one, I am starting from scratch. There are currently 164 lines of code, not counting the I/O functions, which are preassembled. None of this code actually does any emulation yet, but at least it's a start. I wanted to wait until I got my N64 to VGA adaptor box, but its been months since I ordered it and it seems like it will never arrive. Oh well.

Back to the Neon64 page