Making MGS1 PSFs playable in HE?! by Nisto at 3:31 PM EDT on August 6, 2018
So I've been battling this for about an entire week now, and I'm out of ideas. I'm trying to make the PSF set of MGS1 playable in Highly Experimental by replacing SPU IRQ related code, since it isn't well supported in HE and most other PSF engines. I have learned much more about the game code since the initial rip I made last year, but I still can't figure out a way to make the PSFs compatible with Highly Experimental. I have now managed to hack out all the multi-threading related calls (the Multi Task Scheduler) and also the SPU IRQ related code, so the code is actually pretty straight-forward now. However, without SPU IRQs, I have not been able to replicate correct timing. It either plays too fast or too slow, depending on various approaches and parameters.
SPU IRQs in MGS1 are triggered from a single one-time call to SpuSetKey. The magic of it is that the ADPCM address specified in the SpuVoiceAttr struct to SpuSetVoiceAttr is the same address defined to trigger an interrupt when the data there is read (and I must assume the data read operation happens through SpuSetKey, because if I remove the call, no interrupts occur). Furthermore, the data at the IRQ address contains a silent, looped ADPCM stream, so interrupts are triggered automatically and continuously. The most important piece of information here though, is that they set the pitch value in the SpuVoiceAttr struct to 4096 ("pt" units), which represents 44100 Hz according to the SDK docs. From what I can tell, at this rate (pitch), 256 bytes (448 samples) are read between each interrupt, which is half of the stream's size (512 bytes).
So now, the question is, how could I replicate this timing in a simple loop without the use of SPU IRQs? Any ideas? I'm simply calling the sequencer function (similar to calling SsSeqCalledTbyT in a loop), which is essentially the same that happens with SPU IRQs. Here's what I've tried so far:
* Calling SpuSetKey to trigger a non-looped silent ADPCM stream followed by polling SpuGetKeyStatus to check that it finished playing (SPU_ON_ENV_OFF) - this actually works decently, but it's just a little too slow in PSFLab/foo_psf - no good in viopsf (w/ spumednafen.dll), skips a lot - Mednafen itself does play it without a hitch
* Polling root counters 0, 1 and 2 - I used SetRCnt followed by GetRCnt in RCntMdNOINTR mode, but the results seem totally random?
* VSync(-1), VSync(0), VSync... - Obviously not enough precision with this, but I tried it...
I had no idea about this! Sadly I can't help you though, but I'll keep an eye on this.
I just checked the preliminary set and I have to ask, once you figure it out, can you do the same for the Integral/VR Mission exclusive tracks? I ripped the hidden tracks in Integral from an emulator's SPU plugin years ago, and I was able to eliminate the codec noise at the beginning but the game plays the songs with a little fade-in, and I assume the tracks are not stored that way.
Here are the songs in question for anyone interested (there are more in the VR Missions disc, but those were already ripped by someone else) https://mediafire.com/file/x6ibr4fazg6bbnx/ https://mediafire.com/file/zd29hczn4xhlsoc/
I thought the Integral/VR Mission tracks were already ripped by nisto in the general xsf thread. That was sometime after the initial release of the MGS psfs.
EDIT: The tracks in question were indeed already ripped, codec noise is not there however the fade-in effect is.
Also to add to the list of players that can play MGS's psf's, I think purei's source was updated to support them. But that's just the source, I'm not aware of any exe floating around anywhere.
While i can't help you with technical stuff, i'll just let you know that your current rip will play fine on the real hardware.
I converted your *.psf to .exe using psf2exe. And then used no$psx "Utility > Upload to PSX" to send on a scph-7502 via Xplorer cartridge flashed with "NO$CASH Kernel Clone - New Expansion ROM Version" http://www.psxdev.net/forum/viewtopic.php?f=76&t=1319
I also tried the same using original Xplorer roms but for some reason it will play way too fast regardless if video mode is set to PAL or NTSC.
Since the set is not in miniPSF format, it gets pretty big, so I'd advice you to rip it yourself instead with the Python script/kit I uploaded if you want everything. It's not that difficult to use, really.
Lastly, I'd recommend playing these PSFs with viopsf with the Mednafen SPU plugin.
Any real benefit to using the mednafen spu plugin over the updated peop's? I suppose it's more accurate, but it just doesn't sound it, plus the viopsf author updated the peops plugin, so whatever issues it had before that I noticed are gone... plus sinc interpolation.
I honestly wouldn't know which one is supposed to be more accurate. I haven't tried Pete's plugin much.
Is the source/dll provided in the viopsf archive being updated by the viopsf developer, or can it be downloaded separately somewhere? I looked on Pete's site (pbernert.com) and the latest release I see is from 2004.
The plugin is different from the one in Pete's site, but I'm not sure if it's the viopsf author updating it or someone else. Before the July 22 update, I would've had a hard time recommending it do to some issues here and there, but they appear to be gone now.
If you do end up trying pete's plugin, use "PSX Reverb 2". The default reverb will ends up sounding the same as all the other ones.
Isn't the reverb modeled after the actual hardware and down to which coefficients the software actually uses? It's also running at half the sample rate of the hardware, so it downsamples on input, and upsamples on output.
Okay, so I almost, almost got it working properly in HE. I used events and root counter 2 set to a target value of 44100 in interrupt mode. This even plays in viopsf without problems. It's still ever so slightly too slow though; in the span of 1 minute, it has desynced about 1.5 seconds compared to the OST CD version and Mednafen's output (which match pretty closely).
Maybe you should also be using counts of how many root counter events you miss? That high of a resolution sounds like something that could easily lead to missed interrupts.
Hmm, are you saying that the counter might increment past the target value and never trigger an interrupt? Or are there cases where something causes the counter to reset or something? I did notice when polling (non-interrupt mode) that it incremented in units of 18 after each GetRCnt call in a loop that just polled until the target value (or greater) was reached.
The event handler essentially just sets a flag that the "main loop" checks for, so I doubt there are any additional interrupts ever occurring during that time.
Well, if it's only setting a flag and not actually incrementing an atomic counter, then you could be missing interrupt events anyway, if the main loop doesn't repeat frequently enough for 44100Hz interrupts. I'm not really sure on the timing of the root counters.
Ahh. Not sure if maybe there's been a miscommunication here, but the target value for a root counter is not like in the playback rate of streamed music. Instead, the lower the target value, the more frequent the interrupts occur. My guess at this point is that the count actually exceeds the target value a little every time the interrupt occurs, since a tick of root counter 2 is an eighth of the system clock. Either that, or the pitch defined for the IRQ key is not exactly 44100 Hz after all??? I mean, I've tried so many things now but I keep seeing this one specific sample in "CAVERN" at 1:00.666 instead of at 0:59.158 (as with SPU IRQs).
I did the math in a comparison against the output of viopsf (SPU IRQ timing) vs foo_psf (root counter 2 timing) and got this:
The time measured here was from the start up to a distinct sample that's clearly visible in both of the transcodes about a minute in. Indeed, setting the target value to 43011 does result in pretty much the same playback rate, but the value feels kind of arbitrary and there's not anything logic to prove its correctness I feel.
Maybe it's because Highly Experimental's R3000 only syncs the timer cycle counts on register accesses or at the end of a batch of execution, not on every branch like some emulators do? (for instance, the N64 emulator mupen64plus)
Question, is there any I/O register I can use to check the current address for a voice? Or anything at all that could indicate the playing progress of a sample?
According to the no$psx docs, there is a register that's supposed to be updated when the "loop end" flag is read, but at least in PSFLab and Mednafen, it doesn't seem like it ever is. Unfortunately so, because I had this idea:
1. Call SpuSetKey to trigger playback of a silent, looping sample at 44100 Hz 2. Call the MDX tick handler 3. Poll 0x1f801d7e ("loop end" register) until it's been updated to the address containing the flag 4. Reset 0x1f801d7e to 0 so we can use it again to determine if the "loop end" flag (last block) has been read 5. Keep looping over to step 2
So, I finally got a disc with CAVERN to play on my PS1, and noticed the recent rip provided by Squareoft74 is a bit different from the rip I made. The delay on my rip is much greater. Secondly, the waveform on his rip appears to be inverted compared to all other rips (maybe that's expected from older PS1 models though?) So I decided to do a comparison; I measured the distance of two specific sample points (~1 min length) which are distinct and clearly visible in all rips of the track. The OST rip is used as the base for "correctness". To analyze the waveforms, I use iZotope RX 2 with interpolation disabled.
I was slightly bothered to see such a delay on my rip. I'm not sure what caused it, but it's probably from a number of things: - I have a slim PSOne, and looking at Sony's history, I'm gonna guess later models are overall less accurate - My PS1 is hard-modded since long ago (2004/2005); not even sure what chip anymore - The PSF executable is NTSC-J, but the console is PAL - Recording flaw? I recorded via DirectSound in Audacity using a simple RCA->3.5mm adapter
Anyway, would be interesting to see more hardware recordings of CAVERN, if anyone else is able/willing to contribute.
The record i made was on a PAL 1002 and the executable was sent via an Xplorer flashed with Nocash's bios clone so maybe that made a difference ?
I could try other records with this setup using different "Video Mode" settings and see what it gives (Auto, Ntsc, Pal).
My 1002 has a first gen modchip so i could also try with your disc and the original Sony BIOS.
I also have 7502 and 102 consoles i could test on this way.
EDIT: "I recorded via DirectSound"
I have Audacity set to use "MME". https://ttmanual.audacityteam.org/man/Device_Toolbar
""On Windows XP or earlier (given a recent computer), DirectSound's shorter path to the hardware should produce lower latency than MME.
On Windows Vista, Windows 7 and Windows 8, DirectSound may have only slightly lower latency than MME because both interfaces are emulated. Selecting DirectSound and enabling both "Exclusive Mode" boxes in Windows "Sound" allows Audacity to request audio direct from the device without resampling. See the Wiki page for Windows 7 for more explanation.""
> My 1002 has a first gen modchip so i could also try with your disc and the original Sony BIOS. > I also have 7502 and 102 consoles i could test on this way.
That would be great! Mine is an SCPH-102, so the comparison should be close there.. I hope.
> So which one should i use (Windows 10x64) ?
I think maybe DirectSound in Exclusive Mode would be best. I didn't read that part about Exclusive Mode until after my recording, so I didn't use it myself, but I'll do that in the future. But honestly, I'm not sure whether the audio host really affects the end result, or if it just affects the latency of the 'live' recording so to speak? Does anyone know?
EDIT: Well, I think I can remove Audacity and all the settings from the equation. I just recorded the track again using both MME and WASAPI with varying buffer lengths (Edit -> Preferences -> Devices) and shift correction, and the difference is negligible. The distance in my recording is still around 2607600 samples, give or take 5 samples. It seems even the hardware (at least my model) has some timing inaccuracies since it does not always produce the same output.
I reference the no$psx document quite frequently, and believe me I've tried to find something that can help me, but I'm not finding anything. Thanks anyway.
Yeah, it's good don't worry about. As I mentioned, the settings are actually irrelevant for the final recording and comparison.
by Squaresoft74 at 12:39 AM EDT on August 24, 2018
Ok first batch, please bookmark this link: https://mega.nz/#F!6cgnUTAK!XTHk0L7UsbhVkFPZk8NDEQ
I used both your disc and its exe.
CAVERN Scph1002 (Disc) Nocash // Kernel Clone Expansion ROM Version CAVERN Scph1002 (Disc) Sony // Original Bios CAVERN Scph1002 (Exe) Nocash // Kernel Clone Expansion ROM Version
Let me know if something is broken between them.
*EDIT 1* Added :
CAVERN Scph7502 (Disc) Nocash // Kernel Clone Expansion ROM Version CAVERN Scph7502 (Disc) Sony // Original Bios CAVERN Scph7502 (Exe) Nocash // Kernel Clone Expansion ROM Version
*EDIT 2* Added :
CAVERN Scph102 (Disc) Sony // Original Bios
The console used is a v1 (PM-41), more info here if you want to check against yours:
CAVERN Scph39004 (Disc) Sony // Original Bios (PS2) CAVERN Scph77004 (Disc) Sony // Original Bios (PS2)
My Scph7002 doesn't have a modchip, let me know if it's worth trying with Nocash's stuff.
*EDIT 4* Added :
CAVERN Scph1002 (Exe) Xplorer // Original Xplorer ROM
This one is the issue i mentionned in page 1 of this topic. It will play way too fast regardless what the video mode is set to (PAL/NTSC). For some reason it only happens with your MGS rip.
Other PSF rips i tried so far will adjust the playback speed according to the video mode set.
Only way to play your rip at proper speed using this rom is to hit reset just after starting to send the exe so it will start the upload before the Xplorer rom's menu gets initialized.
Wow, this is more than I could've asked for. Thank you! I've compiled an archive with all my comparisons so far for anyone interested. I've done some additional emulator recordings too. I wanted to check your EXE/Xplorer rip in particular but it was too short (I need at least 1 full minute recorded), so I ended up only comparing your Disc/Sony recordings. No big deal though, this should be enough as a reference.
So, when all is said and done, I think I'm satisfied with what I have now. And I think, considering all of these deviations, even between various hardware models, root counter 2 at a target value of 43009 seems like the sweet spot, and I wouldn't regard it as inaccurate per se, since the playback evidently never was consistently reproducible anyway. With root counter 2 set to 43009, it produces the same output time-wise in almost every emulator I've tried. For all conversions from viopsf (all SPU plugins) and Highly Experimental (foo_psf), the distance is 2607461 samples; in Xebra it's 2607457 (5 less); in no$psx it's 2607509 (48 more). In ePSXe however, it's significantly longer, with about 1374 samples more. This may have to do with the root counter not incrementing in some branches, like kode54 mentioned.
Nevertheless, I'll stick with this, and hopefully no one will mind an update to the PSF set. I'll see about miniPSF-ing it and making an 'official' upload for MGS Integral, too.
> Other PSF rips i tried so far will adjust the playback speed according to the video mode set.
My guess is that the other rips you've tried use a root counter instead of SPU interrupts. Final Fantasy 7 and Silent Hill use root counters to name some examples. MGS uses the timing of a key playing a short looped sample at a pitch of 4096 (44100 Hz) instead.
Time-wise, there's only a difference of 2 samples between the disc recording and the uploaded EXE recording for SCPH-1002. No difference between the disc/EXE recordings for SCPH-7502. So it should be totally fine.
I would also like to say thank you for talking the time to record some stuff for us. I apologize for not doing so sooner, a personal event in the last couple months took a part of my mind with it.
Anybody wanna record one last thing? I was surprised to see such a difference on ePSXe when using root counters, so I hope it won't be the same on hardware.
Im gonna hijack this thread since its kinda related, but @Nisto, can you do a writeup on what you found out about how the MDX sequencer (or better yet, the whole sound system) works? Ive been fiddling with sequences for both MGS1 as well as ZOE1 (and to a lesser extent MGS2) and they are similar enough that you can parse them in almost the same way, but im at my wits end (and I havent gotten enough into disassembly yet to just work from there, especially since for the PS2 titles, the IOP is doing the heavy lifting and any modules that still have symbols in them dont really help me)
@Missingno_force: First off, just a quick disclaimer: when ripping PSFs, you don't always need to figure out the sequence format or any of the lower-level sequencer routines (command handlers and such). With that said, I do have a basic grasp of how the sound system works overall, but it's not comprehensive by any means, and I cannot speak for other MDX-compatible games.
Also, I have recently updated my IDA database for MGS with function names based on debug symbols from the TGS build of ZoE. You might want to have a look at that.
So, it all starts with the function at 80082F18. It sets up the following sound buffers/pointers:
sng_data: Contains MDX data (entire MDX file).
wave_header: Contains up to three WVX headers (voice tables). A voice table is pretty self-descriptive: it defines parameters for the voices (samples) in the respective wave data. The wave from the "init" stage is always loaded in the first 2048 bytes; it contains the voice table for most of the instruments and sound effects used in the game. The remaining 2048 bytes are reserved for up to two voice tables, usually occupied by the WVX files for the current stage. In a WVX file, the first 32-bit word specifies where the voice table should be loaded (relative to the wave_header / voice_tbl buffer).
voice_tbl: Synonymous with wave_header.
CDLOAD_BUF: This is used to temporarily hold sound data read in from the CD-ROM (e.g. sample data which later gets transferred to SPU RAM).
I haven't looked into the other buffers much, but they're not relevant to sequenced music.
Next, you have the function at 80082A00, which configures the usual SPU parameters (reverb, volume, etc.), initializes sequencer variables, and sets up interrupts, so that the function which carries out sequence playback (80085218, from here on called IntSdMain) gets invoked at a regular interval. It's a little complicated to explain how the function (originally) actually gets invoked, since it involves understanding a bit about the Multi Task Scheduler (a proprietary engine for handling thread concurrency) and SPU IRQs (interrupt requests). But in my latest PSF rip, I bypass the use of both, which hopefully makes it easier to follow.
To control playback, the game uses "sound codes", which are passed around from place to place. I assume they usually originate from GCL scripts (I know virtually nothing about these, but I can't find many cross-references in the bare executable). They usually pass through the function at 800899B0 first, which stores the sound code in a circular buffer/queue, which can hold up to 16 codes (32 bits each) at a time. Here's some example codes:
01000002 play MDX song 2 (zero is not valid) 01FFFF01 pause music 01FFFF10 activate evasion mode (for ENCOUNTER)
Eventually, when invoked, IntSdMain pops the next sound code (song code) from the buffer and processes it. Once that's done, it goes on to check the sng_status variable (800BFB18). This variable indicates whether or not song (MDX) data is loaded, if it's ready to be played, etc. It needs to be updated to 2 prior to sending a Play code. This is normally done through the stage TOC parser, when an MDX ('m') entry is found, which prompts a call to both 80084C10 and 80084C58. When a Play code is sent and the sng_status is 2 (ready to be played), the function initializes subtrack info and bumps the sng_status to 3 (playing). Now that the sng_status is 3, IntSdMain goes through an additional loop which iterates through all the MDX subtracks (see 800856E0).
If you look closely, the loop only processes 13 subtracks. This is because the game reserves the first 13 SPU voices for music. The other 11 voices are for sound effects (8), streams (2), and SPU IRQs (1). Note that it keeps a global variable for the track currently being processed, and a pointer to a structure which holds all relevant information about it, such as the address of the next command, the current volume, panning, and much more. These variables are frequently referenced in lower-level functions (such as command handlers) and should give you lots of leads, I think. Also, there's a lot more to find in the IDA database (idb) after my latest update. Probably faster/easier to look at that, really.
The rest is a bit beyond me, sorry to say. There's a lot of subcalls and complex structures being passed around that I haven't quite figured out. But about half-way into the aforementioned loop, you'll find a call to 80086750, which subsequently calls 80086884. This function actually reads the next command from the current subtrack, so this would probably be good to look at as well. It uses an array to map the command number to a handler function, so it should be pretty easy to figure out what most commands are for, especially now that the functions are named thanks to debug symbols from Zone of the Enders. Note that the XREF to the array is to 8009FF1C because of some silly pointer arithmetic; you can find the actual table at 800A011C.
If there's something more specific you're wondering about, ask away.
awesome post man, this is exactly what i had hoped for (and i didnt expect a full ready-to-print documentation, no worries). 80086884 sounds like a gold mine for what i was doing when i wrote my post (staring at data from mdx dumped into a csv to make sense of it while trying to change parameters by patching an iso of zoe and testing it via emu). i assume the name of your idb is the mgs executable i want to grab to look around in (and that the functions you named here are all from there as well). if i come across something ill hit you up
> i assume the name of your idb is the mgs executable i want to grab to look around in (and that the functions you named here are all from there as well). if i come across something ill hit you up
Correct. If you're only doing static disassembly, you won't actually need the executable though; you can open the IDB file by itself in IDA 6.8 or later.
by GirianSeed at 4:36 PM EDT on September 14, 2018
Whoa, great writeup. I remember noticing the different channels reserved for song/SE/stream playback a few years back when playing around with the game with a certain SPU plugin.
Only thing I could probably add right now is some of the original function names I noticed your IDB was missing. I could probably add more later but these are just off the top of my head.
This is the string hashing function, you'll see this getting called all over the place if you ever look into other areas of the program.
edited 4:37 PM EDT September 14, 2018
by radornkeldam at 5:10 PM EDT on September 21, 2018
Hey. Sorry to interrupt, as I seldomly come arround here. I've been waiting for ages for someone to make a PSF set of this game, as I've always found the original soundtrack to be severely limited, missing all cutscene music. I've just found the game has finally been ripped, which got me quite pumped, but it seems it only contains the tracks already present in the OST, for the most part. Is there any plan to get the missing cutscene tracks eventually?
by GirianSeed at 8:44 PM EDT on September 21, 2018
All the cutscene audio is pre-mixed, sorry.
by radornkeldam at 9:54 PM EDT on September 21, 2018
Please, could you explain that? What do you mean by pre-mixed here? Are they XA streams? It was a long time ago but I believe I tried looking at the ISO for XA and they weren't there...? They are not sequenced tracks played by the game engine after all? What makes them different from the regular tracks?
Do you mean perhaps, that the scenes have the animation, camera movement, etc.. and audio also, all mixed in a single script?
edited 10:02 PM EDT September 21, 2018
by GirianSeed at 2:47 PM EDT on September 26, 2018
All the sound effects, voices, and music are mixed together as a single VAG stream for each cutscene.
It's a limitation of the ps1. It can't stream more than one audio source at a time, so alot of times music,voices, and sfx are premixed into one file like GiriamSeed said, although there might be cases where that's not true. It's usually why you don't hear streamed music while a game is loading something.
by radornkeldam at 1:14 AM EDT on September 27, 2018
@Kirishima Sure, I understand about streaming audio, even when streaming voices many games have to stop music playback, unless they are small samples kept in RAM.
@GirianSeed Thanks for the explanation. I guess it would take hacking each file to mute the undesired sounds so that only the music is left. It's a pity. I have the music for for the opening screen's submarine scene and the scene when snake gets on the elevator sort of burned in my head xD.
Do these VAG files also contain the samples themselves or are they references to other files? Maybe zeroing out the sample data for the SFX and voices would work. How hard would you say that'd be? I ask this because I might try to do it myself if the task doesn't require advanced techniques.
I don't think you get it. The cutscene audio is one big audio stream. No amount of programming would be able to remove the unwanted audio from them.
by radornkeldam at 7:43 AM EDT on September 27, 2018
Oh... I thought he was talking about a MIDI-like sequence where the SFX and voices were played like instruments too. Sorry for the confusion
The music in the game didn't sound different in the scenes, and years ago, when using some old tools that played audio from PS1 discs, I couldn't get any of them to play, so I just assumed they were sequenced music and didn't expect them to be plain streams. I didn't know the VAG format was that.
I havent recently tried this again. Is there any VAG decoder currently. Is it now included in vgmstream? I guess I could try getting different versions of the game in different languages and try to use averaging and compares to, at least, reduce the speech... SFX would still remain there, unfortunately. There are Japanese, English, Spanish, French, German and Italian versions of the game, and I know for a fact that at least the first three have their own dubs, and the others should be the same, presumably. If they did it for Spain, they most surely did it for at least France and Germany... Italy maybe not, but probably yes.
Perhaps not worth it, though... I'll see about it.