Ive already had a go at these files some time ago. From what ive seen, theyre made up of 4 sections, two of which contain sound and two which i assume to contain instrument/sample info and the sequence. I have my notes somewhere, ill try to dig them up when i get the chance.
So, turns out i didnt get as far in as i thought i did, but what i have found out so far is this: The file begins with a 32byte header (8 uint32_t) which go like this: offsetMusicSampleTable? unknown offsetEffectSampleTable? unknown offsetSequence/Instruments1 unknown offsetSequence/Instruments2 unknown (offsets are given as number of DVD sectors, so multiply by 0x800) The tables i havent figured out. They are immediately followed by VAG/playstation ADPCM data. Music samples have a sampling rate of 44100Hz, effect samples are 32000Hz, both are mono. The first instrument/sequence block seems to have three sections, starting at 0, 0x800 and 0x1800. The second instrument/sequence block is made up like this: 64byte header (first uint32_t is the size in DVD sectors(?)(i found some notes that referred to this as the number of tables, with the following uint32_t being each tables offset, im not 100% sure about this since i havent touched the files in maybe a year or so!), the second is the offset of a table, the rest is not used). The table contains offsets to (i think) sequences of some kind (relative to the beginning of the file in bytes). A sequence seems to be made up of uint32_t with 0xFFFE0000 (or 0x0000FEFF in the file) as terminator. Thats about it. If someone can help me continue working on this id be really glad (cause the music is what made me start reversing the games file formats in the first place)
Ah neat! Someone was doing some digging on this sequence format! I'd really like ZOE2 in psf someday. Sadly I cannot help. Perhaps give Nisto a shout? He worked on another Konami sequence format.
@punk7890-2: It looks like ZOE2 runs on something between MGS2 and MGS3 code wise, both of which use completely streamed audio, so youre out of luck for psfs, but someone may write a tool for getting the streams out of the datafiles (from what i remember, mgs3 is rippable but mgs2 is not...)
Any progress on this and / or ZOE2 music extraction? I've been fiddling around in RAM on ZOE2 and sadly I can't even found myself a music modifier. I can however mess up the audio which makes me think it is sequenced because a bunch of weird instruments start playing.
I'd love ZOE2 game music someday. This site has done it (somehow) http://www.zoneoftheenders.org/music/zoe2/
As luck would have it, i just had another look at the files last sunday. Progress was minimal however, i modified my code for handling the files to dump playable VAGs and read through the documents for CSL data (aka the official sdks "stock" sequence and soundbank formats, sq, bd and hd), however, i havent found anything clearly resembling this kind of data. Ill double-check my code and post it later for people to try themselves. If someone has experience with CSL, have a look at it and see, if you can find things!
Sidenote: i also checked the ZoE HD collection data which i recently got my hands on. While the devs left a lot of interesting data on the disc (i mean it), i found nothing related to the original music (yet), and the wwise data isnt helping either, because i havent found a tool for processing sequenced wwise music that can handle the files on the disc...
Alright, fixed up my code, get it here. Compiling should work with anything on any platform, linux users may need to fiddle with the includes to get directories working. Usage: sdx-extract SDX-file (e.g. test.sdx) This will produce the following files test.sdx-BGMsampleBlock.bin test.sdx-SFXsampleBlock.bin test.sdx-SequenceDataBlock1.bin test.sdx-SequenceDataBlock2.bin as well as these folders: test.sdx-BGMsamples test.sdx-SFXsamples The folders will contain the respecive samples as ready to play VAGs (note that some SDX files contain "empty" VAGs with no actual samples inside). The binary files are the individual sections of the SDX seperated with no modification done (the sampleblocks each contain a table with info regarding the samples which im mostly ignoring, if that interests you). The real fun is in the two sequence blocks, where block1 contains 0-0x800 and 0x800-0x1800 (look like something releated to channels/instruments) and 0x1800-EOF (something sequency looking). Block2 has a table at the beginning and then sequency data, refer to my earlier post about how this may work. So, uh, enjoy?
Here is a dump of stage/fa1/eu00007d.sdx Note that the code i posted will work for ZoE1 only, i have yet to come across any docs regarding the compression/crypto used in ZoE2 STAGE.DAT (and aside from that, i didnt look too much into the second game, maybe ill have a look sometime soon).
@AnonRunzes: I updated my code to (hopefully) include linux support as-is. Get the updated code from the link of my above post, save the contents to a file called sdx-extract.c then run gcc -Wall -ggdb -o sdx-extract sdx-extract.c Then run the program using something like ./sdx-extract file.sdx
Youll want a cross-compiler like mingw, i cant help you with setting it up on linux though, sorry. On windows im using msys2 with the included mingw packages.
I tried running the data i assume to be sequences through some code i adapted from Nistos kdt converter, however the output i get only contains Note-On events with note 0x00 (i assumed that kdt may be related to the sequence format used here because the end-of-track markers look similar. Is there any documentation on the SQ format used in sonys CSL?).
The ADPCM part(it`s where all of the samples are stored) in the "pk######.sdx-BGMsampleBlock.bin" file is actually a .bd file which starts after the sound index part of that file. That`s about as far as I got, unfortunately.
Missingno_force: KCE Japan appears to be using a different sequenced format from what KCE Tokyo uses. I think it may be similar to the format used in MGS(1), but I haven't looked much at any of it.
Nisto: This sounds about right, considering the way the game data is laid out (i assume that ZoE1 is somewhere between MGS1 and MGS2). Good luck in finding things!
Missingno_force: "both [MGS2 and 3] use completely streamed audio"
MGS2 definitely uses sequenced music like MGS1/ZOE1/2. Only the PC port used streamed BGM.
The MGS2 and 3 HD Editions also use .sdx files and they seem to be same or similar format at first glance in a hex editor. (MGS3 only uses them for sound effects like MGS2 for PC though.) They're probably carried over from the original PS2 versions, unlike ZOEHD which replaced the whole audio system with Wwise. The Vita port of MGS2 has them all combined into one file. "mgs2_misc_**.psp2arc/misc/unified_sdx_archive.sdx"
GirianSeed: hrm, looks like i assumed too much from MGS3 (which is using streams for music, as seen when using demux_dat). The fact that there are no tools to handle MGS2s STAGE.DAT doesnt help (me) here as well. Thanks for letting me know! By the way, most, if not all, of the original files from ZoE1 and 2 can be found on the ZoEHD disc with a bit of digging around (together with some logs from the texture conversion tool and more).
Here's a selection of MGS SDX archives from what I have on-hand. Hope they're at least a bit useful. http://www.mediafire.com/file/wm1okag07sow7ts/SDX.7z
MGS2 seems to be set up like MGS1, where a set of common samples are held in resident memory throughout the game. In MGS1, they were stored inside of the init stage. Most stages just used this default set except for s16 (Rex's hangar) which needed two additional choir samples.
MGS2 obviously has a lot of different sample sets to load in for each stage though it still appears to keep some inside resident memory, namely the ones used for the alert phase BGM. Also, MGS2's init doesn't have any SDX archives, but it seems the select/sselect and r_* stages are used for this.
Edit: Oh yeah I forgot to mention two of the stages in MGS3 actually do contain BGM in their SDX files -- title and theater for basic actions and demo theater respectively -- though it's not sequenced. Just two streams (for the left and right channels).
Oooh, thats awesome. I had a quick glance over them and the MGS3HD ones have some kind of header/magic number thing inside. Ill take a closer look soon and see if i can get something from this (aside from the fact, that the PS3 ones are big endian), thanks a lot!
by GirianSeed at 10:47 PM EST on February 28, 2017
You can use unStg to unpack the individual .stg files. They won't have proper file names, so I included the PC stage/stagevr files for reference. I also created an incomplete dictionary.txt file as a demo since unStg at least outputs the files with their hash IDs.
As for the sound data: wv0000**.wvx : Contains audio samples. sg0000**.mdx : Binary data. Possibly the sequence data. unStg gives these a .mt3 extension. "MDX" is the BGM directory in the PC ver. se0000**.efx : Binary data. Probably cue data or something. "se" obviously stands for sound effect and efx.mgz in the SE archive in the PC version.
Although the PC port completely redid the in-game audio because Digital Dialect couldn't figure out how the original worked and KCEJ didn't deliver their tools in time, they left these files in but deleted their contents.
As for the STAGE.DAT files from the later games, Solidus can decode the header and reveal a list of stages as well as extract files from The Twin Snakes. (The Document of MGS2's STAGE.DAT is an exception though because the number of stages exceeds what it's willing to spit out.) Supposedly This tool can too but I remember it not working. I don't think the rest of the file is XORed though.
Also, MGS2, TTS, and MGS3 seem to share the same file hashing algorithm. This carried over into MGSHD where a lot of files still have hashed names (less so in the Vita port though). MGS1: scenerio.gcx = 54ea63.gcx MGS2: scenerio.gcx = 00180720.gcx
Alright, i had a look at the files GirianSeed uploaded (not all of them though), modified my code a lot (will upload that later) and found some interesting things: The SampleBlocks in the sdx from ZoE1 are the same as the wvx files found in MGS1 (though you have to apply different sampling rates when extracting samples). The SequenceDataBlock2 looks the same as the mt3 in MGS1 (though i still havent got a clue as to how these work). The sdx found in MGS2HD look the same as those in ZoE1 and can be handled exactly like those. The sdx in MGS3HD are pretty different, with actual magic numbers/4 letter magics for different sections of the files. Also, the sequence data in those looks like plain midi (check MGS3HD_PS3_US/mg1/pk0001a1.sdx at offset 0x2B70). Ill add more when i get back home later. Edit: Here is the current source for my ZoE tools. It contains more then just the tools for sound, but im too lazy to remove those from the makefile for now. Compiling should work on any msys2/linux setup with gcc and make, just unpack, cd into the folder and run make. For those on win64 without a compilation setup, here are the same applications as precompiled binaries. Ill have another look at MGS3HD later and possibly at MGS2s STAGE.DAT, once i get the unStg and unStage ported to C. If you find anything that doesnt work or that might help in exploring (especially the mt3 sequences), tell me!
Well, it is a lot easier to get the files from the MGS2/MGS3 HD Remaster than it is from their original PS2 versions.
I`ll probably be working on a .sdx dumper for these games at this point. After all, they`re the only ones that are neither encrypted nor compressed in these STAGE.DAT archives.
I looked a little into the STAGE.DAT format and found some general info. There's two types of DAT files KCEJ games use -- DATs for streaming data (BGM, CODEC, DEMO, MOVIE and VOX) and DATs for cached/resident data (STAGE, FACE in MGS2, and SLOT in MGS3). Looking into face and slot might be helpful.
First, FACE.DAT & STAGE.DAT from the MGS2 Trial Edition seem to have a somewhat different header/format from the final retail game. STAGE.DAT's header isn't encrypted here and you can see the stage names clear as day in a hex editor.
Second, FACE.DAT in MGS2 has a non-encrypted header and Solidus can even read it by default, however it refuses to read any version of MGS3's SLOT.DAT at all. Won't even spit out a header file.
Here's some stage headers and DAT files I ripped from every KCEJ PS2 game I have on-hand at the moment. It has all the versions of FACE.DAT as well as the Trial's STAGE. I also included the Guy Savage stage (cdrom0:\ABC\STAGE.DAT) and some of the smaller SLOT and STAGE files. http://www.mediafire.com/file/4rgp36byoc2ogjc/STAGE.DAT.7z I also made a comparison of the contents of FACE and SLOT from the versions I could actually explore and included a few notes about the HD Editions' file sorting system and stuff. http://www.mediafire.com/file/ledx14455x56n1v/STAGE_COMPARE.7z
Also, I dunno where "MT3" came from. Might just be something SecaProject made up. Either way, I'm pretty sure MDX is the official extension. A couple .src files left in stages s19a and s19b in the PC data (both timestamped 11/18/1998 -- way before the PC port started development) use that name.
One last thing. Neither the QuickBMS script nor your dat-extract tool for ZOE1 works with the zoe.dat file from the TGS 2000 demo build. Looks like pretty much the same format as the final game just with a few slight differences. http://www.mediafire.com/file/d8rr59srga0rdvu/ZOE_TGS2000.7z
Edit: I forgot to write in my notes on the HD file system that bp_streams.txt lists the files in the original order that they were stored inside of the DAT file. So for demo.dat in the original PS2 Subsistence the first five files would be these:
Regarding the TGS2000 version, they shifted the offsets of the filelists and filecontents by 0x18 before multiplying. I added a dat-extract_tgs version to both the source and precompiled binary archives and modified the sdx-extract code to use the mdx extension instead of mt3, get the links from my post on page 2. Ill have a closer look at the STAGE.DAT files later.
Okay, so i tried getting into STAGE.DAT from MGS2 and later, but neither could i wrap my head around the header (even the uncrypted one from the MGS2DEMO) nor could i get the decryption from secaprojects ruby code to work (sucks if a required value is missing). There dont seem to be any filelists in there at all :/ As a little heads up though, i hacked together a MDX splitter that splits the sequences(?) found inside the MDX into seperate files. Once again, source and win64 binaries at the links on page 2. The splitter supports MGS1, ZoE1 and probably MGS2 (the differences between versions are minimal, just the size of the subsequence tables). The smallest a sequence file can get is 4 bytes (just the terminating 00 00 FE FF bytes). I already tried looking into midi to see if there are similarities, but to no success. Maybe its got something to do with the base "message" size being 4 bytes (all sequences i checked are a multiple of 4 bytes in size). As always, tell me if you find something :)
In the MGS3 Trial Edition from US PSM demo disc #86 in \MGS\MGS3.ELF at offset E1890 you can find the string "inflate 1.1.3 Copyright 1995-1998 Mark Adler" which hints that some kind of zlib variant was used for...something. The string isn't present in the final versions but that might just be because of how it was built. I dunno.
I also looked into the source code for Beatmania 5thMix since it was made by another team within KCEJ and some of the code was recycled from MGS1 but I don't think there's anything useful. sound.lib is precompiled and it doesn't look like the source is present (made by a different user) and even if it was I don't think the game used any sequenced music anyway. (Might be wrong there; never played it.) On the subject of compression it used some variant of LZSS derived from a file named p_decode.c from some kind of Konami source library.
None of that's probably any use here but hey I like studying KojiPro games.
Beatmania uses sequenced music by the nature of the game (check on youtube, youll quickly see what i mean), but i assume they did their stuff differently. Getting into the data from the MGS3 demo would require a working decrypter for the files (which we dont have right now). On another note, do you know, if there is an application out there, that can handle wwise sequenced music (.bnk with BKHD magic) in big endian format? If not, if there is info on the format out there? When i tried searching for info on it, i found two applications which were made for little endian files and didnt work on the files from the PS3. If this works out, ill try getting the sequencedata from the HD collection to help me figure out the data on the PS2.
Ah. I was just assuming since most of the code I'd read for Beatmania was dealing more with XA data. For what it's worth the file \work.5th\include\SD_CLI.H was originally written by Kazuki Muraoka for MGS in 1996, though it's been modified a bit for BM. I don't think there's much to be gained from looking at it but I still find it pretty neat.
On a more interesting note though, the TGS 2000 build of ZOE1 still contains debug symbols, unlike how most PS2 games from Konami stripped them out.
And sorry, but I've really got no idea about Wwise in this case. ZOE HD's the only game I can think of that does anything that complicated with the BGM. It might be a custom setup by High Voltage Software.
While i had almost zero progress with the mdx sequences from the original, aside from adding a switch to skip empty sequences for my splitter, i checked the wwise banks in the HD remake. Turns out that HV isnt using completely sequenced music, but uses more premade samples (like drum beats or complete melodies) with panning and reverb already applied, as opposed to the original, where they had only a few "long" samples with complete melodies, but all of them were mono with no reverb at all. I doubt ill get any useful info from the banks :/
I also checked the sequences for the titlescreen, despite only playing one sample with reverb applied, theyre pretty big IMHO (check **0000ff.sdx in root/stage/title).
I also updated the archives with the little modifications ive made since the last update, get the links from my post on page 2.
Come to think of it, yeah, they are pretty different. I guess I just never listened all that closely because the original had a lot of phrase-samples as well. It's possible to sequence music in Wwise after all but yeah it won't be too helpful for the PS2 stuff.
By the way the *-SequenceDataBlock1.bin files look like a modification of the se0000**.efx format from MGS1. There's some FF padding but very similar otherwise.
Thanks for pointing that out, i checked the files again and it does look like Block1 and the efx files are more or less the same. It looks like theyre made up like this: [for ZoE only] 0x800 bytes of unknown use (im still guessing channel management but not so sure anymore) The next 0x800 [MGS1] or 0x1000 [ZoE] bytes are a list of some sort, each entry is 0x10 bytes long. The first 4 bytes are a miniheader, where the second byte is the number of sequence offsets after the miniheader. Following are 0 to 3 offsets (with 0xFFFFFFFF being a blank filler) These offsets are relative to the sequenceblock of these files, which starts at 0x800 for MGS1 and 0x1800 for ZoE. After the list come mdx-like sequences (if not exactly the same kind), however, there are no empty (as in only containing the end marker) sequences.
I updated the archives with an efx-splitter (which dumps the 2 or 3 blocks and splits the individual sequences) as well as fixing some leftovers in some of my source files. As always, get the links on page 2.
Also sorry for the late reply, the thread went under in me having a finals thingy to take care of, ill check this stuff out ASAP!
Edit: I ran the mdx ruby script against both, mdx from MGS1 and ZoE1 that i extracted with my tools, and in both cases i got a more or less usable midifile! Looks like ill have to read up on how to midi so i can add samples to these to see, what kind of sequence the MDX actually contain (there still are MDX-like sequences inside the EFX files).