Ripping music (and other data) from Koumajou Densetsu .dat files by Atorasu at 8:17 PM EDT on May 30, 2020
I want to rip the music files from Koumajou Densetsu and its sequel for the purpose of music mods in other games. The hex of bgm.dat indicates that each song is an .ogg file, and includes what may be a pointer to that song's location. Additionally, define.dat seems to contain a bgmlist.txt, which could contain each song's loop points. I was told that the bgm.dat file has a --keep-going argument.
KD1: there is definitely a list in the first 0x350 bytes. More precicely: # Struct "DPMX" [header size] [file count] [0] /for each file: {finename}:0x10 [FFFFFFFF] [ ??? ] [file start (relative, after header)] [file size] / # End struct On quick inspection (0.5 hrs, so not really quick!), they seem rotated, i do not know what N-rot (rot-N) is used, have not revised extensively, but each file seem to have its own rot-N. I do not think it is XORed; if it is, then key must be really large.
KD2: Appareance of first bytes suggest that unlike the KD1's .dat, there is no filelist, maybe just a container string, then the data streams one after another. The encryption, if XOR, must be large enough (at least 1K chars), and there is few repetition so i do not think it is rotN-ed. Just my guess in this one.
Interesting findings, though I wish I knew how to help more. Do the files from KD2 at least have evident beginning points? If for whatever reason it might be helpful to have other files such as the .exe for either game, I would be able to provide a full download. There isn't a whole lot outside of the "data" folder however.
Hello; yes of course it's very possible that hints for deobfuscation of KD2's music be in the other files, even .dll's or .exe's - it will require time to investigate. Jap developers are very structured about data resources - i'm 99.999999% sure all music is within those bgm.dat (as well as having dedicated files for static graphics, dynamic-vector ones, sound fx, even for menu messages and dialogs).
Edited: files on KD2 have similar bytes at start, so they at most share the same starting bytes (fake container probably). Progress on KD1.
You're making good progress, the files don't play but extraction is still a good start. Keep it up, working at your own pace of course, this isn't urgent. The MediaFire folder is updated in case you find that the outside files might be necessary.
KD2: XOR key: EF C1 20 01 DC F7 BB 72 FA CB F2 01 It's most frequent string in all of the .dat files (prominently in se_sys.dat). As I expected, a fake container 0x18 bytes size. .ogg files extracted with vgm-t, KD2's ogg. Key works on all the .dat files.
KD1: I upload ADDed (256-X, X:most frequent byte= $0x8) and XORed (with same $0x8) .ogg's that still do not play because they are not entirely decrypted/deobfuscated. Hopefully you or someone else have a good eye to figure some pattern in these - just use any (normal,unencrypted) .ogg as reference for byte analysis/comparison. Value $0x8 (i.e. value at offset 0x8 of each file) contains the most repeated byte which is used for the bit operations. Values between brackets in the filenames are the "???" in my first post detailing the struct.
I made memory dumps of both games. I'm not sure whether or not there would be unencrypted audio, but in both dumps I found loop points (as samples). In the second game's dump, I see the loop points close to their song titles, but in the first game's dump I only see the points for b_alice.ogg. Additionally, I see song titles surrounded by seemingly garbage, but probably useful data. They are valid loop points at 44100Hz, but that was all I could see in 2 memory dumps of the first game. KD1's files may need to be fully decrypted for music and loops, but the presence of the song titles may somehow help. Also if it helps, this is what I found in the dump of the first:
It seems that bgm_19.ogg (Last Phantasm) from KD2 didn't fully extract, the end loop point from the memory dump goes past the end of the song, and the song lacks a fadeout that the rest had. Would you mind telling me how you used vgm-t for the .dat or re-extracting it?
The memory dump for KD1 would be useful hoping that some .ogg chunks would be present there for byte-comparing with the obfuscated .ogg, as a possible way to guessing more about the encryption. For KD2, you can deobfuscate the bgm.dat with the 12-bytes key I provided; I did not find filenames for the .ogg files, how did you find those? Use wxhexeditor or a better alternative to XOR the original bgm.dat with the key, then use vgm-t on the resulting file to extract the music. vgm-toolbox has automatic OGG extraction in Misc. Tools > Extraction Tools > Streams > Xiph.Org OGG extractor. On track #19, it's possible that some chunk was left out of extraction, it might have been some 2KB at most iirc, enough for a data loop point/chunk maybe (i just let vgm-t to do the extraction).
It seems that no data was cut off, I checked the hex of the bgm.dat and the next song starts immediately where that one ends. I extracted with vgm-t, a generic ogg extractor, and cutting all other parts of the hex out, but all resulted in the same file. It still loops properly if I timeshift the audio.
The filenames for the .ogg for KD2 were found in my memory dump. After XORing, I can see that the song names (i.e. "Last Phantasm") and their loop points are stored in define.dat, but not their filenames.
Update: I made another memory dump and looked for a mention of "Ogg." I found the typical beginning data of an .ogg and isolated all of it and saved it to a new .ogg. It seems that the music is indeed deobfuscated in memory. Here's the .ogg. It's the first stage theme, and I'm assuming its internal filename is s_entr.ogg.
Just a tiny update (should have done with my last post): The sample .ogg is concordant with "s_frst1.ogg" instead, both in filesize AND translated "OggS" locations. Should be useful somehow for figuring the algorhytm. ... BUT it's easier to just memdump in case someone does not want to bother - and risking incomplete sound rip. So, if taking the easier path, make sure that filesizes coincide.
I never saw that filename, probably just slipped by me (or I misunderstood it as "forest" and thought that was stage 2). Are you suggesting that I make memory dumps of each song as they are played in game? I was planning to play the full game anyways so I'd be fine with that, especially since this method works.
I'll assume that the loop points are part of bgmlist.txt in define.dat, since the points in KD2 are also stored in define.dat where all the BGM names are listed. I'll see if this information is also in the memory dump, but I'll have to check tomorrow.
In case we don't make progress with deobfuscation, the easier "memdump" should be the path to take. You seem to be proficient with .ogg etxraction. (I was not paying attention to loop points since that information is external to the .ogg files in the majority of cases.) If the game has sound test menu, it would be even easier to memdump.
I've dumped every major .ogg file for stage and boss themes, but in each dump the only loop points present were for b_alice.ogg. It definitely seems as if this file must be manually deobfuscated/ de-encrypted. The MediaFire folder has been updated to include a folder of all dumped music from KD1. If you have any idea on how to extract bgmlist.txt, having the deobfuscated music may help. At least for my use case however, there is no need to continue work on the music itself. The files were manually named. They should all be correct, but I'm unsure exactly what b_comon refers to.
I took a look into the .dat files again, I noticed that "black.png" exists both inside system.dat and outside of it, in the folder with them. The files definitely seem to be obfuscated with some Rot-N. However, I could not find anything to try to reverse this, maybe I'm not good at searching. Are you aware of any programs that I could use to reverse this? It would likely help with the bmglist.txt as well.
I do not have any leads friend. This is a proper encryption scheme, not a simple XOR or ROT scheme. Sorry if I did not reply earlier. Barely I found that the process is dependant on the lookalike "hash" (i.e. the 32-bit number after the [FFFFFFFF] of each contained file in DPMX) and the original/encrypted bytes, such as /decrypted byte [ZZ] := ff( [hash], /encrypted byte [ZZ-1], /encrypted byte [ZZ] ) ...or something of the like - for byte at offset ZZ, "ff" some function, linear maybe. Also, when [hash]==[0x0], we have /decrypted byte [ZZ] == /encrypted byte [ZZ]; i.e. no encrypted file in DPMX. So many combinations of this hash value (2^32), so searching by byte association between decrypted and encrypted data is of no use. Somebody with ASM/dissasembler knowledge would do better progress with this.
Alright then, I was really just taking a guess looking at the structure of the files, I could try once again to look in a memory dump, just dumped from a different point in the game. If not, I'll have to keep messing with it, or I'll just have to loop manually. Either way, thanks for what was able to be extracted.