h4m / HVQM4 audio by hcs at 2:47 AM EDT on July 21, 2010
Haven't seen any discussion on this for a while, someone asked me to have a look at a file from the Master Quest disc, which contained a .dol and .h4m video, with soundtrack. I've worked out how to play the audio, and I present h4m_audio_decode over here, which will decode to PCM wav.
Turns out to just be IMA after all, so I replaced the decoder with the more standard one from vgmstream, checked that they behave identically. I'm somewhat unhappy with the sound quality, but this may be due to poor encoding.
I've been trying to get this to work, but I haven't had any luck. When I type the following:
h4m_audio_decode.exe file.h4m output.wav
I get a line telling me this:
expected audio frame size to be 0x00000650, got 0x00000652
It gives me this message no matter what h4m file I use. I've extracted them from the Gamecube version of Mystic Heroes. I'm not sure if anyone's familiar with that, but... any information on what's going wrong?
EDIT: The value at 0x00000032 equals 652. I can modify it to equal 650 instead, and then I'm given a list of stats about the file, and then a different error:
unexpected frame id at 00000060
The 4-byte value of that address is 00000648. I don't really know what I'm doing hahaha, I just thought I'd goof around, and maybe this information will help someone understand my problem.
Just a quick question popped in my head: if a GC emulator (eg Dolphin) can play .h4m videos while running an Iso, than why is it so problematic to make a stand-alone player?
It seems that you don't understand what is going on there.... The emulator (aka Dolphin) is running, as though it's hardware. Thus, the *hardware* is running a game disc, which contains software. That software (on the game disc) contains the code necessary to read, interpret, and playback the *.h4m data.
In other words, think of it this way - Let's say that a closed-source program exists for the AmigaOS (yes, Amigas still exist, and are currently manufactured, last I checked), that can play back *.h4m files. You'd have to take that program, disassemble it, and then rewrite the assembly code into another, higher level, language (aka, rewrite it to C, C++, JAVA, C##, or whatever).
So, the existence of an emulator essentially has nothing to do with *.h4m playback. The two have very little, if any, relation.
That said, I suppose it's possible (depending on the debug features available in the emulator) to see what's going on "in the background" while a *.h4m file is being played. Watching how memory is moved back and forth, what changes, what goes in, and what comes out, etc., you might be able to "learn" or figure out how the software runs/works, without having to disassemble it. Therefore, assuming that's possible, you might be able to write a program that does the same thing, and thereby can play *.h4m files. Personally, although I have no experience in the matter and therefore can't say for certain, I'd think that this method would be almost as difficult, or time consuming, as rewriting the disassembled assembly into a higher level language.
So, basically, the difficulty existed before the emulator existed (though was of course still possible), and the existence of the emulator doesn't really alleviate that problem. It just makes it possible to use another method, which is probably almost as difficult as the original method, possibly moreso.
Hopefully that provides an answer to your question. Mouser X over and out.
Hello, I'm trying to use your tool to rip the audio from the pokemon channel h4m files.
So far the tool works great, but the files seem to be multi track, so I only get the japanesse audio.
I have both usa and pal versions of the game, so comparing them I've reached the conclusion that the h4m files are multitrack:
USA h4m: ~300 mB PAL h4m: ~400 mB
Using your app in both files generates the exact same output wav file (checked with md5) which is the japanesse language track.
I think that the number of tracks is in 0x3f at the header and that it can be 0 or 1 if there's only 1 track, as it is 0 at another h4m file of the game, which is the company's logo video.
I'm trying to modify the source so it will dump the next audio track instead of the first one it finds, but so far no luck.
For what I've seen, the file has a video frame, then an audio frame and repeats right? Comparing with an hex editor makes me think the different audios are just one after the other, but I dunno much about video files structure.
Any help will be appreciated
EDIT:
Interestingly enough, replacing the h4m file on the pal game with the USA one makes the video play in japanesse even though there's nowhere in the game to select that language. Probably when it can't find the selected stream the player just falls back to the first track (japanesse). Selecting english in the language menu makes the movie play in english.
By the way, the movie has subtitles, probably in another file (g4m), which are associated with the audio stream playin, as each language is accompanied by its subs, even japansesse.
Using hcs's source code, I started a adding an H4M demuxer in VGMToolbox for this a long time ago. I seem to recall that each block has it own initializing values (predictor? can't remember...), while the IMA format only has a single predictor (in the header) for the entire file.
Consequently, I not sure the converter you're requesting is possible.
hcs, is there any chance you could try to update your tool to decode the H4M files from Resident Evil 0? Sample file here: mt04.h4m. I can upload more if needed.
Aww. I'm not at all familiar with neither IMA ADPCM nor the HVQM4 format. Maybe sometime in the distant future I'll try looking into it if no one else has...
Is anyone willing to fix hcs' audio decoder to work with the movie files from Biohazard/Resident Evil 0 for a small chunk of money; say $50? (I can only pay through PayPal.)
I've had it on my todo list to get the audio out of those pesky RE0 movie files for a long time now and I've been unable to fix the decoder myself. All I've established is that the gain changes significantly every other block (i.e. usually every other 32064 samples), but I have no idea where exactly the problem is. I've experimented with some modifications to the code, but I'm getting nowhere. A couple of files decode without issues, but everything else fails...
Is anybody still working on this? It's a pity there is no decoder for this yet -people has been asking for it for more than 10 years.
I can't help as a programmer, but I think you should contact people from the Doom9 forum. There you can find Avisynth developers with lots of knowledge about how video and audio work. At least, if you have recognised the audio codec, they sure will be able to help. And as some of them are eager to program and develop new things, I'm sure they will help with the video codec as well.
I took it upon myself to update the decoder to support the Resident Evil 0 movies and potential other games (use -n switch; for the moment I don't know if/how the ADPCM variant can be automatically detected). Probably half the time was spent on debugging a really dumb misinterpretation of the assembly, so I hope this will be of use...
I spot some additional decoding routines and IMA ADPCM step/index arrays in the Resident Evil 0 executable, so there may still be issues with files from other games.
I haven't made a pull request on hcs' Github repository, because I don't have any Git tools on my system and frankly don't understand much of it. I guess I'll need some Git for dummies guide or something. Regardless, I've included the source code in the archive, so do what you want with it!
- Support 8-bit (A)DPCM (can someone properly name/identify this coding?) - Support raw PCM - Support non-stereo channel configurations - Support multi-track audio (Tales of Symphonia, Pokemon Channel) (*) - Tweak some sanity checks to support more files (Bomberman Jetters) - Consistently write little-endian samples on all machines - Correct the offsets for audio/video frame counts (**) - More header values figured out (detect audio format and display some more info) - Add -h option to display header info and exit - Remove output path argument since there may be multiple tracks - Remove pointless switch from v0.4 and old/redundant code for IMA ADPCM decoding - Refactor a bunch of code
(*) The tracks in Pokemon Channel are meant to be played separately as they are different languages, whereas in Tales of Symphonia, it seems like it's actually 4 channels rather than two separate stereo tracks, so you'll have to combine them yourself.
(**) I verified this from movies without audio in a bunch of games.
So, as it turns out, the IMA ADPCM code that hcs originally used worked flawlessly for some cases, but there were some edge cases (e.g. Resident Evil 0) where the step index would eventually get out of range in the H4M variant. I compared the decoded audio of the movies from Master Quest generated by the original decoder (v0.3) against the output of the code I've reversed from the Resident Evil 0 executable and can say that they're identical. So I've simply removed the old code instead of having a switch to use it.
Furthermore, support for the other audio formats I spotted has now also been added. Whether I got things right is another matter, because I've checked as many games as I could find (see below) now and didn't find a single H4M/HVQM4 video that didn't use IMA ADPCM as the audio format, so I haven't actually been able to test anything else.
I've also looked at Pikmin, but unfortunately wasn't able to figure it out. It uses an audio frame format I haven't seen before, and, if we're interpreting the main header correctly, the bitdepth is set to zero, so I have no idea how to handle that. Someone's gonna have to reverse Pikmin I think.
I've tested all movies from the following GameCube games, which now all decodes properly, as far as I can tell:
Bakuten Shoot Beyblade 2002 Nettou! Magne Tag Battle! Bomberman Jetters Kirby's Airride Mystic Heroes Naruto - Gekitou Ninja Taisen! 2 Pokemon Channel Rayman Arena Resident Evil 2 (Biohazard 2) Resident Evil 3 Nemesis (Biohazard 3 Last Escape) Resident Evil Zero (Biohazard 0) Tales of Symphonia The Legend of Zelda - Ocarina of Time - Master Quest Viewtiful Joe
Thanks, SubDrag! I don't know how many games use the format, but I went through all that I could find mentioned in this thread, and whatever I could find via Google. I checked Star Fox Adventures already but didn't find any HVQM4 videos there. Unless they're compressed or something?
I'll check SSBM and Pac-Man World 2 as well, thanks MurraySkull.
Based on the header, there aren't any audio frames, and all the vital audio metadata is zero too. So yeah, there's probably no audio for that movie. There may be audio in a separate file though, like in Eternal Darkness and others.
Pac-Man World 2 had some videos with audio. No problems decoding them.
Hi everybody! As this is going on nicely, I'm once again asking for an audio demuxer if it's possible now. If you can decode the audio there should be an easy way to construct a demuxer, right?
The only problem I see with demuxing audio in H4M videos, is that, as far as I know, there is no program, let alone container, that support these specific codings (aside from 16-bit PCM, which, although supported within H4M videos, has never actually been used afaik).
So, are you really asking for just the pure IMA ADPCM audio, or do you want it demuxed into a container of some kind? Because I don't know of any that support this particular variant of IMA ADPCM (there are edge cases where some cannot be fully decoded as strict/standard IMA ADPCM, due to a slight change in the step/index tables).
Actually, I just realized there's another deal breaker. (It's been a little while, so my memory is not too fresh. Sorry.)
Anyway, there isn't really such a thing as a "raw" IMA ADPCM stream. There are special frames that contains sample history (16-bit PCM) and a step index, which are of course crucial for decoding the ADPCM-encoded samples. So the only thing that really makes sense is extracting the complete audio frames, so that you can detect and extract those parameters.
The decoding routine in h4m_audio_decode works on frame buffers, so you are free to copy that if you want.
Sorry if I seem avoiding, but if you are okay with this, I'll see what I can do.
@Nisto/hcs - do you happen to know if output is sample-accurate?
I see every frame it writes the header hist sample (line 300), then decodes all nibbles but the last (to write an even number of samples). This kind of model is used in some IMAs (XBOX IMA), but *not* others (Apple IMA4). The output difference would be minimal though, just curious.
EDIT: checking some samples it does seem most frames are packed considering this extra header sample (ex. 0x643, with 0x642 data size/samples + header) so it must be used.
@bnnm: I'm not entirely sure I understand what you mean to be honest. Are you saying it is (or could be) writing one sample too little, or one sample too many? If anything, I can see if it might write one sample too many, because of this line:
for (int i = 0; i < 2; i++) {
since a mono stream might contain two samples in a byte, and this doesn't allow the program to check if there are actually any remaining samples after the first nibble (since it's kind of "trapped" in processing both nibbles). Is this what you're getting at?
The sample count in the frame header definitely includes the history sample. Unless the RE0 code is wrong, too.
Regardless, I'll definitely try to fix the issue, whatever it is. I just hope you can enlighten me a bit more on this.
Also, I've implemented demuxing capability now, but the code is very sloppy (sketchy even), so I'll see if I can do a somewhat decent rewrite before I release it.
@hcs: Thanks, mate. You are welcome to update your repository on GitHub.
Thanks for all your work! I'm eagerly waiting for a demuxer. :) You see, I'm collecting VGM in raw form and I need the demuxer to NOT save the h4m videos but only the audio stream. Only a rip with demuxed videos is truly complete (at least in my eyes).
I added basic support, barely tested and output is not 100% correct yet. I'll add extended IMA/RE0 and stuff later once I procure some samples.
***
@Nisto - sorry, it's a bit hard to explain. The number of samples per frame it writes is correct, "which ones" I wasn't so sure. Here is an example:
Pretend we have a "frame_format 1" mono audio frame (with ADPCM hist/index) of size 0x0c+0x02+0x04 - 0x0c are frame_type + format + size + frame_samples - 0x02 are 1 hist sample + index (in format 1 style). - 0x04 are 8 nibbles/samples. So the frame actually contains 1+8 samples.
Now frame_samples could be: - 9: means you must output *all* samples, including hist (not sure if H4M does this). - 8 (or lower): you can output in two ways: * A: write header sample, then write nibbles - 1 (1 + 7) * B: don't write header sample, then write all nibbles (0 + 8)
The code seems to use uses the "A" model but I was hoping you could confirm it's correct (since you reverse engineered RE0). Looks fine but just wanted to be sure since I've seen different ADPCMs using those models, it's not so easy to guess.
***
Another oddity I noticed, it uses the first hist sample as right channel. Is this intended?
Ex. if (stereo, frame_format 1) ADPCM setup is "7F816582" (as in, sample 7f80 / index 1 + sample 6580 / index 2) it writes 6580=left,7f80=right channel.
(IOW channels may be swapped? ATM vgmstream swaps channels vs h4m_audio_decode, if you need to compare)
@bnnm: It definitely uses "model A". Here's an excerpt of the decoding routine with some of my notes (hopefully enough to give you some idea what goes on even if you're not familiar with PowerPC assembly):
As you might guess from this code, there are separate functions for different channel configurations and all the different codings. That felt redundant, so naturally I did optimize what I originally ended up with in C, to support varying channel configurations. But I think I did so without screwing things up.
> Another oddity I noticed, it uses the first hist sample as right channel. Is this intended?
Yep. I've just double checked the output with the help of Dolphin. Aside from Dolphin/GameCube writing big-endian samples instead of little-endian samples, the output of h4m_audio_decode is identical.
If you want to confirm it for yourself, here is what I did:
1. Open Dolphin with -d switch 2. Open Biohazard 0 (GBZJ08) 3. Put a breakpoint on 801BFD04 and 801BFF5C 4. Start a new game (mt00.h4m plays) 5. Wait for a breakpoint to be triggered 6. Check register r5 for the output address 7. Check register r7 for the sample count 8. Press play to trigger the next breakpoint 9. In the Memory window, press dump MRAM (find it in %userprofile%\Documents\Dolphin Emulator\Dump) 10. Go to the output offset (address - 0x80000000) in the memory dump 11. Copy the samples (sample count * 4 for decoded stereo IMA ADPCM) 12. Press play and repeat from step 5
You could probably automate all of this somehow. I decided to just check a few frames by hand. It covered all frame formats of course.
Thanks for the confirmation, just wanted to be sure since those tiny bugs tend to slip by (vgmstream played all MS/XBOX-IMA slighlty off until recently due to mini bugs when handling those models).
I'll add a proper decoder based on your code later, but it's quite playable ATM with standard IMA.
@Nisto - Something I noticed. I was adding proper decoding to vgmstream (with those big tables and stuff). I compared the output with the old IMA method (shifts-and-adds) and surprisingly I get byte-exact results vs h4m_audio_decode (at least in RE0's 1st disc videos).
Makes sense since (I think) tables are just mapping all possible IMA output (cool optimization though). Maybe there was some bug in the old h4m_audio_decode causing distortion? Or is there some particularly sensitive video?
Also, do you know which games (or can upload some samples) use ADPCM8/PCM or mono H4M?
Oh, I meant in vgmstream. To clarify: - I re-implemented vgmstream's H4M using your 'expansion' algorithm (converts nibble to sample with big [89][8] tables, like v0.5 line 379). Now vgmstream creates files that are byte-exact vs h4m_audio_decode v0.5 - Then as a test, I swapped the algorithm with the 'classic IMA' one (converts nibble to sample using shifts-and-adds, like v0.3 line 192) This also creates files that are byte-exact.
Since (I think) the big table is just the output of all possible classic-IMA combinations of steps/indexes, makes sense both ways decode the same. It's fun to know they opmitized it like that tho.
***
Checking v0.3 source (line ~160), I think the actual problem is that it doesn't understand frame format 0x03 (introduced in RE0 I pressume?), unlike v0.5.
At 0x57CD8 it should read: 0xFDD4,0x30 + 0xFC36,0x2F (0x03*2). Instead it reads 0xFDD4 + 0x30FC (0x02*2) > a.k.a. 0xFDD4&0xFF80=0xFD80, 0xFDD4&0x007F=0x54 + 0x30FC&0xFF80=0x3080, 0x30FC&0x007F=0x7c = 124 = error!
Ah, okay. I looked over the vgmstream sources now and noticed that both the decode_standard_ima function and frame parsing is a bit different from that in h4m_audio_decode v0.3, so it makes sense if there's a discrepancy in the decoding process.
Frame format 0x03 is used in plenty of other H4M videos from various games, but I think you're right about the decoding function in v0.3 just not handling them appropriately.
So by all means, stick to what you've got if you like it better or whatever. If it generates the exact same output, that's totally fine by me. Just curious though, did you compare *all* of the RE0 disc 1 videos?
so, about those Pikmin vids... the audio codec employed for those vids are not unknown. i managed to connect the dots while examining .stx and .h4m files and i somehow found out that said codec is practically the same codec that was used on .stx files used by the game.
how did i manage to find this out, you ask? here, have a test drive: hvq4_hvqm4.bms
do notice how the output .afc file sounds garbled yet you can still hear literally anything at all. assuming you have vgmstream of course.