Anyway, I have finally been working on importing data from the Portable Music History archive (PMH). Of course, I already have a bunch of Game Boy items in the archive but they lack the rich metadata added by PMH. The challenge is munging the .m3u files bundled with each PMH archive.
How common are Japanese titles in the metadata? The first .m3u file I studied (for Castlevania II: Belmont's Revenge) had a katakana (?) title. I haven't been able to find that in any other files I have spot-checked.
Also, while there isn't much documentation about the characteristics seen in these .m3u files, I think I have figured out most of the details. For example:
DMG-CWJ.gbs::GBS,19,Opening [Opening],1:09,,1
First CSV field defines filename and system type (is there ever anything between those 2 colons?); second field is track number within the file; third field is title; and fourth is play length.
Is there ever anything interesting in the 5th field? About the 6th field, that is always either 1 or 10. Empirically, 1 appears to indicate a non-looping track while 10 is probably a looping indicator. As I close?
Anything else I should be aware of when munging these files? Otherwise, thanks for being so consistent in the tagging!
There shouldn't be any other games with Japanese tags on pmh, I try to keep everything in ANSI if possible. Most (all?) japanese playlists have the text in Shift-JIS encoding.
Format of the playlist header is:
# @TITLE Game_Name (in order of preferrence USA/European/Japanese region) # @ARTIST Developer, Publisher (only the earlier release) # @COMPOSER Composer, Arranger (if available) # @DATE First release date in YYYY-MM-DD format # @RIPPER Ripper credits # @TAGGER Timer and tagger credits
I'm not using either loop_length or number_of_loops, but there are non-pmh playlists that use those.
$song_number can either be in hex or decimal with various starting values for various file types. pmh is standarised to always use decimal values starting with 0.
For pmh $fadeout is either 1 for non-looping tracks, 5 for looping tracks shorter than 21 seconds and 10 for looping tracks longer or equal to 21 seconds.
$title can have commas either as \, or encapsulated in "". They can't have * chars, possible other as well.
How common are Japanese titles in the metadata? I wouldn't imagine many, because .m3u doesn't support it. That's precisely why .m3u8 exists (which I have to use now or a few sets simply won't load; the filename is all screwed up).
@Knurek: Thanks for your guidance on the format. So far, I haven't found any shift-JIS song names (but I haven't searched very extensively).
One other thing I'm curious about: Game Boy vs. Super Game Boy vs. Game Boy Color. The latter 2 seem to be playable using regular .gbs file playback engines. Are there any gotchas to watch out for here?
@Hotcakes: I admit I'm very weak at non-ASCII character encodings (something I have been working hard at rectifying thanks to this project). So I want to clarify: the .m3u file, named DMG-CWJ.m3u, from Castlevania II: Belmont's Revenge contains a Japanese title encoded using Unicode. The character ド is encoded as 0xE3 0x83 0x89, which conforms to my limited Unicode understanding. Should the file have a .m3u8 extension instead?
Heh. Yeah, that's my bad. I was the one who tagged CV2:BR that way. I had easy access to the Japanese titles, so I figured that for the sake of completionism, I'd put them in.
The comedy of errors that followed included all manner of slack-jawed amazement that in this day and age, Unicode handling is still an awkward, spotty thing. I knew that M3U wasn't meant for Unicode (I don't even think it's meant for a dog's breakfast, but I'm snotty like that) but I recall having much trouble getting .M3U8 to work, also. At the end of the day, I ended up finding something that squeaked by with my configuration and I fired it off into the void, hoping it wouldn't break too badly elsewhere.
Could be worse. I wanted to do a lot more tagging, but my days have been filled with other pursuits. Imagine me polluting the rest of the collection that way!
@Electric Keet: Thanks for the data. You're right, .m3u has problems. I guess it solved the problem at one point. But then it was stretched way beyond its original capabilities. Actually, I believe that the main problem is the way that so many of these music playback engines had to be forced into the rather limited Winamp model.
Still, I'm thinking of extracting the entire comment text and adding it to my site's database so the search engine will index it all.
As for data acceptance, my Python script doesn't seem to have any trouble reading those characters and then properly serializing them into Unicode for the database.
Some of the rips marked as Game Boy Color (not *all* of them though) will need to be played in GBC mode (ie, CPU running at 8 MHz, not standard Game Bou 4 MHz). I believe that there's a byte in the header that tells you to do so.