Next Page

cracking the .AUDIO_DATA file format by elburg at 2:15 PM EDT on April 13, 2022
it seems the the latest Lego game by traveller's tales contains almost all of it's music files in .AUDIO_DATA

however, after looking at a text dump of the file, it appears that it uses OGG Vorbis for audio data.

below is a screenshot of an example .AUDIO_DATA file, and an example .OGG file:
(image here was removed... look at next post)

I think decoding this format with vgmstream probably wouldn't be too hard to do.

edited 12:28 PM EDT April 14, 2022
actual image: by elburg at 2:19 PM EDT on April 13, 2022
crap the image failed, here's the actual screenshot

UPDATE! (also another update) by elburg at 12:03 PM EDT on April 14, 2022
after removing all the data before "OggS", i can confirm that there is a custom header in play.

i don't know if there is looping info in the headers, but i now know that vorbis is the audio codec in use.

UPDATE!
after investigating with a hex editor, it seems that there is no fixed offset location where the vorbis data starts.

also i tried to make the files work by using TXTH... it didn't work :(

edited 12:27 PM EDT April 14, 2022
by almendaz at 1:18 PM EDT on April 14, 2022
Could you try to upload binary dump (i.e. the "hex 2-digits" from Hex editor) of those .ogg? First 0x1000 bytes (4096 '2-digits') might be enough.
Strange that .TXTH did not help you, did you specify start_offset?

edited 1:19 PM EDT April 14, 2022
links by elburg at 2:26 PM EDT on April 14, 2022
@almendaz, i don't think the first 0x1000 bytes would be enough, so I have uploaded the file and the TXTH file.
by almendaz at 4:07 AM EDT on April 15, 2022
Thanks for uploading sample files.
You are right! Source file has OggS streams in it, and also it would not have been enough to get information with only 0x1000 bytes.
File seems array of {FMT}{SEEK}{DATA} group of bytes.


Legend (All values are in Hexadecimal 0x#### ):
"TTTT" is string.
[xxxx_size] are (4-byte LE) size of bytes of following (sub-structure){ }.
[yyyy] are (in general) 4-byte values meaning 'yyyy'.
<zzzz> is description or meaning (what I could interpret).


"FMT " ("header") chunk/struct
-----------------------------
offset(relative) / data
-----------------------------
0    "FMT "
4    [header_data size]
(header_data)
{
8    10000
C    AC44 : <sample_rate>?
10    5A52A4
14    2002
18    35948
}
-----------------------------


"SEEK" ("table") chunk/struct
-----------------------------
offset(relative) / data
-----------------------------
0    "SEEK"
4    [table_data size]
(table_data)
{
8    0 : <"OggS" offset in (stream_data)>#0
C    [ ??? ]#0
10    3A : <"OggS" offset in (stream_data)>#1
14    [ ??? ]#1
18    FFB : <"OggS" offset in (stream_data)>#2
1C    [ ??? ]#2
20    20F4 : <"OggS" offset in (stream_data)>#3
24    [ ??? ]#3
28    335E : <"OggS" offset in (stream_data)>#4
2C    [ ??? ]#4
30    4430 : <"OggS" offset in (stream_data)>#5
34    [ ??? ]#5
...
}
-----------------------------


"DATA" ("stream") chunk/struct
-----------------------------
offset(relative) / data
-----------------------------
0    "DATA"
4    [stream_data size]
(stream_data)
{
8    "OggS" {...}    #0: 8 - 8 =0
42    "OggS" {...}    #1: 42 - 8 =3A
1003    "OggS" {...}    #2: 1003 - 8 =FFB
20FC    "OggS" {...}    #3: 20FC - 8 =20F4
3366    "OggS" {...}    #4: 3366 - 8 =335E
4438    "OggS" {...}    #5: 4438 - 8 =4430
54A1    "OggS" {...}
6765    "OggS" {...}
77C6    "OggS" {...}
880E    "OggS" {...}
988C    "OggS" {...}
A8CF    "OggS" {...}
BBE2    "OggS" {...}
CECA    "OggS" {...}
E14C    "OggS" {...}
F488    "OggS" {...}
10725    "OggS" {...}
119F3    "OggS" {...}
12D38    "OggS" {...}
14088    "OggS" {...}
1510E    "OggS" {...}
...
}


So in theory (I could not test in this PC), setting in .txth:
start_offset=0x211C
(AND NOT "06x2150") should make the data playable.
ffmpeg should read stream header before recognizing the format, so first "OggS" bytes must(?) be included (cannot confirm this).


edited 5:38 PM EDT April 15, 2022
thanks (and another problem) by elburg at 12:16 PM EDT on April 15, 2022
i will be sure to try this.


UPDATE

It was just as I feared: only the example file worked.

I was thinking of a potential git clone of vgmstream to implement this, but I am still learning c++ so I don't know how to do a proper edit.

edited 12:37 PM EDT April 15, 2022

edited 12:41 PM EDT April 15, 2022
by almendaz at 5:27 PM EDT on April 15, 2022
OK, let's try again. Second take!!

start_offset=0x211C
is static text.
As this offset is dependant on FMT/SEEK struct sizes, one cannot expect all "DATA" (OggS) structs (from .AUDIO_DATA source files) to be same size, as this "start_offset" assumes (because that's the only file I got - the "example file" as you call it).

The proper way to get "start_offset" values, is to get the proper "count" of those "OggS" string, i.e. to get FMT/SEEK struct sizes in advance, like this:
(All values are Hexadecimal)

(def.)
start_offset :=
sizeof("FMT ") + sizeof([header_data size]) + sizeof(header_data)
+ sizeof("SEEK") + sizeof([table_data size]) + sizeof(table_data)
+ sizeof("DATA") + sizeof([stream_data size])

= 4 + 4 + [header_data size]
+ 4 + 4 + [table_data size]
+ 4 + 4

Translated as
8 + [14] + 8 + [20F0] + 8

And here we have 2 variable offsets!
Let's try this first (or we would need ALL sample files to discard following assumption!)
Assume [header_data size] is constant i.e. {FMT} struct is always same size for any .AUDIO_DATA.
From example file, [header_data size] == 14
But leave [table_data size] as the variable, which with the above asumption, is always at offset 20. Vgmstream's TXTH uses @<offset>:EE$4 to get the [value] from this <offset>, with endianness EE and size 4 i.e. 4-bytes. See TXTH

So
8 + [14] + 8 + [20F0] + 8
= 8 + 14 + 8 + [20F0] + 8
= 24 + ($20) + 8
= 2C + ($20)

And so, for the .TXTH we would have to edit this line
start_offset=0x211C
into
start_offset=0x2C + @0x20:LE$4
Or, using default options, simply as
start_offset=0x2C + @0x20


edited 5:31 PM EDT April 15, 2022
by bnnm at 4:10 AM EDT on April 16, 2022
Somebody asked on discord before, this mostly works but needs adjustments for files without SEEK:


base_offset = 0x00
base_offset = @0x04 + 0x08 + base_offset #fmt
base_offset = @0x04 + 0x08 + base_offset #seek

subfile_size = @0x04 #chunk size
subfile_offset = base_offset + 0x08
subfile_extension = ogg


There are other files that have a "FRST", no idea what codec they use (some kind of ADPCM I think)
by almendaz at 10:53 AM EDT on April 16, 2022
Thanks for your input, bnnm
I did not figure this way of chaining input values.

Is there some way of including simple conditional statements in TXTH/TXTP, to make this process simpler? I can see the variables are already there, and audio source files are byte struct/arrays most of the time.

Example:
if (@0x0 == "FMT ") then base_offset = <expr1>
if (@0x0 == "FRST") then base_offset = <expr2>
etc.

And I do not think, hopefully, loops should be implemented - way beyond the scope of playback... I think.

Next Page
Go to Page 0 1

Search this thread

Show all threads

Reply to this thread:

User Name Tags:

bold: [b]bold[/b]
italics: [i]italics[/i]
emphasis: [em]emphasis[/em]
underline: [u]underline[/u]
small: [small]small[/small]
Link: [url=http://www.google.com]Link[/url]

[img=https://www.hcs64.com/images/mm1.png]
Password
Subject
Message

HCS Forum Index
Halley's Comet Software
forum source