USF player library by haspor at 5:36 PM EDT on October 2, 2013
I'm looking for a library to play USF/MINIUSF files on Unix/Linux. Anyone know a proper candidate for it? It would be for an Android application i'm developing (free app).
USF is unlikely to be playable on ARM, as the existing USF player/emulator library, lazyusf, makes extensive use of x86 32 bit recompilation for both R4300i and RSP emulation. Each core's C interpreter version is an order of magnitude of four times as slow as its recompiling version, making a full recompiler 16 times as slow on x86.
Under my own testing, the full interpreter mode barely pulls real time on my old Athlon 64 3200+ machine. Just imagine how slow that will be running on ARM.
You can always try, but there's no guarantee that it will work properly.
The emulator loops could probably use some major idle detection to improve their playback speed.
Still, you have to remember, N64 games rendered their music in software using the processors available. The RSP did support vectorized math (add/subtract/multiply/divide/etc operations on four floating point numbers in a single opcode) which the recompiler turns into SSE math opcodes. You'd have to look into converting that to something that will work on Android platforms.
The API is quite simple. You couple it with a PSF loader, such as my psflib C library.
Basically, with psflib, you draft up file a file access system. I advise looking in here, specifically the parts involving usf_loader_state.
You would use psf_load with no loader functions, or possibly just the tag loader function for metadata, and a type of 0, and it will return either -1 on failure, or the type of the PSF requested. When you wish to load a specific PSF type, I suggest passing that type to the function so it verifies the types of all files opened.
You instantiate the loader state structure, and assign a newly allocated usf emulator library to it, and call the psf_load with both the loader and loader metadata callbacks. The loader uploads the reserved sections to the emulator in the correct order, and the info function retrieves the two supported emulation behavior override tags and sets the correct members of the loader structure.
Once loading has completed properly, you use those two tag property variables in the structure to call two setter functions on the lazyusf instance.
Then you call usf_render to produce actual sample data, and optionally retrieve the last DAC sample rate. If your player needs to know the sample rate at startup, it's safe to call this function once, store the sample rate, and save the first block of samples for later. With many USF sets, loading the emulator up and calling for a single block of samples is quite rapid. For a number of others, like Animal Forest or Neon Genesis Evangelion, it's a bit slow, so maybe advisable to only bother when starting playback.
Before disposing of the allocated memory block, you need to call usf_shutdown on it, to free the mmap'd block of memory used by the emulator core.
I would like to try to make it so that all memory is allocated against the initial state structure, but alas, that doesn't seem as easy as I thought. When I tried this, certain structures would overlap other parts of the structure.
I also use an offset at the head of the base memory block, initialized by usf_clear, which automatically aligns the internal lazyusf state structure on a page boundary. Paragraph alignment is all it really needs, for the RSP register and temporary structures, for aligned access by SSE2 opcodes, if you enable them on x86/x86_64 platforms. It may help a bit with auto vectorization on non-x86 platforms, though.
As with the stock Iconoclast RSP, defining ARCH_MIN_SSE2 for just rsp.c will enable use of SSE2 intrinsic functions, while defining ARCH_MIN_SSSE3 will use both. Do not use both preprocessor definitions at once, because there are several places in the code in which the SSE2 macro takes precedence over the SSSE3 macro. Neither apply to non-x86 architectures, unless there's some compiler which can produce ARM or MIPS compatible SIMD code from Intel intrinsic functions.
EDIT: I chose to use reserved upload function, like Neill's libraries, instead of bundling PSF loading into every player. Since PSF parsing is so simple, I figured a simple loader library would benefit any PSF player. Check out that linked Obj-C++ source code for examples of loading every xSF format currently in existence, and within my Github or Bitbucket accounts, all of the libraries necessary to interface as the player cores for those formats.
edited 8:05 PM EST February 15, 2014
Okay, the API is in usf.h. usf_internal.h is the internal structure required by the entire library, which I didn't see fit to split up into components because I am lazy.
All of the public functions in usf.h are fairly self-explanatory, except for the upload function's return value. It's like most POSIX library functions, returning -1 on error and 0 on success. The only case where it really fails is either if it can't allocate the memory, or if the data is invalid.
Hmm, I need to add more error checking, since the AllocateMemory internal function from the original code can return error states on failure.
It also needs some better means of allocating the majority of the core memory, since it uses mmap, and would probably need a generic memory allocation and mapping function set so it can be reused on Windows, for instance. Something based on VirtualAlloc/VirtualFree, like the original Win32 code it's based on.
I suggest you pull my latest version as-is and not try to make any further modifications. If you still need any modifications, let me know.
I've eliminated all of the internal headers from the public interface header. That should fix all odd warnings or errors importing it into other projects.
As for missing usf_state_t in any C file, that should not happen. If it does happen, it means that particular C file is missing an include reference for usf_internal.h below all of its other imports.
Pre-rendering 5 seconds for silence detection, and the interpreter cores are not quite as fast as the original dynamic recompiler cores. They're certainly more stable now, though.
Turn off start and end of track silence detection if you want slightly faster startup.
Update for haspor, for his attempt at getting this to work on Android.
rsp/su.h:
#ifdef ANDROID // was: #if (0) #define MASK_SA(sa) (sa & 31) /* Force masking in software. */ #else #define MASK_SA(sa) (sa) /* Let hardware architecture do the mask for us. */ #endif
Unfortunately, it's way too slow to run on my Nexus 4. Looks like we may need to enlist the help of a third party, possibly Iconoclast, if he feels up to it, to hand optimize the vector unit for Neon intrinsic functions.
Disregard the part about it being too slow. GCC doesn't auto vectorize for Neon with just -mfpu=neon, it also needs -funsafe-math-optimizations. It has some silly notion of not wanting to auto vectorize due to potential loss of precision, which doesn't matter in this case.
i was checking those rsp codes yesterday and noticed that it doesnt use neon stuff at all even if the arm_neon.h is included. those few sse2 instructions needs to be converted to neon. not impossible but slow process. better go function by function and see how it goes. also dont use debug build of those native libs, they make it more heavy, try with ndk_debug=0
HLE audio implemented from Mupen64, which significantly speeds up playback and makes Mystical Ninja: Starring Goemon playable on my Nexus 4. Unfortunately, it is also missing features, like CBFD's resonant filters, and random samples in Goldeneye 007.
i already converted three functions, some are annoyng to convert since those sse2 commands are not 1:1 to neon. i think the most important part is in the shuffle.h file, if i can convert the sse3 version of the SHUFFLE_VECTOR , that should help alot
Hey guys, maybe I'm a little slow, but I haven't been able to get any audio out of Droidsound+LazyUSF using the prebuilt stuff on droidmjt/Droidsound or building it myself or trying to pull in kode54/lazyusf on either my phone (running Android 4.2.2) or the emulator.
Dug into the debugger a bit today and it looked like things were running ok... but just silence. So maybe something high level if lucky?
Testing with Super Mario 64 and Space Station Silicon Valley, so there shouldn't be anything out of the ordinary going on, they work fine in kode's foobar2000 plugin.
btw otherwise the improved Droidsound is working great, awesome!
yeah the LazyUSF is not functioning in it. I can commit the latest codes/.apk that plays some tunes but it's still way too heavy. Currently i'm vectorizing some codes in it to make it hopefully run faster. It's very slow process.
I have tried to convert that to NEON for one week now, no luck. I had earlier different values and it worked but with these values it just won't multiply them correctly. I have tried every multiply function, I don't know what I'm doing wrong.
I know this is prolly not the correct place to post stuff like this but just a heads up of my current situation. Without proper NEON support for LazyUSF, there will never be USF plugin in Droidsound.
All of my xSF libraries are capable of building on ARM, although LazyUSF will require you to use HLE audio in many cases, as ARM is not quite powerful enough for full RSP emulation.
007 The World is Not Enough (silent) Blast Corps (silent) Body Harvest (Causes Droidsound to crash, actually) Doubutsu no Mori (Causes Droidsound to crash as well) Gauntlet Legends (silent) Harvest Moon 64 (Crash) Mace - The Dark Age (Crash) Mario Party (Crash) Nintama Rantarou 64 Game Gallery (Crash) San Francisco Rush 2049 (silent) Turok Rage Wars (sparse32 only Crash) Yoshi's Story (silent)
Edit: neglected to see your latest post as I had this tab open from a few days ago, so you already identified the issue with Banjo Tooie. Sorry bout that.
I found the reason for the link errors before with memory.h. You do not need to include the lazyusf folder in the C include paths list, as that overrides what gets imported by <includes>, so stdlib.h is pulling in lazyusf's memory.h instead of the system one. Also, types.h doesn't appear to exist as a system-wide include, at least in the SDK I have, so that was getting pulled in from the lazyusf SDK as well. I struck that out.
Then I removed the usf_start() call, and changed the sample rate function to call usf_render with null pointer and for zero samples, which is a safe call by design, so long as you're not accessing the same instance from multiple threads at the same time.