Let's wait for robinsonb5 to comment on available CPU/MCU time of his menu engine,
The 832 CPU and control module is currently clocked at 50MHz, and manages a shade under 7 DMIPS. It's deliberately clocked at well under its theoretical maximum speed to avoid adding timing pressure to the guest cores.
(The goal with the CPU was always to be small, both in terms of logic and code density, rather than to be fast. Because of that, it's not well suited to decompression workloads. Shifting, in particular, is slow - the usual optimisation of replacing multiplies with shifts applies in reverse here!)
Currently the host CPU has "custody" of the SD card, so it's responsible for loading the ROM at bootup and for handing data on-request to the disk emulation. It's also responsible for passing joystick events to the core. With some cores it handles keyboard events, too - but MiSTery has enough provision for PS/2 devices that I've given it direct access to the physical keyboard and mouse.
My goal with most of the MiST cores I've ported to TC64 has been to do it as non-invasively as possible - to avoid modifying the upstream core any more than strictly necessary, so that it's easier to keep in sync with upstream changes, and so it's possible to merge the codebases in the longer term. (Gyurco and I have already collaborated closely with both Minimig and TurboGrafx16, and both of those now have a shared repo for MiST and Chameleon64.)
Making the DeMiSTify wrapper as transparent as possible to the guest core means keeping the firmware as small and streamlined as it can be, since a lot of the cores use up most of the FPGA's block RAM.
For most of the cores so far the firmware, working RAM and stack have fit within just 12k. (TurboGrafx16 needed 24k because of the CD image support and bin/cue parsing.) The port of MiSTery currently needs a shade under 14k - but that's likely to be nearer 20 once hard drive images and the C64 keyboard are supported.
There's currently enough free block RAM in this core for the firmware to grow to 30k so there's plenty of headroom at the moment. (I'm reluctant to extend into SDRAM since it means more invasive changes and slower execution - but it's nonetheless possible: Minimig already does it that way, and I believe it will be necessary for the Archimedes core.)
As for a new floppy image format, provided there's a straightforward way of mapping from track and sector to file offset I don't see it presenting too many difficulties for the firmware. If large decompression buffers or index tables are needed then it'd be a different story.