let's find out much compression of opcodes (not including their immediate operands) would reduce the code size of Super Mario Bros.
https://gist.github.com/notwa/1aba32c1dd5fa60862aa63e7b7665b6f
would you have paid the price of dedicated decoding hardware to fit 30% more code in your NES game? i wonder…
if my interest in this persists, i might try this with megaman 1. it's a bigger cart to begin with, so it might take a lot more work to trace all of its subroutines, or maybe i'd just source the disassembly going around.
quick and *very* non-exhaustive test shows MM only(?) has ~25% more code and compresses a couple percent better.
also neither of these games use more than 128 unique instructions, so you could just drop a bit and get 12.5% for free*.
* not including the LUT cost, but this is all theoretical anyway.
update: added battletoads to the mix, and stats split for opcodes and their addressing modes.
the answer is not quite 30%. i used Golomb coding here, but Huffman yields a very similar result.
also this code is kinda bleh (i even just spotted a var called 'asdf' and an off-by-one) but i don't feel like working on it anymore tonight.