Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am not sure if it is a good idea to compile code targeted to modern processors to 8-bit CPUs like 6502. For example:

Languages like C (or Rust) allocate variables on the stack because it is cheap with modern CPUs, but 8-bit CPUs don't have addressing modes to access them easily. (by the way, some modern CPUs like ARM also cannot add a register to a variable on the stack).

The solution is not to use the stack for variables and instead use zero-page locations. As there are only 256 zero-page bytes, same locations should be reused for variables in different functions. This cannot be used with recursive functions, but such code is ineffecient anyway so it is better not to use them at all and use loops instead.

Another thing is heap and closures (that allocate variables on the heap). Instead of heap the code for 8-bit CPUs should use static allocation.

The article contains an example of 6502 code compiled from Rust and this code is inefficient. It uses too much locations for variables (rc6-rc39) and it wastes time saving and restoring those locations in prologue/epilogue.

No wonder that programs run slowly. It would be much better to compile CHIP-8 directly to 6502 assembly.



Most of the inoptimality in the article isn't due to the issues you've raised, but rather due to us just starting to optimize LLVM-MOS.

First, I have utterly no idea why there are so many calls to memset; it looks like it's unrolling a loop or something... poorly. It also doesn't seem to be reusing registers when setting up the calls; that's also bad and should be fixed.

Second, if you take a look at the actual structure of the prologue and epilogue, you might notice that it's copying zero page to an absolute memory region called __clear_screen_sstk. This is because LLVM-MOS ran a whole-program analysis on the program and proved that at most one activation of that function could occur at any given time. Thus, it's "stack frame" was automatically allocated statically as a global array, not relative to a moving stack pointer.

The reason that the prologue and epilogue spends so much time copying in and out of the zero page is just that we haven't taught LLVM-MOS how to access the stack directly, but there's no technical obstacle to doing so. Once that's done, the whole body of the function would operate on __clear_screen_sstk directly, and the prologue and epilogue would disappear completely.

Of course, from the first point, you shouldn't need any stack locations to do the body of this routine; there's a big ball of yarn here, but pulling on any of a number of threads would unravel it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: