Funny - I remember when one of our new Engineers needed a memory-copy method for our ARM embedded solution - he went to Linux source and got some library routine.
It faulted when I used it the 1st time. Fixed the bug (alignment of source), ran again and it faulted again.
So I spent 10 minutes writing a test - move 0-128 bytes from source buffer offset 0-128 to destination buffer offset 0-128. Simple, overkill right?
11 bugs later the damned memory copy thing worked. 11.
The next thing to ask is, What did I learn from that bug? What I learned is, accept NO CODE as bug-free, no matter the source, no matter what authoritative base it came from.
Other learning: why oh why don't CPU designers put a damned memory-copy instruction into the machine? We all need it, all the time, for every project and we all hack something together that works until it doesn't. Sigh.
You can't express memcpy in hardware any more efficiently than you can in C because of the way memory controllers work. It'd end up being microcoded, and ARM can't afford that for the same reason it can't afford unaligned access.
I think x86 does have a microcoded memcpy (rep stos) but efficiency varies.
You mention the memory controller; that's probably where the logic belongs, not on the processor. So the microcode would come down to "ask mc to move; wait for completion"
That's not actually any better speed-wise. And the CPU would still have to microcode the copy because of caches (think of what involvement the memory controller has in doing a cache to cache copy)
The real gain in being able to have the memory controller do a memcpy() independent of the CPU would be to let the CPU operate on data out of its caches in parallel to the memcpy() being executed. But that only helps for a very specific class of memcpy() and is highly system dependent (you have to worry about the expense of keeping caches coherent among other things.) Anyway, an integrated GPU or other additional block of hardware behind the memory controller is a better candidate for this sort of thing than a user-level CPU instruction.
> why oh why don't CPU designers put a damned memory-copy instruction into the machine?
x86 has had REP MOVSB since forever, complete with a directional flag so you can handle the cases where the source and destination regions overlap. But it went out of favor since for a while from 80386 to early Pentium processors (when Linux was written), REP MOVSx was slower than writing an explicit memcopy loop.
That said, such an instruction would seem to go against RISC philosophies, where you want your operations to be small and atomic and predictable in terms of time and resource consumption.
Right! Foolish programmer! Using REP MOVSB has been broken since about the 2nd issuance of the processor. Dumb folks (read: DOS) used it as a timing loop to calibrate interrupt timers, complained when it got faster and broke their code so Intel 'dumbed it down' til its about the worst way to move memory you could try.
So you say it works again. Cool!
Maybe what we really need is some sort of 'architecture library' that compilers resort to for things like this. Maybe an instruction, maybe a routine, but guaranteed to work for every wrinkle in the architecture.
Because if its not in the compiler, folks will continue to cobble together buggy code of their own, with only a vague idea of the vast architecture landscape they are navigating blindly.
It faulted when I used it the 1st time. Fixed the bug (alignment of source), ran again and it faulted again.
So I spent 10 minutes writing a test - move 0-128 bytes from source buffer offset 0-128 to destination buffer offset 0-128. Simple, overkill right?
11 bugs later the damned memory copy thing worked. 11.
The next thing to ask is, What did I learn from that bug? What I learned is, accept NO CODE as bug-free, no matter the source, no matter what authoritative base it came from.
Other learning: why oh why don't CPU designers put a damned memory-copy instruction into the machine? We all need it, all the time, for every project and we all hack something together that works until it doesn't. Sigh.