With a tower of MOVEM.L instructions and enough scratch registers you could get within a percent or two of theoretical bus speed for moving aligned data around. The ST used this in spots, and the memory manager and toolbox of the Macintosh used MOVEM extensively.