I don't know if the OP's implementation uses arguments by copied values or not, but I will say that there's a fairly simple transformation that converts purely functional code into continuation-passing style (CPS), where instead of returning, you take a continuation, and pass what you would normally return, into the continuation like this:
(def avg (a b) (/ (+ a b) 2))
...becomes:
(def avg (a b cont) (+ a b (lambda (s) (/ s 2 cont))))
Note that in the converted code + and / also don't return values, instead calling the continuations they're passed on their results.
Pervasive CPS means that everything is a tail call, which means that you can tail call optimize (TCO) everything. The trivial implementation of TCO probably uses copy, but you can notably store values created in the current frame on the stack instead of the heap, and then pass by reference. Any unreferenced values can be removed, and the referenced values can be compacted within the frame to reduce stack space. However, in practice, it's faster to wait until you run out of stack space and then go back and compact everything in one go (this approach was pioneered by Chicken Scheme, I think). If you think through how this stack compaction might work, you'll quickly see that this is equivalent to a mark-sweep-compact garbage collector.
If your language is purely functional (Chicken Scheme isn't) then you have an additional property, which is that objects can only be referenced by objects higher on the stack. This means that if you start from the top of the stack, you can mark and sweep in a single pass, because when you arrive at a node, if it hasn't been marked by any of the objects higher in the stack, it's not referenced and can be swept. If your stack is stored in reverse order in memory (that is, the top of the stack is the earliest in memory) this mark/sweep pass is extremely cache friendly. You can also build up a backreference table during this pass, which means you only need one additional pass to compact. Again, the inverted stack is cache friendly, because even though you're passing over the stack in reverse order, backpatched references are likely to be local to the things they reference, which pulls objects into cache before they are referenced.
That's an absurdly long way to say I don't know the answer the question, but that's all to say that there are good reasons why an implementation of a heapless architecture might pass by reference rather than by copy.