The whole point of the project was to go from "from what I've heard" to "I observe (and it's remarkable)". Write non-trivial code, see it run, look at the generated ASM.
You can git clone the project above in a Debian/Ubuntu or similar, do "make" and have a look at the .lst listing of generated assembly with opcode bytes.
I consistently observed that while z88dk produced extremely verbose code, with many steps back and forth and many extra push and pop, SDCC produced sane code.
Of course it helps a lot when you "const" all relevant variable, and "const type const (star)" all relevant pointers when applicable, when you use uint8_t instead of int. Else generated code is heavy because it pays the code of high genericity that is probably not necessary. Help the compiler and, if it's a good compiler, it will help you. SDCC is a good compiler in this regard. This is especially important in the Z80 context because registers are smaller than the default integer types.
I observed good code produced by SDCC. Too long to explain here but for example it allocates local variables to registers when applicable and code is somehow close to hand-written code (like in answers of the SO question).
For example, it replaces a memset() C function call with this code which fits what one SO answer calls "kind of code one writes where optimization does not matter", which z88dk could not do:
(you might have zeroed a and done ld (hl),a, or even ldir, but this is still correct code)
The next one below looks like it's easy. It is, if the compiler can figure out what is constant, and because you write sane C code. Garbage C code would yield garbage ASM code.
Line uint8_t column = 20 - ( sizeof( message ) - 1 ) / 2; below yields no code, zero bytes.
push de is the C ABI
Another Z80-specific consideration: Z80 is not made to access data off a stack. From the 8080 it added IX and IY index registers. You can `ld someregister, (ix-+127)` and family. They are somehow handy (somehow short code) but slow. SDCC uses this extensively.
Z80 is more comfortable with all-fixed-addresses code. In a function, making local variables static kills the reentrancy but then you can replace the slow `ld someregister, (ix-+127)` with faster and shorter `ld a,(someaddress)`. I observed 20% reduction in SDCC-generated code in a big algorithmic function, which is attributable both to Z80 constraints and ability of SDCC somehow navigate those decently. In complex functions, code generated by SDCC is not extremely nice, but it's decent. It'd good enough so that only the performance-critical parts have to be written in assembly.
On another side, SDCC supports most (all?) of C11. I've tested it, you can e.g. print(var) like you'd stdout << var in C++ and let the compiler do what you mean. Unusual in a microcontroller context.