That's because y is never used, so why should the pointer be dereferenced? If you call bar(y) instead of just bar(), "x86-64 clang 3.9.1" with -O3 does the load as well (but after the check)
foo(int*): # @foo(int*)
test rdi, rdi
je .LBB0_1
mov edi, dword ptr [rdi]
jmp bar(int) # TAILCALL
.LBB0_1:
ret
Only GCC does the kind of aggressive optimization the article mentions (and might need to be tamed by -fno-delete-null-pointer-checks).
Ha... yes, a good point. That's a reasonable reason not to generate the load, and stupid me for not noticing. What's also dumb is that my eyes just glossed over the ud2 instruction that both put in main too. The program (not unreasonably) won't even run properly anyway.
gcc does seem to be keener than clang to chop bits out - I think I prefer clang here. But let's see how I feel if I encounter this in anger in a non-toy example ;) I must say I'm still a bit suspicious, but I can't really argue that this behaviour is especially surprising here, or difficult to explain.