> 1. The value of a pointer to an object whose lifetime has ended remains the same as it was when the object was alive.
This does not help anyone; making this behavior defined is stupid, because it prevents debugging tools from identifying uses of these pointers as early as possible. In practice, existing C compilers do behave like this anyway: though any use of the pointer (not merely dereferencing use) is undefined behavior, in practice, copying the value around does work.
> 2. Signed integer overflow results in two’s complement wrapping behavior at the bitwidth of the promoted type.
This seems like a reasonable request since only museum machines do not use two's complement. However, by making this programming error defined, you interfere with the abilty to diagnose it. C becomes friendly in the sense that assembly language is friendly: things that are not necessarily correct have a defined behavior. The problem is that then people write code which depends on this. Then when they do want overflow trapping, they will have to deal with reams of false positives.
The solution is to have a declarative mechanism in the language whereby you can say "in this block of code, please trap overflows at run time (or even compile time if possible); in this other block, give me two's comp wraparound semantics".
> 3. Shift by negative or shift-past-bitwidth produces an unspecified result.
This is just word semantics. Undefined behavior, unspecified: it spells nonportable. Unspecified behavior may seem better because it must not fail. But, by the same token, it won't be diagnosed either.
A friendly C should remove all gratuitous undefined behaviors, like ambiguous evaluation orders. And diagnose as many of the remaining ones which are possible: especially those which are errors.
Not all undefined behaviors are errors. Undefined behavior is required so that implementations can extend the language locally (in a conforming way).
One interpretation of ISO C is that calling a nonstandard function is undefined behavior. The standard doesn't describe what happens, no diagnostic is required, and the range of possibilities is very broad. If you put "extern int foo()" into a program and call it, you may get a diagnostic like "unresolved symbol foo". Or a run-time crash (because there is an external foo in the platform, but it's actually a character string!) Or you may get the expected behavior.
> 4. Reading from an invalid pointer either traps or produces an unspecified value. In particular, all but the most arcane hardware platforms can produce a trap when dereferencing a null pointer, and the compiler should preserve this behavior.
The claim here is false. Firstly, even common platforms like Linux do not actually trap null pointers. They trap accesses to an unmapped page at address zero. That page is often as small as 4096 bytes. So a null dereference like ptr[i] or ptr->memb where the displacement goes beyond the page may not actually be trapped.
Reading from invalid pointers already has the de facto behavior of reading an unspecified value or else trapping. The standard makes it formally undefined, though, and this only helps: it allows advanced debugging tools to diagnose invalid pointers. We can run our program under Valgrind, for instance, while the execution model of that program remains conforming to C. We cannot valgrind the program if invalid pointers dereference to an unspecified value, and programs depend on that; we then have reams of false positives and have to deal with generating tedious suppressions.
> 5. Division-related overflows either produce an unspecified result or else a machine-specific trap occurs.
Same problem again, and this is already the actual behavior: possibilities like "demons fly out of your nose" does not happen in practice.
The friendly thing is to diagnose this, always.
Carrying on with a garbage result is anything but friendly.
> It is permissible to compute out-of-bounds pointer values including performing pointer arithmetic on the null pointer.
Arithmetic on null works on numerous compilers already, which use it to implement the offsetof macro.
> memcpy() is implemented by memmove().
This is reasonable. The danger in memcpy not supporting overlapped copies is not worth the microoptimization. Any program whose performance is tied to that of memcpy is badly designed anyway. For instance if a TCP stack were to double in performance due to using a faster memcpy, we would strongly suspect that it does too much copying.
> The compiler is granted no additional optimization power when it is able to infer that a pointer is invalid.
That's not really how it works. The compiler assumes that your pointers are valid and proceeds accordingly. For instance, aliasing rules tell it that an "int *" pointer cannot be aimed at an object of type "double", so when that pointer is used to write a value, objects of type double can be assumed to be unaffected.
C compilers do not look for rule violations as an excuse to optimize more deeply, they generally look for opportunities based on the rules having been followed.
> When a non-void function returns without returning a value, an unspecified result is returned to the caller.
This just brings us back to K&R C before there was an ANSI
standard. If functions can fall off the end without returning a value, and this is not undefined, then again, the language implementation is robbed of the power to diagnose it (while remaining conforming). Come on, C++ has fixed this problem, just look at how it's done! For this kind of undefined behavior which is erroneous, it is better to require diagnosis, rather than to sweep it under the carpet by turning it into unspecified behavior. Again, silently carrying on with an unspecified value is not friendly. Even if the behavior is not classified as "undefined", the value is nonportable garbage.
It would be better to specify it a zero than leave it unspecified: falling out of a function that returns a pointer causes it to return null, out of a function that returns a number causes it to return 0 or 0.0 in that type, out of a function that returns a struct, a zero-initialized struct, and so on.
Predictable and portable results are more friendly than nonportable, unspecified, garbage results.
I believe you have confused undefined behavior and implementation defined behavior. Undefined behavior means that the code is not legal, and if the compiler encounters it, it is allowed to eliminate it, and all consequential code. (The linked posts and papers have lots of examples of this.)
Implementation defined behavior means that the code is legal, but the compiler has freedom to decide what to do. It, however, is not allowed to eliminate it.
After 20 years of comp.lang.c participation, it's unlikely that I'm confusing UB and IB.
Undefined behavior doesn't state anything bout legality; only that the ISO C standard (whichever version applies) doesn't provide a requirement on what the behavior should be.
Firstly, that doesn't mean there doesn't exist any requirement; implementations are not only written to ISO C requirements and none other. ISO C requirements are only one ingredient.
Secondly, compilers which play games like what you describe are not being earnestly implemented. If a compiler detects undefined behavior it should either diagnose it or provide a documented extension. Any other response is irresponsible.
The unpredictable actual behaviors should arise only when the situation was ignored. In fact it may be outright nonconforming.
The standard says:
"Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message)."
A possible interpretation of the above statement is that the unpredictable results occur only if the undefined behavior is ignored (i.e. not detected).
If some weird optimizations are based on the presence of undefined behavior, they are in essence extensions, and must be documented. This is because the situation is being exploited rather than ignored, and the program isn't being terminated with or without a diagnostic. That leaves "behaving in a documented manner".
But optimizing based on explicitly detecting undefined behavior is not a legitimate extension. It is simply insane, because the undefined behavior is not actually being defined. There is no extension there, per se. Optimization requires abstract semantics, but in this situation, there aren't any; the implementation is taking C that has no (ISO standard) meaning, it is not giving it any meaning, and yet, it is trying to make the meaningless program go faster. Doing all this without issuing a diagnostic is criminal.
I don't think the GCC people are really doing this; only people think that. Rather, they are writing optimizations which assume that behavior is not undefined, which is completely different. The potential there is to be over-zealous: to forget that GCC is expected to be consistent from release to release: that it preserves its set of documented extensions, and even some of its undocumented behaviors. Not every behavior in GCC that is not documented is necessarily a fluke. Maybe it was intentional, but failed to be documented properly.
Compiler developers must cooperate with their community of users. If 90% of the users are relying on some undocumented feature, the compiler people must be prepared to make a compromises. Firstly, revert any change which breaks it, and then, having learned about it, try avoid breaking it. Secondly, explore and discuss this behavior to see how reliable it really is (or under what conditions). See whether it can be bullet-proofed and documented. Failing that, see if it can be detected and diagnosed. If such a behavior can be detected and diagnosed, then it can be staged through planned obsolescence: for a few compiler releases, there is an diagnostic, but it keeps working. Then it stops working, and the diagnostic can change to a different one, which merely flags the error.
But optimizing based on explicitly detecting undefined behavior is not a legitimate extension. It is simply insane, because the undefined behavior is not actually being defined.
The authors of this proposal agree, and are trying to avoid such situations by just eliminating undefined behavior. Their blog posts and academic papers (linked from the blog post in this submission) have many examples where such insanity has happened.
This does not help anyone; making this behavior defined is stupid, because it prevents debugging tools from identifying uses of these pointers as early as possible. In practice, existing C compilers do behave like this anyway: though any use of the pointer (not merely dereferencing use) is undefined behavior, in practice, copying the value around does work.
> 2. Signed integer overflow results in two’s complement wrapping behavior at the bitwidth of the promoted type.
This seems like a reasonable request since only museum machines do not use two's complement. However, by making this programming error defined, you interfere with the abilty to diagnose it. C becomes friendly in the sense that assembly language is friendly: things that are not necessarily correct have a defined behavior. The problem is that then people write code which depends on this. Then when they do want overflow trapping, they will have to deal with reams of false positives.
The solution is to have a declarative mechanism in the language whereby you can say "in this block of code, please trap overflows at run time (or even compile time if possible); in this other block, give me two's comp wraparound semantics".
> 3. Shift by negative or shift-past-bitwidth produces an unspecified result.
This is just word semantics. Undefined behavior, unspecified: it spells nonportable. Unspecified behavior may seem better because it must not fail. But, by the same token, it won't be diagnosed either.
A friendly C should remove all gratuitous undefined behaviors, like ambiguous evaluation orders. And diagnose as many of the remaining ones which are possible: especially those which are errors.
Not all undefined behaviors are errors. Undefined behavior is required so that implementations can extend the language locally (in a conforming way).
One interpretation of ISO C is that calling a nonstandard function is undefined behavior. The standard doesn't describe what happens, no diagnostic is required, and the range of possibilities is very broad. If you put "extern int foo()" into a program and call it, you may get a diagnostic like "unresolved symbol foo". Or a run-time crash (because there is an external foo in the platform, but it's actually a character string!) Or you may get the expected behavior.
> 4. Reading from an invalid pointer either traps or produces an unspecified value. In particular, all but the most arcane hardware platforms can produce a trap when dereferencing a null pointer, and the compiler should preserve this behavior.
The claim here is false. Firstly, even common platforms like Linux do not actually trap null pointers. They trap accesses to an unmapped page at address zero. That page is often as small as 4096 bytes. So a null dereference like ptr[i] or ptr->memb where the displacement goes beyond the page may not actually be trapped.
Reading from invalid pointers already has the de facto behavior of reading an unspecified value or else trapping. The standard makes it formally undefined, though, and this only helps: it allows advanced debugging tools to diagnose invalid pointers. We can run our program under Valgrind, for instance, while the execution model of that program remains conforming to C. We cannot valgrind the program if invalid pointers dereference to an unspecified value, and programs depend on that; we then have reams of false positives and have to deal with generating tedious suppressions.
> 5. Division-related overflows either produce an unspecified result or else a machine-specific trap occurs.
Same problem again, and this is already the actual behavior: possibilities like "demons fly out of your nose" does not happen in practice.
The friendly thing is to diagnose this, always.
Carrying on with a garbage result is anything but friendly.
> It is permissible to compute out-of-bounds pointer values including performing pointer arithmetic on the null pointer.
Arithmetic on null works on numerous compilers already, which use it to implement the offsetof macro.
> memcpy() is implemented by memmove().
This is reasonable. The danger in memcpy not supporting overlapped copies is not worth the microoptimization. Any program whose performance is tied to that of memcpy is badly designed anyway. For instance if a TCP stack were to double in performance due to using a faster memcpy, we would strongly suspect that it does too much copying.
> The compiler is granted no additional optimization power when it is able to infer that a pointer is invalid.
That's not really how it works. The compiler assumes that your pointers are valid and proceeds accordingly. For instance, aliasing rules tell it that an "int *" pointer cannot be aimed at an object of type "double", so when that pointer is used to write a value, objects of type double can be assumed to be unaffected.
C compilers do not look for rule violations as an excuse to optimize more deeply, they generally look for opportunities based on the rules having been followed.
> When a non-void function returns without returning a value, an unspecified result is returned to the caller.
This just brings us back to K&R C before there was an ANSI standard. If functions can fall off the end without returning a value, and this is not undefined, then again, the language implementation is robbed of the power to diagnose it (while remaining conforming). Come on, C++ has fixed this problem, just look at how it's done! For this kind of undefined behavior which is erroneous, it is better to require diagnosis, rather than to sweep it under the carpet by turning it into unspecified behavior. Again, silently carrying on with an unspecified value is not friendly. Even if the behavior is not classified as "undefined", the value is nonportable garbage.
It would be better to specify it a zero than leave it unspecified: falling out of a function that returns a pointer causes it to return null, out of a function that returns a number causes it to return 0 or 0.0 in that type, out of a function that returns a struct, a zero-initialized struct, and so on.
Predictable and portable results are more friendly than nonportable, unspecified, garbage results.