Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Excuses are provided in http://blog.llvm.org/2011/05/what-every-c-programmer-should-... . But they are just that, excuses.

To summarize at least one of them, the compiler doesn't really see it as “detecting undefined behavior and optimizing accordingly”. It sees it as doing the right thing for all defined behaviors. The sort of imprecise analysis it does lead it to consider plenty of possible undefined behaviors, many of which cannot happen in real executions. It ignores these as a matter of fact, but reporting them would not tell the programmer anything it doesn't know, and would be perceived as noise.

On the example for (int i=1; i==0; i++) …, the compiler does not infer that i eventually overflows (undefined behavior). It infers that i is always positive, and thus that the condition is always false.



How about using statistics/machine learning and showing these spam warnings only when people want them? Yes, it is hard, but this is not an excuse!

Besides, fixing spammy warnings shouldn't be more difficult than fixing actual spam! I mean, common, with spam you have intelligent adversaries and compilers haven't reached that level. Not yet, anyway...


-Wi_want_undefined_behavior_spam_please_thanks


Overflows are not undefined. They are overflows. Maybe I want to overflow on purpose. Your for loop (int is signed) will complete assuming the body of the loop doesn't manipulate i, and given enough time.


pascal_cuoq is correct. Here's an excellent introduction to the subject:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-...


unsigned overflows are well defined, but signed ones are not


This has to be one of the most irritatingly pedantic aspects of C, as the vast majority of systems use 2's complement and so would overflow in the same way, but the compiler writers think it's an "opportunity for optimisation" and I think this ends up causing more trouble than the optimisations are worth. The only sane interpretation of something like

    if(x + 1 < x)
     ...
is an overflow check, but silently "optimising away" that code because of the assumption that signed integers will never overflow is just horribly hostile and absolutely idiotic behaviour in my opinion. A sensible and pragmatic way to fix this would be to update the standard to define signed overflow, and maybe add a macro that is defined only on non-2s-complement platforms.


There is one solution that would keep the crazy semantics of C, but would still allow for 2's complement arithmetic to be well-defined when one wants to.

C99 defines int8_t, if it exists, to be a 2's complement signed integer of exactly 8 bits. Same for 16, 32, etc. The standard could very well define behavior on overflow for these (that is, turn them into actual types instead of typedefs), and leave int, long, etc alone. I think this would be a viable, realistic, solution. Integer conversions would probably still be a pain, though.


That makes sense (sort of). Better to use unsigned if you are trying to do modular arithmetic.

Signed integers have some weirdness attached. The number that's one followed by all zeroes in binary (INT_MIN in limits.h) is defined as negative, because the sign bit is set. But, the rules for 2's complement arithmetic predict that -INT_MIN == INT_MIN. So it's not a normal number.


Similarly amusing, INT_MIN / -1 will throw a "division by zero" on Intel CPUs, even though there isn't a zero anywhere in sight. INT_MIN * -1 is fine, of course (according to the CPU, even if not the language spec).


INT_MIN / -1 works fine for me on amd64 with gcc 4.9. It produces INT_MIN, just as you would expect. INT_MIN * -1 is also INT_MIN.

  #include <limits.h>
  #include <stdio.h>
  
  void main(void) {
          int x = INT_MIN;
          printf("INT_MIN = %d\n", x);
          printf("INT_MIN * -1 = %d\n", x * -1);
          printf("INT_MIN / -1 = %d\n", x / -1);
  }


    $ clang --version
    Debian clang version 3.5-1~exp1 (trunk) (based on LLVM 3.5)
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    $ clang -o intmin_c -std=c99 intmin.c 
    $ ./intmin_c
         INT_MIN = -2147483648
    INT_MIN * -1 = -2147483648
    Floating point exception

    $ gcc --version
    gcc-4.7.real (Debian 4.7.2-5) 4.7.2
    (...)
    $ gcc -o intmin -std=c99 intmin.c 
    $ ./intmin
         INT_MIN = -2147483648
    INT_MIN * -1 = -2147483648
    INT_MIN / -1 = -2147483648

    $ tcc -v
    tcc version 0.9.25
    $ tcc -run intmin.c 
         INT_MIN = -2147483648
    INT_MIN * -1 = -2147483648
    Floating point exception

    $ cat intmin.c 
    #include <limits.h>
    #include <stdio.h>
      
    int main(void) { // Change from void to int for c99
      int x = INT_MIN;
      printf("     INT_MIN = %d\n", x);
      printf("INT_MIN * -1 = %d\n", x * -1);
      printf("INT_MIN / -1 = %d\n", x / -1);
      return 0; // return 0 - c99
    }
So, is this a gcc thing?


Your arithmetic is being optimized out by the compiler; https://ideone.com/KmTSUB crashes for example.


Looks like you're right (re my sibling comment above):

    $ gcc -std=c99 -S intmin.c -o intmin.s
    $ clang -std=c99 -S intmin.c -o intmin.sc
    $ grep -i div intmin.s*
    intmin.sc:      idivl   %esi
(No idivl in the gcc version)


This is a good reminder that "undefined behavior" includes "doing what I want it to do".

It's also a good example of how undefined behavior allows for optimizations. The compiler is able to evaluate your expression at compile time rather than emitting a division instruction, even though this changes how the program behaves.

C sure is fun.


Because it's undefined, the compiler is allowed to do anything: including giving the expected result.


In some programs I like to have something like this to catch potential bugs:

    int abs_s(int x) {
        assert(x != INT_MIN);
        return (x >= 0) ? x : -x;
    }


This assumes two's complement, though. With one's complement even `INT_MIN` would be fine to negate ;)

(On that note: Are there even machines left to write code for that don't use two's complement? Or 8 bits per byte?)


It has been mentioned on HN before that there are DSPs that use the same size for byte, short, and int, and that size is not 8 bits.


Pascal is the developer of Frama-C, a static analyser for C.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: