If anyone is curious, on gcc 12.3.0 and clang 16.0.6 (x86_64), the answers are w...

Farmadupe · on Jan 20, 2024

The author is making a deliberate point about undefined behaviour in the article. Hence them not executing worked examples.

In fact, by not doing so they are making a subtle implicit statement that it is uninteresting to consider actually attempting to execute these snippets.

The third paragraph of the "P.S" of the article (you have to press submit to see it) is the one that really gives the game away.

ryao · on Jan 20, 2024

Most of these things are implementation defined rather than undefined. Only the 5th is undefined.

1000100_1000101 · on Jan 21, 2024

More than implementation defined, for some you need context that simply isn't given. On the ones with mixed-type structs, even if you know what system it's compiled for you don't know if someone has used pragma pack 1 to byte pack the data instead of standard packing. Just seeing the struct, you still don't know.

ryao · on Jan 21, 2024

Good point, although that is not part of standard C.

ksherlock · on Jan 21, 2024

'#pragma pack' isn't part of the C standard, but #pragma is and "causes the implementation to behave in an implementation-defined manner."

LatticeAnimal · on Jan 20, 2024

I agree that in theory it would be cool to have C code that uses only defined behavior and works on all platforms for all eternity. However, I think most programs have a fairly clear understanding of what platforms (OS+arch) they are targeting and what compilers they are using to target those platforms.

If the compiler has defined behavior (and you have unit tests for that behavior) on all of these platforms, I don't think it is a huge deal. (Ideally you wouldn't... but sometimes its an accident or unavoidable)

As an example, while struct padding (problem 1) might not technically be in the spec, it is a cornerstone of FFI and every new compiler (that supports C FFI) has a way to compile structs with the same padding.

To my original point, if the article had instead given examples of compilers + architectures that produced different answers, I might feel differently. However, just saying mentioning that these weird edge cases are undefined (in the spec) doesn't mean much to me.

ryao · on Jan 20, 2024

My answers for 2, 3 and 5 were different:

2) I thought the type would be promoted to short. It turns out that the result of the arithmetic operation is promoted to int.

3) The signness of char is platform dependent. It is signed on x86 and amd64, but unsigned everywhere else. After seeing my mistake, I would expect this to cause the answer to be -96 on amd64 from sign extension when it converted into an integer, yet it is 160, which is what I would have expected from a platform where char is signed. If anyone knows why it is 160 here, please let me know.

5) This is a classic. I know to answer I do not know because despite having an Operator Precedence, C famously says that this is undefined. I have no idea why the standard does this when there is a clearly right answer. Java for example makes this have only 1 right answer.

mark-r · on Jan 21, 2024

I decided to try #3 for myself. The results are interestingly inconsistent. If you cast a to int and print it, it comes out -96. But the shell reports it as 160. Godbolt compiler clearly shows it returning -96 (movsx should sign extend it) so I don't know what's happening. https://godbolt.org/z/9rxcnM3G3

mark-r · on Jan 21, 2024

Replying to myself. I did some digging and figured it out - the shell itself truncates the return value to an 8 bit unsigned number. If you have a simple program that consists of only "return -96;" the shell will still report a return value of 160.

erhaetherth · on Jan 21, 2024

Can you explain 3? A space is 20 IIRC, 2013 is 260. An unsigned char tops out at 255 but I guess this one is* signed so... that's 127. And then I have no idea what happens, some kind of overflow, but I don't know the wrapping rules.

defrost · on Jan 21, 2024

' ' in original C is the encoding for space.

This might be ASCII or EBCDIC or something else local to a specific hardware implementation.

https://en.wikipedia.org/wiki/EBCDIC

So, maybe 0x20, maybe 0x40, maybe something else.

At least you know that '0', '1', .., ''9' are contiguous.

mark-r · on Jan 21, 2024

I don't know if wrapping rules are defined by the standard or implementation defined. But the easiest thing for a compiler to implement is simple truncation. A space is 0x20 (32 decimal) in most C compilers, so multiplying it by 13 is 416. Truncating that to 8 bits, the size of char on most compilers, is 160 (0xa0). If char is signed, the upper bit being set will cause it to be a negative number -96. Promotion of the char to int won't change its value.

There are a huge number of assumptions in that simple chain of events, and if any of them are wrong you get a different answer.

jbeninger · on Jan 21, 2024

I made the same mistake having worked with URL encoding for so long. " " is 20...in hex.

defrost · on Jan 21, 2024

" " is a C string constant ... so a space encoding followed by a NUL encoding.

juunpp · on Jan 21, 2024

It's 0x20 like others pointed out.

The wrapping rule here is that signed integer overflow is UB.

defrost · on Jan 21, 2024

Assuming ASCII char encoding ... which isn't a given in C, just extremely commonplace.

ryao · on Jan 21, 2024

The ascii numbers being assigned to characters is a famous example of one C compiler passing on knowledge to a compiler it builds without ever having it specified in the source code. Given that, I am surprised to hear it ever is anything different.

defrost · on Jan 22, 2024

EBCDIC persisted later than many might expect - to 1990 in legacy hardened IBM System/360's used in air traffic and defence (branded as IBM 9020's IIRC).

Early C compiler projects (eg: The Hendrix Small-C of ~1982) would get patched by some to support the full C language and extended to cross compile to and from whatever machines were about at the time, System/360's, VAX, PDP's, early PC's, BBC micros, etc.

It wasn't always the case that char encoding passed on by default, there was always the option to insert a trans table whether compiling or dealing with data stored in not native form (similar to data in big end V little end).

ryao · on Jan 21, 2024

' ' is 0x20 or 32.

defrost · on Jan 21, 2024

or 0x40 .. or something else.

https://en.wikipedia.org/wiki/EBCDIC

not2b · on Jan 21, 2024

The case with multiple increments in an expression might produce different results depending on optimization level, perhaps not in this case but in other cases. That is because the compiler is allowed to use any order, so the order it picks might depend on what is in the registers.

erhaetherth · on Jan 21, 2024

5) How does 2 make sense? Shouldn't be 0 + 1? Or does the pre-increment take precedence over the addition, thus the left i is 1 but not because of the post-increment?

P-Nuts · on Jan 21, 2024

To get 2, there are (at least) a couple of ways it can happen, we can do i=0,i++ and get LHS=0, now i=1,++i and get RHS=2. Or we can do i=0,++i and get RHS=1, then i=1,i++ and get LHS=1.

However we’re also allowed to do something like this: i=0, a=i, b=i, b=b+1, RHS=b (RHS=1), LHS=a (LHS=0), a=a+1, i=a, i=b.

Probably quite a lot of other things are allowed to happen. Usual disclaimer that a standard-compliant compiler is allowed to vaporise your cat etc as part of UB.

The thing to Google is “sequence points”.