Sure, that's an AVR platform where structs are packed and ints are 16 bits, and ...

dzaima · on Sept 17, 2023

But, thus, upon seeing the program, without additional information you cannot know whether it'll give 8 or 3 (or who knows what else), and thus "I don't know." is very much an appropriate answer. Of course, it might be more clear to say "it depends" or "there isn't enough information to answer", but I'd say "don't know" is nevertheless a correct answer. Definitely more correct than "8" at least.

"it's not doing anything weird" only to people who do already know the C standard inside out. But there is a significant amount of people who might not know everything C (or might know that there are some weird things about types, but still assume 'int' will be at least 4 bytes or something).

It's of course not a question of much practical impact (for anyone not working in embedded at least) but it's nevertheless one that can be at least interesting to some.

rwmj · on Sept 17, 2023

The only time you're writing sizeof(x) is when you want to know the size of 'x', eg. to store it somewhere else or zero out the memory or something of that sort. And it gives the right answer, great! It doesn't ever do something that's undefined or strange, and it's not an obscure part of the C language.

gjm11 · on Sept 17, 2023

It seems like you're responding to something that isn't there -- an accusation along the lines of "... and therefore C is a bad language" or "... and therefore sizeof is poorly designed" or "... and it's bad that the correct answer is 'I don't know'".

The author isn't, so far as I can tell, making any such claim.

He's claiming only this: many people who program in C (or C++, which in this particular respect is the same) think they know that sizeof(...) will be 8, or think they know that sizeof(...) will be 5, and all those people are wrong, because it could be either of those things or various other things too, and there are contexts in which you need to be aware that the assumptions you're inclined to make around this sort of code are wrong.

All of which is straightforwardly correct, so far as I can tell.

As the author says, the question is really about struct padding more than it's about sizeof. It most likely doesn't matter that much whether or not someone knows that sizeof(...) might not be 8 in this situation. But it might matter if, e.g., they read the docs for some binary file format and see that it looks like

  offset type   name
  0000   int    block_size
  0004   char   record_type
  0008   int    user_id
  000C   int    unit_id
  0010   double radiation_level
  0018   int    timestamp

and think "aha, I'll make this neater" and write

  struct protocol_block {
    int block_size;
    char record_type;
    int user_id;
    int unit_id;
    double radiation_level
    int timestamp;
  };

and

  fread(...);
  struct protocol_block * block = buffer;
  int uid = block->user_id;

without being aware that they are making assumptions about what their compiler does with structs (and also about endianness, and other things).

Buttons840 · on Sept 17, 2023

You originally said:

> The first case, for example, returns 8 under any reasonable compiler

That is wrong. This part of C is apparently obscure enough that people make false assumptions like your own I quoted.

After you made that claim, someone provided a case where GCC returns something other than 8 and you edited your statement. Again, your statement was incorrect until you edited it, and so there must be some obscurity involved.

dzaima · on Sept 17, 2023

sizeof isn't an obscure part of the language, yes, but the specific behavior here might still be unexpected for a decent number of people, who might, say, think they can always use "sizeof(int)" and "4" interchangeably to shorten code (which could very well be true for all platforms they will ever care about, but nevertheless is not a guaranteed property by C by itself).

AnimalMuppet · on Sept 17, 2023

I am old. I remember when sizeof(int) was usually 2. And I may live long enough to see sizeof(int) typically be 8.

The 16-to-32 bit transition broke a lot of code that assumed sizeof(int) was 2. The next transition may do the same. (Or, we may keep "int is 32 bits" forever, and use long for 64. Who knows? I don't. You probably don't, either, so don't assume that sizeof(int) = 4.)

dzaima · on Sept 17, 2023

The present behavior on normal 32-bit and 64-bit platforms is that, on both, 'int' is 4 bytes, and 'long long' is 8 bytes; and on 64-bit, whether 'long' is 4 or 8 bytes depends on the ABI/target OS.

I'd imagine it's quite likely that 'int' stays 4 bytes even on hypothetical 128-bit CPUs - there's not much reason to change it, as on 64-bit it's already less than the CPU width, and thus is pretty much arbitrary even today. But yes, anything that wants a 4-byte/32-bit integer should just use <stdint.h>'s int32_t.

lelanthran · on Sept 17, 2023

> But, thus, upon seeing the program, without additional information you cannot know whether it'll give 8 or 3 (or who knows what else), and thus "I don't know." is very much an appropriate answer.

May it is appropriate, but its underhanded.

I mean if someone gave the single line of Go code:

     s := x + y

And then, when you said "that adds two number", the someone replied "Hah! Gotcha! The answer is 'I don't know'"

It's puerile.

stephen_g · on Sept 17, 2023

I don’t get what you’re saying - what’s underhanded about saying structure packing can have unexpected results across platforms, architectures, compilers and even compiler flags?

The point is that you could only ever say for sure what the answer will be if all that is exactly specified, otherwise your assumption could be wrong.