Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In C an index is a difference that you add to a pointer to get a pointer. `a[i]` is `*(a + i)`. Given two indices `i` and `j`, you want `i - j` to be such that `a[j + (i - j)]` is `a[i]`, and it then makes sense to me that `i - j` is signed. The expression works out whether they are signed or unsigned, but just in terms of their interpretation on the part of a user (eg. "oh this is 2 elements before bc. it says -2") or so that comparisons like `i < j` are equivalent to `i - j < 0` and so on. That's why it's always made sense to me to use `ptrdiff_t` (or just `int`) for an index, vs. using `size_t`.


ptrdiff_t exists for subtraction between pointers that produce negative values. But how many times have you ever needed to subtract p and q where p represents an array element at a higher index than q? For that matter, how many times have you ever needed to add a negative integer to a pointer?

In C an object can be larger than PTRDIFF_MAX, a real possibility in modern 32-bit environments. (Some libc's have been modified to fail malloc invocations that large, but mmap can suffice.) Because pointer subtraction is represented as ptrdiff_t, the expression &a[n] - a could produce undefined behavior where n is > PTRDIFF_MAX. But a + n is well defined behavior for all positive n (signed or unsigned) as long as the size of a is >= n.

There's an asymmetry between pointer-pointer arithmetic and pointer-integer arithmetic; they behave differently and have different semantics. Pointers are a powerful concept, but like most powerful concepts the abstraction can leak and produce aberrations. I realize opinions vary on whether to prefer signed vs unsigned indices and object sizes (IME, the camps tend to split into C vs C++ programers), but the choice shouldn't be predicated on the semantics of C pointers because those semantics alone don't favor one over the other.


Negative offset is used often to access fields in parent struct having pointer to a field only. For example, to implement garbage collection or string type.


That should be:

  Parent* p = container_of(field,Parent,pa_somefield);
  access(p->pa_otherfield);
You'd usually define container_of using subtraction (not negative offset per se):

  #define container_of(FIELD,TYPE,MEMB) ({ \
   const typeof( ((TYPE*)0)->MEMB )* _mptr = (FIELD); \
   (TYPE*)( (char*)_mptr - __builtin_offsetof(TYPE,MEMB) ); \
   })
but you shouldn't actually be using that directly, because thats what the macro is for.


But p - 2 is not the same as p + -2, and it's not clear in your example whether the former suffices or the latter is required. I can definitely imagine examples where the latter is required--certainly C clearly accommodates this usage--but IME it's nonetheless a rare scenario and not something that could, alone, justify always using signed offsets. Pointers are intrinsically neither signed nor unsigned; it's how you use them that matters.


The reason we are in this subthread is my message about being bitten by unsigned underflow when iterating backwards using an unsigned `i`.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: