Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Go: A Surprising Edge Case Concerning Append and Slice Aliasing (jjinux.com)
63 points by tosh on May 16, 2015 | hide | past | favorite | 72 comments


The Go Blog has an article about slice internals. [0]

After reading that, it is a bit easier to see why this happens. Everything is a value in Go, whether it's an int, a struct type or a reference type like a pointer/map/slice. [1]

Since everything is a value, this is not really "aliasing"

    // b "aliases" a
    b := a
It only appears to be by coincidence: The entire slice data structure gets copied, and one part of that is a pointer to the underlying array, which does not get copied. This is why updating the underlying array (via `a`) reflects in `b`.

Since `append()` either modifies the underlying original array, or allocates a new one if the underlying array is not long enough (classic dynamic array behaviour) [2], the use of this function will only impact `b` if a new array is not allocated.

To be fair, the article mentions all of this, but sort of in passing and at the end. ("They're slices which are a value type that have a pointer within them.") I feel that this is sort of the key point.

0. http://blog.golang.org/go-slices-usage-and-internals

1. https://blog.golang.org/go-maps-in-action

2. http://golang.org/pkg/builtin/#append


Copying a pointer is by definition equal to aliasing.


A slice is:

  type sliceStruct struct {
	array unsafe.Pointer
	len   int
	cap   int
  }
I think you both _mean_ the same thing, just the previous poster is focusing on the entire struct itself, and you're focusing on the 'array' member of the struct.

Or I'm just babbling.


My point was that hiding a pointer in a struct does not magically make the aliasing go away.


The slice itself is not being aliased, but yes, of course, the underlying array is.


Yeah exactly, once you take the time to understand how slices work (I definitely recommend those articles to anyone coding much in Go), the behavior is completely unsurprising.


This does not mean it's good behavior. Nothing is surprising after you learn how it works.


It's not bad behavior. It's expected one you understand what slices are, and it's a better set of tradeoffs than slices in e.g. python which copies the underlying array.


Then try this one on for size. Very similar, ... but not quite :

  package main

  import "fmt"

  func main() {
   // Create a slice of length 1 and capacity 2. Note: try changing this to make([]int, 1, 1) ...
   a := make([]int, 1, 2)

   // b "aliases" a
   b := a
   fmt.Printf("%v %v\n", a, b)

   a = append(a, 2)
   fmt.Printf("%v %v\n", a, b)

   b = append(b, 5)
   // Question : what does a contain now ?
   fmt.Printf("%v %v\n", a, b)
  }
http://play.golang.org/p/Hf2noTyNrp

Run the program, change the capacity in the make call to 1, and re-run.

If they were values, a and b should simply diverge, so the last line should output [0 2] [0 5].

If they were references, then the last line should output [0 2 5] [0 2 5].

So slices (and maps) are neither values nor references. They're "half-references". Their length is passed as a value, their contents as a reference. Sort of like the linux kernel iovec structures, but completely hidden.

I would say this is a bug, as it violates both ways to think about these values. But it can't be changed without changing the complexity of lists in Go ... so this isn't going to change, and therefore will be declared "not a bug".


I think it's reasonable to ask: "who is this bugging?"

When I was learning Go it took me about 5 minutes to understand how slices work, and I've never had a problem with this behaviour in my work (or my coworkers' work).

Also, it's very rare to see anyone in the #go-nuts IRC asking for help with slices, which leads me to believe that most people don't really have a problem understanding them. You yourself clearly have a solid understanding of them. The semantics are clearly defined in the spec and the internals are easy to understand if you read the docs.


No, they are values that contain a pointer. Think of a struct that contains a pointer. Is that called a 'half-reference'? Take a look at this poor man's slice example using structs and pointers, does it make more sense now? http://play.golang.org/p/Nnx5Z6DRNo


That's not a difficult problem, the behavior is straightforwardly predictable. append says it will use the same slice until it has no more capacity. A slice reference behaves like it's a pointer, size, and capacity. If you're expecting reference b to "know" about what a has done, or vice versa, the problem is not with the language, it's with you.


> "know"

> the problem is not with the language, it's with you

You're going too far there. The language already has action-at-a-distance behavior from references: they keep the array from being garbage collected. It would be both entirely possible and straightforwardly predictable to have them update to a new location, or enforce the rule you gave in your other post that there be no other users of the data when you're appending to it.


> It would be both entirely possible and straightforwardly predictable to have them update to a new location,

Only if you expected append to take linear time per usage, and literally just started using it without understanding what it does.

> enforce the rule you gave in your other post that there be no other users of the data when you're appending to it.

Another arbitrary hypothetical where you go using functions you don't understand, but for you to expect this you'd have to expect the language to punish reasonable uses of slices, like sharing the part you've already written and appending after that, and just having a local reference to one in a variable, with linear time behavior per usage. Which also makes it kind of unlikely that one ought to just assume the language works that way.


Lots of ways to do it, wouldn't have to be linear time.

>for you to expect this

Oh, no, I wouldn't expect it to enforce a rule like that by default, but it would be a good thing to do when the behavior is so unpredictable.

>sharing the part you've already written and appending after that

You could require an immutable slice for that, in the situations where append is the most reasonable option.

Anyway I'm not trying to suggest this method. I'm just giving an example to support the idea that it could do something predictable.


I can't understand what you wrote. Append is linear time. It could have been amortized O(1), but the Go authors decided that used too much memory so it's O(n) instead, with amortized O(1) behaviour for short slices. Amortized meaning that it's O(1) and then suddenly it's O(n) for one of the calls and then it's O(1) again for a while. But because of the size limit in increases, for large arrays it's O(n), and a fair assessment of the function would only mention the largest bound.

What do you mean that the way append functions is obvious ? Because it doesn't seem obvious to me at all. Do you mean it's obvious to C/assembly programmers ? Obvious to someone who's implemented the Go standard library ?

This means slices work well, until the growth of one slice happens too fast and then, suddenly, your program crawls to a halt (and because it's doing more work, likely requests will pile up and make the offending slice grow even faster, resulting in effectively an infinite loop, and behaviour that is indistinguishable from the scheduler freezing, because now there is ridiculous growth in the number of goroutines).

The rule you gave in the other post is that in Go, everything behaves as a value. None of the complex Go datatypes do that. Slices, maps, interfaces, channels, ... all are half-half reference-value, with curious behaviour resulting from that. Go's simplicity comes from the fact that there are very few advanced datatypes to remember, and that the language makes it utterly impossible to implement any new ones in a reasonable way. So on the one hand you don't have idiots using B+trees where an array would have sufficed, but good luck expressing matrix equations in Go.


Append has amortized O(1) time at all slice sizes.


No it doesn't. The proof for amortized O(1) only works if you double the array size on every expansion, which only happens until a "ceiling" gets hit and then the slice expands by less.

This results in O(n) complexity.

And this is assuming absense of memory pressure, if there is memory pressure it will very quickly becomes O(n^2) complexity, and god help you if it hits swap.


I had looked at append's behavior when I wrote that post, and for large slices it was increasing the size by 25% each time. That (or any proportion) gives you O(1) amortized time.


Really ? I thought it was limited to something like a fixed number of megabytes increase per call.


Appending to a slice I get these capacities:

  cap:  1
  cap:  2
  cap:  4
  ...
  cap:  512
  cap:  1024
  cap:  1312
  cap:  1696
  cap:  2208
  cap:  3072
  cap:  4096
  cap:  5120
  cap:  7168
  ...
  cap:  192797696
  cap:  240997376
So it looks like it starts by doubling, then it gets weird between 1024 and 4096, and then it multiplies by 1.25.


"Immutable data structures (like lists in Lisp) don't have these issues."

Lists in Lisp are mutable (in almost all dialects, from the original 1950s LISP to Common Lisp). You will also have bad issues if you mutate the cells of lists in Lisp, when there are other references starting at different cells inside the list. List-mutating forms in the standard library directly warn about the inputs being destroyed, hence to use the newly returned head instead of holding on to old pointers.

(It's befuddling how often Lisp gets mentioned on HN, and how often it's wrong.)


Maybe it comes from academia. Lisps are used to teach recursive program and step into purely functional idioms. I wasn't introduced to the effectful APIs or libraries. Only car, cdr and recursion as primitive means. Also history rarely reports Lisp being imperative. McCarthy first paper shows a functional structural symbolic derivator, he was said to be a proponent of recursion when nobody cared, I think Richard Gabriel said the `progn` form was introduced a-posteriori to satisfy FORTRAN programmers. I've read only once that Lisp was planned to have imperative traits from the start. So it's not unnatural to think of Lisp as a mostly pure recursive language that caved in to allow side-effects (just like ML has Ref types when it makes sense). It's a bit Edisonian in how simplified the story has been.


Clear mutable semantics or clear immutable semantics but please, don't give us a mushy combination of the two.

In other words, the operation "append to an alias for this slice" should either consistently modify this slice or consistently not modify this slice. Behavior of "sometimes modifies this slice and sometimes doesn't" is, in my opinion, a serious API/language design failure.


The use of `append` is absolutely clear. It always returns a new slice; the compiler forces you to take the value returned.

What scares you is the possibility that the underlying array may be copied to a newer, larger array giving enough space for the appended items. If there exists enough space, why bother with a new allocation? I want `append` to handle this for me. If you want different behavior, you can easily setup your own design with `copy`

Regarding this blog post, if you have multiple slices to the same array and are arbitrarily appending to any of those slices, then your design is wrong to begin with.

This isn't an edge-case but a lack of slice understanding.


You're describing how it actually works which is by way of 'shallow value' semantics which is precisely what the commenter is against. Such semantics requires understanding implementation without a simple opaque description. Either full mutability or 'deep value' semantics is simpler. For instance if the b: = a; followed by a[0] = 2 resulted on a copy on write, then b would be unaffected after the assignment of value. Of course the language and libraries do what they do and the user must be aware. Go was supposed to be easier, so copy-on-write semantics would have made more sense.


I'm not sure what you're trying to say. This is, IMO, nothing more than a lack of understanding. Is the result of `a := make([]struct{}, 0, 1); b := a; b = append(b, struct{}{}); println(len(a), len(b))` also surprising?


Your understanding is incorrect, too. append() only returns a new slice when "the capacity of s is not large enough to fit the additional values", in which case "append allocates a new, sufficiently large underlying array that fits both the existing slice elements and the additional values. Otherwise, append re-uses the underlying array."

[-] http://golang.org/ref/spec#Appending_and_copying_slices


No, nicksardo's understanding was quite correct. Slices are values, and it is impossible to return the same one, because value semantics don't work like that.

The slice returned only points to a new array when the capacity is not large enough to fit additional values.


You seem to be confusing `array` with `slice`. AFAIK, append always returns a new slice. How could it not, when everything is copy/pass by value? A slice is essentially a struct with fields for len, cap and a pointer to an array.


Go's all about practicality over ideological purity. This article was unsurprising to me ('cuz I know how slices work!), and in fact, is an obvious consequence of the design of slices. Personally I like that design -- it is just enough to be both simple and powerful.


This is why Rust has a borrow checker.

A slice is a mutable borrow. Rust only allows you one mutable borrow at a time per object. So does Go, as someone just found out the hard way. Rust checks this, and Go doesn't.

I've remarked in the past that borrow checking isn't fundamentally restricted to Go. Now that we have a usable theory of borrowing, it belongs in more languages.


You don't need a borrow checker, at most from that you'd need is const annotations. But you don't even need that, all you really need is a wrapper type that prohibits append. Or if you had just the ability to shrink the capacity of a slice reference.

But really you don't need any of that, if you're passing a slice to something and some knucklehead is calling append on it, take him out back and shoot him.


It's very easy to accidentally borrow in Go, because Go has implicit reference types. If you send a slice across a channel, you're sending a reference, not a copy.


People have been making slices in Go and sending them across channels without unique ownership and borrow checking for years, and it hasn't been a problem.


Are you sure there's not a problem? That's a potential race condition. Are you sure there's no problem after a few years of changes to the code?

(Claims that there's no need for programmer checking systems are usually wrong. Go read CERT advisories or go to DefCon. Those are just the exploitable bugs. I once spent four years debugging other people's code using mainframe OS crash dumps. Most programmers are not as good as they think they are.)


Can you come up with a reasonable way this could turn into an exploitable security issue? I can see crashes and maybe live locking but Go doesn't have pointer arithmetic, so I can't see how this could be used to bust the stack.


One user's private information could get sent to another user, or you could have a subslice that gets "validated" and then overwritten.

Edit: Of course, this isn't exclusive to data races, this can happen with anything that decides to save the slice for a while, without, say, defensively copying it, when some other code decides to reuse it.


Oh no, a small chance of a bug!

Edit: Way to edit your post. In fact, if you look at Go users and their ability to use it productively, they seem to be doing just fine. You'll make less money if you use a version of Go with unique ownership and borrow checking.


Data races on an underlying array buffer due to simultaneous accesses from multiple threads do not have "a small chance" of occurring in virtually any language where it's allowed.

You can argue that a borrow checker isn't a good fit for Golang, and I'd even agree with you on that, but let's be candid about the frequency of data races in the wild.


In reasonably written message-passing Go code it's not likely. It's no different from any other situation where you've got a slice that isn't supposed to be modified anymore.

> let's be candid about the frequency of data races in the wild.

Wow.


I think we should assume by default that code is not reasonably written.

Tool chains that check properties for us are incredibly useful.


Have you used Go and ran into problems with people overwriting data in slices underneath you? Was it to such a degree that it took precedence over other bugs and slowed down development in the effort to catch these mistakes manually?


> But really you don't need any of that, if you're passing a slice to something and some knucklehead is calling append on it, take him out back and shoot him.

"You don't need mechanisms to prevent you from doing stuff like this, because you can just avoid doing them", in other words.


Yes, that is another way to put it.


Well that attitude hasn't worked yet... But any time now!


The main mistake in this article is the assumption that b := a makes 'b' an "alias" of 'a'. It doesn't. It creates a new slice that starts off pointing to the same array as 'a'. Also slices are immutable so you can't change 'a'. If you create a new slice by manipulating 'a' then the copy 'b' won't see anything outside of its bounds.


This isn't about aliasing, as the specification notes "If the capacity of s is not large enough to fit the additional values, append allocates a new, sufficiently large underlying array that fits both the existing slice elements and the additional values. Otherwise, append re-uses the underlying array." [1].

I encourage anyone who writes Go to read the specification. I found it very digestible and my code has improved a lot as a result. In this case, for example I can do things like pre-allocate a slice of given and then append to it several times without re-allocations which I sometimes find more semantic when dealing with binary data -- and the reference to the slice doesn't need to change.

[1] http://golang.org/ref/spec#Appending_and_copying_slices


I used to think like that: learn all of the quirks and the internals behind them, and you always have an explanation.

With postgres, I always had an explanation why it was doing so much work on every update, and why replication was so hard, and why checksums were impossible to implement efficiently, and why index only scans wouldn't work, and...

After seeing that all of these things were fixed by people who didn't think that way, I changed my mind.

This Go behavior is surprising behavior and non-deterministic. Changing a constant can have some bizarre action at a distance that breaks your code. It affects ordinary code even if the author doesn't care about these details. I really don't see anything good about it even if you do understand it.


The contract for append() has _always_ been that it might return the original pointer or it might copy-and-extend into a new pointer. That you can keep a reference of the previous version is programmer error for not having clear ownership of that pointer. There should be one canonical context for it, being a member of a struct or a local variable on the stack. Then you don't have to worry about having two copies of it. If you need multiple threads accessing the pointer, then you should have put a mutex on it.


The point is more that it's a bad contract, isn't it?

If a slice is something which can't safely be copied, then why can it be copied?


So that you can pass your slice to functions and actually make use of them.

That somebody could go along and start modifying data structures they aren't supposed to is a problem in Java, C#, ... many languages.


I'd have to agree, that would fit my programming style, however it's not unreasonable for a beginner to think of slices and interfaces as basically "A pointer with some additional info". So if you have two data-structures that have a slice of the same data, you may conceptualize that as two data-structures pointing to the same data. It's wrong, but it's also not far off.

I do think it's a little surprising that this doesn't work:

  func append6(is []int){
    append(is, 6);
  }
even though it appears to with basic testing.


Everyone here is talking about that it's it the specification.

If that's the case, and it does seem to be, then the problem is with the specification. Inconsistent behavior is nasty at the best of times, and this is no exception. I can easily see bugs coming out of this.

It's like C and undefined behavior. (Null checks in the linux kernel, anyone?) Even the best developers get bitten by it occasionally. It's one more subtle thing to remember, and everyone can only remember so much.


But this isn't surprising behavior and it's not edge case behavior either. It's exactly what the docs say will happen, and you don't even need to read the docs, it's exactly what you'd expect, given what the interface is and what that implies about how the feature could possibly be implemented.


That something is "expected" (for some definition of expected) doesn't mean the code is easy to reason about.

Initially, modifying "a" also modifies "b". Later, modifying "a" does not modify "b". I can't imagine a piece of code that uses "a" or b" that wouldn't care which of those two states the system is in.

And, knowing which of those two states code will be operating under will not be clear by simply looking at an arbitrary piece of code. The fact that it is well documented that transition may occur does not resolve this ambiguity.


The code in this example is very easy to reason about. Because it's a single function.

If it's hard to reason about two different handles of shared data where one of them is used to modify the shared data... don't do that. append is for operating on a slice that your code is using exclusively. Like in this example, where it's all in the same function and the behavior's predictable.


And you could say the same about null checks in C. And yet bugs of that sort still appear, up to and including in the linux kernel.


No, that's not a reasonable thing you're saying. That you can say foo about A and foo about B, the fact that saying foo about B is false does not imply that saying foo about A is false.


Strawman. The argument is that they are very similar situations with similar downsides and shoddy defenses. The argument is not that the defenses being wrong about C proves that they're wrong about Go.


It's not a "strawman" when you're arguing with the logic somebody else has presented to you.


That was not their logic. They were not depending on a fallacy. The comparison with C was not the basis of the argument. They were merely giving an example of a similar situation.


It's not a similar situation.


That's an opinion, not a fact. Some people call it similar.


What is everyone ranting about?

Append is not a method of the type slice, it is a function that returns a slice. What I read is people writing

a := make([]int,2,2)

b := a

a = f()

And then "mommyyyy a is not equal to b anymoreeeee".

Just sayin...


This is not an edge case.

effective go[0] clearly states "Slices hold references to an underlying array, and if you assign one slice to another, both refer to the same array."

[0] https://golang.org/doc/effective_go.html#slices


Go does have immutable data structures... They are called strings :-) It's not like Lisp or Scala, of course. Simply convert your slice to a string and you are set.

The slice behavior can be a bit surprising to new Go devs if they just jump in trying to write code. It's pretty common because Go looks simple and very familiar.

You can find more traps for new Go devs in this post: http://devs.cloudimmunity.com/gotchas-and-common-mistakes-in...


Wasn't the whole point of Go that everything was simple and "just works"? I've heard a lot of people here on HN say that they love go because you don't need to keep track of weird quirks or implementation details. This seems to fly in the face of that philosophy.


Go's philosophy strikes me more as "when many solutions are available, pick the most practical one".

In this case, an alternative solution (among others) would be to always allocate a new array when calling `append`. This solution would offer higher predictability but lower performance.

Apparently the Go team decided that the current solution was the most practical - good performance and a relatively low risk of biting developers.

That doesn't mean it's a perfect solution. But has anyone in this thread suggested any solution that would be unequivocally better? Rust's approach for example is much more solid but at the cost of increased complexity (at least as a first glance - I'm just a beginner in Rust but the language already seems much bigger than Go).


That's not a weird quirk. "append" returns a value. One should expect that they shouldn't use the old one.


If you wanted to do that:

   b := &a
   fmt.Println((*b)[0])


I really don't want to talk badly about another language or so, but append always returning a new slice and that being a simple statement about a function seems way more reasonable than for example in Python 2 having 2/3 resulting in 1.

Both of these decisions were made for simplicity.

It's maybe also comparable with "new Number()" in JavaScript not being primitive, but an Object which as a side effect is also call by reference.

I disagree with the statement that "it's in the spec, but that's bad" mainly because a huge number bugs happen out of a lack of understanding in a language and intuitive is usually that language X doesn't work like the first language you really learned.

The whole concept of Go is to have a small, simple spec that you can actually know. Compare Go's specification with other languages (maybe other than Scheme).

Complex specifications often root in way more complex problems. A common problem is for example C++ code, which already is complex being ported to Java, where you end up having side effects from stuff like NUL-delimited strings suddenly not being not delimited and so on.

I relatively frequently stumble across code that show a complete misunderstanding of a language. It's really common and often leads to strange work around that manifest the view of programmers not completely understanding the language they are using. Removing hundreds of lines of work around code and pointing out a mistake made usually looks quite impressive.

However I do not want to deny that specifications should try to avoid such behavior. I do not deny the fact that this isn't optimal, but neither are so many other things (see float). What I think is important is to actually acknowledge that things are not optimal and therefor there should be ways to get around them. append is really nice, because with that one bit of knowledge you end up with one single simple rule on how to program.

The problem that sometimes occurs is that you have rules about a language that go like "If you stumble across this problem the way to solve it depends on ...". So you actually have more side cases.

Also that simple append is maybe something that probably should be warned about cause it is really, really likely an error. This is another reason I think simple, general statements that have such side effects are good, or at least better than needing to know a lot about a language.

What I want to point out with that is that I think that one should judge a language after how it works out for one after one year of seriously working with it on a day by day basis. I really don't think this will end up as a practical issue. Finding something that is different in any language and might be confusing for people not confident in a language is really easy, but can also give you a completely wrong impression.

This is true for Python, Java, JavaScript, C++, ... On the other hand it mostly happens when there is a hype about a language, which totally makes sense. It's also got to point out that this new language is also not the golden shot, because it stops especially inexperienced programmers from following every new hype coming up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: