Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fixing the string mess in Py2 was a good thing.

But the fact that the same code now silently does the always wrong thing in Py3 wrt CSV is clearly a bug.

Actually, the design defect here is calling str() on everything, and assuming that the output is sensible for CSV. It may be a decent rule of thumb, but it clearly does not apply to bytes. Given the likelihood that someone might mistakenly use bytes as a string (for example, because they're porting a legacy Py2 codebase), this should be a hard error, immediately reported as such, and not just a silent behavior change.



Except if you make an exception for bytes, what about other types that might get passed into a CSV writer, whose __str__ is something "wrong" for CSV purposes? Do they also get auto-detected? Do we add a new __csv__() method just for when outputting to CSV (since it might not be "wrong" for other output formats)? Or do we ditch str()-ifying altogether, but then add back in a bunch of special cases for numeric types and other things where str() is "the right thing"?

Or do we say "CSV outputs strings, whatever is the string representation of what you passed in is what gets written out", and trust people to figure out when they're working with something that has a "wrong" string representation for their use case?

Because remember: the whole underlying cause of this was treating a dangerously non-string value as a string. Those bytes objects should have been decoded to strings long before reaching the CSV writer. Python 3 does raise more and louder exceptions when you pass bytes to things that expect strings, but the CSV writer isn't a thing that expects strings; it expects things that have a string representation, and several common use cases get much more difficult if you change that to force every user to explicitly do throwaway casts to string in the name of protecting people who keep insisting on writing dangerous "I'll treat bytes as string until it breaks, and then complain that the language did the wrong thing, not me" code.


`bytes` is plainly special case for historical reasons here - it's something that is not a string, but that so many people assume to be a string.

So yeah, I would be fine with making an exception for it (and providing some kind of option to disable that exception, for that incredibly rare case where someone really does need b"foo" in their CSV output).

And then in 5 years, flip the default of that switch, and deprecate it. In another 5, remove it entirely.

Also, note that raising an error in this case is not placating the people who insist on using bytes as strings. Quite the opposite - it very loudly and unambiguously tells them that they're wrong, and how exactly they're wrong.


The entire problem, though, is people assuming bytes and strings are interchangeable. Anything which allows that assumption to go unquestioned, or without program-wrecking consequences, leads right back to where we were. And the "phase it out" model doesn't work; you proposed a ten-year phase-out, but in ten years people are just going to say "we never updated our code, we're not ready, keep it this way another ten years and we'll think about fixing our code". The only thing that works is actively breaking people's programs when they try to intermix bytes and strings.


> The only thing that works is actively breaking people's programs when they try to intermix bytes and strings.

Um, this is exactly what I proposed above!

"this should be a hard error, immediately reported as such, and not just a silent behavior change."

What I'm asking for is that csv writer raises an exception if it sees bytes anywhere by default. The problem is that right now, it doesn't! It just gives you "incorrect" output, that might go undetected for a long time.


If it's a bug, why not submit a fix?

I think the Python maintainers fundamentally disagree with you on that point, but a fix submission would settle it unambiguously.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: