Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, but if you are dealing only with a subset of the English Language in the U.S., and your API endpoint that you are scraping wants to serve to all peoples in all locales in all situations, you are fucked if you want to use Python3 and its csv module.

You genuinely are better off using Python 2.7.x and its naive approach to text.



I don't understand what you mean by "your API endpoint that you are scraping wants to serve to all peoples in all locales in all situations".

That would mean to me that the API endpoint could be sending me Unicode, in which case Python 3's Unicode-aware CSV is going to work great, and Python 2's csv is fucked. The limitations of Python 2's csv module was one of the key points that moved my company to Python 3.

On Python 3, if you want to be naive about text (not sure why you're celebrating only working in a subset of English, but you have this option), you could open the file as Latin-1 and get the same results as Python 2.

Many CSVs are made with Excel. Excel's only form of Unicode CSV is tab-separated UTF-16. Python 2's csv can't parse those at all, can it?


> Python 2's csv can't parse those at all, can it?

Nope, not without re-encoding to UTF-8 before parsing (learned that out the hard way and found out it's easier to just take excel files as input).

P2's CSV module works byte-based, and basically only handles ASCII-compatible supersets, assuming your special characters (quote chars, field and record separators) are straight ASCII.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: