Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Deliberately-straightforward -- agreed. But "highly proficient" after writing one or two scripts? That's quite a stretch.

For instance, one of the questions I give in phone screens is for the candidate to write a program to count the number of occurrences of unique words in a text file. The "after writing one or two Python scripts" approach is something like this:

    counts = {}
    f = open('test.txt')
    lines = f.read().split('\n')
    for line in lines:
        for word in line.split(' '):
            if word:
                word = word.lower()
                if word in counts.keys():
                    counts[word] += 1
                else:
                    counts[word] = 1
    f.close()
    count_items = [(count, word) for word, count in counts.items()]
    count_items.sort()
    for count, word in reversed(count_items):
        print word, count
Whereas the "highly proficient" (and much simpler and more Pythonic) approach might look something like this:

    import collections
    counts = collections.Counter()
    with open('test.txt') as f:
        for line in f:
            for word in line.lower().split():
                counts[word] += 1
    for word, count in counts.most_common():
        print word, count


My solution:

    lines = [line for line in open("bible.txt")]
    words = [word for line in lines for word in line.split()]
    counts = {word:0 for word in words}
    for word in words:
        counts[word] += 1
No imports needed. Linear time. A bit inefficient in the dictionary comprehension, but easy to read.

The "lines=" and "words=" can be compressed into one line, but I figure this is a bit easier to read for people who aren't familiar with nested list comprehensions.


I'm not too familiar with Python, but this is case sensitive unlike benhoyt's examples. Would this be case insensitive?

    lines = [line for line in open("bible.txt")]
    words = [word.lower() for line in lines for word in line.split()]
    counts = {word:0 for word in words}
    for word in words:
        counts[word] += 1


yes!


> A bit inefficient in the dictionary comprehension, but easy to read.

If you allow one import to sneak in, I think a `defaultdict` would do nicely. :-)


  for word, count in Counter(open('test.txt').read().lower().split()).most_common():
      print word, count


I like this one. Though I would do it like this to keep the lines under 80 characters:

    words=open('test.txt').read().lower().split()
    for word, count in Counter(words).most_common():
        print word, count
(edited per child comment)


Yeah, those are nice -- and may actually be more efficient on smaller files, as you're only doing the lower() once on a big string. However, for big files you don't necessarily want to read the whole thing in at once.

One nitpick: it's Pythonic (I think) to just name the list of words "words" rather than "word_list".


Yes that's a classic tradeoff, a proficient programmer will have to pick one.

Personally I always read entire files into memory first unless I have reason to believe memory will be an issue or need to program defensively against malicious/careless input. The code is always much cleaner and easier to read and if you need to do a second pass on the data you don't need to re-read it from disk.


Here's mine for what it's worth, since this is one of the Google python assignments. Admittedly, i'm not very proficient at all.

   def get_file_words(filename):
     file = open(filename, 'rU')
     words = {}
     for line in file:
       for word in [word.lower() for word in line.split()]:
         if not word in words.keys():
           words[word] = 1
         else: words[word]+=1
     return words


   def print_words(filename):
     wordcount = get_file_words(filename)
     for word in wordcount:
       print word, wordcount[word]


I'd like to jump in with a little R here- it not all that difficult in "that" either!

open(con <- file('text.txt'))

text = readLines(con, n= -1L) # n is number of lines to read, -1L means read all of it

words = strsplit(text,split = " ")

counts = table(unlist(words))

I put this in because the good thing about R is that it provides functions for many such mathematical operations. And along with this, I'll say something any self-respecting pythoner will know- Less is better than more.


Actually that's not 'highly proficient'. This is:

    def read_words(words_file):
        return [word for line in open(words_file, 'r') for word in line.split()]
    len(set(read_words('test.txt')))


smashing tons of crap together isn't necessarily "highly proficient". In some cases it makes things harder to read and/or harder to maintain and many times certainly harder to edit.


True that. But there is also another side to this story, where such nested calls translates naturally to what we are thinking, i.e.

length(unique(text)) comes from the thought "How many(length()) unique (unique()) words are there in this text(text)?"


R version for comparison (;

    length(unique(scan('test.txt',character(),sep=" ")))


Very clean. 1 small change to make it count of frequencies-

table(scan('test.txt',character(),sep=" "))


Don't know when set comprehensions and resource handlers were introduced so that might not run everywhere

    def read_words(words_file):
        with open(words_file, 'r') as f:
            return {word for line in f for word in line.split()}


len(set(open("test.txt").read().split()))


Oh, I like this one. Not quite what OP asked, but very clean.


GP asked for counts, not uniques.


Oops, you're right.

    words = open("test.txt").read().split()
    [(k, len(list(g))) for k, g in groupby(words, lambda x: x)]


Doesn't that require an "import itertools"?


Yup. Sorry.


Oops, should have been this:

    words = open("test.txt").read().split()
    [(k, len(list(g))) for k, g in groupby(words, lambda x: x)]


except that's wrong, all you're doing is counting the number of unique words, and you didn't even consider Foo and foo as the same in your example. Part of proficiency is understanding the problem.


How does this type of question go during a phone screen? Does the candidate send the code to you afterward?


that first version is hilarious. i'd consider it hideous at any proficiency level in any language :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: