Deliberately-straightforward -- agreed. But "highly proficient" after writing on...

wyager · on Jan 9, 2014

My solution:

    lines = [line for line in open("bible.txt")]
    words = [word for line in lines for word in line.split()]
    counts = {word:0 for word in words}
    for word in words:
        counts[word] += 1

No imports needed. Linear time. A bit inefficient in the dictionary comprehension, but easy to read.

The "lines=" and "words=" can be compressed into one line, but I figure this is a bit easier to read for people who aren't familiar with nested list comprehensions.

ajayjain · on Jan 11, 2014

I'm not too familiar with Python, but this is case sensitive unlike benhoyt's examples. Would this be case insensitive?

    lines = [line for line in open("bible.txt")]
    words = [word.lower() for line in lines for word in line.split()]
    counts = {word:0 for word in words}
    for word in words:
        counts[word] += 1

gejjaxxita · on Jan 15, 2014

burntsushi · on Jan 9, 2014

> A bit inefficient in the dictionary comprehension, but easy to read.

If you allow one import to sneak in, I think a `defaultdict` would do nicely. :-)

drkevorkian · on Jan 9, 2014

  for word, count in Counter(open('test.txt').read().lower().split()).most_common():
      print word, count

Goladus · on Jan 10, 2014

I like this one. Though I would do it like this to keep the lines under 80 characters:

    words=open('test.txt').read().lower().split()
    for word, count in Counter(words).most_common():
        print word, count

(edited per child comment)

benhoyt · on Jan 10, 2014

Yeah, those are nice -- and may actually be more efficient on smaller files, as you're only doing the lower() once on a big string. However, for big files you don't necessarily want to read the whole thing in at once.

One nitpick: it's Pythonic (I think) to just name the list of words "words" rather than "word_list".

Goladus · on Jan 10, 2014

Yes that's a classic tradeoff, a proficient programmer will have to pick one.

Personally I always read entire files into memory first unless I have reason to believe memory will be an issue or need to program defensively against malicious/careless input. The code is always much cleaner and easier to read and if you need to do a second pass on the data you don't need to re-read it from disk.

krapp · on Jan 10, 2014

Here's mine for what it's worth, since this is one of the Google python assignments. Admittedly, i'm not very proficient at all.

   def get_file_words(filename):
     file = open(filename, 'rU')
     words = {}
     for line in file:
       for word in [word.lower() for word in line.split()]:
         if not word in words.keys():
           words[word] = 1
         else: words[word]+=1
     return words


   def print_words(filename):
     wordcount = get_file_words(filename)
     for word in wordcount:
       print word, wordcount[word]

eeshanC · on Jan 10, 2014

I'd like to jump in with a little R here- it not all that difficult in "that" either!

open(con <- file('text.txt'))

text = readLines(con, n= -1L) # n is number of lines to read, -1L means read all of it

words = strsplit(text,split = " ")

counts = table(unlist(words))

I put this in because the good thing about R is that it provides functions for many such mathematical operations. And along with this, I'll say something any self-respecting pythoner will know- Less is better than more.

yohanatan · on Jan 9, 2014

Actually that's not 'highly proficient'. This is:

    def read_words(words_file):
        return [word for line in open(words_file, 'r') for word in line.split()]
    len(set(read_words('test.txt')))

bnolsen · on Jan 9, 2014

smashing tons of crap together isn't necessarily "highly proficient". In some cases it makes things harder to read and/or harder to maintain and many times certainly harder to edit.

eeshanC · on Jan 10, 2014

True that. But there is also another side to this story, where such nested calls translates naturally to what we are thinking, i.e.

length(unique(text)) comes from the thought "How many(length()) unique (unique()) words are there in this text(text)?"

mbq · on Jan 9, 2014

R version for comparison (;

    length(unique(scan('test.txt',character(),sep=" ")))

eeshanC · on Jan 10, 2014

Very clean. 1 small change to make it count of frequencies-

table(scan('test.txt',character(),sep=" "))

agumonkey · on Jan 9, 2014

Don't know when set comprehensions and resource handlers were introduced so that might not run everywhere

    def read_words(words_file):
        with open(words_file, 'r') as f:
            return {word for line in f for word in line.split()}

pwang · on Jan 9, 2014

len(set(open("test.txt").read().split()))

wyager · on Jan 9, 2014

Oh, I like this one. Not quite what OP asked, but very clean.

jessaustin · on Jan 9, 2014

GP asked for counts, not uniques.

yohanatan · on Jan 9, 2014

Oops, you're right.

    words = open("test.txt").read().split()
    [(k, len(list(g))) for k, g in groupby(words, lambda x: x)]

leephillips · on Jan 9, 2014

Doesn't that require an "import itertools"?

yohanatan · on Jan 15, 2014

Yup. Sorry.

yohanatan · on Jan 9, 2014

Oops, should have been this:

    words = open("test.txt").read().split()
    [(k, len(list(g))) for k, g in groupby(words, lambda x: x)]

hnriot · on Jan 10, 2014

except that's wrong, all you're doing is counting the number of unique words, and you didn't even consider Foo and foo as the same in your example. Part of proficiency is understanding the problem.

chocolate_ · on Jan 10, 2014

How does this type of question go during a phone screen? Does the candidate send the code to you afterward?

tokipin · on Jan 10, 2014

that first version is hilarious. i'd consider it hideous at any proficiency level in any language :D