Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hang on, this article links to instructions on how to do some of the things it claims can't be done.

It complains that the news archives frontend is gone, but then links to the page which explains how to do the same things using search: https://support.google.com/news/answer/1638638?hl=en

It also complains that groups is dead because you can't search by date... but the exact same method used in those instructions works just fine: https://www.google.co.uk/search?q=site%3Agroups.google.com+a...

The article claims that books scanning is slowing and links to an article which says it's still going in some places, but explicitly says that it's slowing down because some of the libraries are running out of books that need scanning.

It links to an old quartz article from 2012 claiming that "20% time is dead". After the first three paragraphs, that article links to the rebuttals: http://qz.com/116196/google-engineers-insist-20-time-is-not-... http://qz.com/117164/20-time-is-officially-alive-and-well-sa...

I'm not all that interested in arguing the title point of the article, but when an article provides a whole stack of "evidence" and superficial investigation reveals that most of it does not support what the article claims, I question the motives of the author.



Hi! I wrote the article. First off, I find it disingenuous that you don't mention you work for Google. But, hey! I'll give you the benefit of the doubt that it was a simple oversight.

Addressing your comments:

1. Google News Archive is, without question, a dead project. No new material is being added, no new development is being made, and it's unsupported. They removed the News Archive and homepage and redirected it to News.

The method Google suggests for web search isn't limited to news articles, making it effectively useless for research. (It shows everything indexed in Google.)

You can search for some newspapers in Google Search, but it's impossible to find any date before January 1970, order by date, or filter by publication. You're stuck with post-1970 date filtering for all papers, ordered by relevance. https://www.google.com/search?q=site%3Agoogle.com%2Fnewspape...

For reference, these were the options that were available in News Archive Search: http://www.library.illinois.edu/hpnl/images/newspapers/gna_a...

2. I didn't say Groups was dead. I said it was effectively dead for research purposes, which is true. For example, you can't search or filter by date across groups anymore: https://groups.google.com/forum/#!search/linux

In your example, how would you propose (for example) finding the first mention of Linux on Usenet? You can't, at least in part because the option to order by date is completely broken: https://www.google.co.uk/search?q=site%3Agroups.google.com+a...

Not to mention, only a fraction of the total posts are indexed and available in Google Search. For example, changing your query to limit to 1995 only results in 70 posts. There were many more than that being posted monthly in 1995 in comp.os.linux.advocacy alone.

3. It's entirely plausible that Google's library partners are running low on books, though that doesn't explain why the project appears to be completely dormant. As I mentioned, the official blog stopped updating in 2012 and the Twitter account's been dormant since February 2013. It doesn't seem like any book's been added in the last year -- no new books from January 2014 to today: https://www.google.com/search?q=a&biw=1146&bih=933&source=ln...

4. The 20% time thing is interesting. As a Google engineer, I imagine you'd have a better perspective on that than I would.

Former employees have explicitly said that 20% time no longer exists in the way it used to, and current employees, including here on Hacker News, say that it exists but only on top of your existing workload (effectively making it 120% time). I tend to trust them over a PR person, but really, that was a brief aside in my overall article.

The fact that a tiny fraction of the former functionality of a service is possible, albeit with an obscure and user-unfriendly method, does not detract from the overall point:

Google's current priorities don't appear to be in archiving the past.


For a specific example of real problems caused by killing Google News Archive search, it affected the work of Wikipedia editors. I and a lot of other editors had found it very useful as a high-quality and fast way to find good sources for articles we were working on (https://en.wikipedia.org/wiki/Wikipedia:Free_English_newspap...). There's not really anything else like it, so you end up combing through tons of general Google results that don't qualify as reliable sources in order to find a few newspaper articles.


Sigh. Standard disclaimer: nothing I write here has got anything to do with my job, I'm not representing my employer in any way. I cannot talk about anything I know personally. I've also never worked on any of these projects. However, I can read, describe, and link to public material on the internet like anybody else.

In all honesty, I have no interest in the news archive projects; I read an HN link, followed some of the links in it, and said "this isn't what I was promised on the previous page". It sounds like you just made a second attempt at writing the article. I suggest you take the original one down and put this one up instead; it stands up to at least the completely superficial fact-checking of reading the links in it, which makes it a significant improvement - although it now appears to be a list of fairly straightforward bug reports. (I like bug reports. Bug reports are actionable.)

Engaging with the subject would require substantially more effort on my part to research and investigate what's going on here, because I don't know anything about it beyond what I read in links here. I'm not going to do that. However, I would encourage anybody with an interest in this subject to do the research and write up their findings.


Saying that these problems look like bug reports is dismissive of the depth of the problems. Stopping development of products and removing access to features isn't unintentional, and a lot of people have already complained about each of these problems over the years. Andy's article is making a larger point that what has happened to these products is part of a pattern, that Google is not being as responsible in stewarding its information as its mission statement said it would try to be.


> Saying that these problems look like bug reports is dismissive of the depth of the problems.

Personally I completely disagree with your priorities. I think a bug report is far more valuable, since people can act on bug reports and make things better, while I would not anticipate any meaningful action as a result of speculation about mission statements.


That makes me wonder why the Chromium bugtracker appears to effectively be a black hole, if the bug reports are so "valuable". I don't think I've ever gotten a single response to my CSS calculation bug report.


> Hang on, this article links to instructions on how to do some of the things it claims can't be done.

After confirming that the linked sources disagree with major claims from the article, I've flagged it. I encourage others to do the same, for the sake of truth and honesty.


This is more cheap dismissal than a "superficial investigation". The author writes "Google News Archives are dead, killed off in 2011, now directing searchers to just use Google." and you have discovered that this is true?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: