Your tl;dr by an ai: a deep reinforced model for abstractive summarization

rubyn00bie · on May 14, 2017

While this is pretty neat; FWIW-- I've always been blown away but the summarization tool built into MacOS. You just select text, hit summarize, and adjust the length. It works wonderfully-- I used it college all the time for annotated bibliographies. To be honest, I've always found it good enough and it's a wildly simple tool (or so it looks) by comparison to using AI.

kccqzy · on May 14, 2017

I agree. But it was pretty much neglected by Apple these days.

ahussain · on May 14, 2017

wow, I didn't even know it existed, thanks for the comment

alexcnwy · on May 14, 2017

Awesome!

P.s. Richard Socher, one of the authors on the paper taught a great Stanford course 'CS224d: Deep Learning for Natural Language Processing' with videos and notes available here:

http://cs224d.stanford.edu/

lngnmn · on May 14, 2017

According to linguists, there is an inevitable gap between a syntax and semantics, due to the fundamental principle of arbitrariness of phonemes used. To put it simple, any sequence of sounds could be associated with any meaning (given a distinct semantics). Morphemes however, while in some cases could be used as a direct pointers to the meaning, nevertheless require one or even several contexts to be interpreted correctly.

TL;DR - there is no way to get proper semantics without mastering appropriate contexts (domain knowledge) from mere syntax in principle. One would get certain word patterns, but not the corresponding deep structure (the intended meaning). Summarization would be arbitrary.

Try to summarize the sermon on the mount.

skoocda · on May 14, 2017

Some would further argue that learning underlying meaning (which can only derived from context, AKA the pragmatic), is completely impossible without agency. Until these AI systems can grok information relative to some sort of spiritual/ philosophical/ metaphysical "self", they won't be able to make quality judgements.

du_bing · on May 14, 2017

Any open source code for trying? It's wonderful!

phreeza · on May 14, 2017

I have been wondering if it would be good for someone to set up a patreon and develop clean open source implementations of popular/current machine learning papers, and get paid for every model he/she puts out.

EvgeniyZh · on May 14, 2017

...and find out that most authors forgot to mention some critical details :)

nazka · on May 15, 2017

Ya and like the unoptimized solution for WaveNet when DeepMind published it. But hey they published so it's cool!

p1esk · on May 15, 2017

You should ask for your money back.

_delirium · on May 14, 2017

From what they've done so far, I believe Salesforce's business model with MetaMind is to publish papers openly, but expose the code only via paid products/APIs.

mdlap · on May 14, 2017

There is code for some of the related papers, if you just want to play around with something, and something pretty similar at that. Here's one: https://github.com/abisee/pointer-generator

du_bing · on May 14, 2017

thanks!

pouta · on May 14, 2017

If someone can share the other mentioned implementations it would be great!

philtar · on May 14, 2017

Yeah that would be great

projectorlochsa · on May 14, 2017

I don't think reinforcement learning is equivalent over optimizing joint loss.

I mean, their model executes X steps and then they calculate the loss using supervised data, use that loss to learn.

The same is being done with machine translation models when they optimize over BLEU. It's still supervised learning because to calculate the loss you need reference data.

andreyk · on May 14, 2017

Is is RL because the loss is non-differentiable - they don't do standard backprop, but use "self-critical policy gradient training algorithm" (a form of RL). You could argue it's supervised in the sense that there is ground truth data, but then again RL also has 'ground truth' in the form of a score function - they don't provide the ground truth sentence to the model but a different metric based on the accumulated outputs of the model, so if you squint you can see how it fits in classic RL terms (though the starting state is always the same, the action/state space is ridicolous, etc.).

projectorlochsa · on May 14, 2017

Well, BLEU is non-differentiable and not decomposable over sequence of translation decisions. Yet I wouldn't call methods reinforcement learning because loss is tricky.

But yeah, I guess there's more to it than meets the eye.

andreyk · on May 15, 2017

I suspect (I have not read that much NLP literature) that BLEU is typically used as evaluation only, not as the training loss. eg Google's "Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation" mentions directly optimizing for BLEU, but again via RL and not supervised learning. It certainly is a quirky example of RL, though... guess that's the pace new ideas/approaches are introduced these days.

mark_l_watson · on May 14, 2017

I heard a similar talk June 2016 at NAACL - this paper is probably a good improvement. I have spent time writing extractive summmarization code, which is a much easier problem to solve. The ability to ingest text, form and internal representation, and then generate a summary is impressive.

pouta · on May 14, 2017

My Startup does exactly that. I wonder if you're available to share some more thoughts on this topic.

mark_l_watson · on May 14, 2017

Sure, I would enjoy talking with you. My email is in my profile.

kgwxd · on May 14, 2017

That looks pretty long.