Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Duckling – Open-source datetime expression parsing (duckling-lib.org)
136 points by blandinw on Oct 1, 2014 | hide | past | favorite | 67 comments


> Intervals (like June 12-13) I wrote a Python library to do things like

    >>> r = Recurrence('June-July 2014').intersect(Recurrence('Monday to Friday'))
    >>> datetime.date(2014, 6, 13) in r
    True
    >>> datetime.date(2014, 6, 14) in r
    False
If anyone is interested in hacking on it with me, send me an email and I'll try to get it up on github.


That sounds really useful.


I'm impressed. I thought I'd try it with `next wednesday at 20 to 3 in the afternoon`. It understood!

It didn't understand if I used `afternon`. Perhaps as an improvement it could try selecting the most likely word from misspellings.


Thanks! Yes we should probably relax a little bit the lexical analysis to cope with typos, thanks for your feedback.


I don't even understand that. What time period is that meant to specify?


20 to 3 in the afternoon = 2:40 PM


Oooh, "twenty minutes until three". I typically use until or shorten it to 'till. But I've seen/heard to as well.


'To' is the most common version in British English.


One suggestion from a use-case standpoint.

On iOS, "tomorrow at 6" is parsed as 6PM instead of 6AM. This makes sense because usually people really mean 6PM. This is context dependent--in chat logs etc this is desirable.

Semantically speaking, the Duckling library does the right thing by parsing it at 6AM, but if the goal is ultimately to parse human expressions, then the iOS approach is probably better.

EDIT:

Another issue I ran into is that it correctly parses:

"tomorrow evening at 6"

but fails with:

"see you tomorrow evening at 6"

It would be nice to pass it the entire sentence since that's how most people will intend to use it.


I think the best way to handle ambiguities like this is to present the user with an unambiguous result that highlights the choice that it made, but then to also provide a list of possible ambiguities to the application itself so it knows what the most likely corrections would be.

From the Limitations section:

> ... we only display the closest upcoming time, if any, or the closest past time otherwise. It can result in surprising outcomes, like “one year after Christmas” will be actually analyzed as “one year after last Christmas”

So this could be the interaction:

> User: "one year after Christmas"

> Computer: "OK, one year after last Christmas" // putting emphasis on what could be ambiguous

> User: "no, after next Christmas" // the application expected that next vs last could be ambiguous, so this is understood correctly


Absolutely. We are working on something called "assumptions" where Duckling informs you about what assumptions it made to produce the result (like: time was ambiguous, I chose PM), and then you can change these and get a new result. Coming soon.


You're spot on. The next version of Duckling will return the list of assumptions made, as well as a list of alternate results. Applications will also be able to give assumptions as input (past vs. future, etc.).


We just updated the demo. Now you can try "see you tomorrow evening at 6" and see the partial parse.


cf. my answer to another comment. Even though the demo website expects the whole input to be a time expression, Duckling -- used as a library -- actually detects substrings inside a larger block of text.

EDIT regarding 6pm vs 6am, applications will have control over the assumptions made by the system. See https://news.ycombinator.com/item?id=8397113


This project looks great. I think it's a great example of finding one thing to do well, and doing it well (though of course, there are the other competitors you will have to catch up to, like SUTime, etc).

I also like that this project was attempted by the layman (no offense intended). I feel that a lot of academic projects have this "if you haven't been studying ngrams for 20 years don't bother" feel to them, and people don't seek to deeply understand, instead of just handwaving "somebody smart thought of this". That kind of thinking reduces new thought in a given field.

Will be using the library in my personal projects for sure, extra points for using Clojure (in my book), as I've been recently learning about it and getting into it.


Unfortunately it doesn't parse a Scottish colloquial expression such as "the next again day at 4pm". Which means in two days time at 4pm :)


I'm very excited about supporting Scottish colloquial expressions! Working on it now, thanks :)


Colloquialisms will pose tricky when they conflict. For example, "3pm next friday" is deciphered by duckling as Friday, 3 October 2014 at 15:00:00 +0000 (UTC) which would not be correct if talking colloquially where I grew up as "next" means not this one but the one after or, more generally, "the one next week not the one this week".

This is probably not the case everywhere, which is why duckling uses next and this interchangeably.


Likewise it doesn't know what a fortnight is.


Hell, I don't know what a fortnight is either!


From 'fourteen nights' - two weeks


Another reason to know what fortnight means is that it enables you to use the word "fortnightly" instead of "bi-weekly." With the latter I never know if the speaker meant "twice a week" or "every two weeks" without a follow-up question.


will add it soon. Thanks


It also doesn't understand basic britishisms like "Friday week" or "a week on Friday"


Perl modules to implement this would be Date::Manip and DateTime::Format::Natural. To see some funnier modules (like calculating Discordian dates or Japanese eras) look here: http://www.perl.com/pub/2003/03/13/datetime.html

I actually bumped up against legacy time/date issues while working on SSL cert parsing. An old Perl interpreter's 32-bit limits kept resetting my dates! Rather than upgrade perl or my architecture, I wrote my own perl methods to calculate infinite time (sorta?) on 32-bit systems with old perls.

For those that haven't worked with date parsing before: timezones are surprisingly complex, leap years are stupid, daylight savings is really stupid, and leap seconds are impossible without a regularly updated leap second database (similar to timezones, but worse). (The math to calculate dates correctly is rather simple, but you need to be pretty good at math to optimize it) https://github.com/psypete/public-bin/blob/public-bin/src/ne...


Nice tool!

Feedback: I shared it to a friend and his reaction was "bah, it doesn't even work with the example suggested".

Meaning, he saw the placeholder and pressed enter.


Yes, the placeholder is misleading in that it looks like it's been entered into the textbox already and I only need to click "Try me!", but actually I need to type it out first.

It could at least detect that the input is empty instead of saying it failed to parse the input!


You're right, fixing it now.


Fixed, thanks!


Really cool project. Thanks for sharing the source.

As an aside, I noticed it was renamed from "Picsou" (https://github.com/wit-ai/duckling/commit/0d9f666ae4da114803...)

Were you worried about getting scrooged by Disney? :)


Ha! I was waiting for somebody to ask :)

The original name was Picsou (Uncle Scrooge's name in French) because the parsing strategy is super greedy. We liked the name, but when we decided to open source it we thought it may be hard to pronounce, so we switched to Duckling (keeping the duck link...).


If you're having a big party

   tomorrow at three thirty people are coming over
you may be a half hour late.


Haha, nice example! If you want to do this you'd better use our full platform Wit.ai [1], that will arbitrate competition between the datetime entity (`tomorrow at three`) and the number_of_person entity!

[1] https://wit.ai


As a human, without a colon that thing is ambiguous.

It can be:

    tomorrow at three thirty, people are coming over
or: tomorrow at three, thirty people are coming over


Totally. I guess that was my point, that it's completely ambiguous, so I was curious to see which interpretation the algorithm picked.


Looking forward to time zones support in something like this. Parsing phrases like "4 o'clock tomorrow my time" or "8am on the East Coast" would be useful. Few libraries that try to guess time zones do this well (they just assume you mean whatever your device is localized to at best).


I have been frantically researching for a temporal tagging library that can be used on an Android application with no good results.

I have looked at SUTime, HeidelTime, natty and some others. I am trying to parse (among others) expressions of the type "the first week of the previous month", "The last week of September". The only library that can parse this type of query is SUTime.

Can you comment on why you implemented a home grown solution instead of using SUTime or some other library readily available. Have you measured the performance of Duckling vs the state of the art in temporal tagging ?

Duckling seems very well made with good docs but unfortunately for me will be hard to make work on Android.


SUTime is very good, like all the StanfordNLP stuff. We chose to do Duckling because:

- To my knowledge SUTime only supports English

- We wanted something that's easy to extend. SUTime is somewhat hard to extend, especially if you are not into Java

- We needed not just temporal expressions, but also monetary data, temperatures, quantities...

That being said, Duckling is still young and certainly not as proven as SUTime yet.


This is cool! I too was pretty impressed it nailed "the second friday of october 2017", was kind of hoping it would get "the second friday of october in 4 years" but still cool


This looks amazing! I wish it would parse ISO8601 times, though.


If you want to do this in Ruby try Chronic. https://github.com/mojombo/chronic


Your "try me" text box should say "eg. tomorrow at 6am".

"ie." means "that is" (as in "restating...")

"eg." means "example".


Fixed, thanks.


Suggestion: if you want people using the demo page to be able to spot errors easily, it would be useful to give a plain english description of when the time they specified is. For example, if I enter "quarter of six", you could parse it in my local timezone and spit back a piece of text like, "that is a little more than 2 hours from now, or: Thursday, 2 October 2014 at 17:45:00 -0700 (PST)".


Along somewhat similar lines, my own date/time-parsing library for golang: https://github.com/bcampbell/fuzzytime

I wrote it to parse dates and times in news articles and blog posts. Still a work-in-progress, but someone might find it useful!


Wow, we've tried to solve this problem inhouse and our results are much worse than this. One question, how hard is to detect that kind of expresions inside a random text? Like gmail does for suggesting a calendar appoinment within an email.


Even though the demo website expects the whole input to be a time expression, Duckling actually detects substrings inside a larger block of text.

We've mostly used it on short sentences, but it should work on larger inputs, like articles. I'd recommend splitting very large inputs into sentences though.


Demo site updated, now shows partial parses.


Great! I'm sure we will use it soon. Thanks!


It seems to have problems with places as time zones, e.g. next sunday noon, german time


Indeed, that's not supported yet, good catch. Will do.


Was hoping it would parse "tomorrow's yesterday" as today.


"the day before tomorrow" works though


This if off topic but how much does Wit use Clojure?


Probably too much :) All our backend is Clojure, all our new web developments are in ClojureScript (with React and Om). The only places we're not using Clojure are iOS, Android and Raspberry/embedded linux. We're using Rust more and more for the latter.


Wow this is very interesting. I have been following Wit's progress since I am quite a geek for all kinds of automation and AI stuff.

Would it be possible to port this into JavaScript using ClojureScript and use it on the client side?


Actually we are planning to port it to a language suitable for embedded use. Maybe Rust?


I'd love to hear about your experiences using Rust on embedded devices, and I know the Rust devteam would as well.


This is our first experiment with Rust on Raspberry Pi: https://wit.ai/blog/2014/09/12/office-automation-with-raspbe...

We'll share more as we progress.


Who wants to use this to make an Alfred Workflow that creates Google Calendar events? :) QuickCal hasn't been supported in years.


hah! 1/2/2014 is January 2nd. Take that, Europeans.

Unless you are checking my IP address to guess the best convention...


This is just a demo website, but in a real use case you would pass a context (for instance with the user timezone, country...) with your library calls.

You can also try the same sentence in French, you'll get February first.


Are there countries that use slashes for d/m/y?

What surprised me was "1-2-2014":

  From Thursday, 2 October 2014 at 1:02:00 +0000 (UTC)
  to Wednesday, 1 January 2014 at 0:00:00 +0000 (UTC)
On top of the "where did it get those timestamps from", time flows backwards in that interval.


> Are there countries that use slashes for d/m/y?

Oh yes, loads of them. Lots more than use m/d/y anyway. See https://en.wikipedia.org/wiki/Date_format_by_country

Canada looks the most hellish, eg: "Immigration Canada Stamps use DD/MM/YYYY and Canada Customs Stamps use MM/DD/YYYY." eek!


Looks very cool, but sadly I have no way to use Clojure on my iOS apps.


I hit "try me" button, but it does not work, nothing happens.


Hey sorry little downtime on the demo site :) it's back now




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: