> Intervals (like June 12-13)
I wrote a Python library to do things like
>>> r = Recurrence('June-July 2014').intersect(Recurrence('Monday to Friday'))
>>> datetime.date(2014, 6, 13) in r
True
>>> datetime.date(2014, 6, 14) in r
False
If anyone is interested in hacking on it with me, send me an email and I'll try to get it up on github.
On iOS, "tomorrow at 6" is parsed as 6PM instead of 6AM. This makes sense because usually people really mean 6PM. This is context dependent--in chat logs etc this is desirable.
Semantically speaking, the Duckling library does the right thing by parsing it at 6AM, but if the goal is ultimately to parse human expressions, then the iOS approach is probably better.
EDIT:
Another issue I ran into is that it correctly parses:
"tomorrow evening at 6"
but fails with:
"see you tomorrow evening at 6"
It would be nice to pass it the entire sentence since that's how most people will intend to use it.
I think the best way to handle ambiguities like this is to present the user with an unambiguous result that highlights the choice that it made, but then to also provide a list of possible ambiguities to the application itself so it knows what the most likely corrections would be.
From the Limitations section:
> ... we only display the closest upcoming time, if any, or the closest past time otherwise. It can result in surprising outcomes, like “one year after Christmas” will be actually analyzed as “one year after last Christmas”
So this could be the interaction:
> User: "one year after Christmas"
> Computer: "OK, one year after last Christmas" // putting emphasis on what could be ambiguous
> User: "no, after next Christmas" // the application expected that next vs last could be ambiguous, so this is understood correctly
Absolutely. We are working on something called "assumptions" where Duckling informs you about what assumptions it made to produce the result (like: time was ambiguous, I chose PM), and then you can change these and get a new result. Coming soon.
You're spot on.
The next version of Duckling will return the list of assumptions made, as well as a list of alternate results. Applications will also be able to give assumptions as input (past vs. future, etc.).
cf. my answer to another comment.
Even though the demo website expects the whole input to be a time expression, Duckling -- used as a library -- actually detects substrings inside a larger block of text.
This project looks great. I think it's a great example of finding one thing to do well, and doing it well (though of course, there are the other competitors you will have to catch up to, like SUTime, etc).
I also like that this project was attempted by the layman (no offense intended). I feel that a lot of academic projects have this "if you haven't been studying ngrams for 20 years don't bother" feel to them, and people don't seek to deeply understand, instead of just handwaving "somebody smart thought of this". That kind of thinking reduces new thought in a given field.
Will be using the library in my personal projects for sure, extra points for using Clojure (in my book), as I've been recently learning about it and getting into it.
Colloquialisms will pose tricky when they conflict. For example, "3pm next friday" is deciphered by duckling as Friday, 3 October 2014 at 15:00:00 +0000 (UTC) which would not be correct if talking colloquially where I grew up as "next" means not this one but the one after or, more generally, "the one next week not the one this week".
This is probably not the case everywhere, which is why duckling uses next and this interchangeably.
Another reason to know what fortnight means is that it enables you to use the word "fortnightly" instead of "bi-weekly." With the latter I never know if the speaker meant "twice a week" or "every two weeks" without a follow-up question.
Perl modules to implement this would be Date::Manip and DateTime::Format::Natural. To see some funnier modules (like calculating Discordian dates or Japanese eras) look here: http://www.perl.com/pub/2003/03/13/datetime.html
I actually bumped up against legacy time/date issues while working on SSL cert parsing. An old Perl interpreter's 32-bit limits kept resetting my dates! Rather than upgrade perl or my architecture, I wrote my own perl methods to calculate infinite time (sorta?) on 32-bit systems with old perls.
For those that haven't worked with date parsing before: timezones are surprisingly complex, leap years are stupid, daylight savings is really stupid, and leap seconds are impossible without a regularly updated leap second database (similar to timezones, but worse). (The math to calculate dates correctly is rather simple, but you need to be pretty good at math to optimize it) https://github.com/psypete/public-bin/blob/public-bin/src/ne...
Yes, the placeholder is misleading in that it looks like it's been entered into the textbox already and I only need to click "Try me!", but actually I need to type it out first.
It could at least detect that the input is empty instead of saying it failed to parse the input!
The original name was Picsou (Uncle Scrooge's name in French) because the parsing strategy is super greedy. We liked the name, but when we decided to open source it we thought it may be hard to pronounce, so we switched to Duckling (keeping the duck link...).
Haha, nice example! If you want to do this you'd better use our full platform Wit.ai [1], that will arbitrate competition between the datetime entity (`tomorrow at three`) and the number_of_person entity!
Looking forward to time zones support in something like this. Parsing phrases like "4 o'clock tomorrow my time" or "8am on the East Coast" would be useful. Few libraries that try to guess time zones do this well (they just assume you mean whatever your device is localized to at best).
I have been frantically researching for a temporal tagging library that can be used on an Android application with no good results.
I have looked at SUTime, HeidelTime, natty and some others. I am trying to parse (among others) expressions of the type "the first week of the previous month", "The last week of September". The only library that can parse this type of query is SUTime.
Can you comment on why you implemented a home grown solution instead of using SUTime or some other library readily available. Have you measured the performance of Duckling vs the state of the art in temporal tagging ?
Duckling seems very well made with good docs but unfortunately for me will be hard to make work on Android.
This is cool! I too was pretty impressed it nailed "the second friday of october 2017", was kind of hoping it would get "the second friday of october in 4 years" but still cool
Suggestion: if you want people using the demo page to be able to spot errors easily, it would be useful to give a plain english description of when the time they specified is. For example, if I enter "quarter of six", you could parse it in my local timezone and spit back a piece of text like, "that is a little more than 2 hours from now, or: Thursday, 2 October 2014 at 17:45:00 -0700 (PST)".
Wow, we've tried to solve this problem inhouse and our results are much worse than this.
One question, how hard is to detect that kind of expresions inside a random text? Like gmail does for suggesting a calendar appoinment within an email.
Even though the demo website expects the whole input to be a time expression, Duckling actually detects substrings inside a larger block of text.
We've mostly used it on short sentences, but it should work on larger inputs, like articles. I'd recommend splitting very large inputs into sentences though.
Probably too much :) All our backend is Clojure, all our new web developments are in ClojureScript (with React and Om). The only places we're not using Clojure are iOS, Android and Raspberry/embedded linux. We're using Rust more and more for the latter.
This is just a demo website, but in a real use case you would pass a context (for instance with the user timezone, country...) with your library calls.
You can also try the same sentence in French, you'll get February first.