Good question. The regex I tried is for extracting amounts in EUR and USD: / (?<...

dleeftink · on March 7, 2025

I'd imagine many nested named capturing groups may trip even the best automated system! I do like the solution though.

I would've probably approached it differently, trying to first get the 'inverted' match (i.e. ignore anything that isn't a currency-like pattern) and refine from there. A bit like this one I did a while back, to parse garbled strings that may occur after OCR [0]. I imagine the approach does not translate fully, because it's pattern extraction rather than validation.

[0]: https://observablehq.com/@dleeftink/never-go-nuts

janfoeh · on March 7, 2025

Thanks for sharing! I have to admit I do not have the necessary brain cycles to spare today, but OCR processing is indeed of interest to me, and I will take a more in-depth look in the upcoming days.

The idea of an exclusionary approach sounds interesting as well. I'll have to think about that a bit.

dleeftink · on March 8, 2025

Check out WordNinja in case regex doesn't cut it! [0]

[0]: https://github.com/keredson/wordninja

janfoeh · on March 8, 2025

Will do, thanks again!