You can use Unicode code points for the shift table(s -- the Horspool variant ha...

_delirium · on Nov 28, 2013

> Of course, you can also just reduce the Unicode pattern to bytes

Ah, right. I was under the impression this was unsafe, since you could end up with spurious byte matches that are not on character boundaries. But it seems the keyword is "self synchronizing", and UTF-8 (but not UTF-16) is safe to do byte-oriented searching on.