> This is really the only way to get composability with regexes
If you restrict yourself to "real" regular expressions (avoid Perl extensions and such), they express regular languages[1], which do compose very well, and can be compiled into finite state machines.
Regular languages are closed not only under concatenation and Kleene star, but also:
- union
- intersection
- complement
- reverse
and more.[1]
Real FST libraries like HFST or OpenFST let you compile and compose regular expressions this way. It's too most "standard" regex libraries stop at "re.compile" and never implement concat/intersect/complement/reverse/union.
-----
For the readability issue, give rx[2] a go – you can turn plain regexes into rx syntax with xr[3] :)
> ... and can be compiled into finite state machines. ...
I am not a big fan of this argument without a 'but'. It is usually overlooked that the finite state machine can become exponentially large (=totally unusable), because sets of non-deterministic states are used for deterministic states. This caused me major trouble in a project where this effect was not noticed during the design phase.
I find this important to stress because the finite state machine can usually be constructed without problems (e.g. by the flex tool), so this property of regexps is undoubtedly useful.
That's deterministic finite state machines (DFA); the nondeterministic (NFA) representation is linear in regex length (unless you use "a{99}", but that's nonstandard for a reason). You can also do concat/intersect/etc directly on the (actually-regular) regex representaion directly, although it is a bit ugly.
Unfortunately, this is not the case as soon as you add grouping! To properly express regular expression with grouping, finite state automatons are not sufficient and you need the theory of transducers which does not admit the same properties (in particular regarding to determinisation).
> > Regular languages are closed not only under concatenation and Kleene star, but also:
> > - union
> Union doesn't belong in the bottom list. Disjunction is one of the regular operations, just like concatenation and Kleene star.
That seems to be what your parent said. Did you perhaps read "Regular languages are closed not only under …" as "Regular languages are not closed under"?
The parent comment says "closed not only under concatenation and Kleene star, but also under these other four operations". That divides the closure properties into two sets, obvious and nonobvious. The obvious ones are obvious because they are regular operations -- concatenation and Kleene star are operators defined by the language of regular expressions. Intersection isn't.
I'm pointing out that set union belongs in the obvious group and not the nonobvious group. Like concatenation and Kleene star, it is one of the operators used to define what a regular expression is.
To show that regular languages are closed under intersection / complementation / string reversal, you need to do a proof. The proof of closure under set union is just that it's part of the definition of regular expressions.
If you restrict yourself to "real" regular expressions (avoid Perl extensions and such), they express regular languages[1], which do compose very well, and can be compiled into finite state machines.
Regular languages are closed not only under concatenation and Kleene star, but also:
and more.[1]Real FST libraries like HFST or OpenFST let you compile and compose regular expressions this way. It's too most "standard" regex libraries stop at "re.compile" and never implement concat/intersect/complement/reverse/union.
-----
For the readability issue, give rx[2] a go – you can turn plain regexes into rx syntax with xr[3] :)
[1] https://en.wikipedia.org/wiki/Regular_language#Closure_prope...
[2] https://www.emacswiki.org/emacs/rx
[3] https://github.com/mattiase/xr