I´m loving the Faster CPython project. Just for a reference, I have a project originally written in (very optimized) Python, that has a Rust module for the demanding path. The Rust version was approximately 150% faster in Python 3.10. In Python 3.11 the gap reduced to 100% faster. This is something incredible as I would prefer to keep it all in Python.
Have you tried running it in pypy? Or using @njit in numba? I've found both to be faster than cpython. Of course there's also cython if you're brave but that tends to be a lot more work.