If you just want to segment larger blocks of text into tokens you can try the segment library (it implements the word boundary portion of unicode annex 29):
If you need more manipulation of tokens after segmentation/tokenization, you could look at the analysis sub-package of bleve. Its intended to be able to be used indepenently of the rest of the library.
https://github.com/blevesearch/segment
If you need more manipulation of tokens after segmentation/tokenization, you could look at the analysis sub-package of bleve. Its intended to be able to be used indepenently of the rest of the library.
https://github.com/blevesearch/bleve