Maybe record and save these AI NSFW responses then intercept the request and spit out of the saved responses once you have enough of them stored. This would save from behind banned from the API.
My hunch is that they’re too contextually dependent upon the conversation / tokens that have come before. It’s not simply a “if user inputs A, output B” situation.