RecursiveCharacterTextSplitter (and others inheriting from TextSplitter) all support custom length functions, and even have the convenient from_hugginface_tokenizer/from_tiktoken_encoder classmethods. Issue I've noticed with those is that there's no way to take advantage of batch tokenization, ...