This is the characteristic for plain BPE: it is based solely on distribution, meaning it does not have knowledge of which bytes can form a valid Unicode codepoint, character, or meaningful word.The byproduct is that text may be sub-tokenized differently in different contexts, even for words ...