Due to the orthographic characteristics of Urdu text, such as the optional use of diacritics and the ambiguity in word boundaries, two additional tasks namely: diacritic omission, and word boundary identification, are added in the text cleaning process. In Urdu, diacritics are optional, and their...