See the XSS cheat sheet and filter evasion guide, as an example of how regular-expression filters don't work, and why a safe whitelist parser-based sanitizer is the correct approach. See the Cleaner reference if you want to get a Document instead of a String return See the Whitelist ...