python-goose - HTML Content/Article Extractor. python-readability - Fast Python port of arc90's readability tool. sanitize - Bringing sanity to world of messed-up data. sumy - A module for automatic summarization of text documents and HTML pages. textract - Extract text from any document, Wo...