Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML - trafilatura/tests/realworld_tests.py at eb37cf181b6189bd1d2df89a7d11b5f98f6993b9 · purin-blog/trafilatura