Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML - trafilatura/tests/eval/arsnova.thm.de.frag.html at 2e333bb7916df5a266ce9e3222bb79dac4f3327b · purin-blog/trafila