Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML - trafilatura/tests/eval/bondyblog.fr.paris-8.html at 2956cc7df8952ff95f4cefbff5c9bb3e58f40c7a · purin-blog/trafil