Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML - trafilatura/tests/eval/elpais.com.ciencia.html at b6808306670d3ec30cd2ac6591decac10f705fce · purin-blog/trafilat