I don't know how to get the redirect urls with scrapy-splash,can you help me? eg. http://xxx.xxx.xxx/1.php will redirect to http://xxx.xxx.xxx/index.php,how can I get http://xxx.xxx.xxx/index.php with scrapy-splash? Below is my code which can not get http://xxx.xxx.xxx...
I’ve scraped hundreds of sites, and I always use Scrapy. Scrapy is a popular Python web scraping framework. Compared to other Python scraping libraries, such as Beautiful Soup, Scrapy forces you to structure your code based on some best practices. In exchange, Scrapy takes care of concurrency...
(self):forurlinself.start_urls:yieldscrapy.Request(url=url, callback=self.parse,#endpoint='render.json', # optional; default is render.html#splash_url='<url>', # optional; overrides SPLASH_URL#slot_policy=scrapy_splash.SlotPolicy.PER_DOMAIN, # optional)defparse(self, response):try: json...
Note though, there are some websites that load their data using Javascript, in that case, you should userequests_htmllibrary instead, I've already made another script that makes some tweaks to the original one and handles Javascript rendering, check ithere. Alright, we're done! Here are som...