from scrapy.crawler import CrawlerRunner from myproject.spiders import Spider1, Spider2 runner = CrawlerRunner() runner.join([Spider1(), Spider2()])
from scrapy.crawler import CrawlerRunner from myproject.spiders import Spider1, Spider2 from scrapy.signalmanager import dispatcher from scrapy import signals def stop_crawl(): runner.stop() runner = CrawlerRunner() dispatcher.connect(stop_crawl, signal=signals.spider_closed) runner.join([Spider1(), Spider2()])
from scrapy.crawler import CrawlerRunner from myproject.spiders import Spider1, Spider2 runner = CrawlerRunner() runner.settings.set("FEED_FORMAT", "json") runner.join([Spider1(), Spider2()])In this example, we are using the CrawlerRunner to execute two spider instances as before. However, we have also passed in custom settings using the CrawlerRunner's settings method. Here, we are setting the feed format to JSON for both spider instances. In conclusion, the scrapy.crawler.CrawlerRunner class is a powerful tool for running multiple spider instances concurrently and then joining them together into a single pipeline. It is an essential package library in the Scrapy ecosystem for efficient and effective web scraping.