Skip to content

vyvojer/scrapy-async-selenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Async Selenium middleware for Scrapy

Middleware allows to use multiple Selenium instances asynchronously and simultaneously.

Install

git clone git@github.com:vyvojer/scrapy-async-selenium.git
cd scrapy-async-selenium
python setup.py install --user

Using

settings.py

DOWNLOADER_MIDDLEWARES = {
    'scrapy_async_selenium.middlewares.AsyncSeleniumMiddleware': 3, 
}

# Maximum number of selenium instances
SELENIUM_POOL_SIZE = 5

SeleniumRequest

def start_requests(self):
    url = "http://quotes.toscrape.com/js/page/{}/"
    for page in range(1, 11):
        yield SeleniumRequest(url=url.format(page), callback=self.parse)

Interacting with page

You can send some function, that will be running after page loading. The function must have one required parameter of the selenium.webdriver.remote.webdriver.WebDriver type.

Send the function as driver_func parameter of SeleniumRequest. Send args and kwargs for the function as driver_func_args and driver_func_kwargs of SeleniumRequest.

# Fill search field of 'python.org', press 'Enter', wait for results
def search(self, webdriver: WebDriver, search: str):
    input = webdriver.find_element_by_xpath("//input[@id='id-search-field']")
    input.clear()
    input.send_keys(search)
    input.send_keys(Keys.RETURN)
    WebDriverWait(webdriver, 15).until(
        EC.visibility_of_element_located((By.XPATH, "//h3[contains(text(),'Results')]")))
        
# Open 'python.org' with self.search function
def start_requests(self):
    url = 'https://www.python.org'
            yield SeleniumRequest(callback=self.parse_search,
                          driver_func=self.search,
                          driver_func_args=('python',))

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages