Python XPathExtractor.XPathExtractor примеры использования

Язык программирования: Python

Пространство имен/Пакет: crawley.extractors

Класс/Тип: XPathExtractor

Метод/Функция: XPathExtractor

Примеров на hotexamples.com: 4

Python XPathExtractor.XPathExtractor - 4 примера найдено. Это лучшие примеры Python кода для crawley.extractors.XPathExtractor.XPathExtractor, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Основные методы

Показать Скрыть

XPathExtractor(4)

xpath(3)

getroot(2)

Основные методы

XPathExtractor (4)

xpath (3)

getroot (2)

Пример #1

Показать файл

Файл: urls.py Проект: wgfi110/crawley

    def search_regulars(self):
        """
            Search urls inside the <A> tags
        """

        urls = set()

        tree = XPathExtractor().get_object(self.response.raw_html)

        for link_tag in tree.xpath("//a"):

            if not 'href' in link_tag.attrib:
                continue

            url = link_tag.attrib["href"]

            if not urlparse.urlparse(url).netloc:

                url = self._fix_url(url)

            url = self._normalize_url(url)

            urls.add(url)

        return urls

Пример #2

Показать файл

Файл: shell.py Проект: wgfi110/crawley

    def execute(self):

        try:
            import IPython
        except ImportError:
            exit_with_error("Please install the ipython console")

        url = self.args[0]
        crawler = BaseCrawler()

        response = crawler._get_response(url)
        html = XPathExtractor().get_object(response)

        shell = IPython.Shell.IPShellEmbed(argv=[],
                                           user_ns={'response': response})
        shell()

Пример #3

Показать файл

Файл: browser.py Проект: wgfi110/crawley

    def _highlight_nodes(self, html, nodes):
        """
            Highlights the nodes selected by the user in the current page
        """

        html_tree = XPathExtractor().get_object(html)

        for xpath in nodes:

            tags = html_tree.xpath(xpath)

            if tags:

                tag = tags[0]

                classes = tag.attrib.get("class", "")
                classes = "%s %s" % (classes, SELECTED_CLASS)
                tag.attrib["class"] = classes.strip()
                tag.attrib["id"] = xpath

        return etree.tostring(html_tree.getroot(), pretty_print=True, method="html")

Пример #4

Показать файл

    def __init__(self, url_regex, url, html):

        self._url_regex = url_regex
        self.url = url
        self.html_tree = XPathExtractor().get_object(html)