Esempi in Python per XPathExtractor.XPathExtractor

Linguaggio di programmazione: Python

Spazio dei nomi/nome del pacchetto: crawley.extractors

Classe/tipologia: XPathExtractor

Metodo/funzione: XPathExtractor

Esempi su hotexamples.com: 4

XPathExtractor.XPathExtractor in Python: 4 esempi trovati. Questi sono i migliori esempi reali in Python per crawley.extractors.XPathExtractor.XPathExtractor, estratti da progetti open source. Li puoi valutare, per aiutarci a migliorare la qualità dei nostri esempi.

Metodi utilizzati di frequente

Mostra Nascondi

XPathExtractor(4)

xpath(3)

getroot(2)

Metodi utilizzati di frequente

XPathExtractor (4)

xpath (3)

getroot (2)

Esempio n. 1

Mostra file

File: urls.py Progetto: wgfi110/crawley

    def search_regulars(self):
        """
            Search urls inside the <A> tags
        """

        urls = set()

        tree = XPathExtractor().get_object(self.response.raw_html)

        for link_tag in tree.xpath("//a"):

            if not 'href' in link_tag.attrib:
                continue

            url = link_tag.attrib["href"]

            if not urlparse.urlparse(url).netloc:

                url = self._fix_url(url)

            url = self._normalize_url(url)

            urls.add(url)

        return urls

Esempio n. 2

Mostra file

File: shell.py Progetto: wgfi110/crawley

    def execute(self):

        try:
            import IPython
        except ImportError:
            exit_with_error("Please install the ipython console")

        url = self.args[0]
        crawler = BaseCrawler()

        response = crawler._get_response(url)
        html = XPathExtractor().get_object(response)

        shell = IPython.Shell.IPShellEmbed(argv=[],
                                           user_ns={'response': response})
        shell()

Esempio n. 3

Mostra file

File: browser.py Progetto: wgfi110/crawley

    def _highlight_nodes(self, html, nodes):
        """
            Highlights the nodes selected by the user in the current page
        """

        html_tree = XPathExtractor().get_object(html)

        for xpath in nodes:

            tags = html_tree.xpath(xpath)

            if tags:

                tag = tags[0]

                classes = tag.attrib.get("class", "")
                classes = "%s %s" % (classes, SELECTED_CLASS)
                tag.attrib["class"] = classes.strip()
                tag.attrib["id"] = xpath

        return etree.tostring(html_tree.getroot(), pretty_print=True, method="html")

Esempio n. 4

Mostra file

    def __init__(self, url_regex, url, html):

        self._url_regex = url_regex
        self.url = url
        self.html_tree = XPathExtractor().get_object(html)