Skip to content

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules

License

Notifications You must be signed in to change notification settings

Live-Lyrics/chopper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

axe Chopper

pypi travis coveralls

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.

Compatible with Python >= 2.6, <= 3.4

Installation

pip install chopper

Full documentation

http://chopper.readthedocs.org/en/latest/

Quick start

from chopper.extractor import Extractor

HTML = """
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <div id="header"></div>
    <div id="main">
      <div class="iwantthis">
        HELLO WORLD
        <a href="/nope">Do not want</a>
      </div>
    </div>
    <div id="footer"></div>
  </body>
</html>
"""

CSS = """
div { border: 1px solid black; }
div#main { color: blue; }
div.iwantthis { background-color: red; }
a { color: green; }
div#footer { border-top: 2px solid red; }
"""

extractor = Extractor.keep('//div[@class="iwantthis"]').discard('//a')
html, css = extractor.extract(HTML, CSS)

The result is :

>>> html
"""
<html>
  <body>
    <div id="main">
      <div class="iwantthis">
        HELLO WORLD
      </div>
    </div>
  </body>
</html>"""

>>> css
"""
div{border:1px solid black;}
div#main{color:blue;}
div.iwantthis{background-color:red;}
"""

About

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.8%
  • Makefile 0.2%