Exemple #1
0
def resizeiframe(elem, width=280):
    '''
    Resize an iframe to have a mobile-friendly width

    If elem contains (or is) an iframe element, set its width to a
    mobile-friendly value.  The height attribute, if present, is
    scaled appropriately to preserve the original aspect ratio. 

    This was originally created for the resizing of iframe-based
    embedded Youtube videos.

    TODO: see TODOs on resizeobject, which are mostly relevant to this filter

    '''
    iframe_elem = findonetag(elem, 'iframe')
    if iframe_elem is not None:
        setwidth(iframe_elem, width)
Exemple #2
0
def noimgsize(elem):
    '''
    Strip the height and width attributes from the first child img tag

    This filter searches for the first img in the element, and removes
    any sizing attributes.  This is useful if you have a large source
    image, and want to use a "width: 100%" trick in CSS to make it
    span any device.

    @param elem : Element representing an html tag
    @type  elem : lxml.html.HTMLElement
    
    '''
    img_elem = findonetag(elem, 'img')
    if img_elem is not None:
        for a in ('height', 'width'):
            if a in img_elem.attrib:
                del img_elem.attrib[a]
Exemple #3
0
def resizeobject(elem, width=280):
    '''
    Resize something embedded in an object tag to have a mobile-friendly width

    If elem contains (or is) a OBJECT element, set its width to a
    mobile-friendly value.  The height attribute, if present, is
    scaled appropriately to preserve the original aspect ratio. This
    is done for both the "object" tag, and also any "embed" tag that
    may be present inside.

    TODO: This will operate on only the first object; if there are
    several object elements within, those beyond the first will be
    ignored.  Best thing is probably to just find and operate on all
    of them.

    '''
    object_elem = findonetag(elem, 'object')
    if object_elem is not None:
        setwidth(object_elem, width)
        embed_elem = object_elem.find('.//embed')
        if embed_elem is not None:
            setwidth(embed_elem, width)
Exemple #4
0
def table2divgroupsgs(elem, specmapgen, omit_whitespace=True):
    '''
    Apply the table2divgroups filter with a dynamically generated spec map

    This filter is much like table2divgroups.  However, instead of
    taking a explicit spec map argument, table2divgroupsgs takes a
    callable that generates the spec map.  This callable, specmapgen,
    accepts a table element as its single argument, and returns a spec
    map.
    
    @param elem            : Element to operate on
    @type  elem            : lxml.html.HtmlElement

    @param specmapgen      : Callable that generates a spec map
    @type  specmapgen      : function: HtmlElemnt -> type(specmap)

    @param omit_whitespace : Whether to omit cells just containing content that would render as whitespace in the browser
    @type  omit_whitespace : bool
    
    '''
    table_elem = findonetag(elem, 'table')
    specmap = specmapgen(table_elem)
    return _table2divgroups(elem, table_elem, specmap, omit_whitespace)
Exemple #5
0
def table2divgroups(elem, specmap, omit_whitespace=True):
    '''
    Extract blocks arranged in a table grid as more semantic elements

    Table based layouts sometimes lead to a grid of elements
    semantically spanning some set of rows and columns.  This filter
    helps extract them into a clearer semantic organization.

    Let's try to make this concrete.  Consider this html:

    <table>
      <tbody>
        <tr>
          <td>CONTACT US</td>
          <td>&nbsp;</td>
          <td>&nbsp;</td>
          <td>&nbsp;</td>
        <tr>
          <td>123 Main Str</td>
          <td>&nbsp;</td>
          <td>OUR TEAM</td>
          <td>&nbsp;</td>
        <tr>
          <td>Springfield, IL</td>
          <td>&nbsp;</td>
          <td>Mike Smith</td>
          <td><img src="/mike-smith.jpg"/></td>
        <tr>
          <td>1-800-BUY-DUFF</td>
          <td>&nbsp;</td>
          <td>Jen Jones</td>
          <td><img src="/jen-jones.jpg"/></td>
        <tr>
          <td>&nbsp;</td>
          <td>&nbsp;</td>
          <td>Scruffy</td>
          <td><img src="/scruffy-the-dog.jpg"/></td>
        <tr>
      </tbody>
    </table>

    Schematically, this would render as something like this (with ___
    indicating a content-free TD cell):

    CONTACT US       ___  ___         ___
    123 Main Str     ___  OUR TEAM    ___
    Springfield, IL  ___  Mike Smith  <img src="/mike-smith.jpg"/>
    1-800-BUY-DUFF   ___  Jen Jones   <img src="/jen-jones.jpg"/>
    ___              ___  Scruffy     <img src="/scruffy-the-dog.jpg"/>

    There are two clear semantic elements here.  From a mobile design
    perspective, it would be great to parse them more like this:
    
    <div class="mwu-elem-table2divgroups-group" id="mwu-elem-contact">
      <div>CONTACT US</div>
      <div>123 Main Str</div>
      <div>Springfield, IL</div>
      <div>1-800-BUY-DUFF</div>
    </div>

    ... and:
    
    <div class="mwu-elem-table2divgroups-group" id="mwu-elem-ourteam">
      <div>
        <div>OUR TEAM</div>
      </div>
      <div>
        <div>Mike Smith</div>
        <div><img src="/mike-smith.jpg"/></div>
      </div>
      <div>
        <div>Jen Jones</div>
        <div><img src="/jen-jones.jpg"/></div>
      </div>
      <div>
        <div>Scruffy</div>
        <div><img src="/scruffy-the-dog.jpg"/></div>
      </div>
    </div>

    That's exactly what this filter can do.

    You'll need to specify what the semantic groups are, and how to
    extract them from a table grid.  The specmap argument is a list of
    Spec instances.  Each spec object defines a square of cells, from
    1 or more rows and 1 or more columns in the source table.  It also
    defines a DOM ID name (equivalent to 'mwu-elem-contact' and
    'mwu-elem-ourteam') above.  See the Spec class documentation for
    more details, but briefly, one way to define a group of cells is
    with these four numbers:
    
      (tr_start, td_start, tr_end, td_end)

    These integers are 0-based indices of the row and column.  So a
    specmap for the above would read:

    specmap = [
      Spec(idname('contact'), 0, 0, 3, 0)),
      Spec(idname('ourteam'), 1, 2, 4, 3)),
    ]

    By default, any TD cells that would render as whitespace in the
    browser are omitted. Set omit_whitespace=False if you don't want
    these cells discarded.

    TODO: make the above paragraph true even if a TD element contains, say, an empty SPAN

    If the extracted cells are one-dimensional (i.e. a single column
    or row), the group will be a list of DIVs (as in the "contact us"
    example). But if the cells extend over more than one row and
    column in the source table, they will be organized in divs by row,
    as in the "our team" example.

    @param elem            : Element to operate on
    @type  elem            : lxml.html.HtmlElement

    @param specmap            : Specification of what groups of cells to extract
    @type  specmap            : list of (key, value) tuples

    @param omit_whitespace : Whether to omit cells just containing content that would render as whitespace in the browser
    @type  omit_whitespace : bool
    
    '''
    table_elem = findonetag(elem, 'table')
    return _table2divgroups(elem, table_elem, specmap, omit_whitespace)