Python Element 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pattern.web

클래스/타입: Element

hotexamples.com에서의 예제들: 2

Python Element - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pattern.web.Element에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Element(15)

by_class(2)

by_tag(2)

startswith(1)

자주 사용되는 메소드들

Element (15)

by_class (2)

by_tag (2)

startswith (1)

예제 #1

파일 보기

파일: wikihow.py 프로젝트: christiaanw/theseeker

    def create_article(cls, title=None):
        page = cls.get_raw_wikihow_page(title=title) if title is not None \
               else cls.get_raw_wikihow_page() 

        title = Element(page)("h1.firstHeading a")[0].string
        if title.startswith("wiki"): title = title[4:]

        url = 'http://www.wikihow.com/{}'.format(title[7:].replace(' ', '-'))

        steps, errors = cls.get_steps(page)
        tips = cls.get_tips(page)

        return cls(url, title, steps, tips, errors)

예제 #2

파일 보기

파일: crawling - Dit is de oude code die dus niet werkt.py 프로젝트: thomasjurriaan/datapro

def scrape_movie_page(dom):
    """
    Scrape the IMDB page for a single movie

    Args:
        dom: pattern.web.DOM instance representing the page of 1 single
            movie.

    Returns:
        A list of strings representing the following (in order): title, year,
        duration, genre(s) (semicolon separated if several), director(s) 
        (semicolon separated if several), writer(s) (semicolon separated if
        several), actor(s) (semicolon separated if several), rating, number
        of ratings.

    For the following functions I imported Element from pattern.web. 
    This allowed me to make my code shorter than the constant use of for loops.
    Furthermore, the following code is based on CSS selectors. It uses them to
    extract the right parts of the downloaded HTML file.
    """

    # Title
    element = Element(dom)

    title = element.by_class("itemprop")[0].content
    # Duration
    duration = ""
    for e in dom.by_tag("div.infobar"):
        for a in e.by_tag("time"):
            duration = a.content.replace(" ", "").replace("min", "").replace("/n", "")
    # Genres
    genres = []
    e = dom.by_tag("div.infobar")[0]
    for genre in e.by_class("itemprop"):
        genres.append(genre.content)
    genres = ";".join(genres)

    # Directors
    directors = []
    e = element('div[itemprop="director"]')[0]
    for a in e.by_tag("span"):
        directors.append(a.content)
    directors = ";".join(directors)

    # Writers
    writers = []
    e = element('div[itemprop="creator"]')[0]
    for a in e.by_tag("span.itemprop"):
        writers.append(a.content)
    writers = ";".join(writers)
    # Actors
    actors = []
    actorscode = element('div[itemprop="actors"]')[0]
    for actor in actorscode.by_tag("span.itemprop"):
        actors.append(actor.content)
    actors = ";".join(actors)

    # Rating
    rating = element.by_class("titlePageSprite star-box-giga-star")[0].content.replace(" ", "")

    # Amount of raters
    n_ratings = element('span[itemprop="ratingCount"]')[0].content

    # Return everything of interest for this movie (all strings as specified
    # in the docstring of this function).
    return title, duration, genres, directors, writers, actors, rating, n_ratings