Python extractFirst 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: bibcrawl.utils.parsing

메소드/함수: extractFirst

hotexamples.com에서의 예제들: 2

Python extractFirst - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 bibcrawl.utils.parsing.extractFirst에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: renderjavascript.py 프로젝트: BlogForever/crawler

def extractComments(driver, commentXP, contentXP, authorXP, publishedXP):
  """Generic procedure to extract comments from precomputed xPaths.

  @type  driver: selenium.webdriver.phantomjs.webdriver.WebDriver
  @param driver: the driver
  @type  commentXP: string
  @param commentXP: the xPath to a comment nodes
  @type  contentXP: string
  @param contentXP: the xPath to comment contents
  @type  authorXP: string
  @param authorXP: the xPath to comment authors
  @type  publishedXP: string
  @param publishedXP: the xPath to comment publication dates
  @rtype: tuple of CommentItem
  @return: the extracted comments
  """
  try:
    page = driver.find_element_by_xpath(".//body").get_attribute("innerHTML")
  except (ElementNotVisibleException, NoSuchElementException):
    return tuple()
  parentNodeXP = "./ancestor::" + commentXP[2:]
  getParentNode = lambda node: (node.xpath(parentNodeXP) + [None])[0]
  nodesMapComments = OrderedDict(imap(
    lambda node: (node, CommentItem(
      content=extractFirst(node, contentXP),
      author=extractFirst(node, authorXP),
      published=extractFirst(node, publishedXP),
      parent=getParentNode(node))),
    parseHTML(page).xpath(commentXP)))
  foreach(
    lambda cmmnt: cmmnt.__setattr__("parent", nodesMapComments[cmmnt.parent]),
    ifilter(lambda _: _.parent is not None, nodesMapComments.values()))
  return tuple(ifilter(lambda _: _.content, nodesMapComments.values()))

예제 #2

파일 보기

파일: contentextractor.py 프로젝트: BlogForever/crawler

  def __call__(self, parsedPage):
    """Extracts content from a page.

    @type  parsedPage: lxml.etree._Element
    @param parsedPage: the web page where content is extracted
    @rtype: tuple of strings
    @return: the extracted content
    """
    if self.needsRefresh:
      self._refresh()
    return tuple(imap(lambda _: extractFirst(parsedPage, _), self.xPaths))