Python get_page_for_query Examples

Programming Language: Python

Namespace/Package Name: wikipedia

Method/Function: get_page_for_query

Examples at hotexamples.com: 2

Python get_page_for_query - 2 examples found. These are the top rated real world Python examples of wikipedia.get_page_for_query extracted from open source projects. You can rate examples to help us improve the quality of examples.

Example #1

Show file

File: app.py Project: hans/wikipedia-information-extractor

    def process(self, method, data):
        if method not in ['url', 'term']:
            raise ValueError("method must be one of 'url', 'term'")

        document = None
        if method == 'term':
            page_name = wikipedia.get_page_for_query(data)
            document = wikipedia.fetch_page((None, page_name, None))
        else:
            result = urlfetch.fetch(urllib.unquote(data), deadline=20)
            if result.status_code == 200:
                document = etree.fromstring(result.content, parser=self.parser)

        if document is None:
            self.error(500)

        # document_content = document.cssselect('#mw-content-text').text()

        # TODO: Extract terms
        # TODO: Extract definitions

        relevant_pages = wikipedia.get_relevant_pages(document)

        data = {
            # TODO
            'terms': [],
            'cards': [],
            'shmoop': [],
            'related': [{'url': wikipedia.page_name_to_link(page),
                         'name': page[1]} for page in relevant_pages]
        }

        return data

Example #2

Show file

File: wiki_analysis.py Project: hans/wikipedia-information-extractor

def get_definition(term, document=None):
    """Get a definition for a term. Caller can optionally provide the
    `lxml` document in which the term was first found (and thus where a
    definition may also be found)."""

    # First stab: try to fetch a relevant article
    page_name = wikipedia.get_page_for_query(term)
    document = wikipedia.fetch_page((None, page_name, None))

    if document is None:
        # TODO: Other extraction method
        return

    content_el = document.xpath('//div[@id="mw-content-text"]/p[1]')[0]

    text_content = etree.tostring(content_el, method='text',
                                  encoding='utf-8')
    text_content = PARENTHETICAL_EXPRESSION.sub(r'\1', text_content)

    # Grab first sentence.
    # TODO: Better sentence segmentation
    sentence = text_content.split('.')[0]

    return sentence.strip() + '.'