Esempi in Python per WikiTextReader

Linguaggio di programmazione: Python

Spazio dei nomi/nome del pacchetto: wikitextreader

Classe/tipologia: WikiTextReader

Esempi su hotexamples.com: 4

WikiTextReader in Python: 4 esempi trovati. Questi sono i migliori esempi reali in Python per wikitextreader.WikiTextReader, estratti da progetti open source. Li puoi valutare, per aiutarci a migliorare la qualità dei nostri esempi.

Metodi utilizzati di frequente

Mostra Nascondi

WikiTextReader(2)

readLinks(2)

Esempio n. 1

Mostra file

    def _findAssociation_ReadArticleFirst(self, articles, rankLimit=7):

        self.wiki = Wiki()
        allLinksMultiSet = {}

        wikiReader = WikiTextReader()

        for articleTitle in articles:
            content = self.wiki.getArticle(articleTitle)
            links = wikiReader.readLinks(articleTitle, content, 0, 0, 100000)
            onlyLinks = [link for (link, freq) in links]
            allLinksMultiSet[articleTitle] = collections.Counter(onlyLinks)

        return self._findSharedLinks(allLinksMultiSet, articles, rankLimit)

Esempio n. 2

Mostra file

File: wikicontroller.py Progetto: OccupyApollo/Apollo

    def _findAssociation_ReadArticleFirst(self,articles,rankLimit =7):

        self.wiki = Wiki()
        allLinksMultiSet = {}

        wikiReader = WikiTextReader()


        for articleTitle in articles:
            content = self.wiki.getArticle(articleTitle)
            links = wikiReader.readLinks(articleTitle,content,0,0,100000)
            onlyLinks = [link for (link,freq) in links]
            allLinksMultiSet[articleTitle] = collections.Counter(onlyLinks)

        return self._findSharedLinks(allLinksMultiSet,articles,rankLimit)

Esempio n. 3

Mostra file

    def getImportantLinks(self,
                          articleTitle,
                          selectionAlgorithm=SelectionAlgorithm.PageRank,
                          outputLimit=15):
        """Retrieves the most important links in an article based on a specified algorithm


        This is the function that retrieves and ranks items from wikipedia. This function always combines the results
        with a bag of words algorithm.


        The bag of words algorithm is run automatically when a wikiReader reads links. It goes through two steps of
        first identifying all links than selecting the most frequent of those links in the wikiText.


        Right now page ranks takes some time to finish but this should not be a problem. A Hadoop server with MapReduce
        and a sophisticated caching mechanisms along with an index database will significantly improve the speed to a
        matter of miliseconds.


        Input Parameters:
        articleTitle : The title of the article to retrieve and rank the links for
        selectionAlgorithm : The algorithm to use for ranking alongside bag of words
        outputLimit: This specifies how many links should be ranked and returned


        Returns:
        A list containing top links titles. (the number or links equals to the outputLimit input parameter passed in)
        """

        #Get article content
        self.wiki = Wiki()
        articleContent = self.wiki.getArticle(articleTitle)

        #Read all the links from the wikiText
        wikiReader = WikiTextReader()
        links = wikiReader.readLinks(articleTitle, articleContent)

        #Select the ranking algorithm and run it in the all links that are retrieved
        selectionAlg = getattr(self, "_selectLinks_%s" % selectionAlgorithm)
        return selectionAlg(links, outputLimit)

Esempio n. 4

Mostra file

File: wikicontroller.py Progetto: OccupyApollo/Apollo

    def getImportantLinks(self, articleTitle, selectionAlgorithm=SelectionAlgorithm.PageRank, outputLimit=15):
        """Retrieves the most important links in an article based on a specified algorithm


        This is the function that retrieves and ranks items from wikipedia. This function always combines the results
        with a bag of words algorithm.


        The bag of words algorithm is run automatically when a wikiReader reads links. It goes through two steps of
        first identifying all links than selecting the most frequent of those links in the wikiText.


        Right now page ranks takes some time to finish but this should not be a problem. A Hadoop server with MapReduce
        and a sophisticated caching mechanisms along with an index database will significantly improve the speed to a
        matter of miliseconds.


        Input Parameters:
        articleTitle : The title of the article to retrieve and rank the links for
        selectionAlgorithm : The algorithm to use for ranking alongside bag of words
        outputLimit: This specifies how many links should be ranked and returned


        Returns:
        A list containing top links titles. (the number or links equals to the outputLimit input parameter passed in)
        """

        #Get article content
        self.wiki = Wiki()
        articleContent = self.wiki.getArticle(articleTitle)

        #Read all the links from the wikiText
        wikiReader = WikiTextReader()
        links = wikiReader.readLinks(articleTitle,articleContent)

        #Select the ranking algorithm and run it in the all links that are retrieved
        selectionAlg = getattr(self, "_selectLinks_%s" % selectionAlgorithm)
        return selectionAlg(links, outputLimit)