Python TermModel.get_term_doc_map示例

编程语言: Python

命名空间/包名称: models.terms

类/类型: TermModel

方法/功能: get_term_doc_map

hotexamples.com的示例: 1

Python TermModel.get_term_doc_map - 已找到1个示例。这些是从开源项目中提取的最受好评的models.terms.TermModel.get_term_doc_map现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

get_word_id(2)

get_term_doc_map(1)

update_doclist(1)

示例#1

显示文件

文件： base.py 项目： ianlivingstone/csci4141-project

    def _get_documents (self, tokens):

        """
        Returns a list of unique documents that are sorted by their score
        from highest to lowest.
        """
        
        # Transform our list of token pairs (context_id, term_id) into a
        # map of term_id -> context_id. We do this for optimizations
        # further down the road.
        token_map = dict()
        for context_id, term_id in tokens:
            token_map[term_id] = context_id

        # Retrieve our dictionary of Term -> [(doc,context), ...] from the
        # database
        start_time = time.time()
        term_doc_map = TermModel.get_term_doc_map(token_map.keys())
        logging.debug('Took %.4fs to retrieve data structure for %d terms' % (
            time.time() - start_time,
            len(term_doc_map)
        ))

        # Once we have our dictionary mapping we need to group based on
        # document id's and their terms. We will build a dictionary mapping
        # document id to a list of terms and the context in which those
        # terms occur. map[DOC_ID] = [[context_id, term_id], ...]
        #
        # At this point we will also remove all document ids if given
        # contexts are specified in our token list. This will handle the
        # case were a user has specified that a term must occur in a
        # certain context.
        start_time = time.time()
        doc_term_map = self._organize(token_map, term_doc_map)
        logging.debug('Took %.4fs to rearrange data into structure for %d docs' % (
            time.time() - start_time,
            len(doc_term_map)
        ))
        
        # Recall document data for each document relevant to our query,
        # then we build a map structure for mapping doc_id -> doc_data
        start_time = time.time()
        docs = self._retrieve_documents(doc_term_map)
        logging.debug('Took %.4fs to retrieve %d document data from database' % (
            time.time() - start_time,
            len(docs)
        ))
        return docs