Keywordspice

This is a Python implementation of "Oyama, Kokubo, and Ishida: Domain-Specific Web Search with Keyword Spices, TKDE 2004".

Requirements

Numpy

Preparation

Please first prepare labeled data and separate it into training data and validation data.

Labeled data is a set of documents with a positive/negative label. For example, you can download a hundred of documents, and label them as either recipe-related or not when you want to develop a recipe search engine.

Training data is a part of labeled data and used for training a decision tree, while validation data is the rest of it and used for refining the trained tree. See (Oayama+, TKDE2004) for the details.

Both of the data must be stored in different files, and each line in the files must be of the following format:

<ID> <Label> <Document>

where <ID> is a unique identifier in the labeled data, <Label> is either 1 (positive) or 0 (negative), and <Document> is a list of words separated by whitespaces. Note that variables should be separated by TAB, and <Document> should include only pre-processed words (e.g. stopwords are excluded).

Usage

Run keywordspice.py after preparing trainding and validation data. The usage of keywordspice.py is shown below:

keywordspice.py train_filepath valid_filepath

where train_filepath is a filepath for training data, while valid_filepath is a filepath for validation data.

Then, you will find a keyword spice in the stdout.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md
__init__.py		__init__.py
algo.py		algo.py
data.py		data.py
decision_tree.py		decision_tree.py
decision_tree_refiner.py		decision_tree_refiner.py
fmeasure.py		fmeasure.py
keywordspice.py		keywordspice.py
show.py		show.py
test_decision_tree.py		test_decision_tree.py
test_decision_tree_refiner.py		test_decision_tree_refiner.py
test_keywordspice.py		test_keywordspice.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

README.md

init.py

init.py

algo.py

algo.py

data.py

data.py

decision_tree.py

decision_tree.py

decision_tree_refiner.py

decision_tree_refiner.py

fmeasure.py

fmeasure.py

keywordspice.py

keywordspice.py

show.py

show.py

test_decision_tree.py

test_decision_tree.py

test_decision_tree_refiner.py

test_decision_tree_refiner.py

test_keywordspice.py

test_keywordspice.py

Repository files navigation

Keywordspice

Requirements

Preparation

Usage

About

Releases

Packages

Languages

mpkato/keywordspice

Folders and files

Latest commit

History

Repository files navigation

Keywordspice

Requirements

Preparation

Usage

About

Resources

Stars

Watchers

Forks

Languages