ALwR

Active Learning with Rationales

This is the source code for the paper "Active Learning with Rationales for Text Classification" (Manali Sharma, Di Zhuang, Mustafa Bilgic), In North American Chapter of the Association for Computational Linguistics – Human Language Technologies, 2015

'active_learning_with_rationales.py' implements the code to start all the experiments. The file accepts the following command-line arguments to run various experiments:

-dataset: Dataset to be used for an experiment. The four datasets used in the paper are IMDB ('imdb') NOVA ('nova'), SRAA ('SRAA'), WvsH (20newsgroups; it must have 2 valid group names. The groups use in the paper are comp.os.ms-windows.misc comp.sys.ibm.pc.hardware)
-tfidf: If specified as true, performs tf-idf transformation of the dataset
-metric: The feature expert ranks features based on the specified metric. Currently supported options are: (i) Chi Squared statistic (chi2). This statistic is used in the paper. (i) Mutual Information (mutual_info) (iii) Ranking based on feature weights obtained by training an logistic regression with L1 regularization (L1)
-c: Penalty term for the L1 feature expert
-debug: If debug is ture, it enables debugging of the code
-trials: Number of trials to run for each experiment
-seed: Seed to the random number generator
-bootstrap: Number of documents to select randomly for bootstrapping the initial model
-balance: Ensures both classes starts with equal # of docs after bootstrapping
-budget: Budget (in terms of number of documents) for each experiment
-step_size: Number of documents to label at each iteration of active learning
-strategy: Active learning strategy to use for selecting documents. Currently supported active learning strategies include random sampling (RND), uncertainty sampling (UNC), uncertainty sampling strategy that prefers documents with conflicting rationales (UNC_PC) and uncertainty sampling strategy that prefers documents with no conflicting rationales (UNC_PNC)
-topk_unc: Number of uncertain documents to consider to differentiate between types of uncertainties
-w_o: The 'o' parameter in the paper. This is the weight of all features other than rationales
-w_r: The 'r' parameter in the paper. This is the weight of all rationale features for a document
-model_type: Type of classifier to be used. Currently supported options include logistic regression with L2 regularization (lrl2), logistic regression with L1 regularization (lrl1), Multinomial naive Bayes (mnb), support vector machines (svm_linear), Strategy presented in Melville et al 2009 paper (Melville_etal), Strategy presented in Zaidan et al 2007 paper (Zaidan_etal)
-alpha: Smooting parameter for the MultinomialNB instance model
-lr_C': Penalty term for the logistic regression classifier
-svm_C: Penalty term for the SVM classifier

PARAMETERS FOR THE APPROACH PRESENTED IN Melville etal PAPER

-Meville_etal_r: The 'r' parameter for the feature model in Melville et al 2009 paper
-instance_model_weight: Weight for the instance model in Melville et al 2009 paper. Note that weight for the feature model will be 1. - instance_model_weight')

PARAMETERS FOR THE APPROACH PRESENTED IN Zaidan etal PAPER

-Zaidan_etal_Ccontrast: Parameter Ccontrast in Zaidan et al 2007 paper
-Zaidan_etal_C: Parameter C in Zaidan et al 2007 paper
-Zaidan_etal_mu: Parameter Ccontrast in Zaidan et al 2007 paper
-file_tagThe additional tag you might want to give to the saved file

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
text_datasets/nova		text_datasets/nova
.gitignore		.gitignore
README.md		README.md
active_learning_with_rationales.py		active_learning_with_rationales.py
feature_expert.py		feature_expert.py
imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_all_batch-result.txt		imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_all_batch-result.txt
imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_averaged_batch-result.txt		imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_averaged_batch-result.txt
load_datasets.py		load_datasets.py
models.py		models.py
selection_strategies.py		selection_strategies.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text_datasets/nova

text_datasets/nova

.gitignore

.gitignore

README.md

README.md

active_learning_with_rationales.py

active_learning_with_rationales.py

feature_expert.py

feature_expert.py

imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_all_batch-result.txt

imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_all_batch-result.txt

imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_averaged_batch-result.txt

imdb_tfidfFalse__lrl2_RND_L1_alpha=1.000000_lr_C=1.000_SVM_C=1.000_w_r=1.000_w_o=1.000_averaged_batch-result.txt

load_datasets.py

load_datasets.py

models.py

models.py

selection_strategies.py

selection_strategies.py

Repository files navigation

ALwR

About

Releases

Packages

Languages

ricklentz/ALwR

Folders and files

Latest commit

History

Repository files navigation

ALwR

About

Resources

Stars

Watchers

Forks

Languages