Django community trained Bayesian inference based comment moderation app.
Contents
django-moderator
integrates Django's comments framework with SpamBayes to classify comments into one of four categories, ham, spam, reported or unsure, based on training by users (see Paul Graham's A Plan for Spam for some background).
Users classify comments as reported using a report abuse mechanic. Staff users can then classify these reported comments as ham or spam, thereby training the algorithm to automatically classify similarly worded comments in future. Additionally comments the algorithm fails to clearly classify as either ham or spam will be classified as unsure, allowing staff users to manually classify them as well via admin.
Comments classified as spam will have their is_removed
field set to True
and as such will no longer be visible in comment listings.
Comments reported by users will have their is_removed
field set to True
and as such will no longer be visible in comment listings.
Comments classified as ham or unsure will remain unchanged and as such will be visible in comment listings.
django-moderator
also implements a user friendly admin interface for efficiently moderating comments.
- Install or add
django-moderator
to your Python path. - Add
moderator
to yourINSTALLED_APPS
setting. - Configure
django-likes
as described here. Add a
MODERATOR
setting to your project'ssettings.py
file. This setting specifies what classifier storage backend to use (see below) and also classification thresholds:MODERATOR = { 'CLASSIFIER': 'moderator.storage.DjangoClassifier', 'HAM_CUTOFF': 0.3, 'SPAM_CUTOFF': 0.7, 'ABUSE_CUTOFF': 3, }
Specifically a
HAM_CUTOFF
value of0.3
as in this example specifies that any comment scoring less than0.3
during Bayesian inference will be classified as ham. ASPAM_CUTOFF
value of0.7
as in this example specifies that any comment scoring more than0.7
during Bayesian inference will be classified as spam. Anything between0.3
and0.7
will be classified as unsure, awaiting further manual staff user classification. Additionally anABUSE_CUTOFF
value of3
as in this example specifies that any comment receiving3
or more abuse reports will be classified as reported, awaiting further manual staff user classification.HAM_CUTOFF
,SPAM_CUTOFF
andABUSE_CUTOFF
can be ommited in which case the default cutoffs are0.3
,0.7
and3
respectively.- Optionally, if you want an additional moderate object tool on admin change views, configure
django-apptemplates
as described here , includemoderator
as anINSTALLED_APP
beforedjango.contrib.admin
and addmoderator.admin.AdminModeratorMixin
as a base class to those admin classes you want the tool available for.
By default all comments are classifed as they are created. You can however disable this behaviour by specifying
REALTIME_CLASSIFICATION
asFalse
, i.e.:MODERATOR = { ... 'REALTIME_CLASSIFICATION': False, ... }
By default moderator comment replies are posted chronologically after the comment being replied to. If however you need replies to be posted before the comment being replied to(for example if you display your comments reverse cronologically), you can specify
REPLY_BEFORE_COMMENT
asTrue
, i.e.:MODERATOR = { ... 'REPLY_BEFORE_COMMENT': True, ... }
django-moderator
includes two SpamBayes storage backends, moderator.storage.DjangoClassifier
and moderator.storage.RedisClassifier
respectively.
Note
moderator.storage.RedisClassifier
is recommended for production environments as it should be much faster than moderator.storage.DjangoClassifier
.
To use moderator.storage.RedisClassifier
as your classifier storage backend specify it in your MODERATOR
setting, i.e.:
MODERATOR = {
'CLASSIFIER': 'moderator.storage.RedisClassifier',
'CLASSIFIER_CONFIG': {
'host': 'localhost',
'port': 6379,
'db': 0,
'password': None,
},
'HAM_CUTOFF': 0.3,
'SPAM_CUTOFF': 0.7,
'ABUSE_CUTOFF': 3,
}
You can also create your own backends, in which case take note that the content of CLASSIFIER_CONFIG
will be passed as keyword agruments to your backend's __init__
method.
Once correctly configured you should use the traincommentclassifier
management command to train the Bayesian inference system using a sample of existing comment objects (comments with is_removed
as True
will be trained as spam, ham otherwise), i.e.:
$ ./manage.py traincommentclassifier
Note
The traincommentclassifier
command will remove/clear any existing classification data and start from scratch.
Then you can periodically use the classifycomments
management command to automatically classify comments as either ham, spam, reported or unsure based on user reports and previous training, i.e.:
$ ./manage.py classifycomments
Comments can be manually classified as either ham or spam via admin list view actions.