kaggle-allstate

Code for http://www.kaggle.com/c/allstate-purchase-prediction-challenge/data

Some possible approaches:

Plan combinations appear to follow a power distribution, so focus accuracy on most common combinations
Some plan options appear to be correlated. Maybe use some option choices to predict others?
Stratified model - Build a model for each group of plans that have the same number of shopping points
Build a model to predict which shopping point to use as the prediction instead of predicting the plan
The last observed plan tended to predict much lower occurence of 'common' plan combinations. Maybe replace LOP with 'close' (as measured by a distance along the lines of string similarity, like the levenstein distance) alternatives that occur more frequently if the LOP is sufficiently rare? (I tried to pick the most common plan a user looked at, but that underperformed LOP)
After skimming over cases that LOP failed on(although I didn't do the opposite; might get around to it), there might be a pattern to it: user looks at, say, 111111, and then 111112, which is more expensive, but picks 11111. -In that vein, maybe we should have been trying to learn general patterns: i.e., both [123 123 124] [124] and [124 124 123] [123] are identical patterns(where the second [] represents the chosen option. We would need to keep customer specific map to a element in the pattern.
-If I were to continue, I'd probably just take the LOP and try to boost its predictions. Some questions to explore:
How large is the variation in error rates for each plan feature?

2014-04-19 07:44:21 - classes - INFO - Feature A, score 0.879660650043 2014-04-19 07:44:21 - classes - INFO - Feature B, score 0.889536022431 2014-04-19 07:44:22 - classes - INFO - Feature C, score 0.873382882001 2014-04-19 07:44:22 - classes - INFO - Feature D, score 0.905060355225 2014-04-19 07:44:22 - classes - INFO - Feature E, score 0.891247203868 2014-04-19 07:44:22 - classes - INFO - Feature F, score 0.875887804224 2014-04-19 07:44:22 - classes - INFO - Feature G, score 0.80510055768

So it seems like each individual column is pretty accurate on its own, but the accuracy drops substantially when all the features are considered.

G does seem to have a bit of a lower accuracy
Does the accuracy of the last observed plan always get higher with more shopping points? In other words, how common is it that a person looks at a plan, but instead buys a plan they viewed earlier?

Seems like the more plans that people look at, the more likely they're to choose the last plan they looked at. purchase at 3rd point has 49% accuracy, up to 73% accuracy for purchase at 12th point. Slight dropoff at 13th point to 66%
How common is it for users' information to change (e.g. car value, etc.)
How condensed are the interactions in time? Do interactions all tend to happen on the same day, or on multiple days?
How frequently do users' viewed plans change? Shopping points for a certain user can have repeated plan selections.

-rollup.py creates lagged versions of all variables; it's kind of slow.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.gitignore		.gitignore
GBC.ipynb		GBC.ipynb
LastPredictionForwardtests.ipynb		LastPredictionForwardtests.ipynb
NB_test.ipynb		NB_test.ipynb
README.md		README.md
allfeatures_001.py		allfeatures_001.py
benchmark_001.py		benchmark_001.py
build_new_sample.ipynb		build_new_sample.ipynb
classes.py		classes.py
classify.py		classify.py
explore.ipynb		explore.ipynb
featurewise_001.py		featurewise_001.py
guess_truncation.ipynb		guess_truncation.ipynb
rollup_train.py		rollup_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

GBC.ipynb

GBC.ipynb

LastPredictionForwardtests.ipynb

LastPredictionForwardtests.ipynb

NB_test.ipynb

NB_test.ipynb

README.md

README.md

allfeatures_001.py

allfeatures_001.py

benchmark_001.py

benchmark_001.py

build_new_sample.ipynb

build_new_sample.ipynb

classes.py

classes.py

classify.py

classify.py

explore.ipynb

explore.ipynb

featurewise_001.py

featurewise_001.py

guess_truncation.ipynb

guess_truncation.ipynb

rollup_train.py

rollup_train.py

Repository files navigation

kaggle-allstate

About

Releases

Packages

Contributors 2

Languages

hxu/kaggle-allstate

Folders and files

Latest commit

History

Repository files navigation

kaggle-allstate

About

Resources

Stars

Watchers

Forks

Languages