GitHub - durhamsm/Walmart_Weekly_Sales_Data_Mining: final project for data mining

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
.idea		.idea
Submissions		Submissions
data		data
pickled_data		pickled_data
__init__.py		__init__.py
constants.py		constants.py
correlation_checker.py		correlation_checker.py
main.py		main.py
oop_objects.py		oop_objects.py
predictions_set_object.py		predictions_set_object.py
project_report.pdf		project_report.pdf
read_data_functions.py		read_data_functions.py
readme.txt		readme.txt
regression.py		regression.py
sales_mapper.py		sales_mapper.py
time_series_analysis.py		time_series_analysis.py
utilities.py		utilities.py

Repository files navigation

-------------------

General Approach:

We used the "pickle" utility of Python to "save" our progress throughout the analysis. This allowed us to perform time-intensive
analyses, save the results to a pickle file (serialization method), and then quickly reload those pickled results to proceed
with further steps in the analysis process. We used Pandas dataframes and OOP techniques to manipulate, perform calculations,
and organize the data and intermediate results.

-------------------

Some Details of Objects and Methods:

The "Store" object within the sams_work/oop_objects.py file is used for storing the data of the historical data. A list of 45 instances
of this object is used to hold the historical store data for all of the stores. Throughout the code, this list is often
referred to as "historicalStores" or just "stores." The "FutureStoreSet" object is used to hold all of the data
for the future store data (i.e. the weeks for which we need to make predictions). This object contains a list of "FutureStore"
objects similar to the historicalStores list, but with some different members and functions.

For my particular program, the predictions are performed within the sams_work/time_series_analysis.py file. At the
bottom of this file, you can see segments of code, along with comments that explain the purpose of the code and which
lines should be uncommented/commented to perform various analyses.

The "get_weighted__normalized_sales_value" method on the "WeekSalesAverage" object (sams_work/oop_objects.py) may be of interest
for understanding exactly how the historical sales values are averaged.

Also, the "make_predictions_sequential_methods," and "make_predictions_with_weighted_average_of_methods" may be of interest
for understanding the two primary methods for combining the predicted sales of the various sales prediction methods.

-------------------

Running the Code:

As I mentioned in the report, I don't think you'll be able to run the code, since I ran the code in an Anaconda environment
on my machine that handled all of the package/module dependencies. I wasn't sure how to export the code with all of
its dependencies in a manner such that you would be able to run the program. If you are able to somehow run the program,
you should run the "sams_work/time_series_analysis.py" file as your main, and make modifications to the code at the bottom
of the file, depending on what you want to run. As I mentioned, I would be availale for demo if desired.

-------------------

Output:

With the current setup, the output will be a "kaggle_predictions.txt" file that is in the format specified by Kaggle
for subbmitting the predictions to the online evaluator. A "missing_holiday_preditions.txt" and a "missing_non_holiday_preditions.txt"
file is also provided to ensure that all predictions were made.