Working with Luigi Workflows

This project shows how Luigi workflows are used as pipelines for data science. A well-formed dataset is used to ensure the data science itself doesn't interfere with how Luigi works.

Requirements

The project was created using PyCharm, but any IDE should work. There are several packages imported including:

luigi
numpy
pandas
fpdf
re
gc
pickle
sklearn
nltk

You can use pip install (or whatever your preferred package manager is) to install the packages.

How it Works

There are two Python files contained within the workflows folder that contain all the code. The first workflow (workflow_one) just shows how Luigi works with a very simple example. The real workflow is contained within workflow_two and is the genesis of this project.

The basic notion, from a data science perspective, is to take a corpus, vectorize it, split it into test and train sets, pickle it (for later use), then use logistic regression to build a predictive model.

The dataset used is from Reddit.

Running Luigi

You need to pass arguments to Luigi in order for it to work. (You do this via Python.) You can do this from the command line or add arguments to the .py file run configuration in your IDE.

To see the entire pipeline run with PDF outputs from Luigi, run workflwo_two.py.

NOTE

You need to have the controversial-comments.json dataset from Reddit and put it into the /data/source directory.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
data		data
models		models
reports		reports
workflows		workflows
.gitattributes		.gitattributes
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

data

data

models

models

reports

reports

workflows

workflows

.gitattributes

.gitattributes

README.md

README.md

_config.yml

_config.yml

Repository files navigation

Working with Luigi Workflows

Requirements

How it Works

Running Luigi

NOTE

About

Releases

Packages

Languages

neuggs/Working-With-Luigi-Workflows

Folders and files

Latest commit

History

Repository files navigation

Working with Luigi Workflows

Requirements

How it Works

Running Luigi

NOTE

About

Resources

Stars

Watchers

Forks

Languages