w205-project

Please run everything as the w205 user unless otherwise stated.

The user should already have hadoop and hive installed and running.

More specifically, if you're booting a UCB instance, you can use the following commands:

As root (update your dev to reflect where your EBS volume is):

mount /dev/xvdf /data
/data/start_postgres.sh
./start-hadoop.sh
su - w205

As w205 (Optional):

/data/start_metastore.sh

Env setup

If you don't have anaconda installed already, please install it from:

https://www.continuum.io/downloads#linux

Setup conda env called "w205-project":

conda env create -f environment.yml

Activate env:

source activate w205-project

Update the env when activated if environment.yml is updated:

conda env update -f environment.yml

To remove the project:

conda remove --name w205-project --all

Run all

Activate environment:

source activate w205-project

Add google docs credentials to: export_data/client_secret.json

Run all scripts: ./runAll.sh

Manual Data setup commands

Download data to data source:

python data_get/download.py

Transform data in data source:

python data_get/transform.py

Put data into HDFS:

cd loading_and_modelling

./load_data_lake.sh

Transform data in hive:

cd ../transforming

./allTransforms.sh

Pull final table down as CSV with headers:

hive -e 'set hive.cli.print.header=true;select * from whiskey_business;' | sed 's/[\t]/,/g' | sed 's/whiskey_business\.//g' > export_data/data/whiskey_business.csv

Export data from csv to google sheets:

python export_data/spreadsheet.py

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
data_get		data_get
data_source		data_source
data_transformed		data_transformed
export_data		export_data
loading_and_modelling		loading_and_modelling
milestones		milestones
notebooks		notebooks
transforming		transforming
.gitignore		.gitignore
Final Preso_ Whiskey Business.pdf		Final Preso_ Whiskey Business.pdf
README.md		README.md
environment.yml		environment.yml
runAll.sh		runAll.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_get

data_get

data_source

data_source

data_transformed

data_transformed

export_data

export_data

loading_and_modelling

loading_and_modelling

milestones

milestones

notebooks

notebooks

transforming

transforming

.gitignore

.gitignore

Final Preso_ Whiskey Business.pdf

Final Preso_ Whiskey Business.pdf

README.md

README.md

environment.yml

environment.yml

runAll.sh

runAll.sh

Repository files navigation

w205-project

Env setup

Run all

Manual Data setup commands

About

Releases

Packages

Contributors 3

Languages

chrisfleisch/w205-project

Folders and files

Latest commit

History

Repository files navigation

w205-project

Env setup

Run all

Manual Data setup commands

About

Resources

Stars

Watchers

Forks

Languages