Spark_Fremont_Bridge_Analysis

Simple project integrating Spark into data analysis

Since this project is more about integrating Spark, the Spark environment has to be set correctly.

I used Findspark plugin to find the config for importing Spark, then write

findspark.init(spark_home)

where spark_home is the path to spark directory.

After that, the Spark environment can be set by:

conf=SparkConf().setMaster('local').setAppName('Fremont Bridge Bike Analysis')
sc=SparkContext(conf=conf)

of course we need to from pyspark import SparkConf, SparkContext first

The rest is pretty straightforward. Use sc.TextFile to get the data from the csv file, then use take to put them into lists. One thing to note is that I'm using lists to avoid further RDD actions to simply get some results, and I'm way more familiar with lists and dicts than RDDs. This project can be modified in the future to apply more RDD functions.

After running the code, this figure should be produced, and analysis can be made from there. I know there's a lot of imporvement to be made, but this is a start for further use of Spark.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bike_spark.py		bike_spark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

bike_spark.py

bike_spark.py

Repository files navigation

Spark_Fremont_Bridge_Analysis

About

Releases

Packages

Languages

License

cc3613/Spark_Fremont_Bridge_Analysis

Folders and files

Latest commit

History

Repository files navigation

Spark_Fremont_Bridge_Analysis

About

Resources

License

Stars

Watchers

Forks

Languages