Skip to content

Parthi10/Pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

learning pyspark

Books

a) Spark Definitive Guide

b) High Performance Spark

c) Learning Spark

d) PySpark Cookbook

Certification

DataBricks Certified Developer Note: Databricks says, this Certification will no longer be available after 31 Oct 2019.
DataBricks Certified Associate This is coming soon,as per portal.

Git Repositories,books

Spark Internals by Jerry Lead
gitbook by Jacek Laskowski
another gitbook
advices on certification
Tutorials by Mahmoud Parsian Talks by Daniel Abadi

RDD

See RDD notes
See A primer on Lambda

Dataframes

See Dataframe notes

Spark Internals, architecture, tuning

See architecture

Spark SQL

See spark-sql

Spark Streaming

See spark-streaming

GraphX

Machine Learning

Machine Learning - Feature Engineering

Scala

courses

Python

Other resources

Sequence file
hdfs
External spark packages

TPC-DS Benchmarking

Blogs: http://blog.madhukaraphatak.com/

http://www.cs.sfu.ca/CourseCentral/732/ggbaker/content/spark.html

ibm cloud resources

https://console.bluemix.net/docs/services/AnalyticsforApacheSpark/using_spark-submit.html

#running-a-spark-application-using-the-spark-submit-sh-script https://developer.ibm.com/clouddataservices/docs/analytics-engine/get-started/

Questions/Comments

View my LinkedIn Profile

Please send me email at: kanchan.tewary@gmail.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages