Skip to content

sathishmtech01/pyspark_learning

Repository files navigation

Pyspark Learning Journey

Online Tutorials 1.https://spark.apache.org/docs/latest/rdd-programming-guide.html 2.http://www.sparktutorials.net/Getting+Started+with+Apache+Spark+RDDs 3.https://www.codementor.io/jadianes/spark-python-rdd-basics-du107x2ra 4.http://files.cnblogs.com/files/sirkevin/Spark_for_Python_Developers.pdf 5.https://www.tutorialspoint.com/pyspark 6.https://www.dezyre.com/apache-spark-tutorial/pyspark-tutorial 7.http://www.kirupagaran.com/images/free_downloads/Apache_Spark_Programming_Cheat_Sheet.pdf 8. https://medium.com/makemytrip-engineering

Cloudera not opening solved:

https://amiduos.com/support/knowledge-base/article/enabling-virtualization-in-lenovo-systems

PySpark Configuration

  1. Open Pycharm
    • File
      • Settings
        • Project
          • Project Structure
            • Add Content Root
              • Add the python libraries - py*.zip,pyspark.zip

csk@csk-ai-revolution:/sparkscala/spark-2.4.0-bin-hadoop2.6/bin$ export PYSPARK_PYTHON=/home/csk/anaconda/envs/face/bin/python csk@csk-ai-revolution:/sparkscala/spark-2.4.0-bin-hadoop2.6/bin$ ./pyspark It will open pyspark in command prompt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published