Skip to content

rootcss/cassandra-pyspark-end-to-end-ml

Repository files navigation

Spark-Cassandra

  • data_generator.py: Spark Job to create fake data and store into Cassandra
  • data_faker.py: Designs the payload for fake data
  • modelling.py: Creates Data Models from primary table of JSON data
  • queryable.py: Allows to write SQL query on Data Models, using Spark as backend.
  • config.py.sample: Copy the file to config.py and set values.
  • schema_creator.py: Creates the required schema in Cassandra.
  • Check HOWTO file for all instructions.
  • Scripts to create dummy graph data is inside the folder graph_data, for which usage same as above scripts.

About

Generates Fake Data, Saves to Cassandra, Performs ETL over raw JSON data, finally does Machine Learning on the data - developed for a POC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published