Skip to content

sagark/smap-materializer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smap-materializer

Currently works:

  • "Stream" focused operators (e.g. running subsample for every stream, lots of operators for one stream)
  • New streams detected as they appear, subsample(300) and subsample(3600) is run for all of them
  • Subsample(300) and subsample(3600) are updated every 5 mins (or one hour) for each stream
  • Setting proper metadata so that powerdb recognizes these as subsample streams (i.e. this is now capable of being actually deployed at least for auto-subsampling, barring any stability issues)
  • Graceful restart after failure (don't have to recompute everything, just pickup from where we left off)
  • "Query" focused operators (e.g. queries used by berkeley.openbms, lots of streams for one set of operators)
  • Output metadata allows for powerdb to automatically recognize substreams

Needs to be implemented:

  • Load from DB instead of shelves
  • What happens if a stream that we're computing for goes down?
  • Republisher (I'm not really sure if this needs to be used... seems simpler just to keep pulling directly from readingdb and keep track of latest time + polling)
  • Propagating metadata

How to use:

  • Adding a stream
    • Start your driver
    • Materializer will auto-detect the stream and start subsample(300) and subsample(3600) for it (including all historical data) with no need for user action
    • If you need to add custom ops, stop materializer, edit the shelf with python and restart materializer (this is annoying, will be fixed once DB is integrated)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages