Skip to content

pombredanne/pulp_db

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pulp_db

AIM:  To capture and query large data.  Realtime capture.  Offline compression.

pulp_db is a NoSQL database.
It was written for realtime, jagged (hard to schema) data streams.  Days of immutable data.
It indexes the main data. Also supports field index and quering.
It supports indexing.

Optimised for high throughput.
It uses a two stage write process to achive this.  The offline one automatcially optimises the indexes.
Most data is immutable. We can use more specialised data structures if we don't need to support insertion once the db has been written.

Optimised for low memory usage in read mode.
Can also be used to index and markup a flat file of data.

Supports python and pypy query front end.


NoSQL db for:
     * jagged  
     * ordered/realtime
     * immutable data

Supports indexing.
Python query api.  Much nicer SQL imo.    ( DB[:200](fld=selector_func) & DB.idx['time'](time_range_func)(un_indexed) ) | DB(some_selector)(some_thing)[::2](some_othering)
    
    SELECT * FROM TableTimes INNER JOIN TableTypes ON TableTimes.id = TableTypes.id
    vs
    DB.idx['time'] & DB.idx['type']
    
    SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name
    vs
    DB.idx['time'] | DB.idx['type']

 Far more expressive and general.   Can just chain.  
    DB(selector)(selector)[::2](selector)(selector)(selector)[:1000]
 Can combine
    ( DB[:200](fld=selector_func) & DB.idx['time'](time_range_func)(un_indexed) ) | DB(some_selector)(some_thing)[::2](some_othering)
    


Targeting:
    High write throughput:   Two stage process to acchive this.  1) Capture and do little.  2) Offline optimise [lots more todo here].
    Super fast read queries.  [More todo here, pushing more to c]
        Low memory use for read mode.  Streaming results.


BUILDING
    todo

TESTING
    todo

INSTALLING
    todo

USING
    todo


About

Capture and query large data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 45.9%
  • Shell 16.9%
  • HTML 13.5%
  • Python 11.9%
  • C++ 9.5%
  • PureBasic 1.2%
  • Other 1.1%