Skip to content

rfaulkner/databayes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

databayes

Probabilistic Database that uses bayesian inference to build up relations.

Install C++ redis client hiredis & jsoncpp. Follow the instructions at https://github.com/redis/hiredis (release 0.11.0).

Repeat for https://github.com/open-source-parsers/jsoncpp.

Installation

(Not yet implemented)

git clone https://github.com/rfaulkner/databayes
cd databayes
cmake \.
make
sudo make install

Setup

Ensure that you LD_LIBRARY_PATH is set:

export LD_LIBRARY_PATH=/usr/local/lib

Use the following compiler flags including the jsoncpp library:

-std=c++0x $(pkg-config --cflags --libs jsoncpp)

In the current setup you will need create an object file for the md5 lib manually (will eventually fix this):

g++ -std=c++0x -o md5.o -c src/md5.cpp

For execution (link hiredis, md5, libboost_regex):

databayes$ g++ -std=c++0x src/client.cpp $(pkg-config --cflags --libs jsoncpp) -g -o dbcli /usr/lib/libhiredis.a /usr/lib/libboost_regex.a ./md5.o
databayes$ ./dbcli

How does it work?

"Databayes" is a probabilistic database. "Entities" define classes of things that are sets of attributes that may take on different values. The "Relation" is the atomic unit which expresses a link between instances of two entities dependent on attribute values. From the set of relations probability distributions may be determined that allow for sampling and inference to be applied over sets of entity attributes.

Parser

The general parser syntax has the following definition:

Implements an SLR parser. Valid Statements:

(1) ADD REL E1(x1=vx1[, x2=vx2, ..]) E2(y1=vy1[, y2=vy2, ..]) [VALUE]
(2) GEN E1[.A_E1] GIVEN E2 [ATTR Ai=Vi[, ...]]
(3) INF E1.A_E1 GIVEN E2 [ATTR Ai=Vi[, ...]]
(4) DEF E1[(x1_type-x1, x2_type-x2, ...)]
(5) LST REL [E1 [E2]]
(6) LST ENT [E1]*
(7) RM REL E1(x_1 [, x_2, ..]) E2(y_1 [, y_2, ..])
(8) RM ENT [E1]*
(9) SET E.A FOR E1(x1=vx1[, x2=vx2, ..]) E2(y1=vy1[, y2=vy2, ..]) AS V
(10) DEC E1(x1=vx1[, x2=vx2, ..]) E2(y1=vy1[, y2=vy2, ..]) [VALUE]
  1. provides a facility for insertion into the system
  2. generate a sample conditional on a set of constraints
  3. infer an expected value for an attribute
  4. define a new entity
  5. list relations optionally dependent relational entities
  6. list entities. Either specify them or simply list all.
  7. remove a relation
  8. remove an entity
  9. set an attribute value
  10. decrement the count for this relation

More details on how to use these to build entities, relations and how to use generative commands to sample.

Entities

Entities are the core artifacts of databayes that are defined by entity names and entity attributes. They correspond closely to what could be thought of as types in databayes. Attributes are given a type and may take on values of that type when defined in relations (below).

Below is a sample entity representation:

{"entity" : "e1",   "fields" : {"_itemcount" : 1, "x" : "integer" } }

The key "entity" defines the entity name while "fields" contains a list of keys that correspond to attribute names with their respective types. The itemcount field simply stores the integer count of the number of entity attributes.

Relations

Relations are defined across entities and allow a relationship among entity attribute values to be established. The relation defines two sets of fields with attribute values assigned to entity attributes, see the sample below:

{"cause" : "e1", "entity_left" : "e1", "entity_right" : "e2", "fields_left" : { "#x" : "integer", "_itemcount" : 1, "x" : "1"},   "fields_right" : {"#y" : "float", "_itemcount" : 1, "y" : "1.0"},   "instance_count" : 1}

The meta data has the following definitions

  • entity_left, specifies the left hand entity that defines the "left" attributes
  • entity_right, specifies the right hand entity that defines the "right" attributes
  • cause, for causal relations this defines the causal entity attribute
  • fields_left, defines the list of left-hand attributes with attribute types and values
  • fields_right, defines the list of right-hand attributes with attribute types and values
  • #entityname, key indicating the type of an attribute
  • itemcount, stores the integer count of the number of entity attributes in the list

Filtering

The index provides a means to filter a set of relations via the following method:

void IndexHandler::filterRelations(std::vector<Relation>& relations, AttributeBucket& filterAttrs, std::string comparator)

Given a set of relations and a bucket of attributes each relation in the set is compared against each attribute in the bucket. Only if the relation contains the given attribute is a comparison made. If the attribute does exist in the relation, then the value is compared to that of the bucket attribute given the comparator passed.

Comparators are defined by the following string values:

#define ATTR_TUPLE_COMPARE_EQ "="
#define ATTR_TUPLE_COMPARE_LT "<"
#define ATTR_TUPLE_COMPARE_GT ">"
#define ATTR_TUPLE_COMPARE_LTE "<="
#define ATTR_TUPLE_COMPARE_GTE ">="
#define ATTR_TUPLE_COMPARE_NE "!="

Only if the relation returns true on compare for all bucket attributes is the relation retained in the filtered list. The reference to the relations vector is mutated accordingly.

Examples

Defining an entity:

When defining an entity then entity-name and attributes with their types. Entity-names must be unique (ie. not already exist) and all attrbute types must be valid:

databayes > def x(y_integer, z_float)
databayes > def a(b_string, c_integer)
databayes > def nofields

Creating a Relation:

When creating a relation you specify entities and optionally any number of entity attributes constrained by value. Entities and specified attributes must exist however value constraints are optional:

databayes > add rel x(a=1, b=2.1) y(c=22, d=0.3)

Listing Entities:

When listing entities (and in general) a wildcard ("*") for one or more characters may be used:

databayes > lst ent *   // List all entities
databayes > lst ent a*  // List all entities beginning with 'a'

Listing Relations:

When listing relations conditions on attributes may be specified where desired. Attributes and entities must exist.

databayes > lst rel * *                     // List all entities
databayes > lst rel a* b*                   // List all entities
databayes > lst rel a*(x=20) b*             // List all entities
databayes > lst rel a*(x=20) b*(y=hello)    // List all entities

Removing Entities:

Allows client to remove entities from the database:

databayes > rm ent myent    // removes entity myent, this cascades to all relations containing myent

Care must be taken when using this command since removal will automatically cascade to all relations dependent upon this entity.

Removing Relations:

Allows client to remove relations from the database - WARNING, this will remove all relations of this type:

databayes > rm rel x(a=1, b=2.1) y(c=22, d=0.3)

This will remove all relations whose attributes match the assignments.

Functionality Checklist

  • CLI: Token Parsing
  • CLI: Parsing Loop to process input commands
  • CLI: Cursor navigation with arrow keys
  • CLI: Regenerate Last Command
  • CLI: Handle arbitrary amounts of whitespace
  • Parser Modelling: Parser class to maintain internal state of command execution
  • Parser Modelling: Command codes and State Modeling
  • Parser Modelling: Error codes and Error handling Login
  • Parser Modelling: Termination loop to handle processing of parsed command by calling the index
  • Parser Command: Defining Entities
  • Parser Command: Adding Relations
  • Parser Command: Removing Entities
  • Parser Command: Removing Relations
  • Parser Command: Setting Attribute Values
  • Parser Command: Listing Existing Entities
  • Parser Command: Listing Existing Relations
  • Parser Command: Generating Relation Samples
  • Parser Command: Inferring Expected Value of Relation Samples
  • Object Modeling: Entities
  • Object Modeling: Relations
  • Object Modeling: Attributes
  • Object Modeling: Attribute based collections
  • Object Modeling: JSON Representation of Entities
  • Object Modeling: JSON Representation of Relations
  • Object Modeling: JSON Representation of Relations with type data
  • Object Modeling: Mapping among JSON and object model representations for entities
  • Object Modeling: Mapping among JSON and object model representations for relations
  • Object Modeling: Define attribute types
  • Object Modeling: Attribute type validation
  • Object Modeling: Counts for relations
  • Storage Modeling: Entity Storage
  • Storage Modeling: Relation Storage
  • Storage Modeling: Type Storage
  • Storage Modeling: Index to handle mapping, filtering, extracting relation & entity data from memory to the runtime
  • Storage Modeling: Redis interface for in memory storage
  • Storage Modeling: Disk Storage model
  • Storage Modeling: Swapping logic for disk storage
  • Storage Modeling: Cascading removal of Relations when Entities are removed (NEEDS TESTING)
  • Probabilistic Modeling: Generate Marginal Distributions
  • Probabilistic Modeling: Generate Joint Distributions
  • Probabilistic Modeling: Generate Conditional Distributions
  • Probabilistic Modeling: Sampling Marginal Distributions
  • Probabilistic Modeling: Sampling Joint Distributions
  • Probabilistic Modeling: Sampling Joint Distributions given causality
  • Probabilistic Modeling: Expectation of attribute values across a relation set
  • Probabilistic Modeling: Counting Relations
  • Probabilistic Modeling: Counting Entity occurrences in Relation sets
  • Hosting: Flask server logic utilizing wsgi_mod with Apache
  • Hosting: Mapping scheme from URL to Parser Commands (ADDITIONAL COMMANDS NEED TO BE HANDLED)
  • Hosting: HTTP Parsing logic (see views.py)
  • Hosting: Broker functionality for parser commands coming in form HTTP (currently uses redis)
  • Hosting: Broker serving logic (NEEDS TESTING)

Development

All contributions are certainly welcome! In fact I'd love to have some partners on this work at this point especially those that have interest in schemaless relational systems and bayesian statistics.

To get going simply follow the instructions in "Setup" above (or even fork your own repo) and use the push.sh script to push your changes to your local vagrant instance for testing.

About

Probabilistic Database that uses bayesian inference to build up relations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published