databayes

Probabilistic Database that uses bayesian inference to build up relations.

Install C++ redis client hiredis & jsoncpp. Follow the instructions at https://github.com/redis/hiredis (release 0.11.0).

Repeat for https://github.com/open-source-parsers/jsoncpp.

Installation

(Not yet implemented)

git clone https://github.com/rfaulkner/databayes
cd databayes
cmake \.
make
sudo make install

Setup

Ensure that you LD_LIBRARY_PATH is set:

export LD_LIBRARY_PATH=/usr/local/lib

Use the following compiler flags including the jsoncpp library:

-std=c++0x $(pkg-config --cflags --libs jsoncpp)

In the current setup you will need create an object file for the md5 lib manually (will eventually fix this):

g++ -std=c++0x -o md5.o -c src/md5.cpp

For execution (link hiredis, md5, libboost_regex):

databayes$ g++ -std=c++0x src/client.cpp $(pkg-config --cflags --libs jsoncpp) -g -o dbcli /usr/lib/libhiredis.a /usr/lib/libboost_regex.a ./md5.o
databayes$ ./dbcli

How does it work?

"Databayes" is a probabilistic database. "Entities" define classes of things that are sets of attributes that may take on different values. The "Relation" is the atomic unit which expresses a link between instances of two entities dependent on attribute values. From the set of relations probability distributions may be determined that allow for sampling and inference to be applied over sets of entity attributes.

Parser

The general parser syntax has the following definition:

Implements an SLR parser. Valid Statements:

(1) ADD REL E1(x1=vx1[, x2=vx2, ..]) E2(y1=vy1[, y2=vy2, ..]) [VALUE]
(2) GEN E1[.A_E1] GIVEN E2 [ATTR Ai=Vi[, ...]]
(3) INF E1.A_E1 GIVEN E2 [ATTR Ai=Vi[, ...]]
(4) DEF E1[(x1_type-x1, x2_type-x2, ...)]
(5) LST REL [E1 [E2]]
(6) LST ENT [E1]*
(7) RM REL E1(x_1 [, x_2, ..]) E2(y_1 [, y_2, ..])
(8) RM ENT [E1]*
(9) SET E.A FOR E1(x1=vx1[, x2=vx2, ..]) E2(y1=vy1[, y2=vy2, ..]) AS V
(10) DEC E1(x1=vx1[, x2=vx2, ..]) E2(y1=vy1[, y2=vy2, ..]) [VALUE]

provides a facility for insertion into the system
generate a sample conditional on a set of constraints
infer an expected value for an attribute
define a new entity
list relations optionally dependent relational entities
list entities. Either specify them or simply list all.
remove a relation
remove an entity
set an attribute value
decrement the count for this relation

More details on how to use these to build entities, relations and how to use generative commands to sample.

Entities

Entities are the core artifacts of databayes that are defined by entity names and entity attributes. They correspond closely to what could be thought of as types in databayes. Attributes are given a type and may take on values of that type when defined in relations (below).

Below is a sample entity representation:

{"entity" : "e1",   "fields" : {"_itemcount" : 1, "x" : "integer" } }

The key "entity" defines the entity name while "fields" contains a list of keys that correspond to attribute names with their respective types. The itemcount field simply stores the integer count of the number of entity attributes.

Relations

Relations are defined across entities and allow a relationship among entity attribute values to be established. The relation defines two sets of fields with attribute values assigned to entity attributes, see the sample below:

{"cause" : "e1", "entity_left" : "e1", "entity_right" : "e2", "fields_left" : { "#x" : "integer", "_itemcount" : 1, "x" : "1"},   "fields_right" : {"#y" : "float", "_itemcount" : 1, "y" : "1.0"},   "instance_count" : 1}

The meta data has the following definitions

entity_left, specifies the left hand entity that defines the "left" attributes
entity_right, specifies the right hand entity that defines the "right" attributes
cause, for causal relations this defines the causal entity attribute
fields_left, defines the list of left-hand attributes with attribute types and values
fields_right, defines the list of right-hand attributes with attribute types and values
#entityname, key indicating the type of an attribute
itemcount, stores the integer count of the number of entity attributes in the list

Filtering

The index provides a means to filter a set of relations via the following method:

void IndexHandler::filterRelations(std::vector<Relation>& relations, AttributeBucket& filterAttrs, std::string comparator)

Given a set of relations and a bucket of attributes each relation in the set is compared against each attribute in the bucket. Only if the relation contains the given attribute is a comparison made. If the attribute does exist in the relation, then the value is compared to that of the bucket attribute given the comparator passed.

Comparators are defined by the following string values:

#define ATTR_TUPLE_COMPARE_EQ "="
#define ATTR_TUPLE_COMPARE_LT "<"
#define ATTR_TUPLE_COMPARE_GT ">"
#define ATTR_TUPLE_COMPARE_LTE "<="
#define ATTR_TUPLE_COMPARE_GTE ">="
#define ATTR_TUPLE_COMPARE_NE "!="

Only if the relation returns true on compare for all bucket attributes is the relation retained in the filtered list. The reference to the relations vector is mutated accordingly.

Examples

Defining an entity:

When defining an entity then entity-name and attributes with their types. Entity-names must be unique (ie. not already exist) and all attrbute types must be valid:

databayes > def x(y_integer, z_float)
databayes > def a(b_string, c_integer)
databayes > def nofields

Creating a Relation:

When creating a relation you specify entities and optionally any number of entity attributes constrained by value. Entities and specified attributes must exist however value constraints are optional:

databayes > add rel x(a=1, b=2.1) y(c=22, d=0.3)

Listing Entities:

When listing entities (and in general) a wildcard ("*") for one or more characters may be used:

databayes > lst ent *   // List all entities
databayes > lst ent a*  // List all entities beginning with 'a'

Listing Relations:

When listing relations conditions on attributes may be specified where desired. Attributes and entities must exist.

databayes > lst rel * *                     // List all entities
databayes > lst rel a* b*                   // List all entities
databayes > lst rel a*(x=20) b*             // List all entities
databayes > lst rel a*(x=20) b*(y=hello)    // List all entities

Removing Entities:

Allows client to remove entities from the database:

databayes > rm ent myent    // removes entity myent, this cascades to all relations containing myent

Care must be taken when using this command since removal will automatically cascade to all relations dependent upon this entity.

Removing Relations:

Allows client to remove relations from the database - WARNING, this will remove all relations of this type:

databayes > rm rel x(a=1, b=2.1) y(c=22, d=0.3)

This will remove all relations whose attributes match the assignments.

Functionality Checklist

Development

All contributions are certainly welcome! In fact I'd love to have some partners on this work at this point especially those that have interest in schemaless relational systems and bayesian statistics.

To get going simply follow the instructions in "Setup" above (or even fork your own repo) and use the push.sh script to push your changes to your local vagrant instance for testing.

Name		Name	Last commit message	Last commit date
Latest commit History 897 Commits
http		http
src		src
vagrant		vagrant
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
push.sh		push.sh

License

rfaulkner/databayes

Folders and files

Latest commit

History

Repository files navigation