Flatten scheme sync experiments

The goal of this project is to evaluate how costly and difficult it is to maintain mongodb's flattened data.

Let's say we have such document:

{
    "name": "userA",
    "posts": [
        {
            "title": "title1",
            "body": "body1"
            "comments": [
                {
                    "content": "comment1"
                },
                {
                    "content": "comment2"
                }
            ]
        }
    ]
}

So, there is user embeds_many posts relationship and also post embeds_many comments relationship.

First idea is to maintain in child tables link to parent and also index field that makes it easy to find child entry based on index.

create table users (_id TEXT, name TEXT);
create table posts (_id TEXT, title TEXT, body TEXT, u_id TEXT, u_index INT);
create table comments (_id TEXT, content TEXT, p_id TEXT, p_index INT);

Second idea is to store all parents' id in chain.

create table users (_id TEXT, name TEXT);
create table posts (_id TEXT, title TEXT, body TEXT, u_id TEXT, u_index INT);
create table comments (_id TEXT, content TEXT, u_id TEXT, u_index INT, p_id TEXT, p_index INT);

Experiment

Generate data so users table contains 100 records, posts contains 10000 records and comments table contains 1M records
Query comments table to get some field vale
Delete all records related to some user _id from all tables.

How to

python schema1.py # generate data according to schema1
python schema2.py # generate data according to schema2

# query to get id for update
select comments.value from users join posts on users._id = posts.user_id join comments on posts._id = comments.post_id where users._id = '33' and posts.user_index = 27 and comments.post_index = 14; # for schema1
select comments.value from comments where user_id = '33' and post_index = 27 and index = 14; # for schema2

# delete stmts for nested strucure updates
delete from comments where user_id = '33'; 
delete from posts where user_id = '33'; 
delete from users where _id = '33';

Results

Do not use postgre features such as CASCADE DELETE of any kind of constraints (foreign key: REFERENCES). Implement all sync logic on app side: will make updates much faster and can easily switch SQL engine.
No difference between schemas.
Query results in 150ms for 1M table.
Delete takes 150ms to remove 10000 records in 1M table. I guess, logN complexity

TODO

Implement sync functionality
Stress test on events stream

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
schema1.py		schema1.py
schema1.sql		schema1.sql
schema2.py		schema2.py
schema2.sql		schema2.sql
test_schema1.py		test_schema1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

schema1.py

schema1.py

schema1.sql

schema1.sql

schema2.py

schema2.py

schema2.sql

schema2.sql

test_schema1.py

test_schema1.py

Repository files navigation

Flatten scheme sync experiments

Experiment

How to

Results

TODO

About

Releases

Packages

Languages

rampage644/flatten-sync

Folders and files

Latest commit

History

Repository files navigation

Flatten scheme sync experiments

Experiment

How to

Results

TODO

About

Resources

Stars

Watchers

Forks

Languages