Skip to content

DataPlay is an open-source data analysis and exploration game developed by PlayGen as part of the EU's CELAR initiative and further extended to support CACTOS Project.

License

BorderlessNomad/DataPlay

Repository files navigation

Overview

DataPlay is an open-source data analysis and exploration game developed by PlayGen as part of the EU's CELAR initiative and further extended to support CACTOS Project.

The aim of DataPlay, besides taking CELAR for a spin, is to provide a collaborative environment in which non-expert users get to "play" with government data. The system presents the user with a range of elements of the data, displayed in a variety of visual forms. People are then encouraged to explore this data together. The system also seeks to identify potential correlations between disparate datasets, in order to help users discover hidden patterns within the data.

DataPlay also serves as an excellent use case for CACTOS. First, better resource utilisation and allocation. In order to keep applications infrastructural cost in control it is necessary to have control over when and what should be scaled. We have an ultimate aim of transforming application to fully-automated system which can work with minimum human interventions. We envision to use workload model which will be mostly using Automation in deployment and monitoring of system components. Additionally we would also like to utilise enhanced data center infrastructure where CACTOS will allow to select best possible assembly of hardware and tools to improve the performance of the cloud system as a whole.

With the help of dynamic workload distribution we expect performance improvements over a considerable part of our application which heavily relies on on-demand scaling of resources, when scaling is done across the infrastructure newly allocated VM’s initial execution time will decrease. If segmentation is done optimally there is a possibility of good cost saving and lower maintenance efforts. We also plan to utilise simulation framework in order to test application’s sustainability under very large load and which could answer certain performance question for real market scenarios.

Architecture

The back end is written in Go, to provide concurrency for large volume data processing. There is a multiple master/frontend architecture which relies on HAProxy for its Load-balancing capabilities. The backend also utilises Martini for parametric API routing, number of PostgreSQL replicated and load balanced using pgpool-II with GORM for facilitating communication between back end and database, Cassandra coupled with gocql for data obtained via scraping of 3rd party news sources. Redis for storing monitoring and session related data.

The front end is written in CoffeeScript on top of AngularJS and makes extensive use of the libraries such as D3.js, dc.js and NVD3.js for presenting data in the form of various charts. The user interface is created using Bootstrap, Bootswatch and Font Awesome.

DataPlay contains a rudimentary selection of datasets drawn from DATA.GOV.UK & LONDON DATASTORE, along with political information taken from the BBC, which was extracted and analysed via import.io, kimono and embed.ly.

##Screens

Landing Page

Home Page

Activity Monitor

Search Page

Chart Page

Installation

  1. Install Ubuntu & Node.js
  2. Install all necessary dependencies npm install

Note: Refer tools/deployment/base.sh for base system config and libs.

Production:

  1. HAProxy Load Balancer tools/deployment/loadbalancer/haproxy.sh
  2. Gamification instances tools/deployment/app/frontend.sh
  3. Computation/API instances tools/deployment/app/master.sh
  4. PgPool-II instance tools/deployment/db/pgpool.sh
  5. PostgreSQL DB instance tools/deployment/db/postgresql.sh
  6. Cassandra DB instance tools/deployment/db/cassandra.sh
  7. Redis instance tools/deployment/db/redis.sh

Monitoring:

  1. API response time monitoring tools/deployment/monitoring/api.sh
  2. HAProxy API for dynamic scaling tools/deployment/loadbalancer/api/

Usage

Development:

  1. Run back end & API server using ./start.sh
  2. Install PostgreSQL and import data
  3. Run front end cd www-src && npm install && grunt serve

Staging:

  1. Run back end & API server using ./start.sh
  2. Install PostgreSQL and import data
  3. Deploy & run front end in cd www-src && npm install && grunt serve:dist

Production:

  1. Deploy HAProxy Server & DataPlay HAProxy API (written in Node.js)
  2. Deploy number of required master nodes (Initial multiplicity = 2)
  3. Run back end and API server on each master nodes using ./start.sh
  4. Send add master node requests to DataPlay HAProxy API via cURL
  5. Deploy number of required frontend nodes (Initial multiplicity = 2)
  6. Install Nginx to serve data & set appropriate path for www-src directory
  7. Send add gamification node requests to DataPlay HAProxy API via cURL
  8. Install pgpool-II on CentOS server (for best compatibility) & DataPlay PGPOOL API (written in Node.js)
  9. Deploy number of required PostgreSQL nodes (Initial multiplicity = 1)
  10. Install PostgreSQL along with pgpool-II client plugin
  11. Send add node request to DataPlay PGPOOL API via cURL
  12. Create an A Record for required domain and point it to HAProxy Sever IP

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

Changelog

See CHANGELOG

Authors

The original authors of this application are no longer employed at PlayGen. This software is no longer being maintained by PlayGen, use at your own risk.

Copyright (C) 2013 PlayGen LTD

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License version 3.0 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see LICENSE.

Credits

About

DataPlay is an open-source data analysis and exploration game developed by PlayGen as part of the EU's CELAR initiative and further extended to support CACTOS Project.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published