Dedupe files via a web interface
Install OS level dependencies:
- Python 2.7
- Redis
Install app requirements
$ pip install "numpy>=1.6"
$ pip install -r requirements.txt
Located in the delpoy_scripts
directory, there are a collection of bash
scripts that, once run, should give you a standalone instance of the spreadsheet
deduper to run locally. Once you have the OS level dependencies installed (see
above) as well as a C compiler, you should be able to run the scripts like this:
$ bash dedupe_setup.sh
$ bash start_dedupe.sh
Once the app is started, you should be able to navigate to http://127.0.0.1:9999 in a web browser and start deduplicating. To stop the app, do this:
$ bash stop_dedupe.sh
The above script takes care of running the separate components for dedupe-web
. If you are working on this app, you may want to manually control the processes. There are three components that should be running simultaneously for the app to work: Redis, the Flask app, and the worker process that actually does the final deduplication:
$ redis-server # This command may differ depending on your OS
$ nohup python run_queue.py &
$ python app.py
For debugging purposes, it is useful to run these three processes in separate terminal sessions.
- Dedupe Google group
- IRC channel, #dedupe on irc.freenode.net