Skip to content

Cocktail search written in Python with werkzeug, scrapy and sphinx

License

Notifications You must be signed in to change notification settings

paddymul/cocktail-search

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cocktail search

Simple but powerful open source web application and crawler for searching cocktail recipes from the web. Check out the demo or read below to run it on your own machine.

Getting started

Cloning the repository

git clone https://github.com/wallunit/cocktail-search
cd cocktail-search
git submodule init
git submodule update

Dependencies

If you are on Debian Wheezy you can install everything except less with apt-get:

apt-get install python-werkzeug python-scrapy python-stemmer python-lxml sphinxsearch

In order to install less, install node.js and run:

npm install -g less

Crawling

Crawling websites will consume not only a lot of your bandwidth, but generates also a lot of traffic on the websites you are crawling. So please be nice and don't run the crawler unless absolutely necessary, for example when you have to test a spider, that you have just added or modified. For any other case, I made the files with the cocktail recipes I have already crawled available for you:

wget -r -A .json http://cocktails.p24dev.de/data/
mv cocktails.p24dev.de/data/* crawler/
rm -r cocktails.p24dev.de

However following command will run the crawler for a given spider:

cd crawler
rm -f <spider>.json
scrapy crawl <spider> -o <spider>.json

Note that when the output file already exist, scrapy will append scraped recipes at the bottom of the existing file. So make sure you delete it before.

Setting up Sphinx

There is no RDBMS. All data are stored in a Sphinx index that is built from the crawled cocktail recipes. In order to built the index and run the search daemon in the console, just run:

cd sphinx
indexer --all
searchd --console

Running the development server

In order to serve the website from your local machine and start hacking, there is no need to setup an advanced web server like Apache. Just run the development server and go to http://localhost:8000/ with your web browser:

./web/app runserver

By default the development server only listens on localhost. However if you want to access the website from an other device you can make it also listen on all interfaces:

./web/app runserver 0.0.0.0:8000

Deploying the production environment

Configuring the web app

Create the file web/settings.py and set follwing options:

SITE_URL = 'http://cocktails.p24dev.de/'
LESSC_OPTIONS = ['--compress']

Configuring Apache

<VirtualHost *:80>
        ServerName cocktails.p24dev.de
        Alias /static /var/www/cocktails/web/static

        WSGIDaemonProcess cocktails processes=4 maximum-requests=500 threads=1
        WSGIProcessGroup  cocktails
        WSGIScriptAlias   / /var/www/cocktails/web/app.wsgi

        RewriteEngine On
        RewriteRule ^/$ /static/index.html [P]
</VirtualHost>

Generating static files

Some static files (like the CSS which is compiled from less) are generated on the fly in the development environment, but must be compiled when deploying the production environment, in order to serve them faster:

./web/app deploy

Remember to call that command every time you deploy a new version.

Setting up Sphinx

Build the index and start the search daemon:

cd sphinx
indexer --all
searchd

Note that we omitted the --console option, in order to make searchd run in the background. However instead of just calling searchd on the command line, it would be even better to set up an init script to start and stop Sphinx.

There is rarely a need to restart the search daemon. When you have deployed a new version of the cocktail search or when you ran the crawler again, just rebuilt and rotate the index:

cd sphinx
indexer --all --rotate

Getting involved

This project is my playground for new web technologies and frameworks. And you are invited to make it your playground as well. The code base is still small and well organized. And setting up the development environment is easy and straightforward.

The easiest way to get involved would probably be to write spiders for more cocktail websites. Most spiders consists only of a few lines of Python code and you don't have to know anything about the rest of the stack. Or you could contribute to the wordforms and synonyms lists, without even any programming skills. Also have a look at the open issues and feel free to fix some of them. I prefer to get pull requests via github, but will also accept patches via email.

You have found a bug and don't want to fix it yourself. Or you have an awesome idea to improve the cocktail search? That's great. Please send me an email or even better use the issue tracker.

About

Cocktail search written in Python with werkzeug, scrapy and sphinx

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published