Web video syndicator.
minor change of scoring algorithm
slovak docs
extended list of supported sites and video players
multiple videos per page functionality
results pagination
video templates
embedded video output
tags
slovak text search support
cherrypy web framework integration
web crawler
data parser
basic whoosh indexing and search engine
Further documentation in slovak language:
http://vi.ikt.ui.sav.sk/User:marek.hlavac?view=home
To run Viddle in your local environment you will need:
- Python 3.x
- Whoosh module
- CherryPy module
- BeautifulSoup module
- PyMongo module
- MongoDB database
/conf/db.conf
Should contain one line of mongodb access data in format: mongodb://USER:PASS@SITE:PORT/DB_NAME
/conf/sites.conf
List of sites from we are going to crawl inner links with additional sites information. One line contains triplet
[URL] [INNER_LINKS_FILTER] [NAME]
where:
- URL is sites url
- INNER_LINKS_FILTER is used for filtering out cross-domain or other irrelevant inner links
- NAME is used for identifying site
/conf/regex.conf
List of regular expressions that will be used for finding out video data. One line contains triplet
[TAG] [URL_REGEX] [PLAYER]
where:
- TAG specifies tags from which we are going to crawl video data
- URL_REGEX is regular expression for finding out video identificator
- PLAYER specified type of video player
- e.g.: input http://embed.ted.com/talks/.*\.html ted.com
Web crawling can be started with miner.py script:
python crawler/miner.py
Search can be executed through web GUI or by query class from search module.