Skip to content

hmark/viddle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Viddle

Web video syndicator.

Changelog

v0.6

minor change of scoring algorithm
slovak docs

v0.5

extended list of supported sites and video players
multiple videos per page functionality

v0.4

results pagination
video templates

v0.3

embedded video output
tags

v0.2

slovak text search support
cherrypy web framework integration

v0.1

web crawler
data parser
basic whoosh indexing and search engine

Docs

Further documentation in slovak language:
http://vi.ikt.ui.sav.sk/User:marek.hlavac?view=home

Dependencies

To run Viddle in your local environment you will need:
- Python 3.x
- Whoosh module
- CherryPy module
- BeautifulSoup module
- PyMongo module
- MongoDB database

Usage

/conf/db.conf

Should contain one line of mongodb access data in format: mongodb://USER:PASS@SITE:PORT/DB_NAME

/conf/sites.conf

List of sites from we are going to crawl inner links with additional sites information. One line contains triplet
[URL] [INNER_LINKS_FILTER] [NAME]
where:
- URL is sites url
- INNER_LINKS_FILTER is used for filtering out cross-domain or other irrelevant inner links
- NAME is used for identifying site

/conf/regex.conf

List of regular expressions that will be used for finding out video data. One line contains triplet
[TAG] [URL_REGEX] [PLAYER]
where:
- TAG specifies tags from which we are going to crawl video data
- URL_REGEX is regular expression for finding out video identificator
- PLAYER specified type of video player
- e.g.: input http://embed.ted.com/talks/.*\.html ted.com

Web crawling can be started with miner.py script:

python crawler/miner.py

Search can be executed through web GUI or by query class from search module.

About

Web video syndicator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published