Creepy¶ ↑

DESCRIPTION¶ ↑

Creepy is a simple search engine. It will be developed in four stages: (i) web crawler; (ii) data cleanser; (iii) indexer; and (iv) query processor. Each of the stages are defined by components, some of which are accessible through the Creepy interface, some of which are standalone components.

RUNNING¶ ↑

$ creepy.py [options] <arguments>

Creepy Search Engine¶ ↑

Options:

--version             show program's version number and exit
-h, --help            show this help message and exit
-v, --verbose         Verbose Output [default: False]
-d, --debug           Debug Mode [default: False]
-S STORAGE, --storage=STORAGE
                      Data Storage location [default: ./storage/]
-C, --clean_start     Start clean by deleting existing storage location
                      [default: False]

Web Crawler:
  Crawling the Web: Takes set of seed urls and start crawling the web
  and stores the pages in the storage location.

  -c <seeds | seedfile>, --crawl=<seeds | seedfile>
                      Crawl the web
  -T THRESHOLD, --page_threshold=THRESHOLD
                      Max. number of pages to crawl [default: 2500]
  -N NUM_THREADS, --num_threads=NUM_THREADS
                      Number of threads to use [default: 1]

Data Cleanser:
  Data Preparation/Cleansing: Reads pages stored by the crawler and
  perform requested actions.

  -s, --strip_tags    HTML Tag Stripping
  -t, --tokenize      Tokenize the document

XML Document Graph¶ ↑

$ rucksack.rb [options] <pid_map file>

Options:

-o, --output FILE                Output resultant XML to FILE
-i, --indent N                   Indent N spaces
                                  Default: 2

-a, --append DIR                 Append DIR to the pid_map path
    --delimiter DEL              pid_map uses a delimiter DEL to separate URL and ID
                                  Default: "=>"

XML Validation¶ ↑

$ validate.rb <dtd_file> <xml_file>

REQUIRES¶ ↑

Python 2.6.1
Ruby 1.8.7 or higher
Gem: hpricot
Gem: builder
Gem: libxml-ruby
libxml2

LOCATION¶ ↑

github.com/doggles/creepy

AUTHORS¶ ↑

Travis Hall, trvs.hll@gmail.com

Brittany Thompson, miller317@gmail.com

Bhadresh Patel, bhadresh@wsu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
doc		doc
lib		lib
test/rucksack		test/rucksack
.gitignore		.gitignore
README.rdoc		README.rdoc
creepy.py		creepy.py
rucksack.rb		rucksack.rb
seed.txt		seed.txt
validate.rb		validate.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

lib

lib

test/rucksack

test/rucksack

.gitignore

.gitignore

README.rdoc

README.rdoc

creepy.py

creepy.py

rucksack.rb

rucksack.rb

seed.txt

seed.txt

validate.rb

validate.rb

Repository files navigation

Creepy¶ ↑

DESCRIPTION¶ ↑

RUNNING¶ ↑

Creepy Search Engine¶ ↑

XML Document Graph¶ ↑

XML Validation¶ ↑

REQUIRES¶ ↑

LOCATION¶ ↑

AUTHORS¶ ↑

About

Releases

Packages

Contributors 2

Languages

tvs/creepy

Folders and files

Latest commit

History

Repository files navigation

Creepy¶ ↑

DESCRIPTION¶ ↑

RUNNING¶ ↑

Creepy Search Engine¶ ↑

XML Document Graph¶ ↑

XML Validation¶ ↑

REQUIRES¶ ↑

LOCATION¶ ↑

AUTHORS¶ ↑

About

Resources

Stars

Watchers

Forks

Languages