Skip to content

adamPrestor/Crawler

Repository files navigation

Crawler and data extraction

This repository contains code an materials for the web crawler implementation and data extraction algorithms.

Crawler

Configurable and multi-threaded crawler that crawls *.gov.si sites by default.

Details and instructions

Data extraction

  • Regular expressions and XPath queries for data extraction from rtvslo.si, overstock.com and themoviedb.org.
  • Implementation of an automatic data extraction wrapper generator.

Details and instructions

Data indexing

  • HTML webpages inverted index generation
  • Data retrieval using queries

Details and instructions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published