Skip to content

stasbel/articlix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

articlix

Information retrieval project at SPbAU 7th term

screen

Installation

Dev

We use python and pipenv as a primary tools for development. See Pipfile, Pipfile.lock, requirements-dev.txt(if any) and requirements.txt for full specification of platform, python and dependency packages.
Basically, to reproduce enviroment, you need to run pip install -r requirements.txt with certain version of python. However, it is recommended to use virtualenv.

Makefile

We provide Makefile for convinient commands implementation.
Run make help to get info on that.

Prerequisites

  • psql>=10.0 for crawler to store pages

Usage

We provide main.py script, which implements cli interface.
Run python main.py -h to get info on that.

Crawler

python main.py crawler

Index

You can now preprocess data (look at this).
Then python main.py --dfpath="data/clean_articles.h5" --indexpath="data/index.json" --workers=8 index.

Data

Where to find prepared data

Search

Examples

Web interface

Run python main.py web_interface. Then you can find page at localhost on port 8080.

Evaluation

You will need assessments log file, obtained from server.
DCG

Report

Web
Slides
Report

License

MIT

About

Information retrieval project at SPbAU 7th term

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published