search-engine

We designed and implemented a web search engine based on the vector space model. This search engine uses the non-relational database mongodb and the tookit of natural language processing nltk.

Dataset

File: data.csv

Source: https://dataverse.harvard.edu/dataset.xhtml?id=3010077

Description: This is a reusable publicly-available dataset for “media bias” studies. The content of this dataset is publish date, title, subtitle and text for 3824 news articles. These articles are collected by a project within 3 months from December of 2016 to march 2017. The source of these news articles are from ABC News, CNN news, The Huffington Post, BBC News, DW News, TASS News, Al Jazeera News, China Daily and RTE News. All of them are collected by using RSS feeds of each news sites. (2017-3-31)

Install MongoDB

$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 68818C72E52529D4
$ sudo echo "deb http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
$ sudo apt-get update
$ sudo apt-get install -y mongodb-org
$ sudo systemctl start mongod
$ sudo systemctl enable mongod

Create a database

$ mongo
$ use search-engine

Performing the ingestion of the articles in the database

$ python3 ingestion.py

Run the search algorithm

$ python3 search.py

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
.gitignore		.gitignore
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
image_utils.py		image_utils.py
ingestion.py		ingestion.py
models.py		models.py
requirements.txt		requirements.txt
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

.gitignore

.gitignore

Jenkinsfile

Jenkinsfile

LICENSE

LICENSE

README.md

README.md

image_utils.py

image_utils.py

ingestion.py

ingestion.py

models.py

models.py

requirements.txt

requirements.txt

search.py

search.py

Repository files navigation

search-engine

Dataset

Install MongoDB

Create a database

Performing the ingestion of the articles in the database

Run the search algorithm

About

Releases

Packages

Languages

License

saraivaufc/search-engine

Folders and files

Latest commit

History

Repository files navigation

search-engine

Dataset

Install MongoDB

Create a database

Performing the ingestion of the articles in the database

Run the search algorithm

About

Topics

Resources

License

Stars

Watchers

Forks

Languages