A-Search-Engine-for-Kids

A Kid Friendly Search engine which will display results for enhancement of kids' knowledge. The search engine eliminates all kinds of harmful content inappropriate for kids. We are using Neural Network and will rank the results using TF IDF,by tweaking it with our own formula.

The project consists of the following main steps:

General instructions to run the project
Scraping data from the web
Assigning labels to the training data using pattern.en
Filtering objectionable content
Identifying topics
Running ElasticSearch

General Instructions to run the project

Clone the repository into your local machine by typing the command

git clone repository_url

To run the project, you need to have a running version of Python 3.6(not 3.7) and pip.
To install the dependencies execute,

pip install requirements.txt

This command will install all the required dependencies.

Scraping the data from the web

Scraping the data requires installed version of Selenium and BeautifulSoup. The libraries are present in the requirements.txt file.
For Data Scraping -

Run Medium_Scrapper_using_selenium.py
Run WebScraper.py
Run Medium_Search_URL_Scrapper.py
Run WebScraper.py
Combine the datasets and name them - final_data.csv

Or you can download the data from this link: https://drive.google.com/file/d/1BrAguUjU6yU4In8iWx4-i37MBcK_gmqi/view

Assigning labels to the training data using pattern.en

Input file - final_data.csv

Output file - file_data_output.csv

Create a new Virtual Environment using the command
```
virtualenv -p python3 venv
```
A new folder called venv gets created.
To source into the Virtual Environment, type the command
```
source venv/bin/activate
```
A (venv) will get prepended to the command line.
Navigate to the Project folder in the path - /A-Search-Engine-for-Kids/helper_scripts/class_labelling_using_pattern.en
Run the command
```
python data_content_labelling.py
```
This script was created initially to classify data as Positive, Strongly Positive, Negative, Strongly Negative. The input CSV file taken here is a basic data set with limited records of 1280 rows.
The output of this script is the same input data set with another column for sentiment score appended.

Filtering objectionable content

Once the final_data.csv file is retrieved, the file should be saved in the same directory as the file named, web_content_classification.ipynb file. The file should be executed by entering the command,
```
jupyter notebook
```
This will open the notebook and all the cells can be executed by using Shift+Enter. Or via Cells> Run All.

Note: Since the data set is huge (149mb), it will take a long period of time to see the results.

Identifying topics

To execute this file, load the classification3.ipynb and topic modelling.ipynb as ipynb files in the jupyter notebook and execute it by using Shift+Enter or Cells> Run All.
This file takes as input the output of the Filtering objectionable content step. The input file is "whole_data.csv" which is found in the same directory as the classification.ipynb file.

Running ElasticSearch

Create virtual environment:
```
virtualenv -p python3 venv
```
Download elastic search (anywhere other than project folder):
```
brew install elasticsearch
```
Set up virtual environment inside the app/ folder
```
virtualenv -p python3 venv
source venv/bin/activate
```
On execution of the last command you will see “venv” in the terminal line
Open a second terminal window and start elastic search process in background
```
brew services start elasticsearch
```
Go to this directory, " ./usr/local/bin your elastic search directory and run
```
./elasticsearch` or `.\elasticsearch
```
Once elasticsearch is up and running, go to app/index/ and run,
```
python elastic_search_helper.py
```
This will start the flask app, which can be viewed in the browser using this url:
```
http://localhost:5000
```

TroubleShooting:

If any error while starting elastic search

"failed to obtain node locks, tried [[/usr/local/var/lib/elasticsearch .." Enter
ps aux | grep 'java' kill -9 <PID>
Unable to locate python 3.7 on Pycharm locate anaconda if installed
which anaconda Copy the path for the folder into pycharm and locate python 3.7 or similar version

PyCharm run/debug configuration will look like this ![image](https://user-images.githubusercontent.com/25397038/50049102-1f7c2b80-0091-11e9-8369-b13087f1346d.png)

Authors

Girish Tiwale
Richa Nahar
Sabiha Barlaskar
Supritha Amudhu

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
app		app
data		data
helper_scripts		helper_scripts
static		static
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

data

data

helper_scripts

helper_scripts

static

static

.gitignore

.gitignore

README.md

README.md

requirement.txt

requirement.txt

Repository files navigation

A-Search-Engine-for-Kids

General Instructions to run the project

Scraping the data from the web

Assigning labels to the training data using pattern.en

Filtering objectionable content

Identifying topics

Running ElasticSearch

TroubleShooting:

Authors

About

Releases

Packages

Contributors 3

Languages

sabiha90/A-Search-Engine-for-Kids

Folders and files

Latest commit

History

Repository files navigation

A-Search-Engine-for-Kids

General Instructions to run the project

Scraping the data from the web

Assigning labels to the training data using pattern.en

Filtering objectionable content

Identifying topics

Running ElasticSearch

TroubleShooting:

Authors

About

Resources

Stars

Watchers

Forks

Languages