Skip to content

sabiha90/A-Search-Engine-for-Kids

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A-Search-Engine-for-Kids

A Kid Friendly Search engine which will display results for enhancement of kids' knowledge. The search engine eliminates all kinds of harmful content inappropriate for kids. We are using Neural Network and will rank the results using TF IDF,by tweaking it with our own formula.

The project consists of the following main steps:

  1. General instructions to run the project
  2. Scraping data from the web
  3. Assigning labels to the training data using pattern.en
  4. Filtering objectionable content
  5. Identifying topics
  6. Running ElasticSearch

General Instructions to run the project

Clone the repository into your local machine by typing the command
git clone repository_url

To run the project, you need to have a running version of Python 3.6(not 3.7) and pip.
To install the dependencies execute,
pip install requirements.txt

This command will install all the required dependencies.

Scraping the data from the web


Scraping the data requires installed version of Selenium and BeautifulSoup. The libraries are present in the requirements.txt file.
For Data Scraping -
  1. Run Medium_Scrapper_using_selenium.py
  2. Run WebScraper.py
  3. Run Medium_Search_URL_Scrapper.py
  4. Run WebScraper.py
  5. Combine the datasets and name them - final_data.csv

Or you can download the data from this link: https://drive.google.com/file/d/1BrAguUjU6yU4In8iWx4-i37MBcK_gmqi/view

Assigning labels to the training data using pattern.en


  • Input file - final_data.csv
  • Output file - file_data_output.csv
    1. Create a new Virtual Environment using the command
      virtualenv -p python3 venv
      
    2. A new folder called venv gets created.
    3. To source into the Virtual Environment, type the command
      source venv/bin/activate
      
    4. A (venv) will get prepended to the command line.
    5. Navigate to the Project folder in the path - /A-Search-Engine-for-Kids/helper_scripts/class_labelling_using_pattern.en
    6. Run the command
      python data_content_labelling.py
      
    7. This script was created initially to classify data as Positive, Strongly Positive, Negative, Strongly Negative. The input CSV file taken here is a basic data set with limited records of 1280 rows.
    8. The output of this script is the same input data set with another column for sentiment score appended.

    Filtering objectionable content

    1. Once the final_data.csv file is retrieved, the file should be saved in the same directory as the file named, web_content_classification.ipynb file. The file should be executed by entering the command,
      jupyter notebook
      
    2. This will open the notebook and all the cells can be executed by using Shift+Enter. Or via Cells> Run All.

    Note: Since the data set is huge (149mb), it will take a long period of time to see the results.

    Identifying topics

    1. To execute this file, load the classification3.ipynb and topic modelling.ipynb as ipynb files in the jupyter notebook and execute it by using Shift+Enter or Cells> Run All.
    2. This file takes as input the output of the Filtering objectionable content step. The input file is "whole_data.csv" which is found in the same directory as the classification.ipynb file.

    Running ElasticSearch

    1. Create virtual environment:
      virtualenv -p python3 venv
      
    2. Download elastic search (anywhere other than project folder):
      brew install elasticsearch
      
    3. Set up virtual environment inside the app/ folder
      virtualenv -p python3 venv
      source venv/bin/activate
      

      On execution of the last command you will see “venv” in the terminal line

    4. Open a second terminal window and start elastic search process in background 
      brew services start elasticsearch
      
    5. Go to this directory, " ./usr/local/bin your elastic search directory and run
      ./elasticsearch` or `.\elasticsearch
      
    6. Once elasticsearch is up and running, go to app/index/ and run,
      python elastic_search_helper.py
      

      This will start the flask app, which can be viewed in the browser using this url:

      http://localhost:5000
      

    TroubleShooting:


    If any error while starting elastic search
    1. "failed to obtain node locks, tried [[/usr/local/var/lib/elasticsearch .." Enter

      ps aux | grep 'java' kill -9 <PID>

    2. Unable to locate python 3.7 on Pycharm locate anaconda if installed

      which anaconda Copy the path for the folder into pycharm and locate python 3.7 or similar version

    PyCharm run/debug configuration will look like this ![image](https://user-images.githubusercontent.com/25397038/50049102-1f7c2b80-0091-11e9-8369-b13087f1346d.png)

    Authors

    • Girish Tiwale
    • Richa Nahar
    • Sabiha Barlaskar
    • Supritha Amudhu
  • About

    No description, website, or topics provided.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published