Open Domain Question Answering Model

By Quentin Churet, Boris Tronch and Jiahao Lu (CentraleSupélec) -- coordinated by Martin d'Hoffschmidt (Illuin Technology)

The goal of this project is to create an open-domain question answering platform, allowing the user to create a turnkey search engine, that will try to answer to the question given the paragraphs that appears in results.

This project is the result of 1 year of school project in collaboration with CentraleSupélec and Illuin Technology.

Setting up the project

Creating a virtual environnement

After cloning the repository, it is highly recommended to install a virtual environment (such as virtualenv) or Anaconda to isolate the dependencies of this project with other system dependencies.

To install virtualenv, simply run:

$ pip install virtualenv

Once installed, a new virtual environnement can be created by running:

$ virtualenv venv

This will create a virtual environment in the venv directory in the current working directory. To change the location and/or name of the environment directory, change venv to the desired path in the command above.

To enter the virtual environment, run:

(venv) $ deactivate

Installing Depedencies

While the virtual environment is active, install the required dependencies by running:

(venv) $ pip install -r requirements.txt

This will install all of the dependencies at specific versions to ensure they are compatible with one another.

Load the FQuAD data

In order to make the data accessible by the models, we need to transform the FQuAD data into the data folder.

You can do it by running :

(venv) $ python process_fquad_data.py

Load the Wikipedia data

In order to make the data accessible by the models, we need to transform the Wikipedia data into the data folder.

You can do it by running

(venv) $ python process_wikipedia_data

Launching the interface !

The interface has multiple parameters, but it has already in default the best parameters.

Command to launch the interface on the FQuaD data :

Make sure you loaded the data as described above !!!!

(venv) $ python interface.py --count 10 \
          --weighting_model okapi-bm25 \
          --lemmatizer spacy-fr \
          --context_retrieval False \
          --question_answering_label camembert-fquad

Command to launch the interface on the Wikipedia data :

Make sure you loaded the data as described above !!!!

First, you need to download the Camembert model fine-tuned on 13 000 questions generated on this corpus. The models is available here

Then you need to put the folder unzip in the working directory.

Then, run the following command :

(venv) $ python interface.py --count 10 \
          --weighting_model okapi-bm25 \
          --lemmatizer spacy-fr \
          --context_retrieval False \
          --question_answering_label ./camembert_fine_tuned_13000_questions

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
camembert_fine_tuned_13000_questions		camembert_fine_tuned_13000_questions
data		data
dev_resources		dev_resources
fquad_json_files		fquad_json_files
indexes		indexes
models		models
results		results
utils		utils
wikipedia_best_quality_article		wikipedia_best_quality_article
.gitignore		.gitignore
README.md		README.md
accuracy_fquad.py		accuracy_fquad.py
interface.py		interface.py
ploting_final_results.ipynb		ploting_final_results.ipynb
process_fquad_data.py		process_fquad_data.py
process_wikipedia_data.py		process_wikipedia_data.py
requirements.txt		requirements.txt

qchuchu/UnsupervisedQuestionAnswering

Folders and files

Latest commit

History

Repository files navigation

Open Domain Question Answering Model

Setting up the project

Creating a virtual environnement

Installing Depedencies

Load the FQuAD data

Load the Wikipedia data

Launching the interface !

Command to launch the interface on the FQuaD data :

Command to launch the interface on the Wikipedia data :

About

Resources

Stars

Watchers

Forks

Languages