Watsonsim Question Answering System

Quick Intro

Watsonsim works using a pipeline of operations on questions, candidate answers, and their supporting passages. In many ways it is similar to IBM's Watson, and Petr's YodaQA. It's not all that similar to more logic based systems like OpenCog or Wolfram Alpha. But there are significant differences even from Watson and YodaQA.

We don't use a standard UIMA pipeline, which is a product of our student-project history. Sometimes this is a hindrance but typically it has little impact. We suspect it reduces the learning overhead and boilerplate code.
Unlike YodaQA, we target Jeopardy! questions, but we do incorporate their method of Lexical Answer Type (LAT) checking, in addition to our own.
Our framework is rather heavyweight in terms of computation. Depending on what modules are enabled, it can take between about 1 second and 2 minutes to answer a question. We use Indri to improve accuracy but it is now an optional feature that we highly recommend. (We are investigating alternatives as well.)
We include (relatively) large amounts of preprocessed article text from Wikipedia as our inputs. Be prepared to use about 100GB of space if you want to try it out at its full power.

Installing the Simulator

Use git to clone this repository, as in: git clone https://github.com/SeanTater/uncc2014watsonsim.git
Install Java 8, either:
- Bundled with Eclipse
- or on Ubuntu utopic+: sudo apt-get install openjdk-8-jdk
- or on Fedora 20+: yum install java-1.8.0-openjdk
- or on Windows, Mac, all others
libSVM machine learning library (native)
- For Ubuntu and Fedora: install libsvm-java
- otherwise, for Windows follow some instructions
Download Gradle (just unzip it; keep in mind it updates very often)
Download the latest data and place them in the data/ directory
Copy the configuration file config.properties.sample to config.properties and customize to your liking
Run gradle eclipse -Ptarget in uncc2014watsonsim/ to download platform-independent dependencies and create an Eclipse project.
Possibly enable some Optional Features

Running the Simulator

We recommend running the simulator with Gradle:

gradle run -Ptarget=WatsonSim

But, if you prefer, you can also use Eclipse. First create a project.

gradle eclipse -Ptarget

Then you can run WatsonSim.java directly.

There are a few other features as well

# Generate statistics reports for accuracy and other measurements
gradle run -Ptarget=scripts.ParallelStats
# Regenerate the Indri, Lucene, SemanticVectors, Bigram and Edge indices
gradle run -Ptarget=index.Reindex

Technologies Involved

This list isn't exhaustive, but it should be a good overview

Search
- Text search from Lucene and Indri (Terrier upcoming)
- Web search from Bing (Google is in the works)
- Relational queries using PostgreSQL and SQLite
- Linked data queries using Jena
Sources
- Text from all the articles in Wikipedia, Simple Wikipedia, Wiktionary, and Wikiquotes
- Linked data from DBPedia, used for LAT detection
- Wikipedia pageviews organized by article
- Source, target, and label from all links in Wikipedia
Machine learning with Weka and libSVM
Text parsing and dependency generation from CoreNLP and OpenNLP
Parsing logic in Prolog (with TuProlog)

Notes:

You should probably consider using PostgreSQL if you scale this project to more than a few cores, or any distributed environment. It should support both engines nicely.
The data is sizable and growing, especially for statistics reports; 154.5 GB as of the time of this writing.
Can't find libindri-jni? Make sure you enabled Java and SWIG and had the right dependencies when compiling Indri.

Name		Name	Last commit message	Last commit date
Latest commit History 775 Commits
data		data
lib		lib
public		public
scripts		scripts
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
config.properties.sample		config.properties.sample
get_started.py		get_started.py
get_started.sh		get_started.sh

License

davidmr001/uncc2014watsonsim

Folders and files

Latest commit

History

Repository files navigation

Quick Intro

Installing the Simulator

Running the Simulator

Technologies Involved

Notes:

Tools

About

Resources

License

Stars

Watchers

Forks

Languages