WordCatch

WordCatch is a webapp which helps non-native English speaker learn English. It gives users a few word candidates to fill in an incomplete phrase. For example, when a user typing "different _ them", our webapp will prompt out the following candidates: "different from them", "different than them", "different to them", "different about them" etc, and their number of occurance in the UMBC WebBase Corpus respectively. Each word candidate is accompanied with two example sentences. It also supports filtering by part of speech. When a user typing "v. an issue", it will prompt out candidates like "submit an issue", "have an issue", etc.

Technology

WordCatch is built by/on:

Spark
Hbase
NLTK(Natural Language Toolkit)
Django

Schema Design

When we want to get all word candidates which fill in the phrase "different _ them", we could query Hbase like this get 'wordcatch_umbc','07:different,them',[2] in hbase shell.

In this example:

Table Name: wordcatch_umbc

Partition key: 07:different,them Prefix 07 means the key is salted to regions.

Column Family: 2 It means we want to find all word candidates filled in the middle blank.

Column key: 2:from the word candidate with column family name 2

Cell:

{ 
   "count":278,
   "pos":"IN",
   "example":[ 
      "We still have too many Americans who give into their fears of those who are different from them",
      " For centuries, the Japanese have considered Burakumin people as descendants of Korean prisoners of war, even though there is no evidence that they are racially different from them"
   ]
}

In this json file:

count the word candidate occurance in the umbc copus

pos the part of speech of this word candidate (See Penn TreeBank Project)

example example sentences to further clarify the usage

More Example Query Result

Check it out

main program by Spark

Cell Converter : help generate Cell for the bulk load by HFile.

Issues

Should upgrade the Django App to get compatible with the new schema.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ScalaUtils		ScalaUtils
WordCatch-core/src/main/python		WordCatch-core/src/main/python
bash-script		bash-script
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScalaUtils

ScalaUtils

WordCatch-core/src/main/python

WordCatch-core/src/main/python

bash-script

bash-script

.gitignore

.gitignore

README.md

README.md

Repository files navigation

WordCatch

Technology

Schema Design

Check it out

Issues

Future Update

About

Releases

Packages

Languages

ZXShwan/WordCatch

Folders and files

Latest commit

History

Repository files navigation

WordCatch

Technology

Schema Design

Check it out

Issues

Future Update

About

Resources

Stars

Watchers

Forks

Languages