Skip to content

ZXShwan/WordCatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordCatch

WordCatch is a webapp which helps non-native English speaker learn English. It gives users a few word candidates to fill in an incomplete phrase. For example, when a user typing "different _ them", our webapp will prompt out the following candidates: "different from them", "different than them", "different to them", "different about them" etc, and their number of occurance in the UMBC WebBase Corpus respectively. Each word candidate is accompanied with two example sentences. It also supports filtering by part of speech. When a user typing "v. an issue", it will prompt out candidates like "submit an issue", "have an issue", etc.

Technology

WordCatch is built by/on:

  • Spark
  • Hbase
  • NLTK(Natural Language Toolkit)
  • Django

Schema Design

When we want to get all word candidates which fill in the phrase "different _ them", we could query Hbase like this get 'wordcatch_umbc','07:different,them',[2] in hbase shell.

In this example:

Table Name: wordcatch_umbc

Partition key: 07:different,them Prefix 07 means the key is salted to regions.

Column Family: 2 It means we want to find all word candidates filled in the middle blank.

Column key: 2:from the word candidate with column family name 2

Cell:

{ 
   "count":278,
   "pos":"IN",
   "example":[ 
      "We still have too many Americans who give into their fears of those who are different from them",
      " For centuries, the Japanese have considered Burakumin people as descendants of Korean prisoners of war, even though there is no evidence that they are racially different from them"
   ]
}

In this json file:

count the word candidate occurance in the umbc copus

pos the part of speech of this word candidate (See Penn TreeBank Project)

example example sentences to further clarify the usage

More Example Query Result

Check it out

main program by Spark

Cell Converter : help generate Cell for the bulk load by HFile.

Issues

Should upgrade the Django App to get compatible with the new schema.

Future Update

About

WordCatch is a webapp which helps non-native English speaker learn English.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published