Skip to content

YunseokJANG/Sentences-analysis

 
 

Repository files navigation

Korean SNS & Article Analysis


Back-end

Crawling

Todo

  1. Crawl Naver news or Google news based on a query
  2. Connect to the parsing part to parse and store in database automatically

Parsing using KoNLPy

Todo

Issue

  1. Are all the morphems indeed needed? (ex. 의, 는, 이다, etc.)
  2. Multi threading in parsing process doesn't work. It works only for short sentences. doesn't matter. will not use multi threading.

Done

2016.05.12.

  1. Done with parsing the input sentence for creating rule.

2016.04.09.

  1. Used dummy text file
  2. Parsed each line to sentences with multi threading process (2 threads)
  3. Parsed each sentences to morphemes with multi threading process (2 threads) doesn't work.

REST API

Todo

API Description CRUD
(rulesets) PUT
/rulesets/{topic_id}/{ruleset_seq}/{new_name}
  • Change the name of the ruleset.
U
  • rulesets

Done

Removed

API Description CRUD
(topics) GET
/topics/
  • Get all topics from the database.
R
  • topics
(sources) GET
/sources/
  • Get all sources from the database.
R
  • sources
(rulesets) POST
/rulesets/{topic_id}/{category_seq}/{name}
  • Create new ruleset.
  • Which is a kind of package of rules.
C
  • rulesets
(rulesets) GET
/rulesets/{topic_id}
  • Get all the rulesets from the database.
R
  • rulesets
(rulesets) DELETE
/rulesets/{topic_id}/{category_seq}
  • Delete the ruleset and its realted rules.
D
  • rulesets
  • rules
  • rule_word_relations
(words) POST
/words/{fulltext}
  • Parse the {fulltext} into morphemes.
  • Store the unregistered morphems into words table.
  • Get morphemes of the fulltext after parsing.
C
  • words
(rules) POST
/rules/{topic_id}/{category_seq}/{fulltext}/{word_ids}
  • Create an actual rule, combination of words.
C
  • rules
  • rule_word_relations
(rules) GET
/rules/{topic_id}/{ruleset_seq}
  • Get rules of the selected rulset.
R
  • rules
  • (rule_word_relations?)
(rules) GET
/rules/{rule_id}
  • Get specific rule.
R
  • rules
  • rule_word_relations
(rules) PUT
/rules/{rule_id}/{word_ids}
  • Change the rule, combination of words.
U
  • rule_word_relations
(rules) DELETE
/rules/{rule_id}
  • Delete the rule, either fulltext and combination of words.
D
  • rules
  • rule_word_relations

Issue

  1. See Database - Issue 2.

Done


Database

Todo

Issue

  1. Some emojis are not properly saved. Some are saved just like '?????'
  2. How about create 'querys' table to store the queries which are used to crawl the posts. (ex. 총선) Then it is possible to categorize the posts and user can analyze only the posts they are interested in. If we only want to analyze just all of the recent posts, it might be redundant data. However, still it is a good option, considering expandability. There is topics table

Done

2015.05.12.

  1. Crawled posts are stored in MySQL database.
  2. Rulesets and Rules are sotred in MySQL database.
  3. Redis hold the result of analysis. There are key-bitarray maps with a rule_id as a key and bitarray with 1 at the position of realted sentece_id as value. If there are no realted sentences for the rule, all the value of bitarray will be 0. The rule_id of Unanalyzed rule is not set in the redis.

2016.04.09.

  1. Created database shceme and initializing code.

Front-end

Todo

Issue

Done

About

Korean SNS or article analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 70.5%
  • JavaScript 20.0%
  • HTML 7.8%
  • CSS 1.7%