- Write crawler to crawl news from PTT (Biggest BBS in Taiwan)
- Please contact me if data may be helpful to you
- use Jieba to segment sentences and Stanford POS tagging to filter terms
- Perform single-pass algorithm using cluster mean
- Please see event_mean.py for details
- Use Entropy/Precision/Recall/F-measure for evaluation
- Automatic Online News Issue Construction in Web Environment (WWW, 2008)
- A study on retrospective and online event detection (SIGIR, 1998)
- A comparison of Document Clustering Techniques (1999)