A rework of news-scraper of SinarProject (http://sinarproject.org)
- lxml
- bs4
- pymongo
- mongod
- Bernama [http://www.bernama.com]
- Borneo Post [http://www.theborneopost.com]
- Free Malaysia Today [http://www.freemalaysiatoday.com]
- Ipoh Echo [http://www.ipohecho.com.my]
- Malay Mail [http://www.mmail.com.my]
- Malaysia Chronicle [http://www.malaysia-chronicle.com]
- Malaysia Kini BM [http://www.malaysiakini.com/bm]
- My SinChew [http://www.mysinchew.com]
- New Straits Times [http://www.nst.com.my]
- Selangorku [http://www.selangorku.com]
- Selangor Times [http://www.selangortimes.com]
- The Malaysian Insider [http://www.themalaysianinsider.com]
- The Malaysian Times [http://www.themalaysiantimes.com.my]
- The Star [http://www.thestar.com.my]
- The Sun Daily [http://www.thesundaily.my]
- Utusan [http://www.utusan.com.my]
# apt-get install build-essential mongodb python3 python3-dev python-setuptools
# easy_install3 pip
# pip-3.2 install beautifulsoup4
# pip-3.2 install pymongo
- Beautiful Soup 4.0 Doc [http://www.crummy.com/software/BeautifulSoup/bs4/doc/#]
- W3C Selectors [http://www.w3.org/TR/CSS2/selector.html]
- PyMongo C Extensions dependencies [http://api.mongodb.org/python/current/installation.html#dependencies-for-installing-c-extensions-on-unix]