Skip to content

CharlesNie/DECC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineer Coding Challenge

This page will introduce the code I wrote for Data Engineer Coding Challenge(DECC).

What is it

  • This program I wrote is for crawling news articles from www.bbc.com/news.
  • In this program,you can crawl articles and those articles will be stored in my MongoDB hosted on Compose.
  • When you do the search by title, the title keyword have to be complete title such as "Delta blames power cut for worldwide flight delays" and the section should be like "Magazine" when doing search by section, because there is no fuzzy query implemented.
  • You can always find out more section names from www.bbc.com/news.

How to run it

Download those codes and start with usage shown as below:

Usage:

  • python2.7 decc.py -h(help)
  • python2.7 decc.py -c(crawl articles)
  • python2.7 decc.py -s(search article by section)
  • python2.7 decc.py -t(search article by title) <title>

About

This repository is for Data-Engineer-Coding-Challenge only

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages