This is the Python 3 version of the date extractor created by Webhose.io.
articleDateExtractor (Article Date Extractor) is a simple open source Python module, built and maintained by Webhose.io, that automatically detects, extracts and normalizes the publication date of an online article or blog post.
- Extracting the publication date information when it is specified in a web page, with over 90% success rate.
import articleDateExtractor
d = articleDateExtractor.extractArticlePublishedDate("http://edition.cnn.com/2015/11/28/opinions/sutter-cop21-paris-preview-two-degrees/index.html")
print d
d = articleDateExtractor.extractArticlePublishedDate("http://techcrunch.com/2015/11/29/tyro-payments/")
print d
You can install from source:
$ git clone https://github.com/Webhose/article-date-extractor
$ cd article-date-extractor
$ python setup.py install
- BeautifulSoup >= 3.2.1
- python-dateutil >= 2.4.2