feedergrabber

Retrieves the links and titles of recent posts from blog feeds.

Version

0.3, 20130421

Set-up

Note the existence of requirements.txt files.
There are versions for Python v. 2.7 (contain "27" in the file names) and for Python 3 (no special notations). They are in separate PYTHON27 and PYTHON3 directories.
The main program is feedergrabber.py and its main function is feedergrabber(), which takes a single URL as an argument. The URL should point to an RSS or Atom feed; it normally returns an error if it encounters ordinary HTML or malformed XML.

Output

File feedergrabber.py returns a 3-tuple containing two lists and a datetime.datetime object:

a first list of 2-tuples, each containing the URL and title of a single post; this tuple may be None if something went wrong with the look-up or parsing.
a second list of 2-tuples, each containing the URL and error message associated with an error encountered; if this tuple is None, no errors were observed.
a datetime.datetime object containing the date of either publication or updating, preferring the latter if possible, of the post.

A supplementary program is supply_feedergrabber.py, which runs through a list of known feeds and non-feed blogs, calling feedergrabber for each, and reporting a period (.) if the look-up and parsing proceeded smoothly. Since non-feed sources are no longer supported, they will return an error, "Parsing methods not successful." This supplementary program is used only for internal testing.

New in this version

Now checking for empty titles and reporting as an error if found; parallel to empty links.
Doc-strings complete.
Obsolete function removed.
More commenting.

Past versions

0.2, 20130420 (initial commit; previous version was as bloggergrabber v. 0.1). The initial prototype of this module used Beautiful Soup 4 to scrape both feeds and ordinary HTML. Here, however, support for HTML blogs is discontinued, in order to eliminate the need for manual configuration of the scraping process for each new blog and to speed the parsing process.

Future work

Unit testing.
Error-logging.
Systematize error codes.
Is it possible to subscribe to a feed using a socket, so that there is no need to process anything more than once or wait for HTTP requests to be answered?

[end]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
PYTHON27		PYTHON27
PYTHON3		PYTHON3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PYTHON27

PYTHON27

PYTHON3

PYTHON3

README.md

README.md

Repository files navigation

feedergrabber

Version

Set-up

Output

New in this version

Past versions

Future work

About

Releases

Packages

Languages

brannerchinese/feedergrabber

Folders and files

Latest commit

History

Repository files navigation

feedergrabber

Version

Set-up

Output

New in this version

Past versions

Future work

About

Resources

Stars

Watchers

Forks

Languages