Fanfiction Scraper

This repository contains scraping tools for FanFiction.Net. These tools are meant to be used for non-commercial, research purposes. They were originally created for the following paper; please cite if you use this software for your research:

Smitha Milli and David Bamman, "Beyond Canonical Texts: A Computational Analysis of Fanfiction" EMNLP 2016.

We have imposed a rate limit of a page per second in these tools in order to comply with the fanfiction.net terms of service:

E. You agree not to use or launch any automated system, including without limitation, "robots," "spiders," or "offline readers," that accesses the Website in a manner that sends more request messages to the FanFiction.Net servers in a given period of time than a human can reasonably produce in the same period by using a conventional on-line web browser.

If you want fanfiction from Archive of Our Own instead, check out @radiolarian's Archive of Our Own scraper.

Usage

Install

pip install fanfiction

Example

from fanfiction import Scraper
scraper = Scraper()
metadata = scraper.scrape_story_metadata(STORY_ID)

Documentation

fanfiction.Scraper.get_story_metadata(story_id)

Returns a dictionary with the metadata for the story.

Attributes:

id [int]: The id of the story
canon_type [str]: The type of canon
canon [str]: The name of the canon
author_id [int]: The user id of the author
title [int]: The title of the story
updated [int]: The timestamp of the last time the story was updated
published [int]: The timestamp of when the story was originally published
lang [str]: The language the story is written in
genres [list]: A list of the genres that the author categorized the story as
num_reviews [int]
num_favs [int]
num_follows [int]
num_words [int]: Total number of words in all chapters of the story
rated [str]: The story's fiction rating. i.e. K, K+, T, M

fanfiction.Scraper.scrape_story(story_id, keep_html=False)

Returns a dictionary with the metadata, chapters, and reviews of the story. The dictionary has the same attributes as the metadata attributes listed above plus the additional attributes:

chapters [dict]: A dictionary mapping from the chapter id (where the chapter id for the n-th chapter of the story is n) to the text of the chapter. The text is either stripped of HTML if keep_html is False or with the HTML intact if keep_html is True.
reviews [dict]: A dictionary mapping from the chapter id to a list of review dictionaries (see fanfiction.Scraper.scrape_reviews_for_chapter(story_id, chapter_id))

fanfiction.Scraper.scrape_chapter(story_id, chapter_id, keep_html=False)

Returns the text of the chapter either stripped of the HTML if keep_html is False or with the HTML intact if keep_html is True.

fanfiction.Scraper.scrape_reviews_for_chapter(story_id, chapter_id)

Returns a list of review dictionaries. Each review dictionary has the following attributes:

user_id [int]: The user id of the reviewer. If the review came from an unregistered user, then user_id is set to None.
time [int]: The timestamp of the review.
review [int]: The text of the review.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
fanfiction		fanfiction
.gitignore		.gitignore
MANIFEST		MANIFEST
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fanfiction

fanfiction

.gitignore

.gitignore

MANIFEST

MANIFEST

README.md

README.md

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

Fanfiction Scraper

Usage

Install

Example

Documentation

About

Releases

Packages

Languages

starrybouquet/fanfiction

Folders and files

Latest commit

History

Repository files navigation

Fanfiction Scraper

Usage

Install

Example

Documentation

About

Resources

Stars

Watchers

Forks

Languages