Skip to content

Tool for automatic mirroring of DeviantArt, HentaiFoundry, Pixiv, and FurAffinity

Notifications You must be signed in to change notification settings

mastic544/xA-Scraper

 
 

Repository files navigation

xA-Scraper

This is a automated tool for scraping content from a number of art sites:

  • DeviantArt
  • FurAffinity
  • HentaFoundry
  • Pixiv
  • Tumblr art blogs
  • SoFurry
  • Weaslyl

To Add:

It also has grown a lot of other functions over time. It has a fairly complex, interactive web-interface for browsing the local gallery mirrors, and a tool for uploading galleries to aggregation sites (currently exhentai) is in progress.

It has a lot in common with my other scraping system, MangaCMS. They both use the same software stack (CherryPy, Pyramid, Mako, BS4). This is actually an older project, but I did not decide to release it before now.

Dependencies:

  • Postgres >= 9.3
  • CherryPy
  • Pyramid
  • Mako
  • BeautifulSoup 4
  • others

DB setup is left to the user. xA-Scraper requires it's own database, and the ability to make IP-based connections to the hosting PG instance. The connection information, DB name, and client name must be set by copying settings.base.py to settings.py, and updating the appropriate strings.

Settings.py is also where the login information for the various plugins goes.

Disabling of select plugins can be accomplished by commenting out the appropriate line in main.py. The JOBS list dictates the various scheduled scraper tasks that are placed into the scheduling system.

The preferred bootstrap method is to use run.sh from the repository root. It will ensure the required packages are available (build-essential, libxml2 libxslt1-dev python3-dev libz-dev), and then install all the required python modules in a local virtualenv. Additonally, it checks if the virtualenv is present, so once it's created, ./run.sh will just source the venv, and run the scraper witout any reinstallation.

Currently, there are some aspects that need work. The artist selection system is currently a bit broken, as I was in the process of converting it from being based on text-files to being stored in the database. Currently, there isn't a clean way to remove artists from the scrape list, though you can add or modify them.


Anyways, Pictures!

These are a few DeviantArt Artists culled from the Reddit ImaginaryLandscapes subreddit.

The web-interface has a lot of fancy mouseover preview stuff. Since this is primarily intended to run off a local network, bandwidth concerns are not too relevant, and I went a bit nuts with jQuery.

Basic Popups

There is also a somewhat experimental "gallery slice" viewing system, where horizontal mouse movement seeks through a spaced sub-set of each artist's images. The artist is determined by the row, and each horizontal 10 pixels is a different image.

Fancy Popups

Lastly, there is also a basic, chronological view of each artist's work, though it does support infinite-scrolling for their entire gallery. The scraper also preserves the description that preserves each item, and it is presented with the corresponding image.

About

Tool for automatic mirroring of DeviantArt, HentaiFoundry, Pixiv, and FurAffinity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 77.2%
  • JavaScript 12.8%
  • Mako 8.9%
  • Other 1.1%