Skip to content
/ rippy Public
forked from JohnDoee/rippy

Scrape websites using Chrome

Notifications You must be signed in to change notification settings

fakegit/rippy

 
 

Repository files navigation

Rippy

image

Rip-it with Rippy

Introduction

Rippy is a downloader designed to scrape websites using a real web browser to find, e.g. video or downloadable files. The targets are website that try to be scrape-resistant and where other downloaders had to give up.

The magic is that Rippy uses a real browser it controls so a lot of the normal anti-bot designs are inefficient, e.g. scrambling javascript. To block Rippy you will have to block browsers. I also enjoy a blocking arms-race, keeps my day bright and fulfilled.

Installation

Currently the only distribution method officially provided is the docker-compose way but all it really requires is Chrome and Python.

wget https://github.com/JohnDoee/rippy-docker/raw/master/docker-compose.yml

You should edit docker-compose.yml. The following values should be changed

  • /tmp/media should be changed to where you want rippy to download data, it is in the file twice.
  • BASIC_AUTH_PASSWORD should be changed to a unique password
  • SECRET_KEY should be changed to something unique
  • Optional: Change RIPPY_CONCURRENCY to how many scrape and download threads you want to have.
docker-compose up -d

Usage

Head over to http://ip:51359 and add a job. It should start downloading or prompt you to do something manually.

If the status text says “Waiting” it means you need to open the browser and fill in a captcha or something alike. If you are using the docker-compose setup there should be a button in the upper-right corner of the website to open the browser. It will open a new window with a VNC to the hosted Chromium browser.

New scrapers

Feel free to request a new scraper but there are a few requirements if you want me to implement them: They are scrape resistant, as in, nobody else should be able to download. Check out tools like youtube-dl and JDownloader first. They should not be using an encryption or behind paywall, i.e. I can’t do stuff like netflix (something like that is also not the target at all)

Currently a generic video-site scraper is on the slab as this project is a merge between a reddit post and a generic video-site scraper

Accompanied repositories

Docker-compose file and docker chromium repository

Rippy webinterface

FAQ

Q

My tab crashed or elements on the website crashed, what should I do?

A

Close the tab, rippy should notice it shortly and try again.

TODO

  • [ ] Add (semi-)generic view player extractor
  • [ ] Return (potentially proxied) URL to video instead of downloading

Supported sites

  • Avgle

Docker images

Main backend component (this repository)

Webapp and reverse proxy

Chrome accessible via VNC

Logo / icon

frog by habione 404 from the Noun Project

License

MIT

About

Scrape websites using Chrome

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.9%
  • Dockerfile 1.6%
  • Shell 0.5%