Skip to content

A set of python and config files to allow you to download files from a URL

Notifications You must be signed in to change notification settings

aidanhiggins/python-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-downloader

A set of python and config files to allow you to download files from a URL

##Overview: These scripts allow you to configure a website URL where it will download all images over a configured size (100KB default). Currently the script processes each URL sequentially so can be slow to process, but it functionally works.

##Dependencies required (python libraries):

  • python3
  • urllib
  • BeautifulSoup4
  • logging
  • yattag
  • PIL

##Usage notes: Usage:
python3 testDownloader.py
python3 testDownloader.py -h -c -i -r

Options:
-h --help Show this screen.
-c --config Set the location of the configuration file to use.
-i --images Download images only (i.e. don't navigate to other pages.
-r --regenerate (Re)Generate the index.htm file for all of the generated output locations.

##Configuration options explained:
[URL]
base: http://imgur.com/r/funny The base URL to download
retries: 2 The HTTP retries setting - not currently being used
maxDepth: 0 How far to follow any links on the page - this setting is troublesome and should be left at the default
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36

[OUTPUT] Options for anything downloaded
; This should end with /
outputFolder: ./images/
includeDateInOutputName: true Include the date in the output folder name
outputTmpFolderPrefix: tmp_ The location to download any temporary files. This value is prefixed to the name above
; Measured in k
minimumFileSize: 100 The minimum file size to download, measured in Kb
minimumContentLength: 10000 The minimum content length for the file, as returned by the server
downloadVideos: true Whether or not to try to download videos
outputHTML: true Whether to output a HTML representation of the downloaded files

[LOGGER] The error log configuration
filename: downloader.log
level: DEBUG
includeDateInName: true

[ACCESSLOGGER] The access log configuration
filename: access.log
level: INFO
includeDateInName: true

[EXTENSIONS] File extensions listed here are attempted to be downloaded
type1: .png
type2: .jpg
type3: .jpeg
type4: .gif
type5: .bmp
type6: .webm
type7: .mp4

[EXCEPTIONS]
; blacklisted urls
url01: myfreecams.com URLs (or partial URLs) listed here are ignored for processing

About

A set of python and config files to allow you to download files from a URL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages