Skip to content

rdghosal/Honyaku.py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Honyaku.py

Need to translate a website but not sure if you will (or want to) catch every word via copy-paste?
Honyaku can help!

Usage

python honyaku.py URL -d DIRECTORY -f FORMAT {csv, txt} -L LANGUAGE

This will save all the text from every webpage internal to the initially input URL
as either a text file or CSV (default). Don't worry about duplicates! Honyaku is smart enough to avoid copying the same page twice. Even if you're not sure what language you're translating, Honyaku will ask Google Translate to guess!

OR to check your English translation, try the following command!

python honyaku.py URL -c 

Note

If you didn't know, Honyaku uses scraping libraries for its magic!
While BeautifulSoup is its first choice, if in scraping for links it finds something that doesn't resemble a URL,
Honyaku will then summon Selenium to parse what's probably a dynamic webpage.
Thus, you'll need to have downloaded the chromedriver matching your version of Google Chrome to run Honyaku error-free.
The chromedriver can be downloaded here: https://chromedriver.chromium.org/downloads

To see what file outputs look like, take a look at the examples folder!

About

Console application to help website translations by scraping text from a target web page and setting up files for the translator.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages