I'm studing to use web crawler to get the novels from websites.
my enviroment is:
Window8
Python 3.8.2
enviroment setup step:
-
install python3, and then setup python and pip to windows enviroemnt path.
https://www.python.org/download/releases/3.0/
During installing process, Please tick the checkbox【Add python 3.8.x to PATH】
and python would be add to windows enviroment path.
https://docs.python.org/3/using/windows.html -
install below python module.
pip install PyYAML
pip install lxml
pip install BeautifulSoup
pip install requests
pip install html2text
pip install opencc-python-reimplemented -
transter to epub and mobi, need below enviroment.
AozoraEpub3 (need Java)
https://w.atwiki.jp/hmdev/pages/21.html
OpenJDK (java), I try the newest jdk-14.0.1 on below site, and it works fine for AozoraEpub3.
need add it to windows enviroment path, too.
https://jdk.java.net/archive/
KindleGen
https://www.amazon.com/gp/feature.html?docId=1000765211
put kindleGen and AozraEpub3 together -
set AozoraEpub3 and KindleGen path, edit config/globals.yaml or excute below command to setup.
n.bat init
-
get command infomation.
n.bat help -
get the free chapters from qidian, run below command on cmds
n.bat download https://book.qidian.com/info/1010868264
or
n.bat d https://book.qidian.com/info/1010868264
https://github.com/eight04/ComicCrawler