This is a crawling tool to obtain article information of a certain journal from www.ieeexplore.org
-
Install mongodb.
To verify if mongodb is installed successfully, type the following codes in the shell:
mongod --version
-
Install python 3.x, and pip. (You can use virtualenv if you like it)
-
For linux, install libxml2 and libxslt. For windows, download lxml model from here, then type the following codes:
cd /path/to/ieee-crawler # open virtualenv if you like pip install wheel pip install /path/to/lxml-x.x.x.whl
-
Type the following codes in the shell
cd /path/to/ieee-crawler # open virtualenv if you like pip install -r requirements.txt
-
Type the following codes in the shell
cd /path/to/ieee-crawler mongod --dbpath db
-
Open another shell, and type the following codes:
cd /path/to/ieee-crawler python run.py <number-of-journal> <mode>
where:
- number-of-journal: the number of the journal you are interested in. For instance, the number of "IEEE Transactions on Smart Grid" is 5165411
- mode: there are three modes, i.e. "current"(current issue), "early"(early access), "new"(new articles)
All article information will be saved into the database. If you choose Mode "new", articles already in the database will not be crawled again. Mode "new" can be used when you want to get the most recent articles that you've never watched before.
-
The results are saved in directory "out". There are 3 kinds of files:
- [number-of-journal]_current_issue.txt: for current issue
- [number-of-journal]_early_access.txt: for early access
- [number-of-journal]_new_articles.txt: for new articles
the name and abstract will be shown in these files.