Installation

This is a crawling tool to obtain article information of a certain journal from www.ieeexplore.org

Installation

Install mongodb.

To verify if mongodb is installed successfully, type the following codes in the shell:
```
mongod --version
```
Install python 3.x, and pip. (You can use virtualenv if you like it)
For linux, install libxml2 and libxslt. For windows, download lxml model from here, then type the following codes:
```
cd /path/to/ieee-crawler
# open virtualenv if you like
pip install wheel
pip install /path/to/lxml-x.x.x.whl
```

Type the following codes in the shell

cd /path/to/ieee-crawler
# open virtualenv if you like
pip install -r requirements.txt

How to Use

Type the following codes in the shell

cd /path/to/ieee-crawler
mongod --dbpath db

Open another shell, and type the following codes:
```
cd /path/to/ieee-crawler
python run.py <number-of-journal> <mode>
```
where:
- number-of-journal: the number of the journal you are interested in. For instance, the number of "IEEE Transactions on Smart Grid" is 5165411
- mode: there are three modes, i.e. "current"(current issue), "early"(early access), "new"(new articles)
All article information will be saved into the database. If you choose Mode "new", articles already in the database will not be crawled again. Mode "new" can be used when you want to get the most recent articles that you've never watched before.
The results are saved in directory "out". There are 3 kinds of files:
- [number-of-journal]_current_issue.txt: for current issue
- [number-of-journal]_early_access.txt: for early access
- [number-of-journal]_new_articles.txt: for new articles
the name and abstract will be shown in these files.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
app		app
db		db
out		out
tests		tests
.gitignore		.gitignore
README.md		README.md
circle.yml		circle.yml
crawl.py		crawl.py
requirements.txt		requirements.txt
run.py		run.py
web.py		web.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

db

db

out

out

tests

tests

.gitignore

.gitignore

README.md

README.md

circle.yml

circle.yml

crawl.py

crawl.py

requirements.txt

requirements.txt

run.py

run.py

web.py

web.py

Repository files navigation

Installation

How to Use

About

Releases

Packages

Languages

cxsmarkchan/ieee-crawler

Folders and files

Latest commit

History

Repository files navigation

Installation

How to Use

About

Resources

Stars

Watchers

Forks

Languages