LiteratureReviewTools

Python tools used to extract publication information from the web and performing data analysis.

PREREQUISITES Anaconda with Python 3.X pip install untangle

TOOLS FOR EXTRACTING PUBLICATION INFORMATION

Science Direct; limited to 200 results per query request. Need an API key to access the database. This can be found by registering at http://dev.elsevier.com/index.html
PubMed; limited to 1000 results per query. No authentication needed as URL accessed with urllib.request.
Google Scholar; no limit on number of results, but at some point, Google will block the IP address if calling too much. URL is accessed with BeautifulSoup, and html tags are extracted relating to article content.

All results are saved in a csv file under the name of the query keywords.

DEVELOPMENT

Try to retrieve results from Web of Science database. Should be possible by using BeautifulSoup (Weekend)
Unsupervised learning algorithm to classify each research article in a cluster. Possible options for classification include:

Topic modelling algorithms such as LDA (Latent Dirichlet Analysis) or LSA (Latent Semantic Analysis)
Gensim Doc2Vec algorithm to represent each document as a set of vectors. Then maybe run clustering algorithm on it. Visualise with pyLDAVis.

Interested in contributing? Email me at andris.piebalgs14@outlook.com

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
__pycache__		__pycache__
machineLearning/LDA		machineLearning/LDA
searchTools		searchTools
testResults		testResults
testResults2		testResults2
toolbox		toolbox
README.md		README.md
runAll.py		runAll.py

Provide feedback