Some stuff with bots

Dependencies

For parsing blogs, use iPython notebook.

pip install ipython

Get the whole scipy stack, I can't remember which ones are necessary.

Ubuntu:

sudo apt-get install python-numpy python-scipy ipython ipython-notebook python-pandas python-sympy python-nose
sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran

On MacOS, you can just get by with Anaconda:

https://www.continuum.io/downloads

Other web scraping tools:

pip install requests
pip install beautifulsoup4
pip install selenium

Semantic tools

Get the semantics library and natural language toolkit.

pip install --upgrade gensim
pip install nltk

Download the nltk packages.

python
>>> import nltk
>>> nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> punkt
    Downloading package punkt to /home/ubuntu/nltk_data...
      Unzipping tokenizers/punkt.zip.
      
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> stopwords
    Downloading package stopwords to /home/ubuntu/nltk_data...
      Unzipping corpora/stopwords.zip.

Downloader> q
True
>>> exit()

Twitter

pip install TwitterAPI

Scrape some blogs

Open blogparser.ipynb in iPython Notebook. Go through the steps to scrape data. Everything gets dumped in /data/

Run this stuff

Edit the variables in run.py

python run.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
blogparser.ipynb		blogparser.ipynb
chromedriver		chromedriver
run.py		run.py
semantic_tools.py		semantic_tools.py
test.py		test.py
twitter_tools.py		twitter_tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

blogparser.ipynb

blogparser.ipynb

chromedriver

chromedriver

run.py

run.py

semantic_tools.py

semantic_tools.py

test.py

test.py

twitter_tools.py

twitter_tools.py

Repository files navigation

Some stuff with bots

Dependencies

Other web scraping tools:

Semantic tools

Twitter

Scrape some blogs

Run this stuff

About

Releases

Packages

Languages

elaineo/twitbots

Folders and files

Latest commit

History

Repository files navigation

Some stuff with bots

Dependencies

Other web scraping tools:

Semantic tools

Twitter

Scrape some blogs

Run this stuff

About

Resources

Stars

Watchers

Forks

Languages