Skip to content

pgr-me/4cat

 
 

Repository files navigation

4CAT: Capture and Analysis Toolkit

4CAT is a tool that can be used to analyse and process data from forum-like platforms (such as Reddit, 4chan or Telegram) for research purposes.

A "forum", to 4CAT, is any data structure that can be represented in terms of threads and posts. This includes traditional forums and imageboards, but may also encompass other types of websites such as blogs (where each blog post is a thread) or even Facebook pages (which also contain posts with comments).

By default, 4CAT has a number of data sources corresponding to popular forums that can be configured to retrieve data from those platforms, but you can also add additional data sources with relatively little trouble as long as you keep the data structure 4CAT expects in mind.

4CAT was created by OILab and the Digital Methods Initiative at the University of Amsterdam. The tool was inspired by the TCAT, a tool with comparable functionality that can be used to scrape and analyse Twitter data.

4CAT is multiple things:

  • A search engine for scraped corpora
  • A transparent and modular analysis toolkit
  • A means to produce traceable and reproducible digital media research

Those things combined provide a "Capture and Analysis Toolkit", a suite of tools through which discourse on forums may be analysed and processed. The goal is to provide a straightforward aid for *chan and forum research, through which such platforms - often described as amorphous, volatile, or ephemeral - may be analysed from various epistemological perspectives.

Install

We use 4CAT for our own purposes at the Universtiy of Amsterdam but you can (and are encouraged to!) You can find detailled install instructions in our wiki:.

Install using docker-compose simply by running:

docker-compose up

Components

4CAT consists of several components, each in a separate folder:

  • backend: A standalone Python 3 app that scrapes defined data sources, downloads and stores the relevant data and performs searches and analyses as queued by the front-end.
  • webtool: A Flask app that provides a web front-end to search and analyze the stored data with.
  • datasources: Data source definitions. This is a set of configuration options, database definitions and python scripts to process this data with. If you want to set up your own data sources, refer to the wiki.
  • processors: A collection of data processing scripts that can plug into 4CAT and manipulate or process datasets created with 4CAT. There is an API you can use to make your own processors.

Contributing

This section yet to be written!

License

4CAT is licensed under the Mozilla Public License, 2.0. Refer to the LICENSE file for more information.

Links

About

4CAT: Capture and Analysis Toolkit

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 77.9%
  • HTML 10.4%
  • JavaScript 7.7%
  • CSS 2.7%
  • TSQL 1.1%
  • Dockerfile 0.1%
  • Shell 0.1%