Skip to content

Count words in Wikipedia articles on multiple machines

Notifications You must be signed in to change notification settings

phueb/WikiCount

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiCount

Research code to count words in English Wikipedia 2018.

Requirements

  • access to the file server at the UIUC Learning & Language Lab is required. That is where Wikipedia articles are stored.
  • Ludwig - a Python package for parallel execution of jobs

Usage

Use the ludwig CLI to run all jobs (on your local machine or remote workers owned by the lab).

ludwig

Each job will do the following: One Python pickle file will be saved for each Wikipedia article folder included in the job. This file contains a list of Python dictionaries, each containing information about the number of times a word occurs in one article.

Compatibility

Tested on Ubuntu 18.04 using Python 3.6.

About

Count words in Wikipedia articles on multiple machines

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages