Skip to content
This repository has been archived by the owner on Jun 4, 2021. It is now read-only.
/ wikidump Public archive
forked from saffsd/wikidump

Tools to manipulate and extract data from wikipedia dumps

License

Notifications You must be signed in to change notification settings

diegocaro/wikidump

 
 

Repository files navigation

wikidump

Introduction

This module contains code for manipulating wikipedia dumps available from http://download.wikimedia.org/backup-index.html

Installation

This module is published on PyPI and can be installed with easy_install

For example:

easy_install wikidump

Alternatively, you can use pip:

pip install wikidump

I highly recommend using virtualenv to isolate the install environment.

For those on ubuntu systems, a built package is available in a PPA. Please go to the PPA for details on how to install from it.

Configuration

Upon first importing the module, a file 'wikidump.cfg' will be created. Modify the paths in this file to point to your data.

  • scratch : where indices are stores (must be writeable)
  • xml_dumps : where the xml dumps are located (can be read-only)

Usage

In addition to python modules, wikidump also comes with a command-line tool to quickly access wikidump functionality. Run wikidump help for a list of options.

Credits

About

Tools to manipulate and extract data from wikipedia dumps

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%