Skip to content

wikicaptcha is an attemp to build a ReCAPTCHA-like service using books digitized in Wikisource.

License

Notifications You must be signed in to change notification settings

CristianCantoro/wikicaptcha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

########################################
#                                      #
# wikicaptcha                          #
#                                      #
########################################

wikicaptcha is an attemp to build a ReCAPTCHA-like[1], CAPTCHA system[2] to
help correcting errors found in digitalization and collaborative transcriptions
done on Wikisource[3], a Wikipedia's sister project.

Wikisource aims to build a free (as in freedom) digital library, books in 
Wikisource are released with a Creative Commons BY-SA license or are otherwise
available to redistribute, copy, modify (e.g books which are in the Public 
Domain)


== History ==
The history of the idea of using CAPTCHAs (and ReCAPTCHA in particular) on
Wikipedia to aid digitization of books is a long one, as testified by its 
insertion among the "Perennial proposals" page[4].

However, in the case of Wikisource, the presence of a considerable amount of
digitized books led some Wikisources in the Italian language community[5]
to wonder if it was possible to use a similar system to help in the steps which
characterize the insertion of a book in Wikisource, expecially in the 
proofreading phase.

An user of it.wikisource, Alex brollo[6] discovered that OCRed Djvu files
contained a text layer which could be extracted and analyzed. The characters 
for which the OCR was dubious were marked with a caret ("^").

The idea is to set a system which exploits these features to build a ReCAPTCHA-
like system to help correct for these dubious recognitions, as first proposted
here[7].

== Sintax & options ==
Usage: wikicaptcha [-h] [-v] [-p PAGE] [--debug] infile

positional arguments:
  infile                the input (.djvu) file

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -p PAGE, --page PAGE  first page to process
  --debug               turn on debug messages

== Examples ==
In the 'test' directory you'll find a couple of djvu suitable to make test with
wikicaptcha. Try, for example:

./wikicaptcha test/Horse.djvu 

== Authors ==
See AUTHORS file for details.

== License ==
See COPYING file for details.

== Further reading == 
* http://lists.wikimedia.org/pipermail/wikisource-l/2011-February/000939.html

== References ==
[1]http://en.wikipedia.org/wiki/reCAPTCHA
[2]http://en.wikipedia.org/wiki/Captcha
[3]http://wikisource.org/wiki/Main_Page
[4]http://en.wikipedia.org/wiki/Wikipedia:Perennial_proposals#Use_reCAPTCHA
[5]http://it.wikisource.org/wiki/Pagina_principale
[6]http://it.wikisource.org/wiki/Utente:Alex_brollo
[7]http://lists.wikimedia.org/pipermail/wikitech-l/2011-November/056078.html

About

wikicaptcha is an attemp to build a ReCAPTCHA-like service using books digitized in Wikisource.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages