Skip to content

2mh/wahatttt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

=========================
wahatttt - wh4t software:
=========================

Goals:
------
The goal of the project "wh4t" (shortened form of "wahatttt") 
is to create a text-technologically traversed archive of texts
originating from Herwart Holland-Moritz (also known as Wau Holland,
or simply "[Ww]au", born 1951, died 2001). The idea is to classify
text material of Wau upon topics and (in a later phase) to visualize
this information in a suitable way for the interested public.

What should emerge is (kind of) a picture of the (varying) 
areas Wau was active in, presented in a structured way.

Wau was one of the founding figures of the Chaos Computer Club (CCC), 
the biggest Hacker Foundation in Europe, with today hackerspaces all
over Europe, which are directly or indirectly related / connected
to the CCC.

After his (early) death in 2001 the Wau Holland Foundation
(German: Wau-Holland-Stiftung or WHS) emerged in order to make his works
public and keep his ideas and ideals alive.


Setting of this project:
------------------------
This project emerges from an idea and some prior work done by
Bernd Fix <bernd@wauland.de> of the WHS. However, it's now 
(independently) carried out as a student programming project at 
the University of Zurich (UZH) by Hernani Marques
<h2m@access.uzh.ch>.


Data/source Text:
------------------
The text material used -- for the time being -- comes from the
debate@ FITUG mailing list. FITUG stands for
"Förderverein Informationstechnik und Gesellschaft", roughly
meaning "Booster club (for) information technology and society". 
From a corpus point of view the texts are likely to be full of
computational technical terms, but also enriched with expressions
from political/sociological fields.

Its (prior) archives are publicly available:
* http://www.fitug.de/debate/index.html

Wau used to post on debate@ between 1996 and 2000.

Further data may be gathered from the (Google) USENET archives
or from (not yet publicly) available material being held in Berlin,
Germany.

This additional text material will be subject of inclusion in later project
phases.


what or wahatttt is this name about?
------------------------------------
"wahatttt" stands for "what" in a longish, not quite correctly
spelled version for the average English speaker; however, it's 
actually considered being a word by the (not quite
representative) Urban Dictionary:
* http://www.urbandictionary.com/define.php?term=Wahatttt

Alternatively, it may be an acronym for the following (longish)
constructions; in German:
(1) WAu Holland-Archiv - TexT-Technologisch(-)Traversiert
(2) WAu Hacker-Archiv - TexT-Technologisch(-)Traversiert

Or, put in English, it may (further) stand for:
(3) WAu Holland Archive -- TexT-Technologically(-)Traversed
(4) WAu Hacker Archive -- TexT-Technologically(-)Traversed

All these ambiguities -- starting already in the name of the
project -- should symbolize the difficulties this project (naturally) 
faces, as it deals with natural language processing.

And, most importantly: This project aims at maximizing not only
the outcome, but also the fun factor.


Software requirements:
----------------------
wh4t was tested with the following software:

$ java -version 
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8.13) (6b18-1.8.13-0+squeeze1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

$ python -V # Using nltk-2.0.1rc4 and some other libraries (to be listed)
Python 2.6.6


First exploration:
------------------
Invoke the following commands:

git clone https://github.com/2mh/wahatttt.git
cd wahatttt/resources
./create_nouns_file.sh
cd ../src/
javac GrabFitug.java
java GrabFitug
./wh4t_xml_checker.py
./wh4t_explore.py

Neural clustering:
-------------------
Invoke the following commands:

git clone https://github.com/2mh/wahatttt.git
cd wahatttt/resources
create_nouns_file.sh
cd ../src/
javac GrabFitug.java
java GrabFitug
./wh4t_nn_classificator.py # Caution: May need several gigs of RAM

Common term-based clustering:
-----------------------------
Invoke the following commands:

git clone https://github.com/2mh/wahatttt.git
cd wahatttt/resources
create_nouns_file.sh
cd ../src/
javac GrabFitug.java
java GrabFitug
./wh4t_stems_classificator.py # Works quite fast

Releases

No releases published

Packages

No packages published