Skip to content

darkliquid/NaNoGenMo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NaNoGenMo

My entry for NaNoGenMo. Currently investigating various methods of analysis on corpus text in order to come up with some kind of engine for generating a few different kinds of sequences. Thinking about this in layers, I'm trying to split up generation into several different phases, from individual sentences to high-level plot themeatics.

I'm looking at hand-building some generators based of rules from various story- telling and roleplaying games such as FATE, Fiasco and Microscope, then combining those with stuff derived from the corpus text analysis.

None of this is likely to end well.

Corpus

I'm making use of a few hand-picked novels from Project Gutenburg, namely:

From which I stripped the non-novel text out to make processing easier.

I'm also using various corpora from the NLTK project, namely gutenberg, abc, reuters, brown and movie_reviews as well as a lovecraft corpus found here: https://raw.github.com/jiko/lovecraft_ebooks/master/corpus.txt

Resources

So far, to generate the various data I'm using, I've grabbed databases and lists from a variety of sources. The current list includes:

Names http://stackoverflow.com/questions/1803628/raw-list-of-person-names

Titles http://www.gutenberg.org/dirs/GUTINDEX.ALL

US Cities http://wiki.skullsecurity.org/images/5/54/US_Cities.txt

Job Titles http://www.bls.gov/soc/soc_2010_direct_match_title_file.xls

Adjectives http://www.enchantedlearning.com/wordlist/adjectives.shtml

Nouns http://www.momswhothink.com/reading/list-of-nouns.html

Tools

The Dada Engine http://dev.null.org/dadaengine/

About

National Novel Generation Month. Because.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 70.8%
  • Ruby 29.2%