Skip to content
This repository has been archived by the owner on Jul 18, 2023. It is now read-only.

merlinschumacher/prosaic

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

                               o
       _   ,_    __   ,   __,      __
     |/ \_/  |  /  \_/ \_/  |  |  /
     |__/    |_/\__/  \/ \_/|_/|_/\___/
    /|
    \|

prosaic

being a prose scraper & cut-up poetry generator

by nathanielksmith

using nltk

and licensed under the GPL.

what is prosaic?

prosaic is a tool for cutting up large quantities of text that a poet can then derive poetry from by writing templates that describe a poem line by line.

prerequisites

  • mongodb 2.0+ (see explanation at bottom)
  • python 3.4+
  • linux (it has also been verified to work fine on osx)
  • you might need some -dev libraries to get nltk to compile

quick start

$ sudo pip install prosaic
$ prosaic corpus loadfile pride_and_prejudice.txt
$ prosaic poem new

  -- and so I warn you.
  We_will_ know where we have gone
  Mr. Darcy smiled

slow start

# install mongodb / python3 / virtualenv for your platform
$ virtualenv poetry
$ source poetry/bin/activate
$ pip install prosaic
# wait a bit, nltk compiles some stuff
# find some text, maybe from project gutenberg
$ prosaic corpus loadfile pride_and_prejudice.txt
$ prosaic corpus loadfile call_of_cthulhu.txt
$ prosaic poem new -t sonnet

  Her colour changed, and she said no more.
  They saw much to interest, but nothing to justify inquiry.
  sir, I do indeed.
  Elizabeth could not but look surprised.
 `` I am talking of possibilities, Charles.''
 `` Can it be possible that he will marry her?''
 `` I am talking of possibilities, Charles.''
  He looked surprised, displeased, alarmed
 `` You can not be too much upon your guard.
  One Survivor and Dead Man Found Aboard.
  It had not been very great
 :-- but let me not interrupt you, sir.
  Mrs. Bennet said only,`` Nonsense, nonsense!''
  She could not bear such suspense

use as a library

from pymongo import MongoClient
from prosaic.nyarlathotep import process_text
from prosaic.cthulhu import poem_from_template

db = MongoClient().my_corpus_db.phrases
process_text("some very long string of text", "a name for this long string of text", db)

# poem_from_template returns raw line dictionaries from the database:
poem_lines = poem_from_template([{'syllables': 5}, {'syllables':7}, {'syllables':5}], db)

# pull raw text out of each line dictionary and print it:
print(list(map(lambda l: l['raw'], poem_lines)))

write a template

Templates are currently stored as json files (or passed from within code as python dictionaries) that represent an array of json objects, each one containing describing a line of poetry.

A template describes a "desired" poem. Prosaic uses the template to approximate a piece given what text it has in its database. Running prosaic repeatedly with the same template will almost always yield different results.

You can see available templates with prosaic template ls, edit them with prosaic template edit <template name>, and add your own with prosaic template new <template name>.

The rules available are:

  • syllables: integer number of syllables you'd like on a line
  • alliteration: true or false; whether you'd like to see alliteration on a line
  • keyword: string containing a word you want to see on a line
  • fuzzy: you want to see a line that happens near a source sentence that has this string keyword.
  • rhyme: define a rhyme scheme. For example, a couplet template would be: [{"rhyme":"A"}, {"rhyme":"A"}]
  • blank: if set to true, makes a blank line in the output. for making stanzas.

example template

[{"syllables": 10, "keyword": "death", "rhyme": "A"},
 {"syllables": 12, "fuzzy": "death", "rhyme": "B"},
 {"syllables": 10, "rhyme": "A"},
 {"syllables": 10, "rhyme": "B"},
 {"syllables": 8, "fuzzy": "death", "rhyme": "C"},
 {"syllables": 10, "rhyme": "C"}]

full CLI reference

  • prosaic corpus ls: list all the databases in your mongo server
  • prosaic corpus rm <database name>: delete (drop) a corpus
  • prosaic corpus loadfile <filename> -d <dbname>: add a new file of text to the corpus db specified with -d. dbname defaults to prosaic
  • prosaic poem new -t <template name> -d <dbname>: generate a poem using the template specified by -t and the corpus db specified by -d
  • prosaic template ls: list the templates prosaic knows about
  • prosaic template rm <template name>: delete a template
  • prosaic template edit <template name>: edit existing template using $EDITOR
  • prosaic template new <template name>: write new template using $EDITOR

how does prosaic work?

prosaic is two parts: a text parser and a poem writer. a human selects text files to feed to prosaic, who will chunk the text up into phrases and tag them with metadata. these phrases all go into a corpus (stored as a mongodb collection).

once a corpus is prepared, a human then writes (or reuses) a poem template (in json) that describes a desired poetic structure (number of lines, rhyme scheme, topic) and provides it to prosaic, who then uses the weltanschauung algorithm to randomly approximate a poem according to the template.

my personal workflow is to build a highly thematic corpus (for example, thirty-one cyberpunk novels) and, for each poem, a custom template. I then run prosaic between five and twenty times, each time saving and discarding lines or whole stanzas. finally, I augment the piece with original lines and then clean up any grammar / pronoun agreement from what prosaic emitted. the end result is a human-computer collaborative work. you are, of course, welcome to use prosaic however you see fit.

developing

Patches are more than welcome if they come with tests. Tests should always be green in master; if not, please let me know! To run the tests:

cd test
py.test

changelog

  • 3.5.2 - handle weird double escaping issues
  • 3.5.1 - fix stupid typo
  • 3.5.0 - prosaic now respects environment variables PROSAIC_DBNAME, PROSAIC_DBPORT and PROSAIC_DBHOST. These are used if not overriden from the command line. If neither environment variables nor CLI args are provided, static defaults are used (these are unchanged).
  • 3.4.0 - flurry of improvements to text pre-processing which makes output much cleaner.
  • 3.3.0 - blank rule; can now add blank lines to output for marking stanzas.
  • 3.2.0 - alliteration support!
  • 3.1.0 - can now install prosaic as a command line tool!! also docs!
  • 3.0.0 - lateral port to python (sorry hy), but there are some breaking naming changes.
  • 2.0.0 - shiny new CLI UI. run hy __init__.hy -h to see/explore the subcommands.
  • 1.0.0 - it works

why mongodb?

MongoDB is almost always the wrong answer to a given architectural question, but it is particularly well suited for prosaic's needs: no relational data (and none likely to crop up), no concerns about HA/consistency, and a well defined document structure.

further reading

About

cut-up poetry generation over large corpora

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%