Visualization and natural language processing project for understanding the "nature of podcasts" and seeing their human listeners through topic analysis.
Note that this project is now effectively in archive.
Podcasts are typically periodic audio-only Internet 'radio shows' that, like any media, act as a lens into the cultures producing and consuming them. Once niche, around 25% of Internet users listen to podcasts according to the Pew Research Center. Although, being online they also reach an international community and some also play on typical radio stations.
What is the nature of these podcasts? Can they say anything about their audiences? Can they let us see into emerging cultural trends? This project tries to take a first stab at answering those questions through a mixture of natural language processing and data visualization.
This project is made up of a pipeline of data parsing, processing, and visualization. Each piece is contained within a directory inside of this repository. Each component has its own README with more specific documentation.
- parse: Python scripts to download and process podcast episode information including some natural language processing.
- combine: Python scripts to combine the results of logic for individual podcasts into a single dataset suitable for processing and visualization.
- podcast_viz_local: Prototype of the podcast anthropology visualization. This desktop tool written in Processing has been superceeded by its p5js equivalent.
- podcast_viz_web: Podcast anthropology visualization written in p5js that represents the "end product" of this project.
Each component of the pipeline has different local development environment setup instructions:
- The parsing scripts require some python modules which can be installed via pip. See the parse directory for additional details.
- The combine scripts use the Python standard library and only require the standard Python 2.7 distribution.
- The local podcast visualization (podcast_viz_local) only requires the standard Processing distribution. It can run under both the 2.x and 3.x series.
- The web-based podcast visualization requires both JS and Python libraries. See the podcast_viz_web directory for additional details.
Podcast parsing has automated unit testing available using the standard Python unit testing module. At present, unfortunately this was a personal project done in-between jobs on vacation and no other pipeline components are under code-coverage. See the parse directory for additional details.
All Python logic should follow PEP 0008 with Epydoc strings on all modules, classes, and functions / methods. Furthermore, all parsing logic should have 80% or more coverage via automated test.
All Javascript should include JSDoc strings and should follow Google JS guidelines except that Singleton Object Classes are allowed.
All code under MIT License.
Note that all of the podcasts listed are external services. We love our podcasters and you should too. this project believes our media including podcasts and radio shows are an important lens into the cultures producing them. This is a tool meant for anthropological research not scraping, aggregation, freebooting, etc. Please use with the utmost love and care. <3
This project gave money to all of the podcasts included (either as Sam Pottinger or as Podcast Anthropology). If you like podcasts, consider throwing them some spare change too. Here are the links:
I <3 Open Source. Here's what the project uses:
- Colebourne, Stephen: Joda-Time under the Apache 2.0 License.
- Denicola, Domenic: Domenic/dict under the DWTFYW Public License.
- JQuery Foundation: JQuery under the MIT License.
- NLTK Project: NLTK under the Apache 2.0 License.
- Processing Foundation: p5js under the GNU LGPL v2 License.
- Processing Foundation: Processing core under the GNU LGPL v2 License.
- Rich, Micah, Caroline Hadilaksono, and Tyler Finck: League Spartan under the Open Font License.
- Richardson, Leonard: Beautiful Soup under the MIT License.
- Ronacher, Armin: Flask under the BSD License.
- Schwartz, Barry: Fanwood under the Open Font License
- Vepsäläinen, Juho: Bebraw/setjs under the MIT License.
- Wood, Tim, and Iskren Cherne: Moment.js under the MIT License.
- Colebourne, Stephen. "Joda-Time." Joda.org. Joda Project, n.d. Web. 15 Apr. 2015.
- Denicola, Domenic. "Domenic/dict." Github. N.p., n.d. Web. 15 Apr. 2015.
- DiMeo, Nate. "Episodes." The Memory Palace. Nate DiMeo, n.d. Web. 15 Apr. 2015.
- Glass, Ira. "Radio Archive by Date." This American Life. Chicago Public Media, n.d. Web. 15 Apr. 2015.
- Grey, CGP, and Brady Haran. "Hello Internet." Hello Internet RSS Feed. N.p., n.d. Web. 15 Apr. 2015.
- Harris, Jonathan. "We Feel Fine." Number 27. N.p., 2006. Web. 15 Apr. 2015.
- "JQuery." JQuery. JQuery Foundation, n.d. Web. 15 Apr. 2015.
- Kamvar, Sep, Sep Kamvar, and Jonathan Jennings Harris. We Feel Fine: An Almanac of Human Emotion. New York: Scribner, 2009. Print.
- Mars, Roman. "Episode." 99% Invisible. PRX, n.d. Web. 15 Apr. 2015.
- Munzner, Tamara. "15 Views of a Node Link Graph." YouTube. Google Inc, 22 Aug. 2012. Web. 15 Apr. 2015.
- "NLTK 3.0 Documentation." Natural Language Toolkit. NLTK Project, n.d. Web. 15 Apr. 2015.
- "P5.js." P5.js. Processing Foundation, n.d. Web. 15 Apr. 2015.
- "Podcasts." Radiolab. WYNC, n.d. Web. 15 Apr. 2015.
- "Processing.org." Procesing.org. Processing Foundation, n.d. Web. 15 Apr. 2015.
- "Radiolab Archive." Radiolab. WNYC, n.d. Web. 15 Apr. 2015.
- Rees, Kim. "Living, Breathing Data." YouTube. Bocoup LLC, 5 June 2013. Web. 15 Apr. 2015.
- Rich, Micah, Caroline Hadilaksono, and Tyler Finck. "League Spartan." The League of Moveable Type. A Good Company, n.d. Web. 15 Apr. 2015.
- Richardson, Leonard. "Beautiful Soup." Beautiful Soup. N.p., n.d. Web. 15 Apr. 2015.
- Sanchez, Gaston. "Star Wars Arc Diagram." N.p., 3 Feb. 2013. Web. 15 Apr. 2015.
- Schwartz, Barry. "Fanwood." The League of Moveable Type. A Good Company, n.d. Web. 15 Apr. 2015.
- Shiffman, Daniel, Shannon Fry, and Zannah Marsh. The Nature of Code. New York: Interactive Telecommunications Program at New York U, 2012. Print.
- Stefaner, Moritz. "Elastic Lists." Http://archive.stefaner.eu. N.p., n.d. Web. 15 Apr. 2015.
- Vepsäläinen, Juho. "Bebraw/setjs." GitHub. N.p., n.d. Web. 15 Apr. 2015.
- Victor, Bret. "Media for Thinking the Unthinkable." Vimeo. MIT Media Lab, 4 Apr. 2013. Web. 15 Apr. 2015.
- Wood, Tim, and Iskren Cherne. "Moment.js." Moment.js. N.p., n.d. Web. 15 Apr. 2015.
- Zickuhr, Kathryn. "Over a Quarter of Internet Users Download or Listen to Podcasts." Fact Tank. Pew Research Center, 27 Dec. 2013. Web. 15 Apr. 2015.
I love this project dearly but, unfortunately, I wrote this while between jobs and my new gig precludes my continued involvement. I uploaded lingering changes to help future work but alas I have written my last for now. If you are interested in picking up the baton, shoot me a note or fork. :)