What is the Census? According to the Australian Bureau of Statistics:
The Census of Population and Housing (Census) is Australia’s largest statistical collection undertaken by the Australian Bureau of Statistics (ABS). For more than 100 years, the Census has provided a snapshot of Australia, showing how our nation has changed over time, allowing us to plan for the future.
Analysis and the interactive webapp are bolted together as follows:
- Twitter 'tweet' data is streamed via the Twitter Streaming API using Python and Tweepy
- Streamed tweets are written to and stored in the NoSQL database MongoDB
- Tweets are 'enriched' using Sentiment Analysis using the Aylien Python API
- Tweets are geocoded (where latitude/longitude info is not available via Twitter) using the user's
location
tag via the Google Maps API - R is then use to produce the following:
- Sentiment coloured map pins using maps from Leaflet for R and Mapbox
- various Text Mining largely resulting in a Topic Modelling using Plotly
- various Twitter insight visualisations using Plotly
- all this is bolted together as an interactive visualisation using the amazeballs ShinyApps from RStudio
- the MongoDB instance and Python-hosting server are hosted on Amazon Web Services EC2
- the Python Twitter Streaming API script is managed on the AWS EC2 instance by a
cronjob
####The visualisation consists of three main tabs:
ShinyApp Tab | Content |
---|---|
Geocoded Tweets |
Map pins shaded by polarity or subjectivity |
Topic Model |
Topic Model showing topic development over time |
Other Twitter-y Stuff |
Various visualisation inc. wordcloud, top words, top users |
####Definitions for Interactive Census Tweet Explorer text analytics components:
Attribute | Description | Visualisation Use |
---|---|---|
polarity | Natural language processing was used to determine the overall polarity of the tweet - was it positive, negative or neutral. Polarity can be considered an indicator as to the emotional state being expressed in the tweet, such as angry, happy or indifferent | colouring |
subjectivity | Natural language processing was used to determine the overall subjectivity of the tweet - was it subjective or objective. This can be a difficult challenge as tweets may contain subjetive and objective terms. It can be considered a measure of a statement of fact versus opinion | colouring |
Note: sentiment and subjectivity analysis was undertaken using the Python API from Aylien |