Logs analysis demo samples used to demonstrate data analysis with BigQuery and IPython at Google I/O 2014.
-
Anomaly Detection with Request Logs
This demonstrates working with HTTP logs to create metrics (such as 99th percentile latency or request volume) using BigQuery, and then running the resulting time-series through an anomaly detector implemented in Python, and plotting the time-series as well as detected anomalies.
-
Hotspot Detection from GPS Logs
This demostrates working with a GPS stream (in particular using sample data from Uber taxis in SFO over a week) and aggregating the readings in a spatial manner, to determine and render activity hotspots.
You can execute the notebooks locally using IPython and the included sample code, once you've uploaded the sample data into BigQuery within your cloud project.
- Unzip the sample data in to the data directory.
- Upload them into a Google Cloud Storage bucket within your cloud project
- Go to the BigQuery console
- Create a dataset named
requestlogs
and then a table namedlogs
within it. Use the request logs data to populate the table. - Repeat for a dataset named
uberlogs
and a tablelogs
within it. Use the uber logs sample data to populate the table.
- Install the Google cloud SDK (which installs the gcloud tool).
- Run
gcloud auth login
to perform a login operation and authorize your local development machine. - Run
gcloud config set project <cloud project name>
to configure the active project. - Install node.js if you don't already have it installed locally.
- Run
node misc/metadata.js
- this runs a local emulation of the Google Cloud Metadata service which is used by the sample python code to authorize queries issued to BigQuery. - Install IPython if you don't already have it. I used the Anaconda distribution.
- Start IPython using the
run.sh
script included. - Within the browser, select the notebook, or create a new one.