Skip to content

pk07449/aws-lambda-scala-example-project

 
 

Repository files navigation

AWS Lambda Scala Example Project

[ ![Build Status] travis-image ] travis [ ![Release] release-image ] releases [ License license-image ] license

Introduction

This is an example [AWS Lambda] aws-lambda Scala application for processing a [Kinesis] aws-kinesis stream of events ([introductory blog post] blog-post). It reads the stream of simple JSON events generated by our event generator. Our AWS Lambda function aggregates and buckets events and stores them in [DynamoDB] aws-dynamodb.

This was built by the Data Science team at [Snowplow Analytics] snowplow, who use AWS Lambda in their projects.

Running this requires an Amazon AWS account, and will incur charges.

See also: AWS Lambda Node.js Project | Spark Streaming Example Project

Overview

We have implemented a super-simple analytics-on-write stream processing job using AWS Lambda. Our AWS Lambda function, written in Scala that runs on the Java8 JVM, reads a Kinesis stream containing events in a JSON format:

{
  "timestamp": "2015-06-05T12:54:43.064528",
  "eventType": "Green",
  "id": "4ec80fb1-0963-4e35-8f54-ce760499d974"
}

Our job counts the events by eventType and aggregates these counts into 1 minute buckets. The job then takes these aggregates and saves them into a table in DynamoDB:

dynamodb-table-image

Developer Quickstart

Assuming git, [Vagrant] vagrant-install and [VirtualBox] virtualbox-install installed:

 host$ git clone https://github.com/snowplow/aws-lambda-scala-example-project.git
 host$ cd aws-lambda-scala-example-project
 host$ vagrant up && vagrant ssh
guest$ cd /vagrant
guest# sbt assembly

Tutorial

You can follow along in [the release blog post] blog-post to get the project up and running yourself.

The following steps assume that you are running inside Vagrant, as per the Developer Quickstart above.

1. Setting up AWS credentials

First we need to configure a default AWS profile:

$ aws configure
AWS Access Key ID [None]: ...
AWS Secret Access Key [None]: ...
Default region name [None]: us-east-1
Default output format [None]: json

2. Setup Amazon Kinesis, DynamoDB, and IAM Role

Now we create our Kinesis event stream:

$ inv create_kinesis_stream my-stream
Kinesis Stream [my-stream] not active yet
Kinesis Stream [my-stream] not active yet
Kinesis Stream [my-stream] not active yet
Kinesis successfully created.

Now create our DynamoDB table:

$ inv create_dynamodb_table default us-east-1 my-table

Now we can create our IAM role. We will be using [CloudFormation] [cloudformation] to make our new role. Using inv create_role, we can create it like so:

$ inv create_role
arn:aws:cloudformation:us-east-1:84412349716:stack/LambdaStack/23a341eb0-4162-11e5-9d4f-0150b34c7c
Creating roles
Still creating
Giving Lambda proper permissions
Trying...
Created role

3. Build the Scala project jar

Let's build our Scala project into a fully self contained jar file.

$ sbt assembly
info] Loading project definition from /aws-lambda-scala-example-project/project
[info] Set current project to aws-lambda-scala-example-project (in build file:/aws-lambda-scala-example-project/)
[info] Including from cache: scala-reflect-2.11.4.jar
...
[warn] Merging 'rootdoc.txt' with strategy 'first'
[warn] Strategy 'discard' was applied to 62 files
[warn] Strategy 'first' was applied to a file
[info] SHA-1: 96401bbad71968267ccea4c479a7d39093ef8988
[info] Packaging /Volumes/DataDrive/dev/aws-lambda-scala-example-project/target/scala-2.11/aws-lambda-scala-example-project-0.2.0.jar ...
[info] Done packaging.
[success] Total time: 59 s, completed 13-Aug-2015 10:40:05 AM

4. Upload project jar to Amazon S3.

We will create a S3 Bucket for the jar file to be picked up by AWS Lambda. We will upload the jar file to the Amazon S3 service using our custom uploader inv upload_s3.

$ inv upload_s3
Jar uploaded to S3 aws_scala_lambda_bucket

5. Configure AWS Lambda service

Now that we have built the project, and uploaded the jar file to the AWS Lambda service, we need to configure the Lambda service to watch for event traffic from our AWS Kinesis stream named my-stream.

$ inv create_lambda
Creating AWS Lambda function.
{
    "FunctionName": "ProcessingKinesisLambdaDynamoDB",
    "CodeSize": 38042279,
    "MemorySize": 1024,
    "FunctionArn": "arn:aws:lambda:us-east-1:842349429716:function:ProcessingKinesisLambdaDynamoDB",
    "Handler": "com.snowplowanalytics.awslambda.LambdaFunction::recordHandler",
    "Role": "arn:aws:iam::842340234716:role/LambdaStack-LambdaExecRole-7G57P4M2VV5P",
    "Timeout": 60,
    "LastModified": "2015-08-13T19:39:46.730+0000",
    "Runtime": "java8",
    "Description": ""
}

Now we can associate our Lambda with our Kinesis stream:

$ inv configure_lambda my-stream
Configured AWS Lambda service.
Added Kinesis as event source for Lambda function.

6. Sending events to Kinesis

We need to start sending events to our new Kinesis stream. We have created a helper method to do this - run the below and leave it running:

$ inv generate_events default us-east-1 my-stream
Event sent to Kinesis: {"timestamp": "2015-06-05T12:54:43.064528", "type": "Green", "id": "4ec80fb1-0963-4e35-8f54-ce760499d974"}
Event sent to Kinesis: {"timestamp": "2015-06-05T12:54:43.757797", "type": "Red", "id": "eb84b0d1-f793-4213-8a65-2fb09eab8c5c"}
Event sent to Kinesis: {"timestamp": "2015-06-05T12:54:44.295972", "type": "Yellow", "id": "4654bdc8-86d4-44a3-9920-fee7939e2582"}
...

7. Monitoring your job

First head over to the AWS Lambda service console, then review the logs in CloudWatch.

Finally, let's check the data in our DynamoDB table. Make sure you are in the correct AWS region, then click on my-table and hit the Explore Table button:

dynamodb-table-image

For each BucketStart and EventType pair, we see a Count, plus some CreatedAt and UpdatedAt metadata for debugging purposes. Our bucket size is 1 minute, and we have 5 discrete event types, hence the matrix of rows that we see.

Roadmap

Credits

Copyright and license

AWS Lambda Scala Example Project is copyright 2015 Snowplow Analytics Ltd.

Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

An AWS Lambda function in Scala reading events from Amazon Kinesis and writing event counts to DynamoDB

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 57.7%
  • Python 36.2%
  • Shell 6.1%