Saving particular conversations in searchable, analyzable text.
Conversations are modeled as Sessions that involve two speakers. Each Session has a start time and is uniquely identified by its timestamp. Data that gets stored for each session:
- Record in Dynamo. We use timestamp as a key, no range. For now the record includes the start time and duration, but will likely include other properties in the future.
- Raw Audio file in S3. We store two audio files, one in MP3 format and another in Ogg format using the Opus codec. Duplicated in Google Drive.
- Transcript in JSON in S3. Format of this file described in greater detail below.
- Nicely-formatted transcript in S3 and Google Drive.
After recording an mp3, a script (runlocal.py
) is run locally to upload the file to S3 using an S3 account that is allowed only to write.
Once the file is uploaded to S3, the AudioUploadResponder
Lambda function (audio_upload_responder.py
) runs, which:
- Converts mp3 to two-channel Ogg with Opus codec using ffmpeg.
- Call the Speech-to-Text Service (currently using Watson), passing the
SpeechToTextCallback
Lambda endpoint as a callback. - Insert record into DynamoDB. This record contains the job ID from Watson.
- Saves raw audio to Google Drive.
The SpeechToTextCallback
Lambda function is executed by the callback.
- When AWS Lambda executes, check the "finished" state of the Dynamo DB record. If finished, exit. If not, proceed.
- In AWS Lambda execution, create JSON transcript and nicely-formatted transcript. Add these to S3 and Google Drive.
- Mark DynamoDB record as finished.