This app transcribes and tests audio files with Microsoft, Amazon & Google's speech to text services, to help compare accuracy and performance.
Select your service by navigating to the appropriate folder
cd Amazon-Transcribe
cd Google-Speech-to-Text-API
cd Microsoft-Speech-SDK
Each service has its own README containing set up and usage instructions, found in their specific directory.
Make sure the path to these files are correct in settings.py
before running
- In each service's directory, there is a
results
folder and contains the following files:ref.txt
: stores the original transcript of the audio file you want to transcribe. Enter the original transcript before running the apphyp.txt
: result of the transcription is stored here once the app is run and transcription is generatedresults.csv
: results are stored here once generated (transcripts, WER & word error count)table.txt
: WER and the word error count results are stored herealltranscriptions.txt
: all text that has been transcribed is stored here
This app measures the accuracy of transcriptions using word error rate (WER).
Word Error Rate (WER), is a method to measure the performance of automated speech recognition (ASR). It compares the original transcript (reference) with the transcribed text (hypothesis) from a speech-to-text service.
WER does have its pros and cons but overall it provides a baseline accuracy metric for general use, in the form of a percentage.
Each app supports single and batch processing. With batch, an average of results are automatically calculated
-
Select your service by navigating to the appropriate folder. You can find README's with specific information there
-
Make sure to have understood and completed the prerequisites
-
Gather audio samples. I recommend creating a
sounds
folder and placing audio files there -
Install all required dependencies by executing
npm run setup
-
Run each app by executing
npm start
-
Analyse results (table.txt & results.csv)
Further info can be found in each services README's
- To use each tool, you will require an account with your service of choice. Each of the services are paid but all offer a free trial period
- For each service, audio files are required to be in a specific format. Details of this can be found in each projects README
- Both the original transcript (ref.txt) and the transcribed text (hyp.txt) are 'cleaned' to have consistent stylistic formats before WER is calculated. For example, digits like 1, 64 and 3000 are converted to their corresponding words: one, sixty-four and three thousand, respectively. Punctuation and unnecessary whitespace is also removed
- You may have to change stylistic differences like "street" and "st" yourself to be consistent with transcription service
- Find out more about this project and our findings in our blog
- Applied Innovation - Kainos
This project was developed using:
- python 3.7.4, python modules version as described in
requirements.txt
- Node js v10.16.0, npm packages as described in
package.json
Software versions are subject to change with new releases, to ensure the project runs smoothly without alteration the above versions should be used. This software was last ran on 09/2019.