Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning Demo
This repository contains demo codes for the paper Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning. It consists of the trained models, the python inference codes, and a simple frontend webpage as well as a backend nodejs server.
Use the demo with the frontend webpage
- clone from https://github.com/Porridge144/sup-mlt-demo.git
- cd model_export and run python pyonnxrt.py (you might need to run it in background or in a tmux window as it is blocking)
- cd server and run node server.js (you can change the listening port to an arbitrary one in server.js)
Direct inference without using the frontend webpage
- clone from https://github.com/Porridge144/sup-mlt-demo.git
- put intended mp3/wav s into model_export/feat_extract/preprocdir/rawmp3
- cd model_export and run python pyonnxrt.py (you might need to run it in background or in a tmux window as it is blocking)
- cd model_export/feat_extract and run bash run.sh
- output will be saved in the server and also printed in the terminal which pyonnxrt.py is running