Listener (v2) Voice Dictation as a (Docker) Service for IBus

Listener is a voice dictation service for Linux desk tops which uses the Mozilla Deep Speech engine to provide the basic recognition services and focuses on providing sufficient accuracy and services to allow for coding common programming languages.

My goal with this project is to create an input method for those who have difficulty typing with their hands (such as myself), with a focus on allowing coding by voice. My personal focus is not to allow for hands free operation of the machine.

Current Status of the Project

The current state of the project is a proof of concept, what works:

typing content into visual studio code, kate, and google chrome
the start of basic punctuation capitalization et cetera driven by user editable rules files

Roadmap

create a docker container with a working deepspeech release [done]
get basic working dictation into arbitrary applications working [done]
create a control-panel application [started]
create punctuation and control short cuts and phrases [mostly done]
create language models which are dictation aware, so that the common dictation short cuts such as cap X have higher priorities [started]
maybe create an DBus service for the core code [started]
allow for switching language models for different programming contexts and providing current-context hints such as class methods, modules, etc from the language server
track interaction and key press events to allow for pauses in dictation without extra spaces this will have to happen in the IBus component in order to get proper notification
send special keys (tab, enter, and modifiers to start with) [proof of concept done]
create a "correct that" GUI (with other predictions and free-form editing)
create a control panel allowing for one click toggling of listening
cut down the container to a more reasonable size

Architecture

listener-audio runs pacat to send raw audio to a named socket
a docker container runs Mozilla DeepSpeech hardware-accelerated by your host OS's (NVidia) graphics card
- the container reads the audio from a pipe and reports results to a user-local event-socket
a listener-interpreter process listens to the event and attempts to interpret the results according to the user's rules, and eventually custom language models and contextual biasing/hinting (think autocomplete)
a DBus service takes the results of the recognition and converts them to regular input to the (Linux) host operating system, using uinput for special character injection (think Alt-Tab, navigation and the like)

Quick Start

Since there is not yet a working graphical user interface the set up is not as friendly as commercial voice dictation solutions.

sudo apt install $(cat dependencies.txt)
virtualenv -p python3 listener-env
source listener-env/bin/activate
pip install -r requirements.txt
# following will download the (large) language model to cache
# before starting the docker container
listener-docker
# Feed raw audio into the recognition daemon
listener-audio &
# Use the default contexts (Note: need to make these available)
listener-default-contexts
# Interpret the raw recognition events as commands and text
listener-interpreter --context english-python -v &
# Send the commands and text to the Linux Desktop via IBus
listener-ibus &

Installation/Setup

See Installation Docs for full installation instructions...

Reference Docs for Devs

Research to Explore

Biasing by Context -- Instead of having to train language models for each context-type
Big Code not Big Vocabulary Code
Suggesting Accurate Method and Class Names

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
docker		docker
docs		docs
listener		listener
scripts		scripts
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
.travis.yml		.travis.yml
MANIFEST.in		MANIFEST.in
README.md		README.md
dependencies.txt		dependencies.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

mcfletch/listener2

Folders and files

Latest commit

History

Repository files navigation

Listener (v2) Voice Dictation as a (Docker) Service for IBus

Current Status of the Project

Roadmap

Architecture

Quick Start

Installation/Setup

Reference Docs for Devs

Research to Explore

About

Resources

Stars

Watchers

Forks

Languages