Actionwire

Background

This project is developed specific for Lin Pei-Yao Solo Exhibition: Who is the speaker? (2025).

In this exhibition, it required speech recognition for selected keywords, and perform specific actions, including smart light control and video playhead control. The speech recognition is done in real-time, deployed locally on a Raspberry Pi.

For example, in the command “Drink Tea”, it blinks one set of lights and seeks the video to a specific time (00:25) and jumps back to original position after 10 seconds.

Different voice commands have different actions, and some of them may depends on each other.

To make the concurrent events manageable, I used Reactive Programming design pattern via RxPy.

Structure

The program is divided into three parts: Events, Commands, and Actions.

Events are the input to the system, including microphone and WebSocket inputs. It will be transform into an Observable stream.

Actions are the output behaviours. Including light control and video playhead control.

Commands are the business logic. Freely connecting, composing, mixing all the inputs, and producing one output. Can be easily customised by user needs.

Events (inputs)
- Microphone -> Vosk -> Keyword extraction
- WebSocket -> Current timecode
Commands
- Define the pipeline logic for every command
- Written in Reactive Programming styles
- No hidden state management. Easy to update
Actions (outputs)
- Light control -> LIFX LAN API
- Video playhead control -> HTTP request

Keywords Recognition

I used Vosk as the offline speech recognition model, because it is small enough to run on a Raspberry Pi.

The original accuracy of the model is not good, and it is designed as a speech-to-text model, not for recognise specific keywords. I customised the vocabulary list to make it only select tokens that appears the keyword list. It’s also important to include [unk] in the list, to prevent the model output unknown words.

Synchan Integration

The video playing system is Synchan, a multichannel multidevice synchronised video playing system. It allows control via HTTP requests, and it updates the current time code to every clients via WebSocket. The time code is parsed as an Observable stream, and used to perform action according to video time code.

For example, in the beginning of the video, it turns on the light in the exhibition as the light is turned on in the video. And in the command “Drink Tea”, it seeks the video to back to 00:25, where the performer asked “Would you like some tea?”, and seeks back to the original playhead after 10 seconds.

Tech Stack

Python
RxPy
Vosk
Socket.IO
lifxlan: Smart Light Control in LAN

Gallery

Want to Try?

Currently available by invitation only. For inquiries, please contact [email protected]

Background#

Structure#

Keywords Recognition#

Synchan Integration#

Tech Stack#

Gallery#

Want to Try?#