Allows you to recognize speech from more than 15 languages, without relying on any cloud service or subscription. Instead, a language server is a separate process on your machine, which talks with your game. The language server app is public ( https://github.com/IlgarLunin/vosk-language-server ), you can fork it and customize, distribute with your game, run it without any user interface.
Unreal engine client is dead simple communication with language server. It connects to it, records, and feeds your voice to the language server, the server sends recognized voices as text back to unreal.
This is streaming voice recognition, and you can implement simple conversations with your NPC without any user input except voice. "Ok robot, do this", "Ok robot, do that" etc.
Download latest language server: https://github.com/IlgarLunin/vosk-language-server/releases
Video demonstration: https://youtu.be/iJVCsuuC5A4
Example project for Unreal 4.27: here
Example project for Unreal 5.1: here