Your voice assistant commands
We are involved in a startup aiming to create a living space management system with access control to permitted premises. The project includes a voice control system with user verification elements. The objective was to quickly provide a prototype, which functionality could be potentially expanded. The prototype had to feature basic spoken commands functionality and a simple interface for managing interaction scenarios.
Amazon Echo, a smart speaker developed by Amazon.com, has gained quite a popularity in recent years. Once the wake word is spoken, the device records the user's speech and sends it to the cloud for analysis and reaction, powered by Amazon Alexa, the intelligent personal assistant from Amazon.
Amazon Alexa has a well-documented API that independent developers can use in their applications.
To master the interaction Alexa, our team developed and implemented a small script that allows you to control a prototype system with emulators of devices for interacting with the environment.
Speech recognition is an unconventional task faced by many vendors. The challenges particularly include extraneous noises and other voices in the background. We considered a few APIs but found the processing results unsatisfying. Thus, it was decided to use the recognized Amazon Transcribe speech-to-text service. This service uses a deep learning process known as ASR (Automatic Speech Recognition). Thus, the speech is transmitted to the service via Amazon Alexa, and then converted into text. The intents are indicated, which we allocate and process in our script. The script analyzes the intents and prepares a response to the user, then the received response is converted back into spoken commands for the control system, and then is sent to the user's device for confirmation as a feedback.
Apart from Amazon services, we considered integration with other solutions, such as Google Nest smart speakers (formerly Google Home) powered by Google Assistant, or Alice personal assistant from Yandex. However, as Russian language support wasn’t compulsory, we opted for a system with multilingual support.