Control in games with the help of eyes

This case study describes the development of a product that uses eye and head-tracking technology for remote device control. By leveraging image recognition methods we aim to enhance user experience, particularly in terms of time delays and positioning accuracy.

Business Challenge

The major challenge is to launch a unique product that combines simple game mechanics and remote device control without the tactile input. The introduction of the technology aims to enhance image recognition methods approaching the loyal gaming audience. Its application in gaming is chosen as the most demanding for time delays and positioning accuracy.

The potential implementation of the technology covers multiple spheres:

UI design for people with special needs
navigation in apps and games
psychological and medical research
marketing and usability
security systems monitoring (cars and traffic control)

The research also encompasses the creation of innovative approaches to app navigation management. The study has shown that only 15% of users use both hands to operate a mobile device. Almost a half of smartphone users would often hold a smartphone on their palm and only use only one hand to interact with the interface. We wanted to show that head positioning and eye tracking can be successfully used for app navigation.

Solution Overview

The industry leaders are exploring alternative ways to control personal devices, such as Google using voice control in their Assistant. Our team focused on the idea of camera control. Face recognition is already well described, and there are several fast algorithms. We decided to verify in practice the studies of head positioning and eye tracking in regards to app controlling opportunities, and used a game prototype to justify hands-free app control.

Solution features:

runs on consumer-grade hardware
does not require additional training
works universally regardless of localization
insensitive to external factors except lighting

Technical Details

The project was embodied as a game prototype, with a video stream of the player's face being received through the camera. Separate frames were analyzed, determining the head position and the point of gaze. These parameters were used to generate a command that was passed to the main game engine. OpenCV open source library was used as the base framework. OpenCV is a well-documented computer vision library with over 2500 algorithms. The advantages also include cross-platform support, so an app can be built for both desktop (Windows, MacOS, Linux) and mobile (Android, iOS) operating systems. It is also convenient in terms of bindings for a large number of programming languages, such as Java, Python and .NET languages.

One of the leading LBF (Local Binary Features) methods was used in the solution, and we also considered the ERT (Ensemble of Regression Trees) method from the Dlib library.

Technology Stack

C++

OpenCV

HAAR