Video in the smart home: The problem or the solution?

Don’t you wish your home was truly efficient and ‘smart’? The current vision of the smart home is a collection of networked devices, operating essentially independently with a different app for each one, and with no sense of who the occupants are, their behaviours, habits or authority, and what needs to be done to accommodate them.

Your house operates much the same as everyone else’s. It’s only you and your family that make the house into a home. In our everyday lives we operate with communication of natural and controlling movements and gestures. These behaviours are subtle, accurate and predictable, and we derive actions from them.

Without this philosophy at the heart of smart homes, we will remain stuck with shouting at a few clever devices that might connect with each other but not us. The solution isn’t trivial – it’s radical and it’s arriving.

What we desire are the devices in our homes – everything from lighting, heat and security, to entertainment and communication systems – to understand and adapt to us in real-time, as well as learning our behaviours.

>See also: Smart homes and cities – when will it happen?

Current so-called ‘smart’ solutions in the marketplace are based on infrared motion sensors, voice recognition, programmable timers, and so on, but they sit as lonely islands, unable to communicate intuitively with the residents they serve. Moreover they are not accurate or predictable enough for reliable control.

A passive infrared (PIR) sensor can detect a change in the room, but not necessarily a change that’s due to a person, let alone who or where they are, or how they move in a room, and false triggers when Bozo the dog is around are likely.

It’s unnatural to shout at a light to come on – and, in reality, voice recognition is still immature. Gesture recognition devices are constrained to narrow use cases and don’t mirror how people tend to react in their environments.

Wearables might form part of the answer but requiring them to be worn continuously at home seems unlikely to gain acceptance. Therefore, if we accept that putting people, doing normal things, at the centre of our smart home is the way forward, how do we do it?

Our primary sense of someone’s behaviour is visual, so an immediate and obvious solution is to use video captured with cameras connected to massive analytical servers in the cloud. This could be regarded as a natural extension of home monitoring and/or baby cameras, which are already finding acceptance within the home.

Video is certainly rich enough to contain useful data about our behaviour, but video contains too much information, is expensive to store, and transmit requires heavy computational analysis that can’t be done quickly enough. In fact, use of video in this way is totally unscalable and would stall the internet. Moreover, do we want surveillance in our house and all the privacy issues that it creates?

So, video itself is a problem of bandwidth and privacy, but there also lies the solution. All we need to know is people movements and gestures, and their combinations – not that the piece of art and the furniture are still in the same place.

If only we could extract the information that is the essence of our identities and behaviours in real time. What is surprising is that we can, and in the process reduce the bandwidth of the system from around 5 gigapixels/sec (raw video) to around 50kbit/sec of virtualized data – an effective compression of 100,000:1.

Here’s the new philosophy: forget about video capture – extract all the information that you need at source. Think of it as the spirit or essence of what is happening. If we can digitise the people and the context, you can also extract the spirit of the picture and distill the core behaviours of what, exactly, is going on in the scene.

When we have behaviours and activity, very small but rich shareable data streams can be applied to the control functions of our home’s machines and processes. To be useful we need to extract and combine the four elements of identity, pose, trajectory and gesture to understand the meaning of movements around the house. Over time, capabilities will evolve, with even mood recognition being incorporated.

It means figuring out a way to digitise ourselves – our virtual spirit, if you will – to create in real time a software model of us with just the essential ingredients to define us, and to extract our movements and gestures – yet do this without a video stream.

The answer lies in a process that uses visual sensors to provide a pixel stream to an engine alongside the sensor (at the edge), not in a downstream server. The sensors and engine virtualise the various people in a given scene and then offer data on the four behavioural elements, and then share that information with any (or all) connected devices and systems.

>See also: Why the Internet of Things is more than just a smart fridge

It neither creates nor distributes any video that could be watched by anyone – no chance of the accidental YouTube appearance – and bringing people-driven, accurate and predictable solutions for the smart home.

Imagine a home that adapts lighting and heating to your personal habits, without you ever having to turn a knob or adjust an app; one which can distinguish between a control gesture by a parent and child, and which knows if a toddler is at the top of the stairs or an elderly person is lying on the floor. It is the combination of known behaviours, responses, and critically real-time behaviour detection that is the need.

This future is in private, predictable and accurate detection of human behaviour. With the new philosophy of human digital virtualisation and distillation, creating essentially an ‘Internet of Behaviour’, a truly responsive smart home is possible for the first time.

 

Sourced from Michael Tusch, CEO, Apical

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...