How multimodal learning is set to transform AI

Tue, 15th Oct 2019

FYI, this story is more than a year old

The total installed base of devices with Artificial Intelligence (AI) will grow from 2.7 billion in 2019 to 4.5 billion in 2024, forecasts global tech market advisory firm, ABI Research.

There are billions of petabytes of data flowing through these AI devices every day; the challenge now facing both technology companies and implementers is getting all these devices to learn, think, and work together.

According to a recent whitepaper from ABI Research, Artificial Intelligence Meets Business Intelligence, multimodal learning is the key to making this happen, and it's fast becoming one of the most exciting — and potentially transformative — fields of artificial intelligence.

“Multimodal learning consolidates disconnected, heterogeneous data from various sensors and data inputs into a single model,” says ABI Research chief research officer Stuart Carlaw.

“Learning-based methods that combine signals from different modalities can generate more robust inference, or even new insights, which would be impossible in a unimodal system.

Multimodal is well placed to scale, as the underlying supporting technologies like Deep Neural Networks (DNNs) – a giant leap forward over rules-based software - have already done so in unimodal applications like image recognition in camera surveillance or voice recognition and Natural Language Processing (NLP) in virtual assistants like Amazon's Alexa.

At the same time, organisations are recognising the need for multimodal learning to manage and automate processes that span the entirety of their operations. Given these factors, ABI Research estimates that the total number of devices shipped with multimodal learning applications will grow from 3.9 million in 2017 to 514 million in 2023.

“There is impressive momentum driving multimodal applications into devices, with five key end-market verticals most aggressively adopting multimodal learning: automotive, robotics, consumer, healthcare, and media and entertainment,” Carlaw adds.

In the automotive space, multimodal learning is being introduced to Advanced Driver Systems (ADAS), In-Vehicle Human Machine Interface (HMI) assistants and Driver Monitoring Systems (DMSs) for real-time inferencing and prediction.

Robotics vendors are incorporating multimodal learning systems into robotics HMIs and movement automation to broaden consumer appeal and provide greater collaboration between workers and robots in the industrial space.

Consumer device companies, particularly those in the smartphone and smart home markets, are competing intensely to demonstrate the value of their products over competitors. New features and refined systems are critical to generating a marketing edge, making consumer electronics companies good candidates for adopting multimodal learning-enabled systems into their products. Growing use cases include security and payment authentication, recommendation and personalisation engines and personal assistants.

Medical companies and hospitals are still relatively early in their exploration of multimodal learning techniques, but there are already some promising emerging applications in medical imaging. The value of multimodal learning to patients and doctors will be difficult for health services to resist, even if adoption is initially slow.

Media and entertainment companies are already using multimodal learning to help with structuring their content into labelled metadata, so they can improve content recommendation systems, personalised advertising, and automated compliance marking. So far, deployments of metadata tagging systems have been limited, as the technology has only recently been made available to the industry.

“The most extensive application of multimodal learning today is for behaviour and language modelling in smartphones. Classification, decision-making, and HMI systems are going to play a significant role in driving adoption of multimodal learning, providing a catalyst to refine and standardise some of the technical approaches,” Carlaw says.