People in a circle using their mobile phones.

Why businesses need to gain better insights from customers’ photos, videos and audio

24 November 2022

The article at a glance

Marketers are not adequately tapping audio and visual (AV) data posted on TikTok, Facebook and other platforms that can be more revealing than reviews or ratings, says a new study co-authored by Shasha Lu of Cambridge Judge Business School.

Marketers are not adequately tapping audio and visual (AV) data posted on TikTok, Facebook and other platforms that can be more revealing than reviews or ratings, says a new study co-authored by Shasha Lu of Cambridge Judge Business School.

Shasha Lu.
Dr Shasha Lu

Business marketers pay close attention to comments good and bad posted online by customers ranging from book buyers (“great read”) to movie goers (“action packed”) to travellers (“avoid this tourist trap”). Yet businesses are not adequately analysing the huge surge in audio and visual data generated by consumers that can be even more revealing than reviews or ratings, says a new study.

With the ability to take a photo or hit record on your mobile phone, and the popularity of digital platforms such as TikTok, Facebook and YouTube, more and more images, video, and audio are being created every day. But there has been a lack of coordination in ways businesses utilise this AV data, so the new study recommends ways audio and visual analytics can be better tapped by marketers to improve business practices in communication, decision making, employee recruitment and other areas.

“Businesses are missing out on the opportunities to use this information to gain better customer insights, understand customer preference, improve customer experience, discover unmet needs and optimise marketing effectiveness”, says the research published in the journal Foundations and Trends in Marketing.

“In one minute, there are 700,000 hours of videos watched and 500 hours of videos uploaded on YouTube, 243,000 photos uploaded on Facebook and 1 million swipes, and 400,000 hours of music listened to on Spotify.”

The opportunities offered by AV data analytics

It’s almost impossible for human beings to process this amount of information. “With the increasing use of online channels in the business sector, more and more customer interactions happen in an environment where firms have less control. The access to AV data and analytical tools not only gives the firms ‘eyes’ and ‘ears’ but also ‘keys’ to unlock the benefits of analytics-based decision making,” says study co-author Shashu Lu, Associate Professor in Marketing at Cambridge Judge Business School.

The main object of AV data analytics is to convert audio or visual data into a structured form to extract useful information. Digitalised audio and video signals can include firms’ designs of marketing mix such as the design of the product, package and brand visuals, the design of promotional and social media content and the design of store layout. Consumers sharing their profiles, experiences, and thoughts in the forms of images and videos on social media platforms such as Instagram, YouTube, LinkedIn, Tumblr, and Flickr are also particularly valuable. For example, the images or video posted by a customer about their experience in a hotel can be more revealing about their preferences than just a rating system.

Analysing data using Artificial Empathy

One framework for analysing AV data is Artificial Empathy (AE). Empathy is challenging for automated systems to replicate because a key characteristic is understanding the internal states of others without any explicit explanations from them. Humans use facial expressions, body language, voice and words emitted by others to assess others’ emotions, feelings and thoughts based on observation, memory, knowledge, and reasoning. In business interactions, the ability to make these interpersonal inferences is critical, especially as it has been found that customer perception of the emotional states of salespeople or service providers can have a significant impact on purchase intentions and customer satisfaction levels.

How Artificial Empathy works

“At the centre of the Artificial Empathy (AE) framework is the human brain which processes input signals (bottom-up information processing) and regulates cognitive, emotional and behavioural responses (output signals) based on the information it inferred from the signals (top-down information processing)”, the research explains. “The object of AE models is to predict or infer the true internal states of an individual either based on the input signals (ex-ante prediction) received by the individual or based on the output signals (ex-post inference) emitted by the individual.”

The ex-ante prediction focuses on the top-down information processing and aims to predict how a person would respond to different signals, such as how a consumer would respond to the voice in an advertisement. The ex-post inference focuses on bottom-up information processing and aims to infer the focal person’s internal state from the signals emitted by them, such as deciphering whether a consumer is happy or not from his or her voice on the phone.

The distinction between static and dynamic internal states is also important in the context of AE because it determines the sources of data and external validity of AE models. “An example of dynamic empathy is when a person or computer model attempts to infer a consumer’s emotional states from his/her transient voice or facial expressions,” the study says. “An example of static empathy is when a person or computer model attempts to infer how trustworthy a person is from his/ her non-transient facial appearance.”

How AV analytics can benefit business

The study finds the use of AV analytics benefits businesses in at least four ways:

  1. Improving the effectiveness of communication. Firms can develop ex-ante artificial empathy models to predict and optimise customers’ responses to the marketing content. With the audio and visual information of customer responses, firms can develop ex-post artificial empathy models to infer customer’s internal states. For example, using speech analysis and computer vision techniques, Affectiva helps brands optimise video content by inferring the emotional states of a viewer as they watch footage or movie trailers. It has been used by more than 1,400 brands, including Kellogg’s.
  2. Extracting useful information more efficiently and effectively to support business decision making. This is particularly important for both online and offline businesses amid rising labour costs and growing data volume. For example, a speech synthesis called WinkTalk adapts synthetic voice styles according to a listener’s facial expressions, which can be potentially useful for computer-assisted customer service.
  3. Improving the efficiency and effectiveness of employee recruitment and allocation. For example, Hirevue developed machine learning models to assess applicants’ skills and match them with job positions by analysing the audio and visual information from their interviews.
  4. Providing new tools for creation of new customer values. In the fashion and beauty industry, face and video analytics are used to build Augmented Reality applications, which allow customers to virtually try on beauty products. An app called “Makeup Genius” launched by L’Oreal brought the brand 20 million app users and 60 million virtual product trials in one year.

Analysing hand movement and body language could be next

Understanding that people may interpret the same audio and visual stimuli differently also gives firms important insights to customise elements in their marketing activities or service offering. Areas of future study include inferring internal states from new types of audio and visual cues such as hand movement, body language, and gaze, and exploring how the design of audio and visual elements of marketing communication or service delivery process varies across different types of products or brand images.

The study

The study – entitled “Audio and Visual Analytics in Marketing and Artificial Empathy”– is co-authored by Shasha Lu of Cambridge Judge Business School; Hye-Jin Kim of the Korea Advanced Institute of Science and Technology; Yinghui Zhou of Shenzhen University; Li Xiao of Fudan University; and Min Ding of Pennsylvania State University.