What if AI assistants like Siri, Cortana, or Alexa had a human face? What would they look like? In an episode from The Big Bang Theory, Rajesh daydreamed about what it would be like to date Siri in person.

Rajesh’s example, while an exaggerated one, reveals that humans are wired to look for a physical and visual character “behind” a voice or text during conversation. This is because among all our sensory channels like sight, touch, hearing, smell, taste, etc., humans rely heavily on visual inputs. One neurological finding states that roughly 30% of neurons in our brain are devoted for sight, whereas only 8% for touch and 2% for hearing (Grady, 1993). Moreover, more than 60% of the brain is somewhat involved with vision, including neurons devoted for vision + touch, vision + motor, vision + attention, and vision + spatial navigation (Goldstein, 2013).

Consciously or not, available visual cues aid us to interact with others in a more smooth and enjoyable way. The case holds true among human-to-human as well as human-to-robot interactions. In an IEEE 2013 study, a fluffy dog tail was attached to a Roomba vacuum robot to suggest its working status via different motion patterns. For instance, if the robot is cleaning a room smoothly, the tail would be wagging to indicate its happiness. People who participated in the study found it easier to understand what the robot was “feeling” and found themselves amused by the robot.

Similarly, AI assistant with a physical body would be able to convey much more information than the current voice-only agents. For instance, the AI assistant’s facial expressions could show emotions, gestures like a simple nod could provide feedback, and movements like pointing to a direction could suggest what to do next. Plus, we get more engaged in the conversation being able to see the other party, since people spend more than 30% of the time trying to make eye contact during face-to-face interactions (Schwind & Jäger, 2016).

Adding visual cues can aid communication with robots: a Roomba with a fluffy tail (Image Link)

ObEN’s artificial intelligence technology quickly and easily creates lively personal avatars that look and talk like you with one selfie and a brief voice recording. Imagine listening to songs recommended by Adele’s avatar or going through today’s NBA sports news reported by Lebron James’ avatar. Advances in AI technology is making this a possibility in the future. As depicted in an episode from Black Mirror, you could even have yourself as the AI assistant who knows and understands you the best. 

In the not so far future, we’ll be able to have our personalized avatar in mobile, virtual reality or augmented reality worlds interact with each other, and speak out famous lines from the blockbuster movie Avatar (2009) to loved ones: “I see you”.

About the Author: Jackie is a researcher on ObEN’s computer vision team

Having a digital self who knows you the best as AI assistant. From Black Mirror (Image Link)

ObEN's proprietary artificial intelligence technology quickly combines a person's 2D image and voice to create a personal 3D avatar. Transport your personal avatar into virtual reality and augmented reality environments and enjoy deeper, social, more memorable experiences. Founded in 2014, ObEN is an HTC VIVE X portfolio company and is located in Pasadena, California at leading technology incubator Idealab.