DEMOS

MOTION ESTIMATION

The ObEN research team is developing AI tools to identify, capture, and mimic human motion from any video and automatically apply those movements to a PAI.

FACE TRACKING

Bringing facial motions to life on your PAI avatar, the ObEN team is researching better, more accurate facial tracking technology using smartphone or webcams.

NIKHIL’S PAI (ObEN CEO)

ObEN’s PAI technology creates an avatar the looks and sounds like you, capable of speaking multiple languages including Chinese, Japanese, Korean, and English.

ADAM’S PAI (ObEN COO)

ObEN’s PAIs can be used by people all over the world – here our co-founder and COO demonstrates PAIs created for native Mandarin speakers.

AIJIA’S PAI (SNH48 BAND MEMBER)

The first of ObEN’s celebrity PAI collaborations, debuted at the 2019 Shanghai World AI conference.

LUCAS’ PAI (DISCOVERY CHANNEL HOST)

ObEN’s PAI technology was used to create a life-like avatar for Lucas Cochran, Tech Correspondent of the Discovery Channel’s Daily Planet.

ADRIAN’S PAI (K11 Chairman)

The world’s first PAI retail concierge, developed for K11 Shanghai Art Mall featuring K11 Founder and Chairman Adrian Cheng – personalizing the retail experience with AI.

PAI DANCE

ObEN’s PAIs are capable of performing a variety of movements, generated using our AI technology. What you can dream up, your PAI can do.

VIRTUAL SINGER

A short voice sample is all we need to take any speaking voice and transform it into a pitch-perfect singing voice. Can you tell which voice is human and which is AI?

Aijia's Original Voice
TTS Voice
PAI Singing
PAI/Human Duet

EXPRESSIVE TTS

Tackling a major challenge in TTS technology, we are creating digital voices that can “speak” with a variety of emotions, make them more human and more capable of connecting us with our audio experiences.

Original Voice
TTS Voice
TTS Happy
TTS Angry
TTS Sad

OUR PATENTS

Text to speech synthesis using deep neural network with constant unit length spectrogram (10,186,252)

A system and method for converting text to speech is disclosed. The text is decomposed into a sequence of phonemes and a text feature matrix constructed to define the manner in which the phonemes are pronounced and accented. A spectrum generator then queries a neural network to produce normalized spectrograms based on the input of the sequence of...

Read More

Voice conversion using deep neural network with intermediate voice training (10,186,251)

A subject voice is characterized and altered to mimic a target voice while maintaining the verbal message of the subject voice. Thus, the words and message are the same as in the original voice, but the voice that conveys the words and message in the altered voice is different. Audio signals corresponding to the altered voice are output, for...

Read More

Creation and application of audio avatars from human voices (9,324,318)

A subject voice is characterized and altered to mimic a target voice while maintaining the verbal message of the subject voice. Thus, the words and message are the same as in the original voice, but the voice that conveys the words and message in the altered voice is different. Audio signals corresponding to the altered voice are output, for...

Read More

OUR LATEST PUBLICATIONS

Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses

This work investigates techniques that select training data from small, found corpuses in order to improve the naturalness of synthesized text-to-speech voices. The approach outlined in this paper examines different metrics to detect and reject segments of training data that can degrade the performance of the system. We conducted experiments on...

Read More

A Spectrally Weighted Mixture of Least Square Error and Wasserstein Discriminator Loss for Generative SPSS

Generative networks can create an artificial spectrum based on its conditional distribution estimate instead of predicting only the mean value, as the Least Square (LS) solution does. This is promising since the LS predictor is known to oversmooth features leading to muffling effects. However, modeling a whole distribution instead of a single mean...

Read More

Show, Attend and Translate: Unsupervised Image Translation with Self-Regularization and Attention

Image translation between two domains is a class of problems aiming to learn mapping from an input image in the source domain to an output image in the target domain. It has been applied to numerous domains, such as data augmentation, domain adaptation and unsupervised training. When paired training data is not accessible, image translation...

Read More

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality...

Read More

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems. The objective of the challenge was to perform speaker conversion (i.e.\ transform the vocal identity) of a source speaker to...

Read More

ESTHER : Extremely Simple Image Translation Through Self-Regularization

Image translation between two domains is a class of problems where the goal is to learn the mapping from an input image in the source domain to an output image in the target domain. It has important applications such as data augmentation, domain adaptation, and unsupervised training. When paired training data are not accessible, the mapping...

Read More

Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion

This blog post presents a one-shot voice conversion technique, in which a variational autoencoder (VAE) is used to disentangle speech factors. We show that VAEs are able to disentangle the speaker identity and linguistic content from speech acoustic features. Modification of these factors allow transformation of voice. We show that the...

Read More

One-shot Voice Conversion using Variational Autoencoders

This blog post presents a one-shot voice conversion technique, in which a variational autoencoder (VAE) is used to disentangle speech factors. We show that VAEs are able to disentangle the speaker identity and linguistic content from speech acoustic features. Modification of these factors allow transformation of voice. We show that the...

Read More

Voice Approximation for Inter-Gender Voice Personalization

One of the key technologies at ObEN is the personalization of voice identity, consisting of a transformation of an input voice (e.g., from a Text-To-Speech system) to render it perceptually similar to a target one (e.g., a celebrity, or a user’s voice). Although some existing technologies, known as Voice Conversion and based on a statistical...

Read More

ObEN is an artificial intelligence company that is building a decentralized AI platform for Personal AI (PAI), intelligent 3D avatars that look, sound, and behave like the individual user. Deployed on the Project PAI blockchain, ObEN’s technology enables users to create, use, and manage their own PAI on a secure, decentralized platform - enabling never before possible social and virtual interactions. Founded in 2014, ObEN is a K11, Tencent, Softbank Ventures Korea and HTC Vive X portfolio company and is located at Idealab in Pasadena, California.


130 West Union Street,Pasadena, CA 91103 | 
contact@oben.com
© 2017 ObEN, Inc. All rights reserved

Privacy Policy
and Terms of Use

[mailgun id=”newsletter@msg.oben.me”]