ObEN’s mission is to enable everyone in the world to create their own Personal AI (PAI), intelligent 3D avatars that look, sound, and behave like the individual user. Secured and authenticated on the Project PAI blockchain, ObEN’s technology creates more productive, more personalized digital interactions. ObEN is a K11, Tencent, Softbank Ventures Korea and HTC Vive X portfolio company, and we work with our strategic investors to expand PAI technology across multiple verticals including hospitality, retail, healthcare, and entertainment.

Working at ObEN means taking on extraordinary transformations every day, in an environment that celebrates and encourages innovation. You’ll be working in small, agile teams (including world class researchers in areas of speech, computer vision, machine learning, NLP, and blockchain). We are blazing new trails in AI and blockchain technology, and we encourage and support publications to top conferences and journals. Learn more about working at ObEN in our blog post.

Job Description:

As a Speech Research Scientist specialized in Text-to-Speech, you will be working on developing cutting-edge deep learning algorithms for voice personalization. This will include the development of structured acoustic models for synthesis allowing the control of factors such as voice timbre, voice quality, language, accent, expressiveness and speaking style and the adaptation/conversion towards a target voice using a reduced amount of data.


  • Develop and extend ObEN’s proprietary TTS system, in view of improving the quality and the naturalness of the synthesized voice as well as the similarity to the target voice and reducing the amount of data for speaker adaptation;
  • Develop deep generative model of raw speech waveform;
  • Develop cross-lingual  approaches (e.g. phonetic posteriorgrams)


  • PhD with strong research experience in Adaptation of DNN-based TTS systems demonstrated by publications in top Speech journals and conferences (Icassp, Interspeech, etc);
  • Strong machine learning background and familiar with standard statistical modeling techniques applied to speech;
  • Research experience in deep generative model of raw audio (wavenet) and Generative Adversarial Network (WGAN);
  • Fluent in Python and C++, and expert knowledge of deep learning packages (TensorFlow, Theano, Keras, etc);
  • Familiarity with linguistic phonetics;
  • Knowledge of basic digital signal processing techniques for audio.

Application requirements:

  • Please send the following to careers@oben.com
  • Detailed resume and/or LinkedIn profile
  • Links to any research / papers you have been an instrumental part of and are proud of
  • Name of instructor / adviser, if any along with link to their profile
  • Cover Letter identifying your five favorite apps on your phone
  • Introduction to ObEN: https://goo.gl/gxpxwT

Interview process:

STAGE 1: Phone Interview
STAGE 2: In-person Interview at Idealab (we cover travel expenses for the day)
STAGE 3: We require a sample project submission and a candidate proposal submission(To know more about what an ObEN candidate proposal is, click here)
STAGE 4: Spend a day at our office and participate in all team activities.
STAGE 5: Offer Letter

Not ready to apply for this job? Sign-up to receive ObEN job alerts.


ObEN is an artificial intelligence company that creates complete virtual identities for consumers and celebrities in the emerging digital world. ObEN provides Personal AI that simulates a person’s voice, face and personality, enabling never before possible social and virtual interactions. Founded in 2014, ObEN is a Softbank Ventures Korea and HTC Vive X portfolio company and is located at Idealab in Pasadena, California.