SPEECH

SPEECH RESEARCH SCIENTIST – TTS (ENGLISH)

JOB DESCRIPTION

Your responsibilities:

ObEN is looking for a passionate, talented, and self-driven speech synthesis scientist with a strong machine learning background. The work will have a particular focus on the development of cutting-edge deep learning algorithms for voice personalization. This will include the development of structured acoustic models for synthesis allowing the control of factors such as voice timbre, voice quality, language, accent, expressiveness and speaking style and the adaptation/conversion towards a target voice using a reduced amount of data. You will have a PhD in speech processing, computer science, cognitive science, linguistics, engineering, mathematics, or a related discipline. You will have the necessary programming ability to conduct research in this area, a strong background in machine learning including deep learning approaches, speech signal processing, and research experience in speech synthesis.

Knowledge in one or more of the following areas is also desirable:

  • HMM/DNN-based speech synthesis
  • Glottal source modeling
  • Speech signal modeling
  • Speaker adaptation
  • Familiarity with speech toolkits including SPTK, HTK, HTS, Merlin, Festival
  • Deep learning libraries including Theano, Tensorflow

You will:

  • Develop and extend speech synthesis technologies in ObEN’s proprietary speech synthesis system, in view of improving the quality and the naturalness of the synthesized voice as well as the similarity to the target voice and reducing the amount of data for speaker adaptation
  • Develop new machine learning techniques for vocoding to improve the control and the quality of the synthesized voice
  • Develop tools for the preparation and annotation of training data
  • Carry out a listener evaluation study of synthetic speech.

You must have:

  • PhD (Preferred), M.Sc. in Computer Science or Electrical Engineering
  • High proficiency in C++, Python, Java, Matlab
  • Experience with data-driven statistical or machine learning methods for Speech Synthesis
  • Enjoys a highly collaborative environment with minimal supervision

Great to have:

  • Proficiency in one of the following languages: Chinese (Mandarin), Japanese, Korean and/or Spanish
  • Familiarity with linguistic phonetics
  • Knowledge of basic digital signal processing techniques for audio
  • Experience with software engineering best practices including unit testing, continuous integration, and source control
  • Proficiency in Java/Android, Objective C/iOS, javascript, C#/Unity3D.

Application requirements:

  • Please send the following to careers@oben.com
  • Detailed resume and/or LinkedIn profile
  • Links to any research / papers you have been an instrumental part of and are proud of
  • Name of instructor / adviser, if any along with link to their profile
  • Cover Letter identifying your five favorite apps on your phone
  • Introduction to ObEN: https://goo.gl/gxpxwT

Interview process:

STAGE 1: Phone Interview
STAGE 2: In-person Interview at Idealab (we cover travel expenses for the day)
STAGE 3: We require a sample project submission and a candidate proposal submission(To know more about what an ObEN candidate proposal is, click here)
STAGE 4: Spend a day at our office and participate in all team activities.
STAGE 5: Offer Letter

Not ready to apply for this job? Sign-up to receive ObEN job alerts.

Share

ObEN is an artificial intelligence company that creates complete virtual identities for consumers and celebrities in the emerging digital world. ObEN provides Personal AI that simulates a person’s voice, face and personality, enabling never before possible social and virtual interactions. Founded in 2014, ObEN is a Softbank Ventures Korea and HTC Vive X portfolio company and is located at Idealab in Pasadena, California.