Speaker embeddings: from i-vector to x-vector and beyond

old_uid17060
titleSpeaker embeddings: from i-vector to x-vector and beyond
start_date2019/01/16
schedule14h
onlineno
location_inforoom A008
summarySpeaker recognition is the task of recognizing a human from his/her voice. The state-of-the-art speaker recognition technology uses a speaker embedding method for representing a speech utterance of arbitrary length in the form of a fixed-dimensional vector. The recent advancements in deep neural network (DNN) research have enabled the development of robust and efficient speaker embedding techniques. In this talk, I will first provide a brief overview of speaker recognition basics. It will be followed by the description of the conventional speaker embedding method popularly known as i-vector. Then I will present various attempts to develop speech signal representations with DNN-based discriminative training. I will explain the recently introduced x-vector embedding which showed promising speaker recognition performance. This talk will end with a discussion on potential future directions in the speaker embedding research including our ongoing work.
responsiblesDutech