A Bilingual Adversarial Autoencoder for Unsupervised Bilingual Lexicon Induction

Abstract

Unsupervised bilingual lexicon induction aims to generate bilingual lexicons without any cross-lingual signals. Successfully solving this problem would benefit many downstream tasks, such as unsupervised machine translation and transfer learning. In this work, we propose an unsupervised framework, named bilingual adversarial autoencoder, which automatically generates bilingual lexicon for a pair of languages from their monolingual word embeddings. In contrast to existing frameworks which learn a direct cross-lingual mapping of word embeddings from the source language to the target language, we train two autoencoders jointly to transform the source and the target monolingual word embeddings into a shared embedding space, where a word and its translation are close to each other. In this way, we capture the cross-lingual features of word embeddings from different languages and use them to induce bilingual lexicons. By conducting extensive experiments across eight language pairs, we demonstrate that the proposed method significantly outperforms the existing adversarial methods and even achieves best-published results across most language pairs.

Publication
IEEE ACM Transaction on Audio, Speech, and Language Processing (TASLP, SCI-Q1, JCR-Q1)
Xuefeng Bai
Xuefeng Bai
Ph.D candidate

My research interests include semantics, dialogues and generation.