论文笔记:Word translation without parallel data无监督单词翻译79 赞同 · 18 评论文章 认知启发的跨模态智能研究组 (Cognition-inspired Cross-modal Intelligent Group, CogModal Group) 团队主页:MMLab-IIE 知乎专栏:认知启发的跨模态智能 团队介绍:认知启发的跨模态智能研究组(Cognition-Inspired Cross-Modal Intellig...
2 Normalized word embedding and orthogonal transform for bilingual word translation
Word translation without parallel data. Preprint at arXiv https://arxiv.org/abs/1710.04087 (2017). van Buuren, S. & Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Soft. 1–68 (2010). Huson, D. H. & Bryant, D. Application of phylogenetic ...
We include two methods, one supervised that uses a bilingual dictionary or identical character strings, and one unsupervised that does not use any parallel data (see Word Translation without Parallel Data for more details). Dependencies Python 2/3 with NumPy/SciPy PyTorch Faiss (recommended) for ...
The package includes a script to build cross-lingual word embeddings with or without parallel data as described in the papers, as well as evaluation tools in word translation induction, word similarity/relatedness and word analogy. If you use this software for academic research,please cite the rel...
However, qualitative studies of language bias are work-intensive, and often limited to small datasets or concepts. This problem is further aggravated in settings where it is necessary to examine several years of data, i.e., a diachronic analysis. It is not only the large amounts of data tha...
We describe an approach to the automatic crea-tion of a sense tagged corpus intended to train a word sense disambiguation (WSD) system for English-Portuguese machine translation. The ap-proach uses parallel corpora, translation diction-aries and a set of straightforward heuristics. In an evaluation...
Machine Translationparallel datauser-generated contentword embeddingstext similaritycomparable corpora.Building a robust MT system requires a sufficiently large parallel corpus to be available as training data. In this paper, we propose to automatically extract parallel sentences from comparable corpora ...
We focus on English to Dutch translation, and we use the Dutch Parallel Corpus (Macken et al. (2011); DPC) as our parallel dataset. Unlike the work done in the Translation Process Research Database (Carl et al., 2016) which uses multiple translations of the same text, we calculate a ...
作者:Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzatoy , Ludovic Denoyerx , Herve Jegouy.(注:一二作的顺序是抛硬币决定的2333)来源:ICLR 2018.机构:Facebook AI Research (FAIR).源代码地址:h…