Domain: Representation Learning

Overview

word, sentence, document representation learning.

Dataset List

The word analogy task is introduced by Mikolov et al. (2013) to quantitatively evaluate the linguistic regularities between pairs of word representations. The task consists of questions like “a is to b as c is to __”, where is missing and must be guessed from the entire vocabulary....
BASELINE SEMANTIC SYNTATIC TOTAL Evaluation
CBOW 73.5800 65.9500 69.5000 Detail
PDC 72.7700 67.6800 70.3500 Detail
GloVe 71.3900 53.7200 61.5700 Detail
HDC 69.5700 63.7500 66.6700 Detail
SG 65.6200 56.6100 60.6400 Detail
Minh-Thang Luong et al. introduced a new dataset focusing on rare words. Its 2034 word pairs contain more morphological complexity than other well-established word similarity datasets, e.g. crudeness—impoliteness.. Details can be found in this paper. Reference Minh-Thang Luong, Richard Socher, a...
BASELINE SPEARMAN_RANK_CORRELATION Evaluation
PDC 47.5400 Detail
CBOW 45.9200 Detail
HDC 44.2600 Detail
GloVe 42.8600 Detail
SG 42.6800 Detail
WordSim 353 is a standard dataset for evaluuating vector-space models. It consists of 353 pairs of words. Each pair is presented without context and rated by 13 or 16 human on similarity or relatedness on a scale from 0 (totally unrelated words) to 10 (very much related or identical words). Details...
BASELINE SPEARMAN_RANK_CORRELATION Evaluation
PDC 74.1200 Detail
CBOW 73.2500 Detail
HDC 70.2500 Detail
GloVe 68.9300 Detail
SG 68.6900 Detail
Huang et al (2012) introduced a new dataset with human judgments on pairs of words in sentential context, Stanford’s Contextual Word Similarities (SCWS). The dataset consists of 2003 word pairs and their sentential contexts. It consists of 1328 noun-noun pairs, 399 verb-verb pairs, 140 verb-noun, ...
BASELINE SPEARMAN_RANK_CORRELATION Evaluation
PDC 66.5900 Detail
CBOW 64.8200 Detail
HDC 62.7600 Detail
SG 62.4700 Detail
GloVe 62.3700 Detail