PDC

Descriptions

Parallel Document Context (PDC) model is an unsupervised learning algorithm for obtaining vector representations for words.

In this model, a target word is predicted by its surrounding context, as well as the document it occurs in. The former prediction task captures the paradigmatic relations, since words with similar context will tend to have similar representations. While the latter prediction task models the syntagmatic relations, since words co-occur in the same document will tend to have similar representations.

The model can be viewed as an extension of CBOW model, by adding an extra document branch.

You could see more datial in the paper.

Software

The software can be downloaded at this page.


Useage

./w2v -train data.txt -word_output vec.txt -size 200 -window 5 -subsample 1e-4 -negative 5 -model pdc -binary 0 -iter 5

 -train, the input file of the corpus, each line a document;
 -word_output, the output file of the word embeddings;
 -binary, whether saving the output file in binary mode; the default is 0 (off);
 -word_size, the dimension of word embeddings; the default is 100;
 -doc_size, the dimension of word embeddings; the default is 100;
 -window, max skip length between words; default is 5;
 -negative, the number of negative samples used in negative sampling; the deault is 5;
 -subsample, parameter for subsampling; default is 1e-4;
 -threads, the total number of threads used; the default is 1.
 -alpha, the starting learning rate; default is 0.025 for HDC and 0.05 for PDC; 
 -model, model used to learn the word embeddings; default is Parallel Document Context model(pdc) (use hdc for Hierarchical Document Context model)
 -min-count, the threshold for occurrence of words; default is 5;
 -iter, the number of iterations; default is 5;

Reference

  • Fei Sun , Jiafeng Guo, Yanyan Lan, Jun Xu and Xueqi Cheng. Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations. The 53rd Annual Meeting of the Association for Computational Linguistics (ACL2015)

Download

WordRep-master.zip     Downloads  0  times