MQ2007-list

Descriptions

Dataset Introductions

MQ2007 is a query set from Million Query track of TREC 2007. There are about 1700 queries in it with labeled documents. In MQ2007, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package.
The data format in MQ2007-list is the same as that in MQ2007.The difference is that the ground truth of this setting is a permutation for a query instead of multiple level relevance judgements.

=====================================
   Folds     Training set   Validation set   Test

set
   Fold1     {I1,I2,I3}

   I4                 I5


   Fold2     {I2,I3,I4}

   I5                 I1


   Fold3     {I3,I4,I5}

   I1                 I2


   Fold4     {I4,I5,I1}

   I2                 I3


   Fold5     {I5,I1,I2}

   I3                 I4


=====================================

Dataset Descriptions

The first column is the relevance degree of a document in ground truth permutation. Large value of the relevance degree means top position of the document in the permutation. The other columns are the same as that in the setting of supervised ranking. An example is shown as follow.

=====================================
   1008 qid:10 1:0.004356 2:0.080000 3:0.036364 4:0.000000 ... 46:0.000000 #docid = GX057-59-4044939 inc = 1 prob = 0.698286
   1007 qid:10 1:0.004901 2:0.000000 3:0.036364 4:0.333333 ... 46:0.000000 #docid = GX235-84-0891544 inc = 1 prob = 0.567746
   1006 qid:10 1:0.019058 2:0.240000 3:0.072727 4:0.500000 ... 46:0.000000 #docid = GX016-48-5543459 inc = 1 prob = 0.775913
   1005 qid:10 1:0.004901 2:0.160000 3:0.018182 4:0.666667 ... 46:0.000000 #docid = GX068-48-12934837 inc = 1 prob = 0.659932

=====================================

Download

MQ2007-list     Downloads  12  times