Domain: Information Retrieval

Overview

Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing.

Dataset List

Dataset Introductions MQ2007 is a query set from Million Query track of TREC 2007. There are about 1700 queries in it with labeled documents. In MQ2007, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. In each fold, there are three subsets fo...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
LambdaMART 0.4811 0.4550 0.4395 0.4255 0.4193 0.4658 0.4122 Detail
RankBoost 0.4799 0.4578 0.4440 0.4252 0.4113 0.4624 0.4126 Detail
svm_struct 0.4746 0.4495 0.4315 0.4193 0.4135 0.4644 0.4096 Detail
RankNet 0.4515 0.4303 0.4129 0.4004 0.3915 0.4500 0.3893 Detail
AdaRank 0.4480 0.4335 0.4253 0.4190 0.4073 0.4602 0.3913 Detail
ListNet 0.4456 0.4311 0.4129 0.4031 0.3935 0.4461 0.3822 Detail
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
LambdaMART 0.4085 0.4115 0.4141 0.4185 0.4985 Detail
RankBoost 0.4147 0.4185 0.4191 0.4191 0.4995 Detail
svm_struct 0.4073 0.4062 0.4084 0.4142 0.4966 Detail
RankNet 0.3895 0.3907 0.3924 0.3966 0.4821 Detail
AdaRank 0.3963 0.4021 0.4091 0.4125 0.4922 Detail
ListNet 0.3868 0.3894 0.3944 0.3974 0.4798 Detail
Dataset Introductions MQ2008 is a query set from Million Query track of TREC 2008. There are about 800 queries in it with labeled documents. In MQ2008, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. In each fold, there are three subsets for l...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
LambdaMART 0.4489 0.4113 0.3843 0.3596 0.3405 0.4753 0.3741 Detail
RankBoost 0.4413 0.4094 0.3962 0.3689 0.3480 0.4758 0.3665 Detail
AdaRank 0.4374 0.4081 0.3848 0.3657 0.3431 0.4783 0.3720 Detail
svm_struct 0.4273 0.4068 0.3903 0.3695 0.3474 0.4695 0.3626 Detail
ListNet 0.4107 0.3865 0.3656 0.3511 0.3342 0.4555 0.3486 Detail
RankNet 0.4068 0.3839 0.3622 0.3466 0.3280 0.4514 0.3422 Detail
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
LambdaMART 0.4056 0.4288 0.4486 0.4689 0.4852 Detail
RankBoost 0.3919 0.4281 0.4487 0.4680 0.4820 Detail
AdaRank 0.4064 0.4289 0.4585 0.4759 0.4885 Detail
svm_struct 0.3984 0.4285 0.4508 0.4695 0.4832 Detail
ListNet 0.3808 0.4061 0.4322 0.4546 0.4682 Detail
RankNet 0.3776 0.4042 0.4281 0.4475 0.4642 Detail
Dataset Introductions MQ2007 is a query set from Million Query track of TREC 2007. There are about 1700 queries in it with labeled documents. In MQ2007, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. The data format in...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
Dataset Introductions MQ2008 is a query set from Million Query track of TREC 2008. There are about 800 queries in it with labeled documents. In MQ2008, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. The data format...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
Dataset Introductions MQ2007 is a query set from Million Query track of TREC 2007. There are about 1700 queries in it with labeled documents. In MQ2007, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. In each fold, there are three subsets for le...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
Dataset Introductions MQ2008 is a query set from Million Query track of TREC 2008. There are about 800 queries in it with labeled documents. In MQ2008, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. In each fold, there are three subsets for lea...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
Dataset Introductions MQ2007 is a query set from Million Query track of TREC 2007. There are about 1700 queries in it with labeled documents. In MQ2007, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. The data format in MQ2007-list is the same a...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
Dataset Introductions MQ2008 is a query set from Million Query track of TREC 2008. There are about 800 queries in it with labeled documents. In MQ2008, the 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. The data format in MQ2008-list is the same as...
BASELINE P@1 P@2 P@3 P@4 P@5 MAP NDCG@1 Evaluation
BASELINE NDCG@2 NDCG@3 NDCG@4 NDCG@5 MeanNDCG Evaluation
Welcome to the to the TREC 2009 Web Track. Our goal is to explore and evaluate Web retrieval technologies over the new billion-page ClueWeb09 Dataset. The dataset was crawled from the Web during January and February 2009 and 50 topics will be used. For the purposes of the diversity track, each topi...
BASELINE ALPHA_NDCG@5 ALPHA_NDCG@10 ALPHA_NDCG@20 ERR_NDCG@5 Evaluation
PAMM(α-NDCG) 0.6370 0.4080 0.4270 0.5640 Detail
PAMM(ERR-IA) 0.5210 0.4250 0.4220 0.5670 Detail
BASELINE ERR_NDCG@10 ERR_NDCG@20 Evaluation
PAMM(α-NDCG) 0.2910 0.2840 Detail
PAMM(ERR-IA) 0.3400 0.2940 Detail
Welcome to the to the TREC 2010 Web Track. Our goal is to explore and evaluate Web retrieval technologies over the billion-page ClueWeb09 Dataset.. The dataset was crawled from the Web during January and February 2009 and 48 topics will be used. For the purposes of the diversity track, each topic w...
BASELINE ALPHA_NDCG@5 ALPHA_NDCG@10 ALPHA_NDCG@20 ERR_NDCG@5 Evaluation
PAMM(α-NDCG) 0.7290 0.6640 0.5250 0.4130 Detail
PAMM(ERR-IA) 0.6330 0.6250 0.5100 0.4190 Detail
BASELINE ERR_NDCG@10 ERR_NDCG@20 Evaluation
PAMM(α-NDCG) 0.3810 0.3820 Detail
PAMM(ERR-IA) 0.4090 0.3860 Detail
Welcome to the to the TREC 2011 Web Track. Our goal is to explore and evaluate Web retrieval technologies over the new billion-page ClueWeb09 Dataset. The dataset was crawled from the Web during January and February 2009 and 50 topics will be used. For the purposes of the diversity track, each topi...
BASELINE ALPHA_NDCG@5 ALPHA_NDCG@10 ALPHA_NDCG@20 ERR_NDCG@5 Evaluation
PAMM(α-NDCG) 0.8290 0.6790 0.6460 0.5610 Detail
PAMM(ERR-IA) 0.6870 0.6510 0.6360 0.5970 Detail
BASELINE ERR_NDCG@10 ERR_NDCG@20 Evaluation
PAMM(α-NDCG) 0.5710 0.5380 Detail
PAMM(ERR-IA) 0.5950 0.5470 Detail
Robust04 is a small news dataset. The topics are collected from TREC Robust Track 2004. Here the Robust04-Desc means that the description of the topic are used as query. The collection is consist of 0.5M documents and 250 queries. The vocabulary size is 0.6M, and the collection length is 252M. Thes...
BASELINE MAP NDCG@20 P@20 Evaluation
DRMM(LCH-IDF) 0.2750 0.4370 0.3710 Detail
NWT 0.2680 0.4130 0.3530 Detail
QL 0.2460 0.3910 0.3340 Detail
BM25 0.2410 0.3990 0.3370 Detail
MatchPyramid(COS) 0.1900 0.3300 0.1620 Detail
MatchPyramid(IND) 0.1420 0.3190 0.1180 Detail
MatchPyramid(DOT) 0.1040 0.1590 0.0920 Detail
DSSM-D 0.0780 0.1690 0.1450 Detail
CDSSM-D 0.0500 0.1130 0.0930 Detail
ARC-II 0.0420 0.0860 0.0740 Detail
ARC-I 0.0300 0.0470 0.0450 Detail
ClueWeb09B is a large Web collection, whose topics are accumulated from TREC Web Tracks 2009, 2010, and 2011. And ClueWeb09B is filtered to the set of documents with spam scores in the 60th percentile, us ing the Waterloo Fusion spam scores [1]. The collection consist of 34M documents and 150 querie...
BASELINE MAP nDCG@20 P@20 Evaluation
DRMM(LCH-IDF) 0.1130 0.2580 0.3650 Detail
NWT 0.1070 0.2360 0.3410 Detail
BM25 0.1010 0.2250 0.3260 Detail
QL 0.1000 0.2240 0.3280 Detail
MatchPyramid(COS) 0.0660 0.2220 0.2900 Detail
CDSSM-T 0.0640 0.1530 0.2140 Detail
MatchPyramid(IND) 0.0560 0.2080 0.2810 Detail
DSSM-T 0.0540 0.1320 0.1850 Detail
CDSSM-D 0.0540 0.1340 0.1770 Detail
MatchPyramid(DOT) 0.0440 0.1580 0.1550 Detail
DSSM-D 0.0390 0.0990 0.1310 Detail
ARC-II 0.0330 0.0870 0.1230 Detail
ARC-I 0.0240 0.0730 0.0890 Detail
Robust04 is a small news dataset. The topics are collected from TREC Robust Track 2004. Here the Robust04-Title means that the title of the topic are used as query. The collection is consist of 0.5M documents and 250 queries. The vocabulary size is 0.6M, and the collection length is 252M. These dat...
BASELINE MAP NDCG@20 P@20 Evaluation
DRMM(LCH-IDF) 0.2760 0.4310 0.3820 Detail
NWT 0.2740 0.4260 0.3800 Detail
BM25 0.2550 0.4180 0.3700 Detail
QL 0.2530 0.4150 0.3690 Detail
MatchPyramid(COS) 0.1890 0.3300 0.2900 Detail
MatchPyramid(IND) 0.1690 0.3190 0.2810 Detail
DSSM-D 0.0950 0.2010 0.1710 Detail
MatchPyramid(DOT) 0.0830 0.1590 0.1550 Detail
CDSSM-D 0.0670 0.1460 0.1250 Detail
ARC-II 0.0670 0.1470 0.1280 Detail
ARC-I 0.0410 0.0660 0.0650 Detail
the WordEmbedding dataset contains word embeddings used in Robust04 and Clueweb09B dataset. The word embeddings are trained on corresponding corpus with the word2vec toolkit. These data can only be used for academic research purposes....
BASELINE MAP NDCG@20 P@20 Evaluation
ClueWeb09B is a large Web collection, whose topics are accumulated from TREC Web Tracks 2009, 2010, and 2011. And ClueWeb09B is filtered to the set of documents with spam scores in the 60th percentile, us ing the Waterloo Fusion spam scores [1]. The collection consist of 34M documents and 150 querie...
BASELINE MAP NDCG@20 P@20 Evaluation
DRMM(LCH-IDF) 0.0870 0.2270 0.2940 Detail
NWT 0.0800 0.2040 0.2640 Detail
BM25 0.0800 0.1960 0.2550 Detail
QL 0.0750 0.1830 0.2340 Detail
MatchPyramid(COS) 0.0570 0.1400 0.1710 Detail
CDSSM-T 0.0550 0.1390 0.1710 Detail
CDSSM-D 0.0490 0.1250 0.1600 Detail
DSSM-T 0.0460 0.1190 0.1430 Detail
MatchPyramid(IND) 0.0430 0.1180 0.1580 Detail
DSSM-D 0.0340 0.0780 0.1030 Detail
MatchPyramid(DOT) 0.0330 0.0730 0.1020 Detail
ARC-II 0.0240 0.0560 0.0750 Detail
ARC-I 0.0170 0.0360 0.0510 Detail