Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

TREC 2022 Deep Learning test collection

Metadata Updated: September 30, 2025

This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

References

https://trec.nist.gov/pubs/trec31/papers/Overview_deep.pdf

Dates

Metadata Created Date September 30, 2025
Metadata Updated Date September 30, 2025
Data Update Frequency irregular

Metadata Source

Harvested from Commerce Non Spatial Data.json Harvest Source

Additional Metadata

Resource Type Dataset
Metadata Created Date September 30, 2025
Metadata Updated Date September 30, 2025
Publisher National Institute of Standards and Technology
Maintainer
Identifier ark:/88434/mds2-2974
Data First Published 2023-04-07
Language en
Data Last Modified 2023-03-01 00:00:00
Category Information Technology:Data and informatics
Public Access Level public
Data Update Frequency irregular
Bureau Code 006:55
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id 9531f20e-790c-476b-907a-5dab4814905c
Harvest Source Id bce99b55-29c1-47be-b214-b8e71e9180b1
Harvest Source Title Commerce Non Spatial Data.json Harvest Source
Homepage URL https://data.nist.gov/od/id/mds2-2974
License https://www.nist.gov/open/license
Program Code 006:045
Related Documents https://trec.nist.gov/pubs/trec31/papers/Overview_deep.pdf
Source Datajson Identifier True
Source Hash 020343f9c4b2b91fb8883781b315e02d10fc06013aae904d967259bde03a2260
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.