-
Federal
Complex Document Information Processing (CDIP) dataset
National Institute of Standards and Technology —
This dataset is called the "IIT CDIP collection". "CDIP" stands for "Complex Document Information Processing" and "IIT" stands for "Illinois Institute of Technology"... -
Federal
TREC 2022 Deep Learning test collection
National Institute of Standards and Technology —
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in... -
Federal
NIST TREC Disks 4 and 5: Retrieval Test Collections Document Set
National Institute of Standards and Technology —
A collection of full-text documents from various sources including the Financial Times Limited (1991, 1992, 1993, 1994), the Congressional Record of the 103rd... -
Federal
STI Tagging Models
National Aeronautics and Space Administration —
Keyword models for a subset of the NASA Thesaurus (https://www.sti.nasa.gov/nasa-thesaurus/). These models were trained on the NASA Technical Reports Server (NTRS)....