Skip to main content
U.S. flag

An official website of the United States government

TREC 2001 CROSS LANGUAGE DATASET

Published by National Institute of Standards and Technology | National Institute of Standards and Technology | Catalog Last Checked: August 02, 2025 at 03:47 PM | Dataset Last Updated: October 02, 2024
Ten groups participated in the TREC-2001 cross-language information retrieval track, which focussed on retrieving Arabic language documents based on 25 queries that were originally prepared in English. French and Arabic translations of the queries were also available. This was the first year in which a large Arabic test collection was available, so a variety of approaches were tried and a rich set of experiments performed using resources such as machine translation, parallel corpora, several approaches to stemming and/or morphology, and both pre-translation and post-translation blind relevance feedback. On average, forty percent of the relevant documents discovered by a participating team were found by no other team, a higher rate than normally observed at TREC. This raises some concern that the relevance judgment pools may be less complete than has historically been the case.

Resources

5 resources available

  • LDC2001T55 document collection

    FILE
  • TREC 2001 cross language topics in English

    TRADITIONAL TREC SGML TOPIC FORMAT
  • TREC 2001 cross language topics in Arabic

    TRADITIONAL TREC SGML TOPIC FORMAT
  • TREC 2001 cross language topics in French

    TRADITIONAL TREC SGML TOPIC FORMAT
  • TREC 2001 CLIR Relevance judgments

    WHITESPACE-SEPARATED: TOPIC, "0", DOCUMENT, RELEVANCE LEVEL

Find Related Datasets

Search by Tags

Click any tag below to search for similar datasets

data.gov

An official website of the GSA's Technology Transformation Services

Looking for U.S. government information and services?
Visit USA.gov