Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

SIAM 2007 Text Mining Competition dataset

Metadata Updated: December 7, 2023

Subject Area: Text Mining

Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining algorithms for document classification. The documents in question were aviation safety reports that documented one or more problems that occurred during certain flights. The goal was to label the documents with respect to the types of problems that were described. This is a subset of the Aviation Safety Reporting System (ASRS) dataset, which is publicly available.

How Data Was Acquired: The data for this competition came from human generated reports on incidents that occurred during a flight.

Sample Rates, Parameter Description, and Format: There is one document per incident. The datasets are in raw text format. All documents for each set will be contained in a single file. Each row in this file corresponds to a single document. The first characters on each line of the file are the document number and a tilde separats the document number from the text itself.

Anomalies/Faults: This is a document category classification problem.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources


Metadata Created Date November 12, 2020
Metadata Updated Date December 7, 2023
Data Update Frequency irregular

Metadata Source

Harvested from NASA Data.json

Additional Metadata

Resource Type Dataset
Metadata Created Date November 12, 2020
Metadata Updated Date December 7, 2023
Publisher Dashlink
Identifier DASHLINK_138
Data First Published 2010-09-22
Data Last Modified 2020-01-29
Public Access Level public
Data Update Frequency irregular
Bureau Code 026:00
Metadata Context
Metadata Catalog ID
Schema Version
Catalog Describedby
Harvest Object Id bee69eb9-ad35-42f7-b26c-190b44d3f7e2
Harvest Source Id 58f92550-7a01-4f00-b1b2-8dc953bd598f
Harvest Source Title NASA Data.json
Homepage URL
Program Code 026:029
Source Datajson Identifier True
Source Hash ff3e6b8c08175cc932e888eae5a004df8573782b2c8d56e40ede5cf0a5da6399
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.

An official website of the GSA's Technology Transformation Services

Looking for U.S. government information and services?