Anomaly Detection in Sequences

Metadata Updated: May 2, 2019

We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences.

The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners.

We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior

Access & Use Information

Public: This dataset is intended for public access and use. License: U.S. Government Work

Downloads & Resources


Metadata Created Date August 1, 2018
Metadata Updated Date May 2, 2019
Data Update Frequency irregular

Metadata Source

Harvested from NASA Data.json

Additional Metadata

Resource Type Dataset
Metadata Created Date August 1, 2018
Metadata Updated Date May 2, 2019
Publisher Dashlink
Unique Identifier DASHLINK_3
Ashok Srivastava
Maintainer Email
Public Access Level public
Data Update Frequency irregular
Bureau Code 026:00
Metadata Context
Metadata Catalog ID
Schema Version
Catalog Describedby
Datagov Dedupe Retained 20190501230127
Harvest Object Id 8ee42446-f85b-499c-9bf9-1a616cd72c1f
Harvest Source Id 39e4ad2a-47ca-4507-8258-852babd0fd99
Harvest Source Title NASA Data.json
Data First Published 2010-09-09
Homepage URL
Data Last Modified 2018-07-19
Program Code 026:029
Source Datajson Identifier True
Source Hash 0d0b76e2af5bb5862b116886df69d0e299e25ba4
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.