Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Mining Distance-Based Outliers in Near Linear Time

Metadata Updated: December 6, 2023

Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date November 12, 2020
Metadata Updated Date December 6, 2023
Data Update Frequency irregular

Metadata Source

Harvested from NASA Data.json

Additional Metadata

Resource Type Dataset
Metadata Created Date November 12, 2020
Metadata Updated Date December 6, 2023
Publisher Dashlink
Maintainer
Identifier DASHLINK_191
Data First Published 2010-09-22
Data Last Modified 2020-01-29
Public Access Level public
Data Update Frequency irregular
Bureau Code 026:00
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://data.nasa.gov/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id d4f7303f-291c-44f7-8523-81584659e4ea
Harvest Source Id 58f92550-7a01-4f00-b1b2-8dc953bd598f
Harvest Source Title NASA Data.json
Homepage URL https://c3.nasa.gov/dashlink/resources/191/
Program Code 026:029
Source Datajson Identifier True
Source Hash ecb0916407f4ce87b29fe420e0e72789579bd1bb45b11b16ab58fa335dc389a0
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.