Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Mining Distance-Based Outliers in Near Linear Time

Metadata Updated: April 11, 2025

Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date November 12, 2020
Metadata Updated Date April 11, 2025
Data Update Frequency irregular

Metadata Source

Harvested from NASA Data.json

Additional Metadata

Resource Type Dataset
Metadata Created Date November 12, 2020
Metadata Updated Date April 11, 2025
Publisher Dashlink
Maintainer
Identifier DASHLINK_191
Data First Published 2010-09-22
Data Last Modified 2025-03-31
Public Access Level public
Data Update Frequency irregular
Bureau Code 026:00
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id c97a52b2-1852-432a-a9c4-1ecb273786b3
Harvest Source Id 58f92550-7a01-4f00-b1b2-8dc953bd598f
Harvest Source Title NASA Data.json
Homepage URL https://c3.nasa.gov/dashlink/resources/191/
Program Code 026:029
Source Datajson Identifier True
Source Hash 2d6663671799c6b8b77a2cde96b4e7a16c507203057995c59dd0862bc264a266
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.