Examples‎ > ‎

Outlier Detection

The gadget spec URL could not be found
This page shows an example on outlier detection with the LOF (Local Outlier Factor) algorithm.

The LOF algorithm

LOF (Local Outlier Factor) is an algorithm for identifying density-based local outliers [Breunig et al., 2000]. With LOF, the local density of a point is compared with that of its neighbors. If the former is signi.cantly lower than the latter (with an LOF value greater than one), the point is in a sparser region than its neighbors, which suggests it be an outlier.

Function lofactor(data, k) in packages DMwR and dprep calculates local outlier factors using the LOF algorithm, where k is the number of neighbors used in the calculation of the local outlier factors.

Calculate Outlier Scores

> library(DMwR)
> # remove "Species", which is a categorical column
> iris2 <- iris[,1:4]
> outlier.scores <- lofactor(iris2, k=5)
> plot(density(outlier.scores))

A density plot of outlier scores

> # pick top 5 as outliers
> outliers <- order(outlier.scores, decreasing=T)[1:5]
> # who are outliers
> print(outliers)
[1] 42 107 23 110 63

Visualize Outliers with Plots

Next, we show outliers with a biplot of the first two principal components.

> n <- nrow(iris2)
> labels <- 1:n
> labels[-outliers] <- "."
> biplot(prcomp(iris2), cex=.8, xlabs=labels)

Visualization of outliers in a biplot of the 1st two principal components

We can also show outliers with a pairs plot as below, where outliers are labeled with "+" in red.

> pch <- rep(".", n)
> pch[outliers] <- "+"
> col <- rep("black", n)
> col[outliers] <- "red"
> pairs(iris2, pch=pch, col=col)

Visualization of outliers with a matrix of scatter plots

Parallel Computation of LOF Scores

Package Rlof provides function lof(), a parallel implementation of the LOF algorithm. Its usage is similar to the above lofactor(), but lof() has two additional features of supporting multiple values of k and several choices of distance metrics. Below is an example of lof().

> library(Rlof)
> outlier.scores <- lof(iris2, k=5)
> # try with different number of neighbors (k = 5,6,7,8,9 and 10)
> outlier.scores <- lof(iris2, k=c(5:10))

The gadget spec URL could not be found