This page shows an example on outlier detection with the LOF (Local Outlier Factor) algorithm.

The LOF algorithm

LOF (Local Outlier Factor) is an algorithm for identifying density-based local outliers [Breunig et al., 2000]. With LOF, the local density of a point is compared with that of its neighbors. If the former is signi.cantly lower than the latter (with an LOF value greater than one), the point is in a sparser region than its neighbors, which suggests it be an outlier.

Function lofactor(data, k) in packages DMwR and dprep calculates local outlier factors using the LOF algorithm, where k is the number of neighbors used in the calculation of the local outlier factors.

Calculate Outlier Scores

> library(DMwR) > # remove "Species", which is a categorical column > iris2 <- iris[,1:4] > outlier.scores <- lofactor(iris2, k=5) > plot(density(outlier.scores))

> # pick top 5 as outliers > outliers <- order(outlier.scores, decreasing=T)[1:5] > # who are outliers > print(outliers) [1] 42 107 23 110 63

Visualize Outliers with Plots

Next, we show outliers with a biplot of the first two principal components.

> n <- nrow(iris2) > labels <- 1:n > labels[-outliers] <- "." > biplot(prcomp(iris2), cex=.8, xlabs=labels)

We can also show outliers with a pairs plot as below, where outliers are labeled with "+" in red.

Package Rlof provides function lof(), a parallel implementation of the LOF algorithm. Its usage is similar to the above lofactor(), but lof() has two additional features of supporting multiple values of k and several choices of distance metrics. Below is an example of lof().

> library(Rlof) > outlier.scores <- lof(iris2, k=5) > # try with different number of neighbors (k = 5,6,7,8,9 and 10) > outlier.scores <- lof(iris2, k=c(5:10))