Examples‎ > ‎

### Outlier Detection

This page shows an example on outlier detection with the LOF (Local Outlier Factor) algorithm.

### The LOF algorithm

LOF (Local Outlier Factor) is an algorithm for identifying density-based local outliers [Breunig et al., 2000]. With LOF, the local density of a point is compared with that of its neighbors. If the former is signi.cantly lower than the latter (with an LOF value greater than one), the point is in a sparser region than its neighbors, which suggests it be an outlier.

Function lofactor(data, k) in packages DMwR and dprep calculates local outlier factors using the LOF algorithm, where k is the number of neighbors used in the calculation of the local outlier factors.

### Calculate Outlier Scores

`> library(DMwR)`
`> # remove "Species", which is a categorical column`
`> iris2 <- iris[,1:4]`
`> outlier.scores <- lofactor(iris2, k=5)`
`> plot(density(outlier.scores))`

`> # pick top 5 as outliers`
`> outliers <- order(outlier.scores, decreasing=T)[1:5]`
`> # who are outliers`
`> print(outliers)`
` 42 107 23 110 63`

### Visualize Outliers with Plots

Next, we show outliers with a biplot of the first two principal components.

`> n <- nrow(iris2)`
`> labels <- 1:n`
`> labels[-outliers] <- "."`
`> biplot(prcomp(iris2), cex=.8, xlabs=labels)`

We can also show outliers with a pairs plot as below, where outliers are labeled with "+" in red.

`> pch <- rep(".", n)`
`> pch[outliers] <- "+"`
`> col <- rep("black", n)`
`> col[outliers] <- "red"`
`> pairs(iris2, pch=pch, col=col)`

### Parallel Computation of LOF Scores

Package Rlof provides function lof(), a parallel implementation of the LOF algorithm. Its usage is similar to the above lofactor(), but lof() has two additional features of supporting multiple values of k and several choices of distance metrics. Below is an example of lof().

`> library(Rlof)`
`> outlier.scores <- lof(iris2, k=5)`
`> # try with different number of neighbors (k = 5,6,7,8,9 and 10)`
`> outlier.scores <- lof(iris2, k=c(5:10))`