Examples‎ > ‎

Association Rules


This page shows an example of association rule mining with R. It demonstrates association rule mining, pruning redundant rules and visualizing association rules.

The Titanic Dataset

The Titanic dataset is used in this example, which can be downloaded as "titanic.raw.rdata" at the Data page.

> str(titanic.raw)
'data.frame': 2201 obs. of 4 variables:
$ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 2 2 2 2 2 ...
$ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...


Association Rule Mining

> library(arules)
> # find association rules with default settings
> rules <- apriori(titanic.raw)
> inspect(rules)
  lhs               rhs         support   confidence lift
1 {}             => {Age=Adult} 0.9504771 0.9504771  1.0000000
2 {Class=2nd}    => {Age=Adult} 0.1185825 0.9157895  0.9635051
3 {Class=1st}    => {Age=Adult} 0.1449341 0.9815385  1.0326798
4 {Sex=Female}   => {Age=Adult} 0.1930940 0.9042553  0.9513700
5 {Class=3rd}    => {Age=Adult} 0.2848705 0.8881020  0.9343750
6 {Survived=Yes} => {Age=Adult} 0.2971377 0.9198312  0.9677574
7 {Class=Crew}   => {Sex=Male}  0.3916402 0.9740113  1.2384742

...

We then set rhs=c("Survived=No", "Survived=Yes") in appearance to make sure that only "Survived=No" and "Survived=Yes" will appear in the rhs of rules.

> # rules with rhs containing "Survived" only
> rules <- apriori(titanic.raw,
  + parameter = list(minlen=2, supp=0.005, conf=0.8),
  + appearance = list(rhs=c("Survived=No", "Survived=Yes"),
  + default="lhs"),
  + control = list(verbose=F))
> rules.sorted <- sort(rules, by="lift")
> inspect(rules.sorted)

 
Association rules

Pruning Redundant Rules

In the above result, rule 2 provides no extra knowledge in addition to rule 1, since rules 1 tells us that all 2nd-class children survived. Generally speaking, when a rule (such as rule 2) is a super rule of another rule (such as rule 1) and the former has the same or a lower lift, the former rule (rule 2) is considered to be redundant. Below we prune redundant rules.

> # find redundant rules
> subset.matrix <- is.subset(rules.sorted, rules.sorted)
> subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
> redundant <- colSums(subset.matrix, na.rm=T) >= 1
> which(redundant)
[1] 2 4 7 8
> # remove redundant rules
> rules.pruned <- rules.sorted[!redundant]
> inspect(rules.pruned)

Assocition rules with redundancy removed

Visualizing Association Rules

Package arulesViz supports visualization of association rules with scatter plot, balloon plot, graph, parallel coordinates plot, etc.

> library(arulesViz)
> plot(rules)
A scatter plot of association rules
> plot(rules, method="graph", control=list(type="items"))
A graph of association rules

> plot(rules, method="paracoord", control=list(reorder=TRUE))

A parallel-coordinates plot of association rules