### Association Rules

This page shows an example of association rule mining with R. It demonstrates association rule mining, pruning redundant rules and visualizing association rules.

### The Titanic Dataset

The Titanic dataset is used in this example, which can be downloaded as "titanic.raw.rdata" at the Data page.

`> str(titanic.raw)`
`'data.frame': 2201 obs. of 4 variables:`
`\$ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...`
`\$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...`
`\$ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 2 2 2 2 2 ...`
`\$ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...`

### Association Rule Mining

`> library(arules)`
`> # find association rules with default settings`
`> rules <- apriori(titanic.raw)`
`> inspect(rules)`
`  lhs               rhs         support   confidence lift`
`1 {}             => {Age=Adult} 0.9504771 0.9504771  1.0000000`
`2 {Class=2nd}    => {Age=Adult} 0.1185825 0.9157895  0.9635051`
`3 {Class=1st}    => {Age=Adult} 0.1449341 0.9815385  1.0326798`
`4 {Sex=Female}   => {Age=Adult} 0.1930940 0.9042553  0.9513700`
`5 {Class=3rd}    => {Age=Adult} 0.2848705 0.8881020  0.9343750`
`6 {Survived=Yes} => {Age=Adult} 0.2971377 0.9198312  0.9677574`
`7 {Class=Crew}   => {Sex=Male}  0.3916402 0.9740113  1.2384742`

`...`

We then set `rhs=c("Survived=No", "Survived=Yes")` in `appearance `to make sure that only "Survived=No" and "Survived=Yes" will appear in the rhs of rules.

`> # rules with rhs containing "Survived" only`
`> rules <- apriori(titanic.raw,`
`  + parameter = list(minlen=2, supp=0.005, conf=0.8),`
`  + appearance = list(rhs=c("Survived=No", "Survived=Yes"),`
`  + default="lhs"),`
`  + control = list(verbose=F))`
`> rules.sorted <- sort(rules, by="lift")`
`> inspect(rules.sorted)`

### Pruning Redundant Rules

In the above result, rule 2 provides no extra knowledge in addition to rule 1, since rules 1 tells us that all 2nd-class children survived. Generally speaking, when a rule (such as rule 2) is a super rule of another rule (such as rule 1) and the former has the same or a lower lift, the former rule (rule 2) is considered to be redundant. Below we prune redundant rules.

`> # find redundant rules`
`> subset.matrix <- is.subset(rules.sorted, rules.sorted)`
`> subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA`
`> redundant <- colSums(subset.matrix, na.rm=T) >= 1`
`> which(redundant)`
` 2 4 7 8`
`> # remove redundant rules`
`> rules.pruned <- rules.sorted[!redundant]`
`> inspect(rules.pruned)`

### Visualizing Association Rules

Package arulesViz supports visualization of association rules with scatter plot, balloon plot, graph, parallel coordinates plot, etc.

`> library(arulesViz)`
`> plot(rules)`
`> plot(rules, method="graph", control=list(type="items"))`

`> plot(rules, method="paracoord", control=list(reorder=TRUE))`