Association Rules

> # rules with rhs containing "Survived" only

> rules <- apriori(titanic.raw,

+ parameter = list(minlen=2, supp=0.005, conf=0.8),

+ appearance = list(rhs=c("Survived=No", "Survived=Yes"),

+ default="lhs"),

+ control = list(verbose=F))

> rules.sorted <- sort(rules, by="lift")

> inspect(rules.sorted)

Association Rule Mining

> library(arules)

> # find association rules with default settings

> rules <- apriori(titanic.raw)

> inspect(rules)

lhs rhs support confidence lift

1 {} => {Age=Adult} 0.9504771 0.9504771 1.0000000

2 {Class=2nd} => {Age=Adult} 0.1185825 0.9157895 0.9635051

3 {Class=1st} => {Age=Adult} 0.1449341 0.9815385 1.0326798

4 {Sex=Female} => {Age=Adult} 0.1930940 0.9042553 0.9513700

5 {Class=3rd} => {Age=Adult} 0.2848705 0.8881020 0.9343750

6 {Survived=Yes} => {Age=Adult} 0.2971377 0.9198312 0.9677574

7 {Class=Crew} => {Sex=Male} 0.3916402 0.9740113 1.2384742

...

We then set rhs=c("Survived=No", "Survived=Yes") in appearance to make sure that only "Survived=No" and "Survived=Yes" will appear in the rhs of rules.

> str(titanic.raw)

'data.frame': 2201 obs. of 4 variables:

$ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...

$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...

$ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 2 2 2 2 2 ...

$ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...

This page shows an example of association rule mining with R. It demonstrates association rule mining, pruning redundant rules and visualizing association rules.

The Titanic Dataset

The Titanic dataset is used in this example, which can be downloaded as "titanic.raw.rdata" at the Data page.

Pruning Redundant Rules

In the above result, rule 2 provides no extra knowledge in addition to rule 1, since rules 1 tells us that all 2nd-class children survived. Generally speaking, when a rule (such as rule 2) is a super rule of another rule (such as rule 1) and the former has the same or a lower lift, the former rule (rule 2) is considered to be redundant. Below we prune redundant rules.

> # find redundant rules

> subset.matrix <- is.subset(rules.sorted, rules.sorted)

> subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA

> redundant <- colSums(subset.matrix, na.rm=T) >= 1

> which(redundant)

[1] 2 4 7 8

> # remove redundant rules

> rules.pruned <- rules.sorted[!redundant]

> inspect(rules.pruned)

> plot(rules, method="graph", control=list(type="items"))

Visualizing Association Rules

Package arulesViz supports visualization of association rules with scatter plot, balloon plot, graph, parallel coordinates plot, etc.

> library(arulesViz)

> plot(rules)

> plot(rules, method="paracoord", control=list(reorder=TRUE))