This is a short course on R and Data Mining.

### Prerequisite

- Knowledge and experience of R, or similar programming languages
- Basic knowledge of data mining and machine learning, such as

### Requirement

You will need to bring your own laptop. Please install the required software and R packages and download the datasets, slides and scripts below before coming to the course.

- Software and Packages
- R

http://www.r-project.org/ - RStudio (desktop edition)

http://www.rstudio.com/products/rstudio/download/ - R packages (please run the R script to install required R packages)

http://www.rdatamining.com/books/rdm/code/Install-R-packages.R **RStudio project archive [RDM-course.zip], which contains all datasets, slides and scripts below.**Alternatively, you may download individual files separately at links below.- Datasets
- Titanic dataset

http://www.rdatamining.com/data/titanic.raw.rdata - Twitter dataset

http://www.rdatamining.com/data/RDataMining-Tweets-20160212.rds - Graph dataset http://www.rdatamining.com/data/graph.rdata

- Titanic dataset
- Slides
- R Programming [PDF]
- Data Exploration and Visualisation with R [PDF]
- Regression and Classification with R [PDF]
- Data Clustering with R [PDF]
- Association Rule Mining with R [PDF]
- Text Mining with R [PDF]
- Time Series Analysis with R [PDF]
- Network Analysis and Graph Mining with R [PDF]
- Hadoop, Spark and R [PDF]
- R Reference Card for Data Mining [PDF]

- R scripts [ZIP]

### Outline

This course consists of 9 sessions below. Each session will be one hour, composed of a 40-minute tutorial and a 20-minute exercise.

Part I:

- R Programming

basics of R language and programming, parallel computing, and data import and export - Data Exploration and Visualisation

summary, stats and various charts - Regression and Classification

linear regression and logistic regression, decision trees and random forest

Part II:

- Data Clustering k-means clustering, k-medoids clustering, hierarchical clustering and density-based clustering
- Time Series Analysis time series decomposition, forecasting, classification and clustering
- Network Analysis and Graph Mining

graph construction, graph query, centrality measures, and graph visualisation

Part III:

- Association Rule Mining

mining and selecting interesting association rules, redundancy removal, and rule visualisation - Text Mining

text mining, word cloud, topic modelling, and sentiment analysis - Big Data

Hadoop, Spark and R

If you have any questions or feedback, please do not hesitate to contact me on yanchang <at> RDataMining.com. Thanks.