This is a short course on R and Data Mining.



You will need to bring your own laptop. Please install the required software and R packages and download the datasets, slides and scripts below before coming to the course.


This course consists of 9 sessions below. Each session will be one hour, composed of a 40-minute tutorial and a 20-minute exercise.

Part I: 

  • R Programming 
    basics of R language and programming, parallel computing, and data import and export
  • Data Exploration and Visualisation
    summary, stats and various charts
  • Regression and Classification
    linear regression and logistic regression, decision trees and random forest
Part II:
  • Data Clustering
  • k-means clustering, k-medoids clustering, hierarchical clustering and density-based clustering
  • Time Series Analysis
  • time series decomposition, forecasting, classification and clustering
  • Network Analysis and Graph Mining
    graph construction, graph query, centrality measures, and graph visualisation
Part III:
  • Association Rule Mining 
    mining and selecting interesting association rules, redundancy removal, and rule visualisation
  • Text Mining
    text mining, word cloud, topic modelling, and sentiment analysis
  • Big Data
    Hadoop, Spark and R


