The gadget spec URL could not be found

RDataMining LinkedIn group has over 20,000 members

posted May 19, 2016, 12:58 PM by Yanchang Zhao   [ updated May 20, 2016, 5:12 AM ]

The RDataMining group on LinkedIn has got 20,000 members.

Created in August 2011, the group received wide interest and expanded very quickly. In less than 5 years, its size reached 20,000.
It is now one of the top LinkedIn groups for data mining and data science. It is one of the fastest growing groups and one of the most active groups in above area (KDnuggets, 2015).

Thanks to all members for their contributions to the RDataMining group. As a big group on data mining and analytics, we will continue our journey of sharing knowledge and experiences in R and Data mining and will make our group better and better.

If you are not a member yet, please join us at:

CFP: AusDM 2016. Submission due 19 Aug

posted Apr 28, 2016, 7:19 AM by Yanchang Zhao   [ updated Apr 28, 2016, 7:21 AM ]

14th Australasian Data Mining Conference (AusDM 2016)
Canberra, Australia,
6-8 December 2016
Join us on LinkedIn:

The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. AusDM'16 seeks to showcase: Research Prototypes; Industry Case Studies; Practical Analytics Technology; and Research Student Projects.

Publication and topics
We are calling for papers, both research and applications, and from both academia and industry, for presentation at the conference. Accepted papers will be published in an up-coming volume (Data Mining and Analytics 2016) of the Conferences in Research and Practice in Information Technology (CRPIT) series by the Australian Computer Society which is also held in full-text on the ACM Digital Library. AusDM invites contributions addressing current research in data mining and knowledge discovery as well as experiences, novel applications and future challenges.

Submission of papers
- Academic submissions: Regular academic submissions can be made in Research Track reporting on research progress, with a paper length of between 8 and 12 pages in CRPIT style.
- Industry submissions: Submissions can be made in the Application Track to report on specific data mining implementations and experiences in governments and industry projects. Submissions in this category can be between 4 and 8 pages in CRPIT style.
- Industry Showcase submissions: Submission from industry and government on an analytics solution that has raised profits, reduced costs and/or achieved other important policy and/or business outcomes can be made in this track with a one page Abstract only.

Important Dates
Paper Submission Closed: Friday 19 August 2016
Authors Notified: Monday 24 October 2016
Camera Ready Submission: Monday 7 November 2016
Conference Dates: 6-8 December 2016

Upcoming Training

posted Apr 24, 2016, 3:01 PM by Yanchang Zhao   [ updated Apr 24, 2016, 3:02 PM ]

I will run one-week training for the Big Data and Analytics course at the S P Jain School of Global Management, Mumbai, India, 30 May - 5 June 2016.

Hadoop, Spark, NoSQL and Data Science Training Courses with exclusive 15% discount

posted Apr 6, 2016, 3:53 AM by Yanchang Zhao   [ updated Apr 6, 2016, 3:53 AM ]

DeZyre provides webinar and training courses on big data and data science as below. Enrol at the links below to get an exclusive 15% discount.

Twitter Data Analysis with R

posted Feb 15, 2016, 1:22 AM by Yanchang Zhao   [ updated Feb 15, 2016, 1:24 AM ]

Slides of my invited talk on Twitter Data Analysis with R at the Making Data Analysis Easier Workshop Organised by the Monash Business Analytics Team (WOMBAT 2016) are available as file RDataMining-slides-twitter-analysis.pdf at the Documents page.


A Twitter dataset for text mining

posted Feb 9, 2016, 4:22 AM by Yanchang Zhao   [ updated Feb 9, 2016, 4:23 AM ]

A dataset of @RDataMining Tweets extracted on 3 February 2016 is now available at Datasets. The dataset can be used for text mining purpose.

Seminar on Cyberbullying Detection by Applying AI Approaches, University of Canberra, 4:30pm Tuesday 2 Feb 2016

posted Jan 27, 2016, 12:39 AM by Yanchang Zhao   [ updated Jan 27, 2016, 12:39 AM ]

Topic: Cyberbullying Detection by Applying AI Approaches
Speakers: Prof. Chengqi Zhang and Dr. Guodong Long, University of Technology, Sydney
Organisers: The Canberra Data Scientists group and the Information Technology & Engineering Program, University of Canberra
Date and time: 4:30-6:00pm, Tuesday, 2 February 2016
Location: Teal Room, Inspire Centre (Building 25 on the map at

Cyberbullying is the use of technology to bully a person or group with the intent to hurt them socially, psychologically or even physically.  Currently there are many young people being cyberbullied or involving in cyberbullying activities, and the cyberbullying offence has been defined as criminal activity by law. To avoid the severe results (e.g. spirit trauma, or be charged as criminal), cyberbullying detection emerged to real-time proactively prevent cyberbullying by generating early warning. Most studies on Cyberbullying detection focus on key-words search and sentiment filtering on textual contents. All of them neglects the online conversation's rich context information including texts, networks, time and demographics. In this talk, we will introduce a novel solution for applying AI approaches to detect cyberbullying by exploiting rich heterogeneous context information.

Prof. Chengqi Zhang is a Research Professor of Information Technology at The University of Technology Sydney (UTS), an Honorary Professor of the University of Queensland (UQ), Director of the UTS Priority Research Centre for Quantum Computation & Intelligent Systems (QCIS). He is Alternative Dean of UTS Graduate Research School, Chairman of the Australian Computer Society National Committee for Artificial Intelligence and Chairman of IEEE Computer Society Technical Committee of Intelligent Informatics (TCII). Chengqi Zhang obtained his PhD degree from the University of Queensland in 1991, followed by a Doctor of Science (DSc – Higher Doctorate) from Deakin University in 2002. His key areas of research are Distributed Artificial Intelligence, Data Mining and its applications. He has published more than 200 refereed research papers and six monographs and edited 16 books. He has attracted 12 ARC grants of $4.7M. He is a Fellow of the Australian Computer Society (ACS) and a Senior Member of the IEEE Computer Society (IEEE).

Dr. Guodong Long has over 10 years experience on leading, developing and coordinating industry and research projects. Since joined UTS in 2010, he has practically led total five industry projects including three ARC Linkage projects. Before join in UTS in 2010, Dr Long has over 6 years industry work experience in IT company. He has strong system-wide knowledge of all computer-related, especially for architecture and design for artificial intelligent based systems; and have strong creativity on research methodology and real application systems. He currently leads a research team to conduct application-driven research by collaborating with industry partners. He obtained his BSc and MSc degree from National University of Defence Technology (NUDT) in 2002, 2008, and PhD degree from University of Technology Sydney (UTS) in 2014, all from computer science.

Slides of 10+ excellent tutorials at KDD 2015: Spark, graph mining and many more

posted Aug 17, 2015, 4:01 AM by Yanchang Zhao   [ updated Aug 17, 2015, 4:02 AM ]

See slides of 10+ excellent tutorials at KDD 2015 at, incl.
  • VC-Dimension and Rademacher Averages: From Statistical Learning Theory to Sampling Algorithms
  • Graph-Based User Behavior Modeling: From Prediction to Fraud Detection
  • A New Look at the System, Algorithm and Theory Foundations of Large-Scale Distributed Machine Learning
  • Dense subgraph discovery (DSD)
  • Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach
  • Big Data Analytics: Optimization and Randomization
  • Big Data Analytics: Social Media Anomaly Detection: Challenges and Solutions
  • Diffusion in Social and Information Networks: Problems, Models and Machine Learning Methods
  • Medical Mining
  • Large Scale Distributed Data Science using Apache Spark
  • Data-Driven Product Innovation
  • Web Personalization and Recommender Systems a mirror site for Chinese users

posted Aug 5, 2015, 5:30 AM by Yanchang Zhao   [ updated Aug 5, 2015, 5:31 AM ] now has a mirror website at Users in China can download RDataMining documents, code and data at above mirror site, if no access to

Note that will still be the primary site and please visit only when you have no access to the primary site.

Please let me know if you have access to neither of two sites below. Thanks.

CIKM Machine Learning Competition 2015

posted Jul 28, 2015, 12:29 PM by Yanchang Zhao   [ updated Jul 28, 2015, 12:29 PM ]

The CIKM Machine Learning Competition 2015 is centered around the AFL. Participants are required to predict the outcomes of every match in the 2015 AFL season in two phases:

- the Leaderboard phase, where contestants predict the outcome of each regular-season match in the 2015 AFL season. The corresponding leaderboard will be updated as the season progresses. This phase will be based on an honour system since the results of matches will already be known.

- the Finals phase, where contestants predict the outcome of each match in the 2015 AFL Finals Series. Submissions will close prior to the commencement of the first finals series match. The final leaderboard of the competition will be determined from these matches and a competition winner will be annonced after the 2015 AFL Grand Final.

The winner of the competition will be awarded $5,000 (AUD) and will be required to provide a satisfactory description of their approach.

Competition opens: 24 July 2015
Submissions close: 10 Sept 2015


1-10 of 81