This page shows an example on text mining of Twitter data with R packages twitteR, tm and wordcloud. Package twitteR provides access to Twitter data, tm provides functions for text mining, and wordcloud visualizes the result with a word cloud.
If you have no access to Twitter, the tweets data can be downloaded as file "rdmTweets.RData" at the Data page, and then you can skip the first step below.
the twitteR vignettes on CRAN or this link to complete authentication before running the code below.
After that, the corpus needs a couple of transformations, including changing letters to lower case, removing punctuations/numbers and removing stop words. The general English stop-word list is tailored by adding "available" and "via" and removing "r".
Print the first three documents in the built corpus.
Something unexpected in the above stemming and stem completion is that, word "mining" is first stemmed to "mine", and then is completed to "miners", instead of "mining", although there are many instances of "mining" in the tweets, compared to only one instance of "miners".
Based on the above matrix, many data mining tasks can be done, for example, clustering, classification and association analysis.
The above word cloud clearly shows that "r", "data" and "mining" are the three most important words, which validates that the @RDataMining tweets present information on R and data mining. The other important words are "analysis", "examples", "slides", "tutorial" and "package", which shows that it focuses on documents and examples on analysis and R packages.
More examples on text mining with R and other data mining techniques can be found in my book "R and Data Mining: Examples and Case Studies", which is downloadable as a .PDF file at the link.