Remove URLs from text
Post date: Mar 24, 2015 7:55:56 PM
Q: Function below does not remove URLs completely.
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
A: Use code below, where ":alnum:" matches any alphanumeric characters, incl. letters and numbers, and ":punct:" matches punctuation characters. See details by running "?regex" under R or googling for "regular expression".
removeURL <- function(x) gsub("http[[:alnum:][:punct:]]*", "", x)
If there are non-ASCII characters in URL, you can use function below, which removes string starting with "http" and followed by any number of non-space characters.
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)