Q: Function below does not remove URLs completely. removeURL <- function(x) gsub("http[[:alnum:]]*", "", x) A: Use code below, where ":alnum:" matches any alphanumeric characters, incl. letters and numbers, and ":punct:" matches punctuation characters. See details by running "?regex" under R or googling for "regular expression". removeURL <- function(x) gsub("http[[:alnum:][:punct:]]*", "", x)
If there are non-ASCII characters in URL, you can use function below, which removes string starting with "http" and followed by any number of non-space characters. removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
|