Wed 6 May 2009
R is a wonderful platform for statistics, not least of all because of the large number of resources out there. There are packages and notes on just about anything you'd want to do. But given the volume of stuff, it's sometimes very hard to find exactly what you need.
For example, yesterday I was looking for the best way to randomize a vector. I basically wanted the equivalent of the shuffle function in PHP. I had a data set, and I want to sample randomly from the observed distribution N number of times. This is a common thing if you're doing any kind of bootstrapping.
Anyway, Google totally let me down. Although I did eventually figure out the best way to create a random vector. The first thing I tried was generating an array of N random indexes using the runif command. This was fine, except to apply them I had to create a huge for-loop, which is very, very slow. It would have taken 10 days to finish, literally.
Then I took a cue from the description above, and figured out the best (and by far the fastest) way is to use the sample command. Very, very fast. Usage is:
sample(x, size, replace = FALSE, prob = NULL)
x: vector to sample from
size: # of times to sample
replace: sample with or without replacement (sampling with replacement means a value is NOT removed from the pool once it's been randomly selected, so it could be selected many times)
prob: a vector of probability weights for x (I didn't use this)
Hopefully I've filled this post with enough keywords so others can find it!