Wed 6 May 2009

R is a wonderful platform for statistics, not least of all because of the large number of resources out there. There are packages and notes on just about anything you'd want to do. But given the volume of stuff, it's sometimes very hard to find exactly what you need.

For example, yesterday I was looking for the best way to randomize a vector. I basically wanted the equivalent of the shuffle function in PHP. I had a data set, and I want to sample randomly from the observed distribution N number of times. This is a common thing if you're doing any kind of bootstrapping.

Anyway, Google totally let me down. Although I did eventually figure out the best way to create a random vector. The first thing I tried was generating an array of N random indexes using the runif command. This was fine, except to apply them I had to create a huge for-loop, which is very, very slow. It would have taken 10 days to finish, literally.

Then I took a cue from the description above, and figured out the best (and by far the fastest) way is to use the sample command. Very, very fast. Usage is:

sample(x, size, replace = FALSE, prob = NULL)

x: vector to sample from

size: # of times to sample

replace: sample with or without replacement (sampling with replacement means a value is NOT removed from the pool once it's been randomly selected, so it could be selected many times)

prob: a vector of probability weights for x (I didn't use this)

Hopefully I've filled this post with enough keywords so others can find it!

It definitely helped. Thanks

I'm new to programming… this was unbelievably clear and helpful.. thank you.

Thanks a lot for the help.

Thanks. Simple and helpful

Thaks for the help

This was really helpful as it's one of the top hits for "shuffle array in R". Thanks!

Could I use this to randomize two vectors consisting of Julian dates for the emergence of insects. I need to randomize to try to normalize the data, or to identify how many SD's the data is from a possible normal set.