Fri 25 Feb 2011
More random R tips that I had a hard time Googling.
When you're writing SQL queries you will often want to do something like
SELECT * FROM table1 WHERE id IN (SELECT id FROM table2). Of course you don't need to do it with a sub-query. But the point is, often you will want to pull out only those rows that have an id which appears in another result set.
I need to do this all the time in R. For example, I have a dataframe in which each row has a unique ID. I do a bunch of processing and end up with a vector if ids that meet my criteria, and I would like to pull out only those rows from the original dataframe which are in this new vector of ids. Of course there are many ways to do this, but I think the SQL-like way is straightforward, and it turns out it's easy to do it in R, if only you can find the damned documentation! Well, here you go: use the %in% operator. It works just like it does in a SQL query:
result <- data[data$id %in% subset.ids,]
This says select all columns from data where the data$id field is in the vector called subset.ids. I find this to be super useful, hopefully you will too!