May 2009


I just read through Oded Nov's paper from Communications of the ACM:

Nov, O. (2007). What motivates Wikipedians? Commun. ACM, 50(11), 60-64. (link)

Two things occur to me. First, Nov explains away the potential influence of social desirability in about two sentences, but I'm not buying it. When you ask people why they do something, there's a huge number of social factors that are going to come into play. In the case of Wikipedia I also think there are likely to be lots of soft and implicit attitudes. Soft attitudes are expressions that don't reflect beliefs, but rather answers to questions someone might not have thought about previously. For example, if I asked you "How do you feel about Kobe Bryant elbowing Ron Artest in the neck last night?", you might respond by saying it's abhorrent. If I took that at face value, I'd be ignoring the fact that many people don't know about basketball, don't know who Bryant or Artest are, don't know the contest, or don't care. Unconscious attitudes, on the other hand, are attitudes that we hold and act on but can't express. To me, neither of these things makes survey research of this type invalid – I do similar surveys myself! But they're important issues, too often left out of discussions.

The second issue, maybe more important, is about scope. There are a fair number of studies now about motivations for contributing to various online collective actions. But they almost always focus on people who contribute a lot. However, these papers, like Nov's, usually don't make that distinction. They make claims about motivations for all contributors. In reality, the motivations of casual or infrequent contributors are likely to be very, very different. Harder to study, though! On the one hand, by studying the heavy contributors we capture motivations for majority of the work that gets done, but we do that at the expense of attention to the vast majority of people who contribute.

In sum: Social desirability, soft attitudes, etc. need more consideration when we talk about motivation. Studies that focus on heavy contributors should say as much, and more studies should look at casual contributors' motivations.

I just wanted to share the solution to a problem with Dreamhost that I just ran in to. A site I designed for a client a few months back stopped working all of a sudden. No one had touched the code. I can access the root page, but none of the interior pages are working. Clicking on the link returns a 'No Input File Specified' error in Firefox, and a 404 in IE.

The problem was with the .htaccess redirect I use to create SEO friendly URLs. I use the method that I wrote about here, and which works on many, many other sites I host on Dreamhost. So, I'm thinking WTF?

Of course, the Dreamhost support people were no help. I figure there must have been some kind of configuration change on the server-side… no other way to explain this. And I notice that the server we're on was recently upgraded to Apache 2.2.11. Then I find a forum thread started by someone who's having my exact problem. And based on a tip in one of the posts, I made a one line change in my .htaccess that solves the problem:

Before: RewriteRule ^(.*)$ index.php/$1 [L]

After: RewriteRule ^(.*)$ index.php?/$1 [L]

Adding that question mark did the trick. I'm still not sure why it's now required, or why it changed suddenly. I'd be grateful to anyone who can explain that to me!

R is a wonderful platform for statistics, not least of all because of the large number of resources out there. There are packages and notes on just about anything you'd want to do. But given the volume of stuff, it's sometimes very hard to find exactly what you need.

For example, yesterday I was looking for the best way to randomize a vector. I basically wanted the equivalent of the shuffle function in PHP. I had a data set, and I want to sample randomly from the observed distribution N number of times. This is a common thing if you're doing any kind of bootstrapping.

Anyway, Google totally let me down. Although I did eventually figure out the best way to create a random vector. The first thing I tried was generating an array of N random indexes using the runif command. This was fine, except to apply them I had to create a huge for-loop, which is very, very slow. It would have taken 10 days to finish, literally.

Then I took a cue from the description above, and figured out the best (and by far the fastest) way is to use the sample command. Very, very fast. Usage is:

sample(x, size, replace = FALSE, prob = NULL)

x: vector to sample from
size: # of times to sample
replace: sample with or without replacement (sampling with replacement means a value is NOT removed from the pool once it's been randomly selected, so it could be selected many times)
prob: a vector of probability weights for x (I didn't use this)

Hopefully I've filled this post with enough keywords so others can find it!

Our internet has been down at home this week – what a drag. We have Vonage, too, which means no internet, no phone. We're cut off from the modern world! Except for our cell phones, and the TV, and the radio, and the friendly neighbor who (unwittingly) provides us with a shaky wireless signal.

What's so interesting about the neighborhood signal is how variable it is. Some days I can get 4 bars of signal in my office, and some days I have to carry my laptop out onto the back porch to get just 1 bar. Not only that, but sometime 4 bars will support streaming video on ESPN (for example), and other times it'll barely fetch my email headers. I'm so confused about why it's so up and down. Is it weather conditions? Signal interference in the area? Is it physical? I mean, can something like a door, whether it's open or shut, influence the signal that much when you're far away and it's comparatively weak? I know that physical barriers like walls, etc. is a big factor in signal quality, but I would expect that stuff to be largely constant – either it can make it through the walls or not. I wouldn't expect such big swings from day to day. Sometimes from hour to hour. Does anyone know what sorts of things can effect WiFi at distance?

Anyway, the repair tech. comes today. Let's hope we get our own internet back!