Archive for the Trends Category

The Three Day Puzzle

| August 7th, 2008

I came across this rather interesting paper – “Terror attacks influence driving behavior in Israel” authored by Guy Stecklov and Joshua R. Goldstein (Carnegie Mellon University), which provides interesting correlations between ‘terror attacks’ and ‘traffic accidents’. Now what do ‘terror attacks’ and ‘traffic accidents’ have anything to do with some ’3 day puzzle’. Some interesting details regarding the 3-day mystery provided in the paper are as follows:

  • No day-0, day-1, or day-2 effects of terror are observed on traffic fatalities, but there is an increase of almost 35% in the rate of traffic fatalities 3 days after terror attacks.
  • The findings suggest that the third-day effect of large terror attacks is even larger, with a 69% increase in traffic fatalities.
  • Interestingly, the 3-day lag observed is similar to that found in studies on imitative suicides, in which well publicized suicides are followed 3 days after with a rise in traffic fatalities. [reference]
  • A similar 3-day spike in homicides is also found after major boxing matches. [reference]
  • Some fraction of the increase in traffic fatalities after terror attacks may be attributable to covert suicides and/or increased aggression on the road.
  • There is a notable lack of longer-term effects beyond the 3-day spike in fatal accidents. Days four and beyond have normal levels of traffic volume and accidents and suggest that the effects of terror are transient.
  • Why traffic fatalities increase on the third day after a terror attack remains a puzzle.

Possible explanations to the 3-day lag are:

  • The day-three increase in fatalities coincides with the time when those exposed to terror may try to return to their normal routines but are not yet psychologically, and perhaps physiologically, sufficiently recovered.
  • Yet another explanation for the 3-day lag is that it is a counterreaction to the collective bonding that occurs immediately after the terror event, similar to the “post-suppression rebound” found in experimental psychology.

The interesting point to look for in the graphs below is the 3rd day after the attack.

Terror attacks - Traffic accidents
Figure: Model estimates of proportional effects of terror attacks on traffic volume and accident rates by number of days after attack. (Upper) Results for all attacks. (Lower) Large attacks only.

Other similar interesting reads are in the references of the paper. I am listing a few of them below:

I started this project as a weekend thingy a day back but hit a hurdle due to the Twitter rate limits. Since I couldn’t complete this project, I thought it would be rather nice to write a post explaining what I had in mind.

Twitterers like bloggers are driven/motivated by their *follower* counts (who doesn’t like attention). But unlike blogs there is no mechanism in Twitter to determine the visibility of ones tweets or gain insights into how one can maximize the visibility of their tweets. By visibility I mean the chance of my tweet being read by my followers. I’ll try to explain my approach to estimate this *visibility* with the following example:


The image above depicts the user (imagine yourself) at the center along with his followers and the users they are following (users which your followers are following – this can get crazy). The idea is just this dead simple – the visibility of your tweet depends on the twittering habits of your followers followings. Let me slow down – imagine a queue of size 20 assigned to each of your followers. This is the list of recent tweets displayed on Twitter for every user. Your tweet’s lifetime is as long it doesn’t get pushed out of this queue – which depends solely on the twittering frequency of your followers followings. The faster they tweet, the faster your tweet vanishes off this list and vice-versa. This lifetime/visibility of your tweet averaged over all your followers can be figured out algorithmically as follows:

  • Fetch all your followers and their followings (lets call this FF).
  • For each FF – calculate their average twittering time (do this by analyzing their last 50-100 tweets).
  • Using the times calculated above, calculate the time for your tweet to get pushed off the queue for each of your followers. This can be done by simulating tweets from each of the followers following (FF) at the frequency calculated above and pushing them into the queue.
  • Average this value across all your followers.

The above not only gives you insight on how long would your tweet be visible but also which of your followers could potentially read your tweet. For example, you could answer queries like – ‘if I tweet now, which of my followers could still read this tweet after 5 hours’ or ‘if I tweet now, which of my followers will have already read/missed this tweet after 3 hours’.

For more accurate results, one could also analyze the twittering habits of the FF’s (followers following) and generate a probabilistic model of their twittering frequency relative to the time of the day. This could help generate dynamic statistics depending on the time you intend to twitter.

Another insight that could be provided with the above statistics could be the duration one should wait before putting another tweet. I, for example, twitter sometimes in bursts (with just seconds delay between consecutive tweets). Followers usually give the most attention to your most recent tweet. The more consecutive tweets you have, the higher the chances of the older tweets getting less attention. If one can figure out the average time it would take for ones older tweet to get pushed off the ‘recent 20′ list of his follower/s, it would give one a better idea of how long should one wait before twittering again.

I’m sure that there are many more cool statistics/insights one can figure out from ones network in Twitter. I would be glad to hear out your take on this as well as any improvements you might have to suggest. I would love to make this an app once I can get Twitter to raise the hourly request limit for me – Twitter are you listening?

I started working on my second weekend project, guess I’ll do something small every week. This one is an extension to LifeLogger. The aim is to analyze ones daily and weekly browsing history and extract themes which could aid in recommendations. It is still a ‘work in progress’ – currently I have been able to generate the following visualizations:

The following visualization depicts the dominant keywords/topics for one day (the terms are stemmed):

I had been reading a couple of Yahoo! related articles and visualization blogs. This is captured by the above visualization – but there is still alot of noise which I need to get rid of.

The next visualization depicts the linkages and clusters for the keywords. There exists a link between two terms if they occur in the same document. [may take sometime to load - you'll need to zoom in to get a better look - click on 'compute layout' if the clusters don't show]

Both the above visualizations depict important metrics that could be used to extract dominant themes from the browsing history. Dominance should not be just inferred from frequency but also from the prevalent of a term across multiple pages. I still need to work on removing noise and running this on larger datasets like browsing history for a week or so. If you have any ideas or good papers to recommend that would be nice.

What’s up with July?

| July 14th, 2008

In my search for interesting datasets/visualizations I came across this graph depicting the average daily vehicle-distance traveled across urban highways in the United States. This graph lays out the trends for 2006, 2007 and 2008.

There is an interesting pattern that is consistent for 2006 and 2007, with 2008 probably following suit. I’ve been trying hard to make sense out of this graph and here is are my interpretations at the first glance:

  • The dip around January and around December is probably due to christmas and new years – people staying at home celebrating the events with close family and friends. The cold season may also be a contributing factor.
  • The steep increase from February to June is probably due to the onset of summer – summer holidays.

What somehow doesn’t make sense to me is the sudden dip in July. Is there any explanation for that? People opting to participate in the 4th of July celebrations at home? Hmm plausible? If you are based in the US, I would love to read your interpretation of the graph.