I started working on my second weekend project, guess I’ll do something small every week. This one is an extension to LifeLogger. The aim is to analyze ones daily and weekly browsing history and extract themes which could aid in recommendations. It is still a ‘work in progress’ – currently I have been able to generate the following visualizations:

The following visualization depicts the dominant keywords/topics for one day (the terms are stemmed):

I had been reading a couple of Yahoo! related articles and visualization blogs. This is captured by the above visualization – but there is still alot of noise which I need to get rid of.

The next visualization depicts the linkages and clusters for the keywords. There exists a link between two terms if they occur in the same document. [may take sometime to load - you'll need to zoom in to get a better look - click on 'compute layout' if the clusters don't show]

Both the above visualizations depict important metrics that could be used to extract dominant themes from the browsing history. Dominance should not be just inferred from frequency but also from the prevalent of a term across multiple pages. I still need to work on removing noise and running this on larger datasets like browsing history for a week or so. If you have any ideas or good papers to recommend that would be nice.