I started working on my second weekend project, guess I’ll do something small every week. This one is an extension to LifeLogger. The aim is to analyze ones daily and weekly browsing history and extract themes which could aid in recommendations. It is still a ‘work in progress’ – currently I have been able to generate the following visualizations:
The following visualization depicts the dominant keywords/topics for one day (the terms are stemmed):
I had been reading a couple of Yahoo! related articles and visualization blogs. This is captured by the above visualization – but there is still alot of noise which I need to get rid of.
The next visualization depicts the linkages and clusters for the keywords. There exists a link between two terms if they occur in the same document. [may take sometime to load - you'll need to zoom in to get a better look - click on 'compute layout' if the clusters don't show]
Both the above visualizations depict important metrics that could be used to extract dominant themes from the browsing history. Dominance should not be just inferred from frequency but also from the prevalent of a term across multiple pages. I still need to work on removing noise and running this on larger datasets like browsing history for a week or so. If you have any ideas or good papers to recommend that would be nice.
The LifeLogger project has been on a standstill for a couple of months now. Either I have been too busy (personal reasons) or didn’t have the required knowledge to tackle the challenging problems in the project. But I haven’t been just sitting idle during the last few months. I figured that the best way to go about the project would be to first get the relevant knowledge. So I have been busy reading AI literature, familiarizing myself with the algorithms and theory.
I intend to scrap the current code base and start afresh this time, but with clear cut goals and a fast paced development cycle. I’m seeking to collaborate with interested folks so that we can jointly build this ambitious project. If you are interested, send me an email at anand [at] semanticvoid.com or just write a comment.
I received an email from Koders, a few hours back, saying that LifeLogger had finally been approved and added to their index. This is a great step for LifeLogger as it joins other numerous open source projects on Koders. This way the project would gain more visibility and contribute to code reusability as well.
I still need to make a dozen check-ins from the code changes/feature implementations I made for the BarCamp last week. So you’ll still find old code at Koders and the svn. I’ll try to clear up the pending check-ins by next week.
I found some cool stats on the Koders page:
|Lines of code:
|Person months (PM):
|Effort per KLOC:
>> Click here to head over LifeLogger at Koders < <
Update: I had to publish this post again using WordPress as Google Docs did not use the document name as the title of the blog post.
Note: This post was written and published using Google Docs. I’m testing out Google Docs blog integration, so if the display is all messed up blame them :-) .