LifeLogger
  Author: Anand Kishore (anand@semanticvoid.com) | Back to my blog logging my life Inspired by MyLifeBits



Screenshots (click to enlarge)
Screenshot 1

Main interface of the LifeLogger UI.
Screenshot 2

Search results from users data.
The results are depicted in different
 colors with each colorrepresenting one
 of the disparate data sources.
Screenshot 3

Various graphs help the user visualize his
browsing/reading/seach patterns.


Screenshot 4

Browse results for a given date range.

Scresnhot 5

Interesting picks of the day.


Gordon Bell has been recording every bit of his life for the past seven years. His custom-designed software, "MyLifeBits" saves everything it can, from every email he sends and receives, every document he types, every chat session he engages in, every Web page he surfs. The advantages of such a software are obvious: total recall. It gives one the ability to search ones life for any reference of a person/thing.

Inspired by it I have decided to start logging my life as well. As of now its restricted to only my online life as I do not have resources like the SenseCam. The data collected in this process could be used in numerous ways: total recall, recommendations, predictions, and so on. As Peter Norvig says, "Its about the data and not the algorithm".

Infact I have been doing this (I didn't realize its advantages back then) for the past two years or so (all thanks to Google's copious amounts of storage). Following are the different aspects of my online life that I have been logging/stashing away:
  • Email: I primarily use Gmail for all my email correspondence. Infact, I have setup filters to forward emails from all my other mail boxes to the primary Gmail account where they are archived.
  • Chat Sessions (IM): Here again I rely on another Google service, GTalk. Most of my IM conversations are on GTalk with record feature turned on. This archives all my chat sessions in my Gmail Chats folder. Meebo also allows one to save their chat sessions. But they do not provide any api to retrieve it back.
  • Search: I had opted for the Google Search History two years back. My search history has about 4500+ searches logged as of today. This forms my database of intentions.
  • Browsing History: I have been keeping a track of all the sites I visit. There are two ways to log such data:
    • Slogger: This firefox extension saves every page you visit into a designated folder, ordered day wise and is highly configurable. It supports various log formats from xml, text to html. Its ability to save text as well as html versions of the page locally facilitate a very fast recall. The only problem one has to deal with here is about storage (that is when you get to the point where you have terabytes of browsing history).
    • Google Web History: This feature launched just a few days back. Coupled with the Google Toolbar, Google logs every page you visit (if you have the PageRank feature turned on in the toolbar). I have just begun exploring this feature, but it certainly relieves me of the burden of storage. (Note: Don't try this if you are paranoid about your privacy)
  • Online Reading: Among all the feed readers, Google Reader provides an interesting view of the trends in your reading history. This data recorded by the reader is not trapped inside Google but is very much accessible. Thus all my feed reading history and patterns are logged without much hassle.
  • Bookmarks (things I find interesting): I usually bookmark pages which I find interesting or which I think I would refer to in the future. I use del.icio.us and Simpy for the same. Both these services provide easy api's and feeds for retrieving the data from their servers. This forms my database of interest.
#  How do I aggregate all this data?

Although most of the data logged resides on remote servers (with privacy not being an issue for me), it can all be aggregated into a unified database. The different tools/ways the data can be retrieved is as follows:
  • Email: To retrieve all the emails from Gmail one can use the g4j (a Java library) or Gmail-APIc (.NET). There are other apis available for other languages as well.
  • Chat Sessions (IM): Chat sessions can be retrieved from the Chats folder in Gmail. This can be done using the apis mentioned above.
  • Search: One can get hold of all their logged searches by the following URL (items in the resulting feed can be identified by the category type as 'web result' for clickthroughs and 'web query' for saerch queries):
    • RSS: https://www.google.com/searchhistory/?output=rss&num=some large number
  • Browsing History: If you use Slogger for logging your browsing history, you can access all your logs in the configureed folder. But if you have dared to let Google save your browsing history, it can be accessed using the URL (items in the resulting feed can be identified by the category type as 'browser result'):
    • RSS: https://www.google.com/searchhistory/?output=rss&num=some large number
  • Online Reading: Every user on Google Reader has a unique id, which is visible in every Google Reader URL e.g. http://www.google.com/reader/view/user/unique id/state/com.google/reading-list. Reading history can be retrieved from Google Reader by using the following URL:
    • Atom: http://www.google.com/reader/atom/user/unique id/state/com.google/read?n=some large number
  • Bookmarks (things I find interesting): Both del.icio.us and Simpy provide numerous ways to access the data off their servers.
#  Code for aggregating the data & the unfied database schema

The code for importing such data can be found at the code repository. The code is GPL lisenced, hence feel free to modify and redistribute. Alternatively, you can browse the source code here.

Currently, the code supports importing data from Google Web History, Google Reader and del.icio.us xmls. Support for Gmail and Gtalk is in progress.

Follow the instructions on this page to deploy and run the program.

For the DB schema refer to the same document.

You can browse the Java Docs here.

#  Algorithms for analyzing the data
 Time Damping Of Textual Relevance [PDF]
Abstract: When a user performs a search in the Life Logger application, he is interested in results which are most relevant but also recent. If the results are ordered by just the textual relevance it may have relevant but older records in the top results. Whereas if they are ordered by time then it may have recent but less relevant records in the top results. This algorithm provides an interesting way to rank results by both textual relevance and time. Click here to read more...

To compare the result ordering with and without the timedamper click below:
Results without TimeDamper for keyword 'swing' Results with TimeDamper for keyword 'swing'

Author: Anand Kishore (anand@semanticvoid.com)
This page was update on 21st July 2007.