Archive for July, 2007

There was a speculation about Google planning to introduce an ‘unavailable after’ meta tag. It would probably look something like this:

< META name='unavailable_after' content='Wed, 01 Aug 2007 00:00:01 GMT'>

By specifying this tag, webmasters can tell Google not to index a particular page after the specified date OR consider the given page as stale. This would be appropriate for promotional pages where the promotions expire after a given time. This would help unclog the search engine indexes of irrelevant data.

This valuable piece of information, provided by the ‘unavailable_after’ tag, would not only be used to clear up the Google’s index but could also make its way into Google’s ranking algorithms. There are two perspectives to how a search engine could use this data for ranking:

  • When a page specifies its expiry/unavailability date, it implicitly tells the search engine of the period for which it would be most relevant. Hence, as the unavailability date of a page approaches it should start becoming less irrelevant to a users query. For example: The user query for ‘fedora release notes‘, currently, has the top 3 results pointing to the notes of FC7, whereas the other results have a random mix of release notes for FC3, FC2, FC4 and FC5 (with FC3 and FC2 pages being ranked higher than FC5 and FC4 respectively). Lets say that FC8 was going to be released this November [schedule]. Assume that the Release Notes page for FC7 has the unavailable_after tag set for sometime around December. Thus, as December approaches, FC7 pages would start losing their relevancy for such queries and gradually transition lower in the search rankings, making FC8 the most relevant result. This would resolve the current inaccurate ordering of results obtained for the query ‘fedora release notes‘ on Google. This could be achieved in a manner similar to proposed in the following paper: Time Damping Of Textual Relevance.
  • This perspective (inverse to the above perspective) would be very specific to promotional/shopping related pages. Most shopping promotions/offers are valid for a given period. Hence, such pages should become more relevant to a users query as they approach their expiry date. For example: Consider a user query for ‘20% discount shoes‘. Lets assume that the results have pages from Zappos as well as Shoebuy both offering a 20% discount on shoes. The Shoebuy sale is going to last for about two more weeks from today (as specified by the unavailable_after tag) whereas the Zappos sale would be ending in another two days. Since both the stores are offering the same percentage discount, it would be more appropriate to rank the Zappos page higher as its offer would be ending soon. From the point of a user (shopper), he would be more interested in looking at offers ending soon, as he can always checkout the other long lasting offers at some later time.

The perspectives above represent only a few of the conceivable usages of the unavailable_after tag. There could be a numerous other perspectives to how this data could be utilized to improve search rankings.

I would lover to hear your take on the unavailable tag, particularly if you can provide another perspective to utilizing this data.

Awaken The Hacker Within

| July 19th, 2007

My good friend and fellow Barcamp organizer, Rohit Srivastwa, came up with this brilliant idea last Barcamp. He has been attending various hacker conventions like Black Hat and thought that it would be appropriate for hosting an event on similar lines in India. ClubHack is scheduled to be hosted in Pune (India) sometime around this December. You can head over and register for the following events:

Or you can register yourself for attending the convention.

Although this event would be moderated, we would still try to keep it as close to the flavor of the Barcamp. So fellas, awaken the hacker within you and be there to present your exploits, mods, software hacks or network with fellow nerds.
You can stay updated about ClubHack here as well as check out the blog.

LifeLogger: Now At Koders

| July 13th, 2007

I received an email from Koders, a few hours back, saying that LifeLogger had finally been approved and added to their index. This is a great step for LifeLogger as it joins other numerous open source projects on Koders. This way the project would gain more visibility and contribute to code reusability as well.

I still need to make a dozen check-ins from the code changes/feature implementations I made for the BarCamp last week. So you’ll still find old code at Koders and the svn. I’ll try to clear up the pending check-ins by next week.

I found some cool stats on the Koders page:

Development Cost:
$7,715
Assumptions
Lines of code: 1,543
Person months (PM): 1.54
Functions required: 100%
Effort per KLOC: 1.00 PM
Labor Cost/Month: $5000

>> Click here to head over LifeLogger at Koders < <

Update: I had to publish this post again using WordPress as Google Docs did not use the document name as the title of the blog post.
Note: This post was written and published using Google Docs. I’m testing out Google Docs blog integration, so if the display is all messed up blame them :-) .

LifeLogger @ BarCamp Pune 3

| July 9th, 2007

A couple of things have kept me away from blogging in the last few weeks. One of them has been about organizing BarCamp Pune 3 and the other my latest ‘weekend project’ – LifeLogger. I presented the concept behind LifeLogger along with a small demo, yesterday at the Barcamp. I received some really good feedback along with a dozen questions about the project. The audience appreciated the idea and I hope I got them thinking about how powerful their data (however trivial) could be.

This time around we crossed the 250 registrations mark at the camp. There was a considerable improvement this time in overall planning and the level of the sessions. Thankfully, technical sessions dominated the event as compared to advertising sessions in the last camp. We are definitely getting better with each event.

This event couldn’t have been successful without the help from the sponsors: ThoughtWorks for the tshirts, Persistent for the venue and hospitality, Codewalla, Bookeazy and AsianLaws.org for their support.

Coming back to the LifeLogger, here are a few screenshots of what the application looks as of today:


Click to enlarge

The screenshot above displays the main interface for the LifeLogger UI. Along with the option to search/browse your data, it also provides a graphical representation of the system (i.e. number of searches, browse, blogs read etc for the user).


Click to enlarge

The screenshot above displays the search results from the users data. The results are visually depicted in different colors, with each color representing one of the disparate sources (bookmarks/blog/browse URL/search) of the data.

Click to enlarge

The screenshot above displays the date range browse feature. Various graphs help the user visualize his browsing/reading/search patterns along with the results.


Click to enlarge

The screenshot above depicts the browse results for the given date range in the descending order of time.


Click to enlarge

One of the cool features of LifeLogger is that when a user browses his data for the date range equivalent to a single day, LifeLogger looks back at the web history data for seven days prior and estimates which URL’s visited during that day are aberrant to the users daily browsing pattern. These results are then displayed separately as ‘Interesting picks of the day’ along with the browse results. These results summarize the key events day.

If you are still reading this post you can learn more about LifeLogger at its homepage. I would love to hear your feedback/suggestions.