Archive for the Google Category

Google App Engine has its advantages – easy deployment, scalability – but does it really matter if you can’t write number crunching code to utilize the Google architecture. The PARTICLE project I’m working on hit a stumbling block yesterday because Google kills processes that run over 3 seconds (dammit Google! not everyone can write good code). The only solution for me was to run the process in the background and at regular intervals. Google App Engine does not allow you to run processes in the backround or schedule jobs as in cron. The only way you can interact with an application in GAE is via http.

After giving it much thought, I figured out the following solution to running cron jobs in google app engine.

Step 1: Create a datastore entity that represents a task the cron job has to perform:

# Class representing tags cron job
class CronJobTask(db.Model):
# Put any properties here

Step 2: Create a class representing the cron

class CronJob(webapp.RequestHandler):
def get(self):
# Do your cron processing here by fetching the crontasks from the datastore
# Do not forget to remove completed cron tasks from the datastore (if they are not periodic)
# Output the following code as a response
self.response.out.write("< script >function reload(){ document.location = '/cron?time=' + new Date().getTime() } setTimeout('reload()', 1000); < /script >")

Step 3: Map some url to the cron job class like “/cron” below.

def main():
application = webapp.WSGIApplication([('/cron', CronJob), ('/', MainPage)], debug=True)
wsgiref.handlers.CGIHandler().run(application)

The following code snippet is the key idea:

self.response.out.write("< script >function reload(){ document.location = '/cron?time=' + new Date().getTime() } setTimeout('reload()', 1000); < /script >")

What the above does is that it causes the browser requesting this page to periodically poll the cron url. The cron can be started by pointing your browser to http://yourapp.appspot.com/cron. Opening multiple browser sessions/tabs to the same url spawns multiple cron processes which complete the cron tasks in parallel. You will always need to have atleast one browser tab pointing to this url at all times.

There was a speculation about Google planning to introduce an ‘unavailable after’ meta tag. It would probably look something like this:

< META name='unavailable_after' content='Wed, 01 Aug 2007 00:00:01 GMT'>

By specifying this tag, webmasters can tell Google not to index a particular page after the specified date OR consider the given page as stale. This would be appropriate for promotional pages where the promotions expire after a given time. This would help unclog the search engine indexes of irrelevant data.

This valuable piece of information, provided by the ‘unavailable_after’ tag, would not only be used to clear up the Google’s index but could also make its way into Google’s ranking algorithms. There are two perspectives to how a search engine could use this data for ranking:

  • When a page specifies its expiry/unavailability date, it implicitly tells the search engine of the period for which it would be most relevant. Hence, as the unavailability date of a page approaches it should start becoming less irrelevant to a users query. For example: The user query for ‘fedora release notes‘, currently, has the top 3 results pointing to the notes of FC7, whereas the other results have a random mix of release notes for FC3, FC2, FC4 and FC5 (with FC3 and FC2 pages being ranked higher than FC5 and FC4 respectively). Lets say that FC8 was going to be released this November [schedule]. Assume that the Release Notes page for FC7 has the unavailable_after tag set for sometime around December. Thus, as December approaches, FC7 pages would start losing their relevancy for such queries and gradually transition lower in the search rankings, making FC8 the most relevant result. This would resolve the current inaccurate ordering of results obtained for the query ‘fedora release notes‘ on Google. This could be achieved in a manner similar to proposed in the following paper: Time Damping Of Textual Relevance.
  • This perspective (inverse to the above perspective) would be very specific to promotional/shopping related pages. Most shopping promotions/offers are valid for a given period. Hence, such pages should become more relevant to a users query as they approach their expiry date. For example: Consider a user query for ‘20% discount shoes‘. Lets assume that the results have pages from Zappos as well as Shoebuy both offering a 20% discount on shoes. The Shoebuy sale is going to last for about two more weeks from today (as specified by the unavailable_after tag) whereas the Zappos sale would be ending in another two days. Since both the stores are offering the same percentage discount, it would be more appropriate to rank the Zappos page higher as its offer would be ending soon. From the point of a user (shopper), he would be more interested in looking at offers ending soon, as he can always checkout the other long lasting offers at some later time.

The perspectives above represent only a few of the conceivable usages of the unavailable_after tag. There could be a numerous other perspectives to how this data could be utilized to improve search rankings.

I would lover to hear your take on the unavailable tag, particularly if you can provide another perspective to utilizing this data.

Google Developer Day

| April 14th, 2007

I was hoping to attend the Google Developer Day 2007 at their Bangalore office this May. But it seems that this time we’ll (Indians) have to settle for the webcast. When asked that why India wasn’t chosen to host this event, this is what Google had to say:

We do not have the proper event planning resources in the Bangalore office to support an event such as this. However, we are webcasting the US general session and one of the main conference tracks. We’re also video recording the event and will be posting it online after the event.

Darn! There goes my chance of tasting the sumptuous Google food.

Edward De Bono, author of the book titled ‘Six Thinking Hats‘, theorized the Six Hat Thinking methodology. As stated in his book:

It is not possible to be presensitized in different directions at the same time just as it would not be possible to design a golf club that was the best club for driving and at the same time the best club for putting. That is why the Six Hats method is essential. It allows the brain to maximize its sensitivity in different directions at different times….The essence of parallel thinking is that at any moment everyone is looking in the same direction — but the direction can be changed. An explorer might be asked to look north or to look east. Those are standard direction labels. So we need some direction labels for thinking. What are the different directions in which thinkers can be invited to look?

This is were the hats come in. Each hat symbolizes a thinking direction. De Bono’s six hats are:

  • White Hat: is neutral and objective, concerned with objective facts and figures.
  • Red Hat: relates to intuition, emotions & opinion.
  • Black Hat: is gloomy, and covers the negative – why things can’t be done.
  • Yellow Hat: symbolizes brightness and optimism, indicating hope and positive thinking.
  • Green Hat: focuses on creativity: the possibilities, alternatives and new ideas.
  • Blue Hat: is concerned with the control and organization of the thinking process.

Thats all about the Six Hats. But where does Google fit in?

Lets take a look at the Google logo:

If we apply the Six Hat approach to interpret the logo, we get:

  • 2 blues: signifying the unified and organized thinking process at Google.
  • 2 reds: signifies that they value intuition, emotions, opinions of the users. Outputting innovative products by not just their brains but their hearts as well. This is evident from the simplistic and intuitive UI designs of Google Search and Gmail.
  • 1 yellow: symbolizes their optimistic outlook, hoping that every new product would bring them one step closer to their ultimate aim and that is to organize the worlds information.
  • 1 green: new ideas, alternatives, innovation…Well, thats Google :)

Conclusion

The above interpretation of the logo actually seems to be in sync with the current image of Google and the prevalent mindset of the average googler. Did I just unearth Google’s success formula?

Hope so :)