Google App Engine has its advantages – easy deployment, scalability – but does it really matter if you can’t write number crunching code to utilize the Google architecture. The PARTICLE project I’m working on hit a stumbling block yesterday because Google kills processes that run over 3 seconds (dammit Google! not everyone can write good code). The only solution for me was to run the process in the background and at regular intervals. Google App Engine does not allow you to run processes in the backround or schedule jobs as in cron. The only way you can interact with an application in GAE is via http.

After giving it much thought, I figured out the following solution to running cron jobs in google app engine.

Step 1: Create a datastore entity that represents a task the cron job has to perform:

# Class representing tags cron job
class CronJobTask(db.Model):
# Put any properties here

Step 2: Create a class representing the cron

class CronJob(webapp.RequestHandler):
def get(self):
# Do your cron processing here by fetching the crontasks from the datastore
# Do not forget to remove completed cron tasks from the datastore (if they are not periodic)
# Output the following code as a response
self.response.out.write("< script >function reload(){ document.location = '/cron?time=' + new Date().getTime() } setTimeout('reload()', 1000); < /script >")

Step 3: Map some url to the cron job class like “/cron” below.

def main():
application = webapp.WSGIApplication([('/cron', CronJob), ('/', MainPage)], debug=True)
wsgiref.handlers.CGIHandler().run(application)

The following code snippet is the key idea:

self.response.out.write("< script >function reload(){ document.location = '/cron?time=' + new Date().getTime() } setTimeout('reload()', 1000); < /script >")

What the above does is that it causes the browser requesting this page to periodically poll the cron url. The cron can be started by pointing your browser to http://yourapp.appspot.com/cron. Opening multiple browser sessions/tabs to the same url spawns multiple cron processes which complete the cron tasks in parallel. You will always need to have atleast one browser tab pointing to this url at all times.