Archive for category Project
Reading Less Is Reading More
Posted by Anand Kishore in Natural Language Processing, Project on October 7th, 2009
If information is what drives you to the internet, like me, you might be spending roughly 60-70% of your time online reading blogs, news and feeds (not to forget twitter). For me at least, reading online has superseded email (and updating social networks) as the most time consuming activity. And yet everyone is busy generating more content rather than finding a solution to consume all this information. We are trying to tackle this problem precisely with Dygest. At its core Dygest is a summarization engine that tries to sift through all the noise and present only the *real* content/news contained in any (news) article/text. Recently, we released an experimental version of a feed summarizer that uses the Dygest engine to summarize blogposts/news for any RSS/ATOM feed. This summarized feed can be subscribed in any feed reader like Bloglines, Google Reader etc.
NOTE: A feed that has not been encountered by our system ever before should be summarized in a couple of minutes.

On the whole with Dygest, reading blogs has now become much faster, much more concise and consuming information has become a great deal easier. Imagine the time saved reading the summarized version as compared to the original post (also you are not overwhelmed with useless information). See for yourself below:
Original Post
Summarized Post
While you might have the urge to head over to Dygest and summarize your entire subscription list on Google Reader, I would recommend reading this post a bit further for some real cool stuff we have in store. If you must though – click here to Dygest.
Summarizing Your Twitter Links
Readtwit is a really cool service launched recently, which extracts links from your twitter feed and packages them in a clean RSS format. The awesome combination of Readtwit along with Dygest yields a summarized twitter feed delivered to your favorite feed reader.
Steps to get a summarized twitter feed:
(1) Sign into Readtwit.
(2) Copy the link on the ‘Get me the feed’ button:

(3) Paste this link into the Dygest interface and subscribe to the summarized feed returned in your favorite feed reader.

More To Come
This is just an experimental release of Dygest and so do send in your feedback on the summaries and help us improve. In the coming months we are working on improving the algorithms and churning out other great applications of Dygest (there is something really cool in the works). So while we are busy teaching computers to read, Dygest your feeds – because reading less is reading more.
Follow us on twitter – @dygest
Dygest Your Search
Posted by Anand Kishore in Hacking, Natural Language Processing, Project, Search, Web, Yahoo! on March 19th, 2009
Update: This hack won the coveted ‘Search’ category award.
For the last couple of days, I and @sudheer_624 have been busy working on this hack for a Yahoo! Hackday. Although still a prototype, the hack has turned out to be interesting so we thought of putting it out for others to play around with.
Dygest (pronounced as ‘digest’ – thanks to @bluesmoon) is aimed at changing the conventional way of displaying search context via a snippet to a more informative, machine generated document summary. There two kinds of relevance for evaluating search results:
- Vertical relevance: determined by the ranking algorithms.
- Horizontal relevance: the contextual information made available to the user about the result – Searchmonkey is a good initiative on this front.
The current way of displaying this context is via a snippet of text under every result. This snippet shows the neighborhood of the occurrence of the query terms. Usually this information is not rich enough for a searcher to make the right judgement about the result. This causes the searcher to switch back and forth between the documents and the search results if the the page is not relevant. This can be frustrating at times.
Dygest aims to solve this by either replacing or enhancing the current search snippet with a summary of the result page. At its core lies a summarization engine which figures out what the *real* content of the page is (distinguishing it from the other junk like surrounding text, navigational text, comments etc) and then performs text summarization on this content. The summary of the page is then displayed to the user via the appropriate interface. How cool is that?
The user no longer needs to click on irrelevant links. He/She can perceive the theme/important facts of the page from right within the results page. The other advantage of this is that it gives the user a good overview of the query topic – he no longer needs to spend time reading many long documents but rather read a few summaries from the top results to get a good overview of the subject. This is particularly well suited for mobile devices where its frustrating to switch back and forth between pages and the search results. This is also fit for news articles where we just need the important facts about the story.
Well, here is an example to convince you. A search for ‘Carol Bartz’ yields the following result which at the first glance is not at all informative.
Enhancing the existing view with an abstract of the page helps gauge the content and theme of the document. This would now look like:
Dygest outputs the following summaries for the query ‘Iran‘ restricted to Yahoo! News:

And following for ‘Obama stimulus plan‘:

Currently, Dygest has two interfaces – (1) a search interface powered by yahoo boss and (2) a searchmonkey plugin. Its just a prototype so be kind and don’t be too judgmental.
Start dygesting here.

recent comments