That certainly was a tough question to answer, but not anymore. Whatsbuzzing, which released a few days back (reminds me of the sleepless nights :-)), helps you do just that. The brainchild of Anand Jagannathan, Whatsbuzzing is aimed at solving the online shopping woes of users. As described in Anand’s blog:
Whatsbuzzing is a destination site for online shopping. The site offers a one-stop service where consumers can browse across hundreds of storefronts, view the latest trends and find the hottest deals. In contrast to comparison shopping or product information sites, Whatsbuzzing provides visitors with the experience of a shopping mall. Visitors can browse storefronts by content, category or store name. A visitor can also tag storefronts so other consumers can find storefronts that are interesting. The storefronts are fully interactive and are constantly being updated with fresh content and timely offers.
As stated above, what makes it different from the plethora of shopping services is the unique content. Instead of just showcasing product details and prices, it also helps you keep track of the latest deals/discounts/offers – capturing the buzz in its true essence.
Another factor that makes it stand apart is its foray into being a browsing engine as compared to the omnipotent search engines. Although search is an integral part of Whatsbuzzing, it is just another feature to help assist the users to find products quickly.
It is surely the panacea to all my shopping woes. With the season of Christmas setting in why don’t you give it a try and come back with some feedback.
Posted in Search, Social Networks, Web 2.0, shopping | No Comments »
Many atimes we index fields in a document which contribute only to classify/distinguish documents and not to its relevance. An analogy would be documents in a library. Here ‘category‘ could be the field which classifies the domain within which the document belongs. Therefore a typical text search query would go as:
content:neural network AND (category:biology OR category:AI)
Everything seems to work fine. Well not yet. In the above query we are trying to retrieve documents contaning the words ‘neural network‘. But if you look closely (try getting an explanation of the score in lucene), although the category sub-query seems to be used only for limiting the range of documents to particular domains, it contributes to the relevance as well.
So you must be wondering “How do I get documents from Biology or AI with ranking based on their relevance with ‘neural network’?”. Here is how. You dont need to hack around the lucene source code. All you have to do is to give a nullifying boost (thats a cool oxymoron (-;) to the respective sub-query. By nullifying boost I mean, a boost value so small that in effect nullifies the score of the sub-query (something like 0.00001). Therefore the revised query would look like:
content:neural network AND (category:biology OR category:AI)^0.00001
Thus although the category sub-query is a must match for a document, inorder to be a part of the resultset, it does not contribute to the score of the document. I like to term such queries non-relevant booleans. Non-relevant as it does not contribute to relevance and boolean as in the condition (AND or OR) as per which it must match.
This lets us harness the querying capabilities of a database from within a search engine.
[UPDATE] A nullifying boost of zero would be the ideal case wherein you don’t want the subquery to contribute to the score at all. A non-zero value for the same would give you more control over the subquery’s contribution to the score.
Posted in Lucene, Search | 2 Comments »
You’ll be quite surprised to find out about how Lucene actually expands your range queries. As pointed out by Simon, range queries are enumerated for every possible value in the given range. Now ain’t that naive >-:. This limits the range to about 1024 values. Simon also points out a possible solution for dates by indexing them as strings of the form ‘yyyymmdd’.
I tried doing the same on one of my recent projects where I was indexing dates as strings ‘yyyymmdd’. But when I actually had a look at my expanded query via Limo, I found Lucene enumerating for string range queries as well.
Apparently this is not a bug nor even a feature but a “known behaviour”.
Posted in Lucene, Search | No Comments »
One thing I discovered recently (coz Google always seems to satisfy my search query in about the first 10 pages) was that Google doesnt let you browse search hits beyond the first 1000 hits ie about 100 search result pages. Google probably thinks that if aint in the first 1000 it aint worth searching for eh!

Posted in Google, Search | No Comments »