Archive for the 'Tagging' Category

Particle - on the way to a Findory

Friday, July 11th, 2008


Although I started this project as an experimental weekend thingy (to play around with Google App Engine), the project has shaped up well. Before you surf over to another blog, wondering what the hell I’m talking about, let me introduce you to “Personalized ARTICLE” aggregator (read as PARTICLE). The aim is to personalize a users online reading (just like what Findory did). Findory was an excellent service and I’ll be glad if I can achieve even an iota of what Greg created. This project is at very rudimetary and experimental stage. Rather than tapping into the users reading history on the site (monitored by the links clicked), the idea is to study how a users *interests*, scattered around at various “databases of interest” like del.icio.us, could be used to personalize online reading (news articles, blogs and more). This way the user could merrily browse the world wide web, bookmarking pages, doing his usual stuff and let PARTICLE worry about making this data useful.

Click here to try PARTICLE

Presently you need to provide PARTICLE with your del.icio.us username, which it uses to analyze your *interests* and present you with recent news stories you may like. It works well if you have a decent number of bookmarks in del.icio.us. As I mentioned, the project is at a very rudimentary stage, so don’t feel disappointed by the results (ah! the unlucky few). I encourage you to play around with the app and recommend it to others to try. I’ll be making many changes/additions in the coming weeks.

Test drive PARTICLE at http://particle.semanticvoid.com. Kindly leave your feedback/comments/suggestions in the comments or send me an email at ‘anand at semanticvoid.com’.

[UPDATE] Yahoo! Research has a similar project called Garçon.

Tag Search: Inferring Relevance From User Authority

Sunday, March 4th, 2007

Search has always been an integral part of any tagging system. Such systems need to make sense out of the abundant user generated metadata such that the documents/items can be ranked in some order. However, very little has been said or written openly about such ranking algorithms for tagging systems.

Conventional Methods

Most systems, that allow tag search, base their rankings on factors like simply the ‘number of unique users’ or on ratios like ‘number of unique users for tag t / number of unique users for all tags’ etc. These conventional algorithms do work, but not quite so well for large datasets where they can be exploited. They also often do not represent the true relevance. Reminds me often of the pre-PageRank era of information retrieval systems.

So, which relevance algorithm do I use?

Well, you can always use the conventional methods, but then you can always try the algorithm I devised. This algorithm seems to capture the true essence of relevance in tagging systems. I call it the WisdomRank as it is truly based on the ‘wisdom’ of the crowds, the fundamental part of any social system. Read along to understand it in detail (or download the pdf).



Inferring relevance for tag search

from user authority – Abstract

Tagging is an act of imparting human knowledge/wisdom to objects. Thus a tag, a one word interpretation/categorization of the object by the user, fundamentally represents the basic unit of human wisdom for any object. This wisdom is difficult to quantify as it is relative for every user. One approach to quantify this would be to use the wisdom of the other users to define this for us. This can be done by assuming that every tag corresponds to a topic for which every user has some authority. Also, every tag added to an object corresponds to a vote, similar to the Digg model, asserting that the object belongs to that topic (tag).

Let us consider a user Ui who has tagged object Oj with the tag Tk. Whenever other users in the system tag Oj with Tk, they are implicitly affirming Ui’s wisdom for tag Tk.

Thus, we define the function affirmation for the tuple(u, d, t) as the number of other users who have also tagged document d with tag t:

affirmation(u, d, t) = ∑i=All users except ‘u’ tagged(ui, d, t)

where,

u – the user
d – the document/object
t – the tag
tagged – 1 if the user Ui has tagged d with t
- 0 otherwise

Hence, we can proceed to define the wisdom of the user for a topic (tag) t as the sum of all such assertions by other users,

wisdom(u, t) = ∑x=For all documents d tagged with tag t by U affirmation(u, d, t)

Likewise, we can now define the authority of a user for the topic t, as the ratio of the user’s wisdom to the collective wisdom for t. Hence,

authority(u, t) = wisdom(u, t) / ∑ wisdom(ui, t)

For example: Let us determine the authority of user u1 for tag t1

Object d1: Object d2: Object d3:
t1 by u1
t1 by u1 t1 by u2
t1 by u2
t3 by u1 t1 by u3
t1 by u3
t3 by u1
t2 by u1

affirmation(u1, d1, t1) = 2 affirmation(u1, d2, t1) = 0
Hence, wisdom(u1, t1) = 2

Likewise for other users,

wisdom(u2, t1) = 3
wisdom(u3, t1) = 3

Hence the authority of user u1 for t1 is as follows:

authority(u1, t1) = 2 / (2 + 3 + 3) = 2 / 8 = 0.25

Whenever a user tags an object with a tag, he does so with the authority he possesses for that tag. Thus as compared to conventional methods, where the objects are usually ranked on the number of instances of the tags, in this method the measure of the relevance of a tag for an object is equivalent to the sum of all such user authorities. Thus,

relevance_metric(d, t) = ∑i= all user who have tagged document d with t authority(u, t)

This relevance score, when calculated for every tag would provide an accurate measure for ranking the objects. As compared to the conventional methods where more number of instances of a tag for an object ensured a higher relevance for that tag, here the number of authoritative users counts.

Let us consider the following example:

Object d1: Object d2:
t1 by u1
t1 by u2
t2 by u5
t1 by u3
t1 by u4

Let us assume that u1 has a very high authority for tag t1. Hence in the above scenario, a search for tag t1 may rank d1 higher than d2, if

authority(u1, t1) > authority(u2, t1) + authority(u3, t1) + authority(u4, t1)

This result is with the assumption that u1’s authority is greater than those of u2,u3 and u4 combined.

On the other hand, d2 would be ranked higher than d1 if the combined authorities of u2, u3 and u4 exceed that of u1. If the majority of the users are suggesting something, it indicates that their suggestion is far more valuable than that of an individual user or a subset of users.

Future Enhancements

While calculating the user assertions this algorithm currently considers all such users as equal even though they may have varying authorities for the corresponding tag. As a future enhancement, I plan to incorporate the authorities of the users as well into the affirmation calculations.