Archive for the Tagging Category

The Evolution Of Tagging

| May 18th, 2006

The Present And The Future

Tagging has been there for quite sometime now, although it seems to be picking momentum after the Web2.0 meme. But the question one needs to answer is that “Has tagging evolved?“. It has yet to evolve out of its stone age era.

Tagging basically deals with organizing information retrieval. But yet current systems don’t seem to apply any of the information retrieval optimizations to it. It could prove useful and relevant if it was treated as a mere search rather than a whole new concept (the very reason why non-techie users are not lured into tagging). It would also prove to be more accurate if the IR preprocessing like stemming, synonym etc could be applied to it. The knowledge acquired in the process is very valuable due to the human intelligence behind it and can be exploited in many useful ways.
But tagging has evolved to some extent. It has evolved from single word tags to multi-word tags. It has evolved in terms of granularity from the Document ( to the Content (recoja, Google Notebook).

What we need to focus on is what more can be done with it (tagging) rather than just replicate what already can be done with it. What do you think could be the evolutionary steps in tagging?

Something which was bound to happen. After all even Google has to rely on human intelligence to do some of its work. Google Coop helps people contribute their expertise (in other terms bookmark links) by adding labels (categories) and annotating (description) them. But it goes way beyond the mode by using this information in improvising search.

In all the forums I visited I came across this one point again and again: “Google Coop is susceptible to spammers“. I don’t agree. Like any other social app which is fuelled by the people, at the first glance it does seem susceptible. But it is the social factor that seems to ward of spammers. Quoting a FAQ from Google Coop:

Who will see my labels?

Users who subscribe to you will see your labels for relevant searches. As your labels become higher quality and more comprehensive, and as more users subscribe to you, your labels may start surfacing to more Google users than just those who explicitly subscribed. A number of factors help determine how broadly your labels appear — such as the number of subscribers you have, how many websites you’ve labeled, and, most importantly, how often your labels make it easier for users to find what they’re looking for.

Google seems to have realised that it can achieve a lot more by utilizing the human intelligence, Intelligence which is the core of the Web 2.0. After all the ubiquitous search is also based on human knowledge (creation of links between pages).

Ever thought of how the collection of tags with varying fontsizes (known as Tag Cloud) populated. As I say ‘theres an algorithm for everything’, theres an algorithm for this too. Assuming you know all about tag popularity (if not refer previous post) I’ll go ahead explaining it.

The distinct feature of tag clouds are the different groups of font sizes. Now the number of such groups desired depends entirely upon the developer. Usually having six such size-groups proves optimal.

Assume any suitable metric for measuring popularity (for instance ‘number of users using the tag’). We can always obtain the max and min numbers for the same. For example:

max(Popularity) = 130
min(Popularity) = 35

Therefore we can define one block of font-size as :
( max(Popularity) – min(Popularity) ) / 6

For the above values we get one such block range as (130 – 35) / 6 = 15.83 ~ 16
Font-sizes therefore could be bound as follows:

Range Font-Size
35 to 51 1
52 to 68 2
69 to 85 3
86 to 102 4
103 to 119 5
120 to 136 6

Thats as easy as it can get.

Calculation Of Tag Popularity

| January 2nd, 2006

Determinig the popularity of tags has very fluid solutions which keep changing from application to application. But in general one metric that can be used is the number of unique items tagged using the particular tag. Secondly another metric that is the number of unique users using this tag could also be used. I’ve come up with a formula that encompasses both of these:

( Usage Count / Number of tagged Items ) * ( User Count / Number of Taggers )

Usage Count (UsgCnt) : the number of unique items having the tag.
Number of tagged Items (NTI) : the total number of items having atleast one tag (i.e. items participating in tagging)
User Count (UsrCnt) : the number of users using this tag.
Number of Taggers (NOT) : the total number of users participating in tagging.

Case 1:
UsgCnt = 15, NTI = 40, UsrCnt = 2, NOT = 20
Popularity = 0.0375
Note: This represents a case in which the two users may be trying to spam the system by tagging many items by the specific tag.

Case 2:
UsgCnt = 15, NTI = 40, UsrCnt = 9, NOT = 20
Popularity = 0.1685
Note: Here we clearly see that as the number of users using this tag increases the popularity increases as well (suggesting no spam but folksonomy).

Case 3:
UsgCnt = 15, NTI = 40, UsrCnt = 1, NOT = 1
Popularity = 0.375
Note: Here it can be noted that if there is only one user in the system the popularity becomes independent of the user ratio and depends entirely on the tagged items ratio.

Case 4:
UsgCnt = 40, NTI = 40, UsrCnt = 10, NOT = 20
Popularity = 0.5
Note: In this case if all the messages in the system are tagged using the specific tag (UsgCnt = NTI ) the popularity depends entirely on the number of users using this tag.

This gives a fairly rough idea of tag popularity calculation.