Archive for the Social Networks Category

A great set of slides from Bryce Glass’s talk on Reputation System Designs at the IA Summit. Bryce Glass is a senior interaction designer on the Y! OS social team.

Search has always been an integral part of any tagging system. Such systems need to make sense out of the abundant user generated metadata such that the documents/items can be ranked in some order. However, very little has been said or written openly about such ranking algorithms for tagging systems.

Conventional Methods

Most systems, that allow tag search, base their rankings on factors like simply the ‘number of unique users’ or on ratios like ‘number of unique users for tag t / number of unique users for all tags’ etc. These conventional algorithms do work, but not quite so well for large datasets where they can be exploited. They also often do not represent the true relevance. Reminds me often of the pre-PageRank era of information retrieval systems.

So, which relevance algorithm do I use?

Well, you can always use the conventional methods, but then you can always try the algorithm I devised. This algorithm seems to capture the true essence of relevance in tagging systems. I call it the WisdomRank as it is truly based on the ‘wisdom’ of the crowds, the fundamental part of any social system. Read along to understand it in detail (or download the pdf).


Inferring relevance for tag search

from user authority – Abstract

Tagging is an act of imparting human knowledge/wisdom to objects. Thus a tag, a one word interpretation/categorization of the object by the user, fundamentally represents the basic unit of human wisdom for any object. This wisdom is difficult to quantify as it is relative for every user. One approach to quantify this would be to use the wisdom of the other users to define this for us. This can be done by assuming that every tag corresponds to a topic for which every user has some authority. Also, every tag added to an object corresponds to a vote, similar to the Digg model, asserting that the object belongs to that topic (tag).

Let us consider a user Ui who has tagged object Oj with the tag Tk. Whenever other users in the system tag Oj with Tk, they are implicitly affirming Ui’s wisdom for tag Tk.

Thus, we define the function affirmation for the tuple(u, d, t) as the number of other users who have also tagged document d with tag t:

affirmation(u, d, t) = ∑i=All users except ‘u’ tagged(ui, d, t)

where,

u – the user
d – the document/object
t – the tag
tagged – 1 if the user Ui has tagged d with t
- 0 otherwise

Hence, we can proceed to define the wisdom of the user for a topic (tag) t as the sum of all such assertions by other users,

wisdom(u, t) = ∑x=For all documents d tagged with tag t by U affirmation(u, d, t)

Likewise, we can now define the authority of a user for the topic t, as the ratio of the user’s wisdom to the collective wisdom for t. Hence,

authority(u, t) = wisdom(u, t) / ∑ wisdom(ui, t)

For example: Let us determine the authority of user u1 for tag t1

Object d1: Object d2: Object d3:
t1 by u1
t1 by u1 t1 by u2
t1 by u2
t3 by u1 t1 by u3
t1 by u3
t3 by u1
t2 by u1

affirmation(u1, d1, t1) = 2 affirmation(u1, d2, t1) = 0
Hence, wisdom(u1, t1) = 2

Likewise for other users,

wisdom(u2, t1) = 3
wisdom(u3, t1) = 3

Hence the authority of user u1 for t1 is as follows:

authority(u1, t1) = 2 / (2 + 3 + 3) = 2 / 8 = 0.25

Whenever a user tags an object with a tag, he does so with the authority he possesses for that tag. Thus as compared to conventional methods, where the objects are usually ranked on the number of instances of the tags, in this method the measure of the relevance of a tag for an object is equivalent to the sum of all such user authorities. Thus,

relevance_metric(d, t) = ∑i= all user who have tagged document d with t authority(u, t)

This relevance score, when calculated for every tag would provide an accurate measure for ranking the objects. As compared to the conventional methods where more number of instances of a tag for an object ensured a higher relevance for that tag, here the number of authoritative users counts.

Let us consider the following example:

Object d1: Object d2:
t1 by u1
t1 by u2
t2 by u5
t1 by u3
t1 by u4

Let us assume that u1 has a very high authority for tag t1. Hence in the above scenario, a search for tag t1 may rank d1 higher than d2, if

authority(u1, t1) > authority(u2, t1) + authority(u3, t1) + authority(u4, t1)

This result is with the assumption that u1’s authority is greater than those of u2,u3 and u4 combined.

On the other hand, d2 would be ranked higher than d1 if the combined authorities of u2, u3 and u4 exceed that of u1. If the majority of the users are suggesting something, it indicates that their suggestion is far more valuable than that of an individual user or a subset of users.

Future Enhancements

While calculating the user assertions this algorithm currently considers all such users as equal even though they may have varying authorities for the corresponding tag. As a future enhancement, I plan to incorporate the authorities of the users as well into the affirmation calculations.

Whats The Buzz Of The Shoposphere?

| December 11th, 2006

That certainly was a tough question to answer, but not anymore. Whatsbuzzing, which released a few days back (reminds me of the sleepless nights :-)), helps you do just that. The brainchild of Anand Jagannathan, Whatsbuzzing is aimed at solving the online shopping woes of users. As described in Anand’s blog:

Whatsbuzzing is a destination site for online shopping. The site offers a one-stop service where consumers can browse across hundreds of storefronts, view the latest trends and find the hottest deals. In contrast to comparison shopping or product information sites, Whatsbuzzing provides visitors with the experience of a shopping mall. Visitors can browse storefronts by content, category or store name. A visitor can also tag storefronts so other consumers can find storefronts that are interesting. The storefronts are fully interactive and are constantly being updated with fresh content and timely offers.

As stated above, what makes it different from the plethora of shopping services is the unique content. Instead of just showcasing product details and prices, it also helps you keep track of the latest deals/discounts/offers – capturing the buzz in its true essence.

Another factor that makes it stand apart is its foray into being a browsing engine as compared to the omnipotent search engines. Although search is an integral part of Whatsbuzzing, it is just another feature to help assist the users to find products quickly.

It is surely the panacea to all my shopping woes. With the season of Christmas setting in why don’t you give it a try and come back with some feedback.

Network Laws Round Up

| August 21st, 2006

Sarnoff’s Law

Sarnoff’s Law states that the value of a broadcast network is equal to the number of recievers (viewers/readers). Hence for a broadcast network of n users would have a value of n:

n

Metcalfe’s Law

Originally targeted for the Ethernet Metcalfe’s Law states that the value of a network is proportional to the square of the number of nodes (users) in the network. That is the value of a network having n users would be proportional to n2. Consider a network having 4 users:

Metcalfe's Law

As can be seen (in figure 1a) that each user can connect to every other user in the network which amounts to a total of n*(n-1) i.e. 4*(4-1) = 12 links

Removing the duplicate linkages it amounts to 4*(4-1) /2 = 6 as can be seen in figure 1b.

Ineffect it amounts to calculating the number of diagnols in the n-agon:

Reed’s Law

Reed’s Law states that the value of a network grows exponentially with the size of the network. It is based on the assertion that the value of a network is not measured by the number of users in it but rather by the number of sub-groups that can be formed by the users. Let us consider a network consisting of 3 users:

Reed's Law

As it can be seen in the above figure that the number of possible sub-groups in a network of n users can be calculated by as 2n. Hence a network of 3 users would have 23 = 8 posiible sub-groups. Now this value can be broken as consisting of:

  • sets with (number of users) > 1
  • sets with (number of users) = 1
  • empty set

Since the empty set and sets with only one user don’t count as a valid group the final estimation as per Reed’s Law turns out to be:

2n – n – 1

Refutation Of Metcalfe’s Law

Real world communication networks, in general, grow faster than the linear growth (Sarnoff’s Law) but much slower than the quadratic growth (Metcalfe’s Law). Hence this law states that the value of a communication network of size n grows like:

n log(n)

The basis of this law is that Metcalfe’s and Reed’s Law are based on the false assumption that all connections or all groups are equally valuable.

The defect in this assumption was pointed out a century and a half ago by Henry David Thoreau when he wrote:

“We are in great haste to construct a magnetic telegraph from Maine to Texas; but Maine and Texas, it may be, have nothing important to communicate.”

Zipf’s Law states that if we order some large collection by size/popu;arity then the value of the k-th ranked item will be about 1/k of the first one. Hence by Zipf’s Law the value of a collection of n items is proportional to log(n).

Now let us suppose that the incremental value that a person gets from other people being part of a network varies as Zipf’s Law predicts. Let’s further assume that for most people their most valuable communications are with friends and family, and the value of those communications is relatively fixed – it is set by the medium and our makeup as social beings. Then each member of a network with n participants derives value proportional to log(n), for n log(n) total value.

If we lose our fight against net neutrality, then this law would be more appropriate than Metcalfe’s Law, since then connections between certain nodes in the network would become more valuable than the others.

I wonder what law did Yahoo! and Microsoft use for evaluating the value of merging their IM networks. Any guesses?