Search Engine-inspired Ranking Algorithm for Trading Networks

ABSTRACT


813
shown in the next section, the trading networks are similar to each other.So by choosing much smaller networks with clear classification of goods and almost fixed price for the same product, the analysis of adjacency matrices of network becomes much easier.Not to mention that the matrices are not sparse, so some manipulation steps to ensure the convergence of the matrices can be avoided (even though the proposed model is written with assumption that the matrix is sparse).For the sake of algorithm testing it provides us with the best condition.
Actually there are also some issues in the online trading networks that prevent it from being good test datasets.For example in an auction network, which is the best example of trading networks where the users are free to buy and sell goods (so each users can have both inlinks and outlinks), the number of sold goods are too diverse in both type and price to allow any classification works.Consequently, it is difficult to infer that goods in the same class are more similar than goods from other class.And if it is the case, there is no point to use this classification as the base for users clustering.And if classification cannot be used, adjacency matrix for each kind of goods must be constructed, which is a really difficult task because there will be too many sparse matrices for one network with only one, or two nonzero entries (number of identical goods bought by a user).And for other type of trading networks like online shopping where there are two kind of nodes, buyers and sellers, the networks become bipartite graphs so ranking and clustering tasks become different problem, which is beyond the scope of this paper.

TRADING NETWORKS
The usual way to calculate the degree of importance of nodes in a trading network is by using total amount of export/import of particular goods.This method, however, fails to capture the link structure of the network; to which nodes a node connect to and being connected to.For example the same amount of export to an insignificant country and to an important country will give the same weight to ranking scores.This problem actually had ever occurred in WWW network, where the methods of only calculating content scores of web pages were no longer adequate to deal with users' satisfaction and accuracy of the queries response in the fast growing WWW network environment.The solutions of this problem were proposed independently by Brin and Page [5,6] and Kleinberg [7].Both solutions use link structure of WWW network to improve the quality of web search.
In PageRank, the important pages are the pages with many inlinks and a few or no outlinks [8, pp.32].And HITS, instead of producing only one score, proposes to use two scores; authority and hub scores.Good authorities are pointed to by good hubs and good hubs point to good authorities [8, pp. 115].The final link structure scores are obtained by combining these scores (in web search purpose, usually only authority scores are used).
Even though there are already good ranking algorithms that deal with link structure of the networks, PageRank or HITS cannot simply be used because the nature of trading networks and WWW network is different.Each nodes in trading networks has at least one type of resource before any transaction can occur.The links addition happens when two nodes with different type of resources exchange their resources.Thus, the amount of resources limits number and weight of links that a node can have.In WWW network, links addition is simply by putting new hyperlinks on web pages, so there is no resource needs to be allocated in creating new links.Another important point that differentiates these networks is links addition in trading networks is mutual process, if the first node creates a new link to the second node, the second node also creates a new link to the first node.This is not the case in WWW network.Further, links attachment purpose in trading networks is to maximize the benefit of the transactions.Thus, in the export side, each nodes competes to get transactions from other nodes that lack of the resource it offers, and in import side, it competes to get resources from other nodes that have abundant resource it needs.In WWW network, the links attachment is to get inlinks from popular pages (pages with many inlinks) and the popular pages will likely to get more inlinks.Figure 1 shows the differences between trading network and WWW network where in trading network the process of links addition is mutual, and the links are different in type and weight, which describes the nature of transaction.In WWW network the links that connect page A and B are hyperlinks, which when A has a hyperlink to B, it doesn't necessary that B has a hyperlink to A also.

PROPOSED ALGORITHM
In trading networks, every nodes should be careful in making new inlinks and outlinks due to the needed resources.Each nodes competes to get inlinks from other important nodes (nodes with abundant number of resources that competing nodes need) and competes to make outlinks to the other important ones (nodes that need resources from competing nodes) by considering the cost.
Due to the nature of trading networks, none of the previous discussed web ranking algorithms are suitable.PageRank which focuses on inlinks clearly cannot be used in the environment where inlinks as well as outlinks are highly regarded.HITS is more interesting than PageRank, because it accommodates both inlinks and outlinks.But by definition, in HITS's a node should establish new outlinks to others with many inlinks, and should receive inlinks from others with many outlinks.In the context of our problem, where making and receiving new links can be expensive, it is more appropriate to make new outlinks to nodes that have many outlinks and receiving inlinks from nodes that has many inlinks, because receiving inlinks means getting resources and creating outlinks means giving up resources.Figure 2 shows the links addition process where in trading network, A prefers B (node with many outlinks therefore lack of resource) when making a new outlink and C (node with many inlinks therefore full of resource) when looking for a new inlink.This preferential is opposite to WWW network.Proposed algorithm is defined with the following statement: a node becomes more important if being pointed to by others that have many inlinks and points to others with many outlinks.And further, by comparing the process of links addition as shown in fig. 2 and HITS model [8, pp. 115], this definition can be written into following equation.The logic behind above equation is: ranking score of a node, r(n i ), depends on the ranking scores of others that point to it (r(n j ) where j→i, the first term of the right hand part) and the nodes that it points to (r(n j ) where i→j, the second term of the right hand part).The rests of the right hand part function as the constants that depend on the number of inlinks and outlinks of each nodes, where for authority / hub part the bigger the number of inlinks / outlinks and the smaller the number of outlinks / inlinks, the larger the constants become.So, the above equation agrees with proposed algorithm definition.The calculation of nodes' ranking scores can be done in two different ways, the first is by using direct method by inspecting the linear system property of the equation [8 pp.71-74] and the second is by using iteration process (power method), a common method in calculating ranking vector for web pages.For small network the first method is preferable because it much faster than power method.As the network getting bigger, only second method is viable.In this paper, however, second method is used because we want to compare convergence property of proposed algorithm to PageRank and HITS.
We will modify eq. ( 1) into matrix form not only to allow property of network being seen from linear algebra perspective but also to ensure power method applied to the adjacency matrix converges by adjusting it into a stochastic and primitive matrix.Let M = βF+(1-β)G, where F = KD -1 D i L is the authority part which describes fraction of scores a node receives from its inlinks, and G = K -1 D -1 D o L T is the hub part which describes fraction of scores a node receives from its outlinks.And L is N×N the adjacency matrix of the network.Thus, eq. ( 1) can be rewritten as: where k = 0, 1, 2, ... denotes the iteration process of the algorithm, diagonal matrices D i , D o and D are defined as: (3) and K is a diagonal matrix with diagonal entries defined as: T is outlink vector of node i.To ensure the power method [9] converges to a positive and unique dominant eigenvector of matrix M, two adjustments are needed.The first is stochasticity adjustment; normalizes all nonzero rows of M and then fills zero rows by 1×N positive real vectors which have 1-norm equals to one.Usually, each entry of these vectors is set to 1/N.Let e T is a 1×N row vector which each of its entries is one and c is a N×1 column vector which its i th row is set to 1 if row i of M is zero row, and 0 otherwise.Then stochastic version of matrix M is: S = M + (1/N)ce T .And the second, primitivity adjustment is done by replacing each zero entries of S with a small positive number; P = αS + (1/N)(1-α)ee T , where 0 < α < 1 is a parameter that control the amount of error (ee T ) introduced to matrix P. Thus, eq. ( 1) can be written in more compact form as: for initial condition r T (0) = (1/n)e T , until error of the process ||r T (k+1)r T (k)|| 1 is smaller than desired error.Note that instead of using 1-norm termination criterion, the comparison between previous rank and current rank order can also be used to terminate the iteration process [10][11][12][13][14][15]

NUMERICAL RESULTS
Because P is stochastic and primitive, the power method applied to it converges to a unique positive vector called stationary vector for any starting vector [88, pp. 36].So the problem left is "will it converge to something that makes sense in the context of trading networks?".We try to answer this question by measuring the similarity between vector of proposed algorithm with standard measure, vector of total export import.
The data used in the experiments is international trading data from United Nations [3,4] where the nodes are the countries that involved in the export and import activities, and the links are the flow of the products.The computation performance of the proposed algorithm is measured by comparing the number of iterations it needs to achieve a desired error to the results of HITS and PageRank (note that it is only used for performance comparison, not for results comparison).In the experiments termination criterion is set to 10 -8 and β is set to 0.5.The number of iterations is chosen instead of computational time because the size of trading networks is very small, so power method applied to the data produces negligible computational time.Then similarity measures, (1) cosine of the angle between ranking vector of proposed algorithm (u) and vector of total export import (v),


Figure 1.The Differences Between Trading Network (Left) and WWW Network (Right)

Figure 2 .
Figure 2. Links Addition Process in Trading Network (Left) and WWW Network (Right).
i ) is the ranking score of node i, |*| denotes absolute value of *, i → j denotes that node i links to node j, and ∑n j inlinks / outlinks / links denotes the number of inlinks / outlinks / links node j has.The first term of right hand part is defined as authority part and the second one as hub part of corresponding node.Parameter β (0 ≤ β ≤ 1) is used to determine which links are more important.If outllinks and inlinks are equal set β = 0.5, if outlinks are more important set β < 0.5, and β > 0.5 otherwise.

Figure 3 .
Figure 3.A Schematic Explanation of the Differences Among Algorithms; Pagerank (Left), HITS (Center) and Proposed Algorithm (Right).

Table 1 .
Table 1 gives summary of the results.The Performance of the Proposed Algorithm Search Engine Inspired Ranking Algorithm for Trading Netrworks (Andri Mirzal) 817