A PARALLEL LOOP SCHEDULING ALGORITHM BASED ON THE SMARANDACHEf-INFERIOR PART FUNCTION

. This article presents an application of the inferior Smarandache f-part function to a particular parallel loop-scheduling problem. The product between an upper diagonal matrix and a vector is analysed from parallel computation point of view. An efficient solution for this problem is given by using the inferior Smarandache I-part function. Finally, the efficiency of our solution is proved experimentally by presenting some computational results.

Parallel programming has been intensely developed in order to solve difficult problems that contain either a big number of computation or a large volume of data.
These often occur both in real word applications (e.g.Weather Prediction) or theoretical problems (e.g.Differential Equations).Unfortunately, there is not a standard for writing parallel programs; this depends on the parallel language used or the parallel platform on which the computation is performed.A common fact of this diversity is represented by easiness to parallelise loops.Loops represent an important source of parallelism occurring in at most all the scientific applications.Many algorithms dealing to the scheduling of loop iterations to processors have been proposed so far.

I.Introduction
Consider that there are p processors denoted in the following by PI, P2, ... , Pp and a single parallel loop (see Figure 1.).We also assume that the work of the routine loop_body(i) can be evaluated and is given by the function w: N ~ R, where w(i) = Wi represents either the number of routine's operations or its running time (presume that w(O)=O).The total amount of N work for the parallel loop is L w(i).The efficient loop-scheduling algorithm i=! distributes equally this total amount of work on processors such that a processor receives a quantity of work equal to ~. f w(i) .

P i=!
Let I j and h j be the lower and upper loop iteration bounds, j = 1,2, ... , p, such that processor j executes all the iteration between 1 j and h j.These bounds are found distributing equally the work on processors by using Moreover, they satisfy the following conditions 11 = 1.
(2.a) if we know 1 j' then h j is given by t w(i) ==.!... f w(i) = W (2.b) (2.c) Suppose that Equation (2.b) is computed by a less approximation.This means that if we have the value 1 j' then we find h j as follows: (3) In the following, we present an optimal parallel solution for the product between an upper diagonal matrix and a vector.This is an important problem that occurs in many algorithms for solving linear systems.The Smarandache inferior part function is used to distribute equally the work on processors.Proof The proof is obtained by starting from the double inequality L i :S x < L i .

The Smarandache Inferior Part Function
Observe that the equation x> 0 has only one positive root given by 2 Thus, the equation for the Smarandache i-inferior part is ia , then the Smarandachef-inferior part is given by Proof We use the Cardano equation for solving x 3 + px + q = O.A real root of this equation is given by ( 7) The equation = x > 0 is transfonned as follows: ¢::> (apply the transfonnation k = y -.!.) ¢::> i _.!.. y -3• x = O.

4
Applying Equation ( 7), we find that ( ) The SrnarandacheJ-inferior part is given by: 3. An Efficient Algorithm for the Upper Diagonal Matrix-Vector Product In this section, we present an efficient algorithm for the product y = a .x between an upper diagonal matrix a = (a i • j )i,j=l.nEM n (R) and a vector XE Rn.This problem is quite important occurring in several other important problems such us solving linear systems or LUP matrix decomposition.
Because a is an upper diagonal matrix, the product y = a .x is given by The product can be computed in parallel by using a simple computation shown below.
DO PARALLEL i=l,n END DO END DO Figure 2. Parallel Computation for the Upper Matrix -Vector Product.
For this parallel loop we have the following elements: • The work of iteration i is wci) = i,i = 1,2, ... ,n; the total work is ~>= n'(n+I).

2•p
The difficult problem for the efficient loop scheduling algorithm is how Equation (I) is implemented.To find the upper bounds from this is quite expensive and can be done in O(logn +~) [Jaja].But, we want to find the upper bounds in at most O(p) p complexity and we show that this is possible for our problem.For that we use the following theorem (9) Proof The Smarandache [inferior part function presented in Theorem 1 is used to obtain the proof.We found that if Since each processor receives a quantity equal to W = n' (n + 1) , we find that the 2•p firstj-I processors have received approximately (j -1)• W. Thus, the upper bound of processor j is the biggest number k such that all the previous work done by processors I,2, ... j should be approximately equal to j. W . Mathematically, this can be written as follows 1 + 2 + ... + h j :5 j.W < 1 + 2 + ... + h j + (h j + 1) ¢:::> 1 ~I 4 .n' (n + 1) -+ +.J .--'----=p 2 , j = 1,2, ... , P A more rigorous and technical explanation can be found in [TabiJ.
• According to this theorem, the efficient scheduling is obtained using the upper bound from Equation (9).These bounds certainly give the better approximation of Equation 1.Thus, the part of parallel loop scheduled on processor j is presented in Figure 3.
This processor computes all the sums of Equation ( 8) between h j_1 + 1 and h j •

Computational Results and Final Conclusions
This section presents some computational results of scheduling the parallel loop from The inferior part function (sometime is named the floor function) [,]; R ~ Z , defined by [xl = k ¢:::> k :S x < k + 1, is one of the most used elementary functions.The Smarandache inferior part function represents a natural generalisation of the floor function [Smara1].Smarandache proposed and studied this generalisation especially in connection to Number Theory functions [Smara1, Smara2].In the following, we present equation for some Smarandache inferior part functions.Consider J: Z ~ R a function that is strict increasing and satisfies lim J(n) =-00 " .....and lim J (n) = 00.The Smarandache J-inferior part function denoted by J[] : R ~ Z " .....is defined by J[] (x) = k ¢:::> J(k):S x < J(k + 1) .
function J[] is well defined because of the good properties off When J(k) = k the floor function [x] is obtained.In the following we study the Smarandache Jk inferior part function when J(k) = ~>a .j=1 Remark.Sometime, we will study only the positive inferior part by considering function J: N ~ R, J (0) = 0 .In this case, we only consider J[] : [0,00) ~ Z .k Theorem 1.If J(k) = .~>,then the SmarandacheJ-inJerior part is given by j=1 (5) k k+l of work received by a processor should be approximately equal to W = n .(n + I)

Figure 3 .
Figure 3.In order to find that the proposed method is efficient from the practical point of view, two other scheduling algorithms are used.The first scheduling algorithm named uniform scheduling, divides the parallel loop into p chunks with the same size

Table 1 .
Computational Times for three Scheduling Algorithms.The first important remark that can be outlined is that there is no way to develop efficient methods in Computer Science without Mathematics and this article is a prove for that.Using a special function named the Smarandache inferior part, it has