You are on page 1of 5

University College, Dublin

Mathematical Physics Department

Master of Computational Science Degree 2004-2005

NUMERICAL ALGORITHMS
Dr Derek O’Connor

Laboratory Exercise No. 6: Complexity of Matrix Operations.

The standard mathematical definition of matrix multiplication, C = AB, where A is m × n, B is n × p, and


C is m × p is as follows :

n
X
cij = aik × bkj , i = 1, 2, . . . , m, j = 1, 2, . . . , p. (1)
k=1

Do the following

1. Using this definition, derive an expression, Nops (m, n, p), for the number of floating-point additions
and multiplications needed to form the matrix product C = AB,
2. Derive a similar expression Nops (m, n, p, q) for D = ABC, where A is m × n, B is n × p, and C is
p × q , and D is m × q.
Although matrix multiplication is associative, i.e., A(BC) = (AB)C, show that

Nops (A(BC)) 6= Nops ((AB)C), in general. (2)

That is, Nops (m, n, p, q) depends on the order in which the product ABC is formed or parenthesized.
3. Write a Matlabfunction C = function MatMult(A,B) that implements the definition above.
Test and compare this function with Matlab’s C = A*B for random square matrices of size
n = 250, 500, 1000.
4. Find 3 sets of values {m, n, p, q} that demonstrate clearly that the inequality (2) above is true
in general. Use both your function MatMult and Matlab’s C = A*B in this demonstration and
compare the results.
5. Calculate the mflops/sec for each of the tests above. Remember to give the machine parameters
with these rates.
note: When timing the operations above use Matlab’s cputime. Here is the help for this function :

CPUTIME returns the CPU time in seconds that has been used
by the MATLAB process since MATLAB started.

For example:
t=cputime; your_operation; cputime-t
returns the cpu time used to run your_operation.

The return value may overflow the internal representation


and wrap around.

See also ETIME, TIC, TOC, CLOCK

derek o’connor – October 10, 2008.


Lab. Exer. 6. Complexity of Matrix Operations 6.2

Solution Notes for Lab. Exercise No. 6.

Analysis
Standard matrix multiplication, C = AB, where A is m × n, B is n × p, and C is m × p is as follows :

n
X
cij = aik × bkj , i = 1, 2, . . . , m, j = 1, 2, . . . , p.
k=1

There are m × p elements cij and each requires the summation ai1 × b1j + ai2 × b2j + · · · + ain × bnj , which
requires n mults and n − 1 adds. Hence we get a total of 2mnp − mp = O(mnp) operations.
The matrix triple multiplication operation D = ABC, where A is m × n, B is n × p, and C is p × q, and
D is m × q, is defined in terms of the matrix pair multiplication above. This gives two possible orders of
multipliction :
D1 = (AB)C or D2 = A(BC).
Mathematically, D1 and D2 are identical, but computationally they are not. Using O(mnp) for matrix pair
multiplication we have

Nops (A(BC)) = Nops (R = BC)) + Nops (D = AR) = O(npq) + O(mnq)

Nops ((AB)C) = Nops (R = AB)) + Nops (D = RC) = O(mnp) + O(mpq)


We will drop the O(−) formalism and simple say that

Nops (A(BC)) = N1 = npq + mnq and Nops ((AB)C) = N2 = mnp + mpq.

It is very difficult to say in general when these two functions have different or equal values. A crude way of
getting some idea is to run this program

%=============== [kl,ke,kg, Equal] = MNPQ(low, high)===============%


% When do R1 = A*(B*C) and R2 = (A*B)*C have different op counts?
% N1 = npq+mnq N2 = mnp+mpq
% The matrices A,B,C are m*n, n*p, p*q.
%===================================================================%
function [kl,ke,kg, Equal] = MNPQ(low, high)
%===================================================================%
Less = zeros((high-low+1)^4,4); Equal = zeros((high-low+1)^4,4);
Great = zeros((high-low+1)^4,4); kl = 0; ke = 0; kg = 0;

for m = low:high
for n = low:high
for p = low:high
for q = low:high
N1 = n*p*q + m*n*q;
N2 = m*n*p + m*p*q;
if N1 < N2
kl = kl + 1;
Less(kl,:) = [m n p q];
elseif N1 == N2
ke = ke + 1;
Equal(ke,:) = [m n p q];
else
kg = kg + 1;
Great(kg,:) = [m n p q];
end;
end;
end;
end;
end;
%-------------------------- End of MNPQ(low, high) ------------------%

derek o’connor – October 10, 2008.


Lab. Exer. 6. Complexity of Matrix Operations 6.3

Running this program for low = 1 and high = 5 gives a total of 54 = 625 4-tuples (m, n, p, q) of which 290
give N1 < N2 , 45 give N1 = N2 , and 290 give N1 > N2 . Here are the 45 for which N1 = N2 .
1 1 1 1 2 1 1 2 3 1 1 3 4 1 1 4 5 1 1 5
1 1 2 2 2 2 1 1 3 2 2 3 4 2 2 4 5 2 2 5
1 1 3 3 2 2 2 2 3 3 1 1 4 3 3 4 5 3 3 5
1 1 4 4 2 2 3 3 3 3 2 2 4 4 1 1 5 4 4 5
1 1 5 5 2 2 4 4 3 3 3 3 4 4 2 2 5 5 1 1
1 2 2 1 2 2 5 5 3 3 4 4 4 4 3 3 5 5 2 2
1 3 3 1 2 3 3 2 3 3 5 5 4 4 4 4 5 5 3 3
1 4 4 1 2 4 4 2 3 4 4 3 4 4 5 5 5 5 4 4
1 5 5 1 2 5 5 2 3 5 5 3 4 5 5 4 5 5 5 5

The main point here is that for most 4−tuples (m, n, p, q) we have N1 6= N2 , which prompts the question :
How do we decide on A(BC) or (AB)C ? The obvious answer is to calculate N1 and N2 before we do the
computations.

Calculating Ak . The naı̈ve calculation R = A × (A × (· · · × A(A × A)) · · ·) requires (k − 1)n3 ops. If


p
k = 2 there is a better way :
p p−1 p−1
A2 = A × A, A4 = A2 × A2 , . . . , A2 = A2 × A2 ,

which requires p multiplications or pn3 = log2 k n3 ops. The Matlabprogram is trivial

R = A; for i = 1 : log2(k), R = R ∗ R; end;

The naı̈ve program is


R = A; for i = 2 : k, R = A ∗ R; end;

Exercise 6.0.2 : Re-write the first program to handle the general case, i.e., when k is not a power of 2. ⊓

Matrix-Chain Multiplication.
Calculating A1 A2 · · · Ak where Ai is an mi−1 × mi matrix is not an easy extension of the 3-matrix case.
Consider A1 A2 A3 A4 . This can be parenthesized in 5 different ways :

((A1 A2 )(A3 A4 )) (A1 ((A2 A3 )A4 )) (A1 (A2 (A3 A4 ))) (((A1 A2 )A3 )A4 )) ((A1 (A2 A3 ))A4 ).

Let C(k) be the number of ways to parenthesize the matrix-chain A1 A2 · · · Ak . Let us put the first parentheses
between Ai−1 and Ai : (A1 A2 . . . Ai−1 )(Ai . . . Ak ). There are C(i) ways to parenthesize the left part and
C(k − i) ways to parenthesize the right part. Now any parenthesization of the left part may be combined
with any parenthesization of the right part and so there are C(i)C(k − i) ways of doing this. Now i can have
any value between 1 and k − 1. Hence we must sum C(i)C(k − i) for all i to get

k−1
X
C(k) = C(i)C(k − i).
i=1

The number of parenthesizations of the matrix chain A1 A2 · · · Ak is the Catalan Number


 
1 2k − 2
C(k) = = Ω(4k /k 2 ).
k k−1

Catalan Numbers

n 1 2 3 4 5 6 7 8 9 10 15
C(k) 1 1 2 5 14 42 132 429 1430 4862 2674440

derek o’connor – October 10, 2008.


Lab. Exer. 6. Complexity of Matrix Operations 6.4

Consider the following example, taken from Cormen, Leiserson and Rivest, page 307,

Matrix-Chain Multiplication

A1 30 × 35 A2 35 × 15 A3 15 × 5 A4 5 × 10 A5 10 × 20 A6 20 × 25

There are 42 different ways to parenthesize A1 A2 · · · A6 . The optimum can be found by dynamic programming
in O(k 3 ) time. The optimum parenthesization is

((A1 ∗ (A2 ∗ A3 )) ∗ ((A4 ∗ A5 ) ∗ A6 )), which gives

((30 × 35 × 5 + (35 × 15 × 5)) + 30 × 5 × 25 + ((5 × 10 × 20) + 5 × 20 × 25)) = 15, 125 ops.

Most compilers/interpreters do not optimally parenthesize and would calculate

(((((A1 ∗ A2 ) ∗ A3 ) ∗ A4 ) ∗ A5 ) ∗ A6 ), which requires 40, 500 ops.

Tests, Part 3
The test below gave the following results : %============= C = MatMult 0,1,2(A,B) ============%
%
Matlab 6.5 vs MatMult. (P III Xeon 800MHz)
% Variations of the original are shown in comments
%
n = 250 n = 500 n = 1000
%==================================================%
Matlab 0.046 0.5 3.812 function C = MatMult(A,B)
MatMult 1.843 16.235 130.125 %==================================================%
[m,n] = size(A); [p,q] = size(B);
ratio mm/ml 40.0 32.5 34.0 if n ~= p
%====================== TestMult(A,B)==============% error(’Matrix sizes incompatible’);
% Tests and compares Matlabs matrix multiplication end;
% against the loop implementation of the standard p = q; % makes parameters same as notes.
% definition of matrix multiplication C = zeros(m,p);
%==================================================% for i = 1:m
function [tmatlab, tmatmult] = TestMult(sizes) for j = 1:p
%==================================================% sum = 0.0;
[m ndims] = size(sizes); for k = 1:n
tmatlab = zeros(1,ndims); sum = sum + A(i,k)*B(k,j);
tmatmult = zeros(1,ndims); end;
for n = 1:ndims C(i,j) = sum;
A = rand(sizes(n),sizes(n)); end;
B = rand(sizes(n),sizes(n)); end;
C = zeros(sizes(n),sizes(n)); %------------------ Version 1 --------------------%
tstart = cputime; % for i = 1:m
C = A*B; % for j = 1:p
tmatlab(n) = cputime - tstart; % C(i,j) = C(i,j) + A(i,:)*B(:,j);
end; % end;
for n = 1:ndims % end;
A = rand(sizes(n),sizes(n)); % end;
B = rand(sizes(n),sizes(n)); %------------------ Version 2 --------------------%
C = zeros(sizes(n),sizes(n)); % for i = 1:m
tstart = cputime; % C(i,:) = C(i,:) + A(i,:)*B;
C = MatMult(A,B); % end;
tmatmult(n) = cputime - tstart; %---------------- End of MatMult1 ----------------%
end;
%---------------- End of TestMult ----------------%

Three versions of MatMult were used in these tests


to see the effect of vectorization. These are shown in
one function below with the variations as comments.

derek o’connor – October 10, 2008.


Lab. Exer. 6. Complexity of Matrix Operations 6.5

The tables below give the results : Tests, Part 4


Matlab 6.5 vs MatMult. %================ Test3MatChain(dims) =============%
% Tests and compares two different orders of
n = 250 n = 500 n = 1000
% multiplying 3 matrices.
Matlab 0.062 0.500 3.828 % R = ((A*B)*C) and R = (A*(B*C))
MatMult 0 1.875 16.250 130.270 % It uses Matlab’s MatMult2’ matrix mults.
MatMult 1 2.156 11.343 66.157 %==================================================%
function [tmatlab, tmatmult] = Test3MatChain(sizes)
MatMult 2 0.313 2.375 18.625
%==================================================%
ratio m0/ml 30.242 33.574 34.030 tmatlab = zeros(1,3);
ratio m1/ml 45.872 23.388 17.282 tmatmult = zeros(1,3);
ratio m2/ml 5.048 4.907 4.865 A = rand(sizes(1),sizes(2));
B = rand(sizes(2),sizes(3));
Time in secs. on a Pentium III Xeon 800MHz 640MB C = rand(sizes(3),sizes(4));
Ram Dell Precion Workstation 620 Windows 2000
R = zeros(sizes(1),sizes(4));
%---------- Do Matlab Test ------------------------%
tstart = cputime;
These test results show two things : (i) vectoriza- R = (A*B)*C;
tion is very important in Matlab, as can be seen tmatlab(1) = cputime - tstart;
in the reduction of the MatMult/Matlabratio from tstart = cputime;
34 to 4.865; (ii) you cannot beat Matlabwhen pro- R = A*(B*C);
gramming in Matlab, so we should not be too sur- tmatlab(2) = cputime - tstart;
prised by these ratios. tmatlab(3) = tmatlab(1)/tmatlab(2);
%---------- Do MatMult2 Test ----------------------%
Matlab6.5 now uses the latest linear algebra ker-
tstart = cputime;
nels (Atlas, I believe) which are highly-tuned as- R = MatMult2(MatMult2(A,B),C);
sembly language primitives for matrix-vector and tmatmult(1) = cputime - tstart;
matrix-matrix multiplication. Inside Matlabwe do tstart = cputime;
not have access to these kernels, except indirectly R = MatMult2(A, MatMult2(B,C));
though vectorization. tmatmult(2) = cputime - tstart;
Can Matlabbe beaten? Yes. O-Matrix at e 50 tmatmult(3) = tmatmult(1)/tmatmult(2);
%---------------- End of Test3MatChain ------------%
beats Matlabby a good margin (I bet they use
the same LA kernels). Any properly-written For-
tran program compiled with DVF (CVF, H-PVF) Matrix Chain Multiplication
and linked with Intel’s latest Math Kernel Library
(A*B)*C A*(B*C) ratio
should come close to or better than Matlab.
50 250 1000 600
Benchmarks on 1000 × 1000 Matrix A
Matlab 0.187 0.610 0.307
Operations on A Matlab O-Matrix MatMult2 0.781 2.766 0.282
1000 × 1000 6.5 5.5
50 2 1000 600
Multiply A ∗ A 3.9 3.1
Matlab 0.141 0.031 4.548
Invert A−1 5.0 3.4
MatMult2 0.531 0.032 16.594
LU Decomp 1.8 1.3
SVD(A) 106.2 70.4 500 2 1000 600
QR(A) 11.9 13.5
Matlab 1.188 0.046 25.826
Eigen(A) 113.8 60.0
MatMult2 5.375 0.063 85.317
Det(A) 1.7 1.3
Cond(A) 16.3 15.2 5 2000 1000 60
Rank(A) 16.3 15.2 Matlab 0.109 0.531 0.205
Time in secs. Pentium III Xeon 800MHz MatMult2 0.188 2.656 0.071
640MB RAM. Dell Precision Workstation
620. Windows 2000 Time in secs. on a Pentium III Xeon 800MHz 640MB
Ram Dell Precision Workstation 620 Windows 2000

The table above demonstrates the importance of


order in a chain of matrix multiplications. Inciden-
tally, a test showed that in Matlab, R = A*B*C is
evaluated as R = (A*B)*C. This left-to-right evalu-
ation is standard in most languages.

derek o’connor – October 10, 2008.

You might also like