Problem Set 1 -- Matrix Multiplication
This assignment has two parts. The first part
involves programming and is due at 11:30PM on Tuesday, September 12.
The second does not involve programming and
is due at the beginning of class at 2:45PM on Wednesday, September 13.
For the first part of this problem set you will write optimized code
for doing matrix multiplication:
C = C + A*B,
where A, B, and C are all square matrices. (If you don't remember
how matrix multiplication works, this is a good time to review your
linear algebra notes.)
The obvious matrix multiplication code (consisting of three nested loops)
has a peak performance of under 140 MFlops on lucy, a Pentium with a
3.06 GHz Intel Xeon processor (the same processor as on
this supercomputer at the
US Army Research Lab). Furthermore, the performance seems to decrease
(slightly) as the problem size increases, and for some size matrices the
performance falls to about 20 MFlops (see the figure). This is terrible.
Fortunately, with your knowledge of uniprocessors, you have reason to
believe that you are personally capable of optimizing the code so it
achieves significantly better performance (while still computing the
right thing, of course).
You can find a makefile, the test program, a basic 3-loop
implementation of matrix multiplication, and a gnuplot file for
generating plots of the timings here. That should
be enough to get you started on writing your own optimized matrix
Many variations on this problem come up in practice. For this part
of the assignment you'll consider two that come up relatively
would change to your optimization strategy from part I in each of
the following situations: