Uzi Vishkin wrote these ideas on how to teach a parallel computing course as a comment on my earlier parallelism post.
The basic claim is that:
It does not make sense to have a new platform of general-purpose parallel computing succeed the established serial platform
without having a one-to-one match of EVERYTHING, including algorithms and data structures.
In particular, it does not make sense to teach parallel programming without teaching parallel algorithms and data
structures. The gap between programming and algorithms must be bridged, so that the continuum from algorithms and data-structures
to programming will resemble as much as possible the continuum in serial computing.
Since the PRAM theory is the only serious candidate developed in nearly 3 decades of research, PRAM algorithms have got to
be taught.
I expect theorists to endorse this argument and use it to convince their colleagues that PRAM algorithms need to be taught. But,
I have to be frank. I am concerned that some of us will do the following: teach a course on parallel algorithms as a purely
theory course WITHOUT any connection to programming. This will miss the point as it ignores the need to relate algorithms to
programming. The Q&A at the end of this text elaborate further on the programming issue.
As others have implied, you can find several fine sources for PRAM algorithms. For this reason, my comments below mostly focus on
a way to address the parallel programming issue:
In class presentation.
Read Section 2.1 entitled XMTC in FPGA-Based
Prototype of a PRAM-On-Chip Processor.
It reviews a modest extension to the C programming language called XMTC that allows PRAM-like programming. XMTC essentially adds
only 2 basic commands to C: Spawn and PS (for prefix-sum).
Devote a total of around 15-20 minutes similar to slides 37-39 in these slides to
present XMTC. Slide 40 can guide a discussion.
Supporting documentation. The students should then be referred to:
the XMTC Manual and the
XMTC tutorial.
Programming assignments. Please look up under assignments on this course page.
Running programming assignments. The UMD PRAM-On-Chip project is on track for public release by the end of June 2008 of:
a cycle accurate simulator of the PRAM-On-Chip machine, and
a compiler from XMTC to that machine.
The will allow your students to run XMTC code on an emulated 64-processor PRAM-On-Chip machine. To remind you, a hardware
prototype of such a machine (using FPGA technology) has been in use at UMD since January 2007. A compiler that translates XMTC to
OpenMP will also be released, giving your students an alternative way to run their assignments.
Finally, please note that this type of programming cannot be too difficult. I have given a 1-day parallel algorithms tutorial to
a dozen high school students in Fall 2007 and subsequently some of them managed to do on their own 8 programming assignments. In
fact, the above link to programming assignments gives these 8 programming assignments. The only help the high school student got
was one office hour per week by an undergraduate teaching assistant. They did not get any school credit for their work. Their
participation was in the context of a computer club after completing their regular school work (8 periods per day).
If you are looking for code examples, you are welcome to write to me.
Here are some Q&A:
Q: I never learned parallel programming formally, but I picked up some ideas in my free time from Java/MPI/OpenMP/etc. How do any
of these relate to XMTC parallel programming?
A: XMTC parallel programming is simpler and different.
Q: The problem of algorithms being taught independently of programming is present within the exclusively serial world. What would
you say to the many theorists who are resistant to the idea of having a heavy programming component in their courses?
A: IMHO the serial case is completely different. Most students have experienced/learned serial programming BEFORE taking the
serial algorithms course. This is NOT the case for parallel programming. My experience is that students learn best if parallel
programming is coupled with parallel algorithms. The main difference is that the parallel algorithms course is where parallel
programming should be FIRST taught. The reason is that parallelism requires introduction of some first principles representing an
"alien culture" to students. In contrast, serial computing is related to: (i) mathematical induction, (ii) the way our brain
instructs our body (as a single processor), etc. There is nothing out there that prepares us for parallel computing.
Q: What text do you use for teaching parallel algorithms?
I'm excited to see someone suggesting that teaching theory with a programming component is a good idea, at least for parallel programming. Though I think that some of the same arguments (and other good ones) also suggest doing at least some programming in a "serial" theory class also.
PRAM doesn't scale. Even serial machines today have memory access distributed into banks. Don't spend too much time on it.
The backbone of most parallel algorithms today is parallel prefix. Aggregating and distributing information in time logarithmic to your number of processors is crucial.
As for programming libraries MPI is *the* standard.
As for theory, cover the classes NC and P-complete. Very few texts today point out that there are problems serial in nature.
I would probably assign Vipin Kumar's book and Pacheco's MPI book.
There has unfortunately been a disconnect between the theory community and the community of people who actually engage in large-scale parallel computing applications. The primary reason for doing parallelism is for doing large-scale problems, and the architectures that have proved most successful in practice are the shared-nothing architectures. The reasons are primarily economic and physical - nobody has figured out how to build anything resembling a PRAM at large scale that will compete with shared-nothing architectures on price/performance or peak performance on real problems.
Now that we are starting to see a lot of multi-core chips, there should be renewed interest in discussion of parallel algorithms in the PRAM model. This won't come close to the needs of the truly high performance computing world, but that doesn't mean it should be taught. It would probably be more useful to focus on aspects of parallel computing that downplay asymptotic analysis on PRAMS with large numbers of processors, because this is the area where the PRAM has been displaced by shared-nothing architectures.
Perhaps there are opportunities for theoreticians to explain whether there are fundamental physical limitations on RAM architectures, or whether we just haven't invented the right hardware. I don't know the answer there, but there seems to be a disconnect between theory and practice on this subject.
There are important reasons why PRAMs failed as a model of parallelism, and Uzi fails to address them. One only needs to read the long of list of papers on alternatives to the PRAM to see why the PRAM was not implementable in practice.
Twenty years later finally we have parallelism in practice in the form of multicore computers and they look nothing like the PRAMs of yore.