Sunday, January 03, 2010

Almost MapReduce in 1992

Paralex: An Environment for Parallel Programming in Distributed Systems (1992)

by Özalp Babaoglu , Lorenzo Alvisi , Alessandro Amoreso , Renzo Davoli , Davoli Luigi , Luigi Alberto Giachini

Just found this paper and read it to the end since I noticed some similarities of what they have proposed in 1992 and the current MapReduce programming model and some of the observations are still true for today as well. I will list few of the observations/assumptions they have made showing the similarity of their work and the current MapReduce programming model.

  • Large-grain data flow model suitable for high-latency low bandwidth networks
  • Only by keeping the communication-computation ratio to reasonable levels can we expect reasonable performance from parallel applications in such system. – We noticed a similar thing with performing parallel computing in Cloud infrastructures [paper]
  • Paralex functions mush be “pure” – no side effects
  • Node corresponds to computations (functions, procedures, programs) and links indicate flow of typed data – Compare this with Microsoft Dryad’s DAG based programming model.
Dryad paper has referred to their work. - I haven't noticed it before ;)

Some of their performance measures had issues with 16MB data set because the memory they had in one of the machines was only 16 MB. Today we have the luxury of using large memories but our data sets are also grown into petabytes. What they did with NFS is now done in HDFS in Hadoop.