by Özalp Babaoglu , Lorenzo Alvisi , Alessandro Amoreso , Renzo Davoli , Davoli Luigi , Luigi Alberto Giachini
Just found this paper and read it to the end since I noticed some similarities of what they have proposed in 1992 and the current MapReduce programming model and some of the observations are still true for today as well. I will list few of the observations/assumptions they have made showing the similarity of their work and the current MapReduce programming model.
- Large-grain data flow model suitable for high-latency low bandwidth networks
- Only by keeping the communication-computation ratio to reasonable levels can we expect reasonable performance from parallel applications in such system. – We noticed a similar thing with performing parallel computing in Cloud infrastructures [paper]
- Paralex functions mush be “pure” – no side effects
- Node corresponds to computations (functions, procedures, programs) and links indicate flow of typed data – Compare this with Microsoft Dryad’s DAG based programming model.
Some of their performance measures had issues with 16MB data set because the memory they had in one of the machines was only 16 MB. Today we have the luxury of using large memories but our data sets are also grown into petabytes. What they did with NFS is now done in HDFS in Hadoop.