Friday, September 11, 2009

MSR Internship is over - Going back to IU

Today I finished my 3 months internship at Microsoft research. It was quite a wonderful experiance for me, and I was able to accomplish most of my internship goals.

At the beginning of my internship I was given the following goals for my internship.
Evaluate the usability of DryadLINQ for scientific analyses
– Develop a series of scientific applications using DryadLINQ
– Compare them with similar MapReduce implementations (E.g. Hadoop)
– Run above DryadLINQ applications on Cloud

During the internship, I developed four DryadLINQ applications and optimized them for performance and also identified several improvements to the current DryadLINQ code base.

I did a detailed performance analysis of the Cap3, HEP, Kmeans applications developed using DryadLINQ comparing them with Hadoop implementations of the same applications. Performance of the pair wise distance calculation application was compared with an MPI implementation of the same application. These findings were all included in the following two papers.
DryadLINQ for Scientific Analyses
Cloud Technologies for Bioinformatics Applicaitons

We (I and my colleague intern –Atilla Balkir) were able to deploy a Windows HPC cluster on GoGrid cloud. I was able to run Cap3 application on Cloud but other applications did not work due to the limitations of the GoGrid infrastructure.

Overall we have the following conclusions regarding DryadLINQ runtime.
  • We developed six DryadLINQ applications with various computation, communication, and data access requirements
    All DryadLINQ applications work, and in many cases perform better than Hadoop
  • We can definitely use DryadLINQ for scientific analyses
  • We did not implement (find)
    –Applications that can only be implemented using DryadLINQ but not with typical MapReduce
  • Current release of DryadLINQ has some performance limitations
  • DryadLINQ hides many aspects of parallel computing from user
    Coding is much simpler in DryadLINQ than Hadoop (provided that the performance issues are fixed)
  • More simplicity comes with less control and sometimes it is hard to fine-tune
  • We showed that it is possible to run DryadLINQ on Cloud

I got all the necessary support from my mentor (Nelson Araujo), Chirstophe, and the ARTS team @ MSR in accomplishing the objectives of my internship. I would also like to thank Dryad team at Silicon Valley for their dedicated support as well. Last but not least, the support from my advisor (Prof. Geoffrey Fox) and the SALSA team at pervasive technology labs was a tremendous encouragement to me.

Sunday we are planning to head back to Indiana with a two week old baby - Our small miracle - in our hands.

No comments: