Twister is a lightweight MapReduce runtime we have developed by incorporating these enhancements. We have published several scientific papers [1-5] explaining the key concepts and comparing it with other MapReduce implementations such as Hadoop and DryadLINQ. Today we would like to announce its first release.
Key Features of Twister are:
- Distinction on static and variable data
- Configurable long running (cacheable) map/reduce tasks
- Pub/sub messaging based communication/data transfers
- Combine phase to collect all reduce outputs
- Efficient support for Iterative MapReduce computations (extremely faster than Hadoop or DryadLINQ)
- Data access via local disks
- Lightweight (5600 lines of code)
- Tools to manage data
Thank you,
SALSAHPC Team.
[1]. Jaliya Ekanayake, (Advisor: Geoffrey Fox) Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing, Doctoral Showcase, SuperComputing2009.
[2]. Jaliya Ekanayake, Atilla Soner Balkir, Thilina Gunarathne, Geoffrey Fox, Christophe Poulain, Nelson Araujo, Roger Barga, DryadLINQ for Scientific Analyses, Fifth IEEE International Conference on e-Science (eScience2009), Oxford, UK.
[3]. Jaliya Ekanayake, Xiaohong Qiu, Thilina Gunarathne, Scott Beason, Geoffrey Fox High Performance Parallel Computing with Clouds and Cloud Technologies Technical Report August 25 2009 to appear as Book Chapter.
[4]. Geoffrey Fox, Seung-Hee Bae, Jaliya Ekanayake, Xiaohong Qiu, and Huapeng Yuan, Parallel Data Mining from Multicore to Cloudy Grids, High Performance Computing and Grids workshop, 2008. – An extended version of this paper goes to a book chapter.
[5]. Jaliya Ekanayake, Shrideep Pallickara, Geoffrey Fox, MapReduce for Data Intensive Scientific Analyses, Fourth IEEE International Conference on eScience, 2008, pp.277-284.
4 comments:
Just saw this mailing list discussion at Apache Mahout project regarding Twister.
http://old.nabble.com/Fwd:-Twister:-Iterative-MapReduce-td27535589.html
They thought we are competing. Cool.
Just thought that I better clarify some of the issues they have mentioned related to Twister.
1. No fault tolerance.
Yes, this is just in this release. It will be ready soon for iterative MapReduce applications.
2. Clusters are small.
We have tried twister with clusters up to 32 nodes with 256 CPU cores. Soon we will try it on 768 core cluster. These can be small compared to Google, or Yahoo, but when we look at the spectrum of parallel applications, i would say these are modest clusters.
Hi,
I've set twister 0.9 to our cluster in my school. I am trying to figure out the case where the fault tolerance kicks in. As far as I see, there is no fault tolerance for the case where one of the mappers die after starting the execution.The job just hangs. Is this correct? If so what is the fault tolerance that you talk about in the paper refers to?- By the way I tried 2,4 and 16 replicas of each split for a 16 node job. Still the job is not completed
thanks
To Elif,
First of all let me verify if the setup is correct.
1. Did you enable fault tolerance using JobConf.
jobConf.setFaultTolerance();
2. Did you try an iterative applicaiton or a normal MapReduce applicaiton?
Twister provide fault tolerance only for iterative applications. What it do is restart the failed iteration.
Hi,
I am currently comparing two implementations of PageRank. The one from Twister and another one.
My concern with Twister - at the end of the calculation, if I sum up all Ranks, shouldn't I get SUM=1.
Instead I get '0.15663257775113015'
Post a Comment