Monday, October 01, 2007

October 3rd Report

Had a meeting with Prof. Geoffrey relating to the Ph.D. topic that I should select. He advice me to find more use cases of the data analysis tasks that are similar to the Particle Physics data analysis that we are doing using Clarens and ROOT.

The data analysis tasks that we handled has the following characteristic.

  • Data is in large files and these files are distributed across the globe.
  • One or more analysis technique(in our case, one analysis script) can be applied to all the data to identify patterns.
  • The outcome of an analysis of a single file is a histogram.
  • These outcomes(histograms) can be merged to produce the final results.

So far I have found one strong use case of this nature and that is;
Astronomical Image Processing - mainly for identifying features in astronomical images.

There are few candidate areas that I found interesting and they are;
Analysis of Earthquake Data
Microarray Analysis for Genes
Pattern Matching in Financial Data

Currently I am reading to find out the exact data analysis requirements of these fields. The target is to find more use cases for "Distributed Composable Data Analysis"

September 19th Report

Conrad tested the demo from CERN and it worked well. So, now I can focus on the next step of the project.
Conrad also sent me a link to more root data files so as the next step I will test the demo with those new root data files. The first demo only uses a single rootlet and the reason for this is mainly the way how the Clarens client is written. Each analysis request is processed synchronously and hence the client send requests one by one to the server for each root data file to be analyzed.

As the next step of the project, I am planning to run the client in with multiple processes and allow it to create multiple rootlets so that the analysis can be performed simultaneously utilizing the full cpu power. Hope to get the results soon.