The data analysis tasks that we handled has the following characteristic.
- Data is in large files and these files are distributed across the globe.
- One or more analysis technique(in our case, one analysis script) can be applied to all the data to identify patterns.
- The outcome of an analysis of a single file is a histogram.
- These outcomes(histograms) can be merged to produce the final results.
So far I have found one strong use case of this nature and that is;
Astronomical Image Processing - mainly for identifying features in astronomical images.
There are few candidate areas that I found interesting and they are;
Analysis of Earthquake Data
Microarray Analysis for Genes
Pattern Matching in Financial Data
Currently I am reading to find out the exact data analysis requirements of these fields. The target is to find more use cases for "Distributed Composable Data Analysis"