Plant phenotyping consists in the observation of physical and biochemical traits of plant genotypes in response to environmental conditions. Challenges, in particular in context of climate change and food security, are numerous. High-throughput platforms have been introduced to observe the dynamic growth of a large number of plants in different environmental conditions. Instead of considering a few genotypes at a time (as it is the case when phenomic traits are measured manually), such platforms make it possible to use completely new kinds of approaches. However, the data sets produced by such widely instrumented platforms are huge, constantly augmenting and produced by increasingly complex experiments, reaching a point where distributed computation is mandatory to extract knowledge from data.
In this paper, we introduce InfraPhenoGrid, the infrastructure we designed and deploy to efficiently manage data sets produced by the PhenoArch plant phenomics platform in the context of the French Phenome Project. Our solution consists in deploying OpenAlea scientific workflows on a Grid using the SciFloware middleware to pilot workflow executions. Our approach is user-friendly in the sense that despite the intrinsic complexity of the infrastructure, running scientific workflows and understanding results obtained (using provenance information) is kept as simple as possible for end-users.
Then, both openalea.opencv and infraphenogrid need to be installed.
- Fetch the sources on github:
- python setup.py install from the root dir of each package.
Fetch data set: data_set_0962_A310_ARCH2013-05-13.zip
Extract in any directory
In 'openalea/infraphenogrid/demo' two directories contain:
- some example of algorithms to evaluate the leaf area of binarized images
- some examples of algorithms to binarize pictures taken on the PhenoArch platform.
Two more workflows compare the result of each category of algorithms.
- Open a workflow in 'visualea' (double click in the package explorer view)
- if the workflow contains a 'import_images' node you need to open it (double click on the node) to point it to the directory where the data set has been unpacked.
- Run the workflow to display the results (Ctrl + R or right click on a specific node to run)