Feature #702
matplotlib histogram reducer
100%
Description
Reduce worker that uses matplotlib to create histograms, with associated tests.
Updated by Jackson, Mike about 12 years ago
From MAUS SSI Component Design - Online Reconstruction:
- Histograms allow detector performance to be monitored.
- Converts JSON document into JPG or EPS (or other image file).
- May be 100s of plots per output.
- Should be easy to configure/extend by non-Python programmer.
- Needs to allow aggregation of data to incrementally update the plot (so include a test for this!)
Updated by Jackson, Mike about 12 years ago
Installing matplotlib on maus.epcc:
yum install python-matplotlib # Now check... python import matplotlib.pyplot as plt plt.plot([1,2,3]) plt.show()
Installing copy to run under MAUS's bundled Python:
# Set up MAUS environment. cd maus source env.sh # Get and unpack matplotlib. cd wget http://sourceforge.net/projects/matplotlib/files/matplotlib/ \ matplotlib-1.0.1/matplotlib-1.0.1.tar.gz/download gunzip matplotlib-1.0.1.tar.gz tar -xf matplotlib-1.0.1.tar cd matplotlib-1.0.1 # Build and install using instructions in INSTALL. python setup.py build python setup.py install # Now check... python import matplotlib.pyplot as plt plt.plot([1,2,3]) plt.show()
Error:
/home/michaelj/maus/third_party/install/lib/python2.7/ site-packages/matplotlib/backends/__init__.py:41: UserWarning: Your currently selected backend, 'agg' does not support show(). Please select a GUI backend in your matplotlibrc file ('/home/michaelj/maus/third_party/install/lib/python2.7/site-packages/ matplotlib/mpl-data/matplotlibrc') or with matplotlib.use() (backend, matplotlib.matplotlib_fname()))
Try with a different "backend"
python import matplotlib matplotlib.use('GTKAgg') import matplotlib.pyplot as plt
Error:
... raise ImportError("Gtk* backend requires pygtk to be installed.")
Default in
yum
version is GTKAgg
as listed in:/usr/lib/python2.4/site-packages/matplotlib/mpl-data/matplotlibrc
Try to save to file as in matplotlib HOWTO
python import matplotlib.pyplot as plt plt.plot([1,2,3]) plt.savefig('tmp') CTRL-D ls tmp.png
Updated by Tunnell, Christopher about 12 years ago
I can repeat this behaviour. Also note:
source env.sh easy_install matplotlib
should install matplotlib. I suggest using 'gv' and '.eps' files to start.
Do you think it's easy to enable the GUI feature? What do we need to add to the dependencies? The main feature we need is files I guess so that may just be a distraction.
Updated by Jackson, Mike about 12 years ago
Commited work-in-progress to branch, commit 634. Supports any of the file formats (default .eps) supported by matplotlib's FigureCanvas.print_figure function.
Updated by Jackson, Mike about 12 years ago
I made attempts to get the GUI features going but got bogged down in tracking down dependencies-of-dependencies. One needs PyGTK which needs GTK and PyGobject. Another needs wxPython which needs wxWidgets. And, different sites list different instructions. These are handled by matplotlib "backends".
One "backend" I did get going was the Tcl/Tk based one which is based on Tkinter bundled with Python 2.7 but which needs Tcl/Tk, so:
yum install tcl tk tclsh yum install tcl-devel yum install tk-devel
Now, rebuild Python:
./configure --enable-shared --prefix="/home/michaelj/maus/third_party/install" ./make ./make install
Trying matplotlib yielded a new error:
python import matplotlib matplotlib.use('TkAgg') import matplotlib.pyplot as plt ImportError: No module named _tkagg
matplotlib needs a rebuild, so
rm -rf third_party/install/lib/python2.7/site-packages/matplotlib* easy_install matplotlib
And, test again:
import matplotlib matplotlib.use('TkAgg') import matplotlib.pyplot as plt plt.plot([1,2,3]) [<matplotlib.lines.Line2D object at 0xa1abf0c>] plt.show()
And a Tcl/Tk window appears.
Updated by Tunnell, Christopher about 12 years ago
Why do you need the GUI at all? Sucks that we can't do it, but for now can't we just spit out .eps and use gv? Then evetually change that to .png for the web?
That'll work out of the box with easy_install matplotlib
Updated by Tunnell, Christopher about 12 years ago
I say this mainly because if you're generating 100 plots, you won't want them all popping up to the local terminal...
Updated by Jackson, Mike about 12 years ago
As it stands the reducer spits out .eps by default, or other file formats if requested. It doesn't, nor was it going to be changed to, render these.
Updated by Tunnell, Christopher about 12 years ago
Save PNG to memory:
import matplotlib.pyplot as plt plt.plot([1,2,3]) import StringIO file = StringIO.StringIO() # file is python file object (like open('test') but in memory plt.savefig(file) file.seek(0) data = file.read() file2 = open('my.png', 'w') # w for write file2.write(data) file2.close()
Updated by Jackson, Mike about 12 years ago
Revision 634: Refactored ReducePyMatplotlibHistogram so it plots ADC against TDC counts, outputs JSON doc with summary information, errors and binary plot data. Wrote OutputPyImage worker to save images to files. Works for EPS. Problem with PNG for now to do with string encodings.
Updated by Tunnell, Christopher about 12 years ago
If you can't get PNG to work, then (if it's quicker) you may want to consider just using EPS and doing the conversion to PNG somewhere else. People use EPS for talks etc since they are vector images thus scale well. What I'm thinking is a web display of the PNG but if you click on it or something you get the EPS.
Or rather: not having PNGs shouldn't block this issue too long. For instance, there's the Python Image Library:
http://mail.python.org/pipermail/image-sig/2004-September/002947.html
that can do this.
Updated by Jackson, Mike about 12 years ago
Encoding issue with PNG and other matplotlib file types (e.g. PDF) sorted by use of base64 module by ReducePyMatplotlibHistogram to encode the data when it's in the JSON doc and by OutputPyImage after extracting the data prior to saving.
Code and tests are ready for review (652). Tests give 100% "coverage":
src/output/OutputPyImage/ OutputPyImage.py test_OutputPyImage.py src/reduce/ReducePyMatplotlibHistogram/ ReducePyMatplotlibHistogram.py test_ReducePyMatplotlibHistogram.py bin/simple_histogram_example.py # Simple client example using hard-coded spill docs. bin/simulate_mice_histogram.py # Copy of simulate_mice.py which invokes the histogram reducer. third_party/bash/40python_extras.bash # Now invokes easy_install matplotlib.
Will likely want to revisit these especially OutputPyImage and how it handles inputs and errors depending on changes as to how Go.py works.
Updated by Jackson, Mike about 12 years ago
654 fixes pylint errors that were causing test run to fail.
Updated by Tunnell, Christopher about 12 years ago
This looks done and great. Minor request: can you refactor the code to take into account the API change in:
https://code.launchpad.net/~maus-dev/maus/main
where the changes are:
http://bazaar.launchpad.net/~maus-dev/maus/main/view/head:/src/common_py/Go.py
and tell me if there is a better way the API could work for you? I assume you prefer it to the old api.
Updated by Jackson, Mike about 12 years ago
656, refactored as requested. This API makes more sense to me - the merger takes in a sequence of 1..N JSON documents and outputs 1..N documents (as in ReducePyMatplotlibHistogram's 1-2 mapping).
Updated by Tunnell, Christopher about 12 years ago
Merge into trunk.
@Rogers: what's a good example of using the error handler?
@Jackson: Small comment that didn't block the merge but worth fixing at some point... there's an ErrorHandler class:
http://micewww.pp.rl.ac.uk/embedded/maus/doxygen_framework/html/ErrorHandler_8py-source.html
Updated by Tunnell, Christopher about 12 years ago
Oh: can you make it so only one plot gets spit out by default? Instead of one per spill (which could be an option)? if we run a million events, then that will get unmanagable.
Updated by Tunnell, Christopher about 12 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Applied in changeset commit:c.tunnell1@physics.ox.ac.uk-20111013144251-g4gufii1sti2zeqs.
Updated by Jackson, Mike about 12 years ago
Could introduce an "N spills" configuration value that determines the number of spills to process before a histogram is received or just to assume 1 histogram is to be output if this value is not specified.
But in both cases (output just 1 histogram or output the final histogram when all spills have been processed) the problem is how to signal to the worker when it can output this histogram. It won't know it won't be receiving more spills until death() is called which is too late as the output may need processed by downstream workers. Two options:
- A post_process operation on mergers which has the same signature as process but which means, this is the last spill.
- Go.py sends down an "end-of-spills" JSON document which signals to them that no more spills are due. Each reducer can either handle this or ignore it as required.
Updated by Jackson, Mike about 12 years ago
I think that this would be a good idea given the motivation you state.
Updated by Tunnell, Christopher about 12 years ago
What's wrong with just outputting the histogram each spill? But to the same filename and where it is a sum of all previous spills?
Updated by Jackson, Mike about 12 years ago
OK I see what you mean. At present ReducePyMatplotlibHistogram creates a tag "spillN" where N is the spill. This is used by OutputPyImage to be the file name. Could change this to be configurable so that the tag is just "spill" and, only if requested, the "N" auto-number is added. Nx2 documents will still be output but OutputPyImage would just overwrite the existing files each time.
Actually, it might be worth adding another configuration parameter to specify whether the user want the plot for that specific spill or just the summary plot.
Updated by Tunnell, Christopher about 12 years ago
Maybe this should be another issue, but it may also be worth thinking about how we explain to people extending this. People will want plots of arbitrary things, so figuring out how we suggest they do it will be important at some point. For instance, do we have a base class that hides some of the machinery underneath?
I think that you've moved a lot out into helper functions which is great. I don't think it'll be that hard to extend it. Just wanted to make sure that that use case was written down somewhere.
Updated by Jackson, Mike about 12 years ago
I'll restructure the class and comment it in a way that tries to pull out the elementary aspects (the histogram object creation, base64 encoding, outputing the JSON) from the aspects that a user may want to alter (e.g. the actual plots done, labels, rescaling etc).
I could also add a section to doc_src/maus_user_guide.tex
Updated by Rogers, Chris about 12 years ago
I could also add a section to doc_src/maus_user_guide.tex
I think that's a good idea.
Updated by Jackson, Mike about 12 years ago
At present the reducer can output the histogram for just the current spill and/or the running total. I think it might just be worth simplifying it to output the summary only (with or without autonumbering) - it would make the reducer code clearer, easier to modify and less full of if-thens. If a reducer that outputs the plot just for the current spill is needed it can easily be customised from the summary one.
Updated by Tunnell, Christopher about 12 years ago
I wish there was a facebook type 'like button', but I like your last comment.
I've never seen anything other than running totals over some interval in particle physics, so that's pretty standard.
Updated by Jackson, Mike about 12 years ago
Completed the removal of the spill-by-spill histogram. Started on the extraction of commonality to a super-class.
Updated by Jackson, Mike about 12 years ago
Commit "667": Pulled out TDC/ADC histogram specifics into ReducePyHistogramTDCADCCounts. ReducePyMatplotlibHistogram keeps the commonality.
Updated by Tunnell, Christopher about 12 years ago
- Status changed from Closed to In Progress
- % Done changed from 100 to 50
I'll let you close when you want.
Updated by Jackson, Mike about 12 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100
I was done with it now.
Updated by Tunnell, Christopher about 12 years ago
Okay, reviewing it (won't bother making a new issue, shouldn't take long).
Updated by Tunnell, Christopher about 12 years ago
Also: let me know when you want some students thrown at this to try making their own plots. If it's useful for you to get input or you think I should hold off (I imagine I'll be answering questions) until something finishes, let me know
Updated by Jackson, Mike about 12 years ago
I'm OK with students trying to customise it to make their own plots. There are notes at #747. Their experiences would be useful as it would indicate what we'd need to document for future users.
Updated by Rogers, Chris about 12 years ago
- Target version changed from Future MAUS release to MAUS-v0.0.9
Updated by Jackson, Mike almost 12 years ago
706 ReducePyMatplotlibHistogram and ReducePyHistogramTDCADCCounts now use ErrorHandler.