Project

General

Profile

Feature #702

matplotlib histogram reducer

Added by Jackson, Mike about 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Jackson, Mike
Category:
Online reconstruction
Target version:
Start date:
20 September 2011
Due date:
% Done:

100%

Estimated time:
Workflow:

Description

Reduce worker that uses matplotlib to create histograms, with associated tests.

#1

Updated by Jackson, Mike about 10 years ago

From MAUS SSI Component Design - Online Reconstruction:

  • Histograms allow detector performance to be monitored.
  • Converts JSON document into JPG or EPS (or other image file).
  • May be 100s of plots per output.
  • Should be easy to configure/extend by non-Python programmer.
  • Needs to allow aggregation of data to incrementally update the plot (so include a test for this!)
#2

Updated by Jackson, Mike about 10 years ago

Installing matplotlib on maus.epcc:

yum install python-matplotlib
# Now check...
python
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.show()

Installing copy to run under MAUS's bundled Python:

# Set up MAUS environment.
cd maus
source env.sh
# Get and unpack matplotlib.
cd
wget http://sourceforge.net/projects/matplotlib/files/matplotlib/ \
  matplotlib-1.0.1/matplotlib-1.0.1.tar.gz/download
gunzip matplotlib-1.0.1.tar.gz
tar -xf matplotlib-1.0.1.tar
cd matplotlib-1.0.1
# Build and install using instructions in INSTALL.
python setup.py build
python setup.py install
# Now check...
python
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.show()

Error:
/home/michaelj/maus/third_party/install/lib/python2.7/
 site-packages/matplotlib/backends/__init__.py:41: UserWarning: 
Your currently selected backend, 'agg' does not support show().
Please select a GUI backend in your matplotlibrc file 
('/home/michaelj/maus/third_party/install/lib/python2.7/site-packages/
 matplotlib/mpl-data/matplotlibrc')
or with matplotlib.use()
  (backend, matplotlib.matplotlib_fname()))

Try with a different "backend"
python
import matplotlib
matplotlib.use('GTKAgg')
import matplotlib.pyplot as plt

Error:
...
    raise ImportError("Gtk* backend requires pygtk to be installed.")

Default in yum version is GTKAgg as listed in:
/usr/lib/python2.4/site-packages/matplotlib/mpl-data/matplotlibrc 

Try to save to file as in matplotlib HOWTO
python
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.savefig('tmp')
CTRL-D
ls
tmp.png

#3

Updated by Tunnell, Christopher about 10 years ago

I can repeat this behaviour. Also note:

source env.sh
easy_install matplotlib

should install matplotlib. I suggest using 'gv' and '.eps' files to start.

Do you think it's easy to enable the GUI feature? What do we need to add to the dependencies? The main feature we need is files I guess so that may just be a distraction.

#4

Updated by Jackson, Mike about 10 years ago

Commited work-in-progress to branch, commit 634. Supports any of the file formats (default .eps) supported by matplotlib's FigureCanvas.print_figure function.

#5

Updated by Jackson, Mike about 10 years ago

I made attempts to get the GUI features going but got bogged down in tracking down dependencies-of-dependencies. One needs PyGTK which needs GTK and PyGobject. Another needs wxPython which needs wxWidgets. And, different sites list different instructions. These are handled by matplotlib "backends".

One "backend" I did get going was the Tcl/Tk based one which is based on Tkinter bundled with Python 2.7 but which needs Tcl/Tk, so:

yum install tcl tk tclsh
yum install tcl-devel
yum install tk-devel

Now, rebuild Python:
./configure  --enable-shared --prefix="/home/michaelj/maus/third_party/install" 
./make
./make install

Trying matplotlib yielded a new error:
python
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
ImportError: No module named _tkagg

matplotlib needs a rebuild, so
rm -rf third_party/install/lib/python2.7/site-packages/matplotlib*
easy_install matplotlib

And, test again:
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
plt.plot([1,2,3])
[<matplotlib.lines.Line2D object at 0xa1abf0c>]
plt.show()

And a Tcl/Tk window appears.

#6

Updated by Tunnell, Christopher about 10 years ago

Why do you need the GUI at all? Sucks that we can't do it, but for now can't we just spit out .eps and use gv? Then evetually change that to .png for the web?

That'll work out of the box with easy_install matplotlib

#7

Updated by Tunnell, Christopher about 10 years ago

I say this mainly because if you're generating 100 plots, you won't want them all popping up to the local terminal...

#8

Updated by Jackson, Mike about 10 years ago

As it stands the reducer spits out .eps by default, or other file formats if requested. It doesn't, nor was it going to be changed to, render these.

#9

Updated by Tunnell, Christopher about 10 years ago

Save PNG to memory:

import matplotlib.pyplot as plt
plt.plot([1,2,3])

import StringIO
file = StringIO.StringIO()  # file is python file object (like open('test') but in memory
plt.savefig(file)
file.seek(0)

data = file.read()

file2 = open('my.png', 'w') # w for write
file2.write(data)
file2.close()
#10

Updated by Tunnell, Christopher about 10 years ago

Per call.

#11

Updated by Jackson, Mike almost 10 years ago

Revision 634: Refactored ReducePyMatplotlibHistogram so it plots ADC against TDC counts, outputs JSON doc with summary information, errors and binary plot data. Wrote OutputPyImage worker to save images to files. Works for EPS. Problem with PNG for now to do with string encodings.

#12

Updated by Tunnell, Christopher almost 10 years ago

If you can't get PNG to work, then (if it's quicker) you may want to consider just using EPS and doing the conversion to PNG somewhere else. People use EPS for talks etc since they are vector images thus scale well. What I'm thinking is a web display of the PNG but if you click on it or something you get the EPS.

Or rather: not having PNGs shouldn't block this issue too long. For instance, there's the Python Image Library:

http://mail.python.org/pipermail/image-sig/2004-September/002947.html

that can do this.

#13

Updated by Jackson, Mike almost 10 years ago

Encoding issue with PNG and other matplotlib file types (e.g. PDF) sorted by use of base64 module by ReducePyMatplotlibHistogram to encode the data when it's in the JSON doc and by OutputPyImage after extracting the data prior to saving.

Code and tests are ready for review (652). Tests give 100% "coverage":

src/output/OutputPyImage/
 OutputPyImage.py  
 test_OutputPyImage.py
src/reduce/ReducePyMatplotlibHistogram/
 ReducePyMatplotlibHistogram.py  
 test_ReducePyMatplotlibHistogram.py

bin/simple_histogram_example.py # Simple client example using hard-coded spill docs.

bin/simulate_mice_histogram.py # Copy of simulate_mice.py which invokes the histogram reducer.

third_party/bash/40python_extras.bash # Now invokes easy_install matplotlib.

Will likely want to revisit these especially OutputPyImage and how it handles inputs and errors depending on changes as to how Go.py works.

#14

Updated by Jackson, Mike almost 10 years ago

654 fixes pylint errors that were causing test run to fail.

#15

Updated by Tunnell, Christopher almost 10 years ago

This looks done and great. Minor request: can you refactor the code to take into account the API change in:

https://code.launchpad.net/~maus-dev/maus/main

where the changes are:

http://bazaar.launchpad.net/~maus-dev/maus/main/view/head:/src/common_py/Go.py

and tell me if there is a better way the API could work for you? I assume you prefer it to the old api.

#16

Updated by Jackson, Mike almost 10 years ago

656, refactored as requested. This API makes more sense to me - the merger takes in a sequence of 1..N JSON documents and outputs 1..N documents (as in ReducePyMatplotlibHistogram's 1-2 mapping).

#17

Updated by Tunnell, Christopher almost 10 years ago

Merge into trunk.

@Rogers: what's a good example of using the error handler?

@Jackson: Small comment that didn't block the merge but worth fixing at some point... there's an ErrorHandler class:

http://micewww.pp.rl.ac.uk/embedded/maus/doxygen_framework/html/ErrorHandler_8py-source.html

#18

Updated by Tunnell, Christopher almost 10 years ago

*Merged

#19

Updated by Tunnell, Christopher almost 10 years ago

Oh: can you make it so only one plot gets spit out by default? Instead of one per spill (which could be an option)? if we run a million events, then that will get unmanagable.

#20

Updated by Tunnell, Christopher almost 10 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100
#21

Updated by Jackson, Mike almost 10 years ago

Could introduce an "N spills" configuration value that determines the number of spills to process before a histogram is received or just to assume 1 histogram is to be output if this value is not specified.

But in both cases (output just 1 histogram or output the final histogram when all spills have been processed) the problem is how to signal to the worker when it can output this histogram. It won't know it won't be receiving more spills until death() is called which is too late as the output may need processed by downstream workers. Two options:

  • A post_process operation on mergers which has the same signature as process but which means, this is the last spill.
  • Go.py sends down an "end-of-spills" JSON document which signals to them that no more spills are due. Each reducer can either handle this or ignore it as required.
#22

Updated by Jackson, Mike almost 10 years ago

I think that this would be a good idea given the motivation you state.

#23

Updated by Tunnell, Christopher almost 10 years ago

What's wrong with just outputting the histogram each spill? But to the same filename and where it is a sum of all previous spills?

#24

Updated by Jackson, Mike almost 10 years ago

OK I see what you mean. At present ReducePyMatplotlibHistogram creates a tag "spillN" where N is the spill. This is used by OutputPyImage to be the file name. Could change this to be configurable so that the tag is just "spill" and, only if requested, the "N" auto-number is added. Nx2 documents will still be output but OutputPyImage would just overwrite the existing files each time.

Actually, it might be worth adding another configuration parameter to specify whether the user want the plot for that specific spill or just the summary plot.

#25

Updated by Tunnell, Christopher almost 10 years ago

agreed

#26

Updated by Tunnell, Christopher almost 10 years ago

Maybe this should be another issue, but it may also be worth thinking about how we explain to people extending this. People will want plots of arbitrary things, so figuring out how we suggest they do it will be important at some point. For instance, do we have a base class that hides some of the machinery underneath?

I think that you've moved a lot out into helper functions which is great. I don't think it'll be that hard to extend it. Just wanted to make sure that that use case was written down somewhere.

#27

Updated by Jackson, Mike almost 10 years ago

I'll restructure the class and comment it in a way that tries to pull out the elementary aspects (the histogram object creation, base64 encoding, outputing the JSON) from the aspects that a user may want to alter (e.g. the actual plots done, labels, rescaling etc).

I could also add a section to doc_src/maus_user_guide.tex

#28

Updated by Rogers, Chris almost 10 years ago

I could also add a section to doc_src/maus_user_guide.tex

I think that's a good idea.

#29

Updated by Jackson, Mike almost 10 years ago

At present the reducer can output the histogram for just the current spill and/or the running total. I think it might just be worth simplifying it to output the summary only (with or without autonumbering) - it would make the reducer code clearer, easier to modify and less full of if-thens. If a reducer that outputs the plot just for the current spill is needed it can easily be customised from the summary one.

#30

Updated by Tunnell, Christopher almost 10 years ago

I wish there was a facebook type 'like button', but I like your last comment.

I've never seen anything other than running totals over some interval in particle physics, so that's pretty standard.

#31

Updated by Jackson, Mike almost 10 years ago

Completed the removal of the spill-by-spill histogram. Started on the extraction of commonality to a super-class.

#32

Updated by Jackson, Mike almost 10 years ago

Commit "667": Pulled out TDC/ADC histogram specifics into ReducePyHistogramTDCADCCounts. ReducePyMatplotlibHistogram keeps the commonality.

#33

Updated by Jackson, Mike almost 10 years ago

#34

Updated by Tunnell, Christopher almost 10 years ago

  • Status changed from Closed to In Progress
  • % Done changed from 100 to 50

I'll let you close when you want.

#35

Updated by Jackson, Mike almost 10 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 50 to 100

I was done with it now.

#36

Updated by Tunnell, Christopher almost 10 years ago

Okay, reviewing it (won't bother making a new issue, shouldn't take long).

#37

Updated by Tunnell, Christopher almost 10 years ago

merged

#38

Updated by Tunnell, Christopher almost 10 years ago

Also: let me know when you want some students thrown at this to try making their own plots. If it's useful for you to get input or you think I should hold off (I imagine I'll be answering questions) until something finishes, let me know

#39

Updated by Jackson, Mike almost 10 years ago

I'm OK with students trying to customise it to make their own plots. There are notes at #747. Their experiences would be useful as it would indicate what we'd need to document for future users.

#40

Updated by Rogers, Chris almost 10 years ago

  • Target version changed from Future MAUS release to MAUS-v0.0.9
#41

Updated by Jackson, Mike almost 10 years ago

706 ReducePyMatplotlibHistogram and ReducePyHistogramTDCADCCounts now use ErrorHandler.

Also available in: Atom PDF