Project

General

Profile

Actions

How to write histogram reducers

Reducers can be written to create histograms which are then updated when successive spills are received by the reducer. MAUS ships with reducers which create histograms using matplotlib and PyROOT. These reducers are:

  • ReducePyHistogramTDCADCCounts
    • This takes in a spill, extracts the TDC and ADC counts, updates a matplotlib histogram and outputs an image of this embedded in a JSON document.
    • Source file: src/reduce/ReducePyHistogramTDCADCCounts/ReducePyHistogramTDCADCCounts.py
  • ReducePyTOFPlot
    • This takes in a spill, extracts slab hits and space points information, updates a collection of PyROOT histograms and outputs images of these embedded in JSON documents. The histograms are output for every N spills successfully handled (N is configurable by the user).
    • Source file: src/reduce/ReducePyTOFPlot/ReducePyTOFPlot.py

Histogram reducers do not (and should not) save their histogram images to files. Instead, they create JSON documents containing the image data:

{"image": {"keywords": [...list of image keywords...],
           "description":"A textual description of the image",
           "tag": "TAG",
           "image_type": "EXTENSION", 
           "data": "...base 64 encoded image..."}}

where:

  • TAG is a simple name or tag that can be used to create an image file name.
  • EXTENSION is the desired file extension, usually just the image type e.g. eps or png.

For example,

{"image": {"keywords":["TDC", "ADC", "counts"],
           "description":"Total TDC and ADC counts to spill 2",
           "tag": "tdcadc",
           "image_type": "eps", 
           "data": "...base 64 encoded image..."}}

MAUS has an output worker, OutputPyImage which can save image files when given these documents, which is described below.

MAUS also provides super-classes for histogram reducers which provide functions to build these documents for you.

Before you start to write your own histogram reducer, or customise the examples listed above, it's useful to understand these histogram reducer super-classes which handle certain actions and provide useful functions you can use (and which are sub-classed by the above).

Histogram reducer super-classes

There are two histogram reducer super-classes, one for matplotlib and one for PyROOT:

  • ReducePyMatplotlibHistogram
    • Source file: src/reduce/ReducePyMatplotlibHistogram/ReducePyMatplotlibHistogram.py
  • ReducePyROOTHistogram
    • Source file: src/reduce/ReducePyROOTHistogram/ReducePyROOTHistogram.py

You do not need to look at the source code for these, but an understanding of what they offer and what they do will help you write your own reducers.

These each take care of handling the following operations for you.

Initialisation - the __init__ function

  • Spill count - count of spills read to date, initially 0.
  • Image type - initially eps (Enhanced PostScript).
  • Auto-numbering - should image names be auto-numbered using the spill count, initially False.
  • ROOT batch mode (ReducePyROOTHistogram only) - should PyROOT be run in interactive mode, initially 0 (False).
  • Supported image types (ReducePyROOTHistogram only) - a list of image types supported by PyROOT (currently ["ps", "eps", "gif", "jpg", "jpeg", "pdf", "svg", "png"]).

Birth and configuration - the birth function

  • Reading and validation of configuration parameters.
    • histogram_auto_number which determines if image names are auto-numbered using the spill count.
    • histogram_image_type which specifies the data format of histogram images output.
      • If omitted then the default of eps is used.
      • For ReducePyROOTHistogram, if a value is provided by the user then it will be validated using the supported image types (currently ["ps", "eps", "gif", "jpg", "jpeg", "pdf", "svg", "png"]).
      • For ReducePyMatplotlibHistogram, if a value is provided by the user then it will be validated using a matplotlib FigureCanvas to see if that file type is supported by matplotlib (currently [svg, ps, emf, rgba, raw, svgz, pdf, eps, png]).
    • root_batch_mode (ReducePyROOTHistogram only) which determines if PyROOT be run in interactive mode.
  • Sub-class-specific configuration, via invocation of _configure_at_birth, see below.

Processing of spills - the process function

  • Converting a spill from a string to a JSON document.
  • Sub-class-specific processing, via invocation of _update_histograms, see below.
  • Converting a list of output JSON documents to a string.
  • Handling of errors occurring in _update_histograms.

Death and clean-up - the death function

  • Cleaning up.
  • Sub-class-specific clean-up, via invocation of _cleanup_at_death, see below.
  • For ReducePyROOTHistogram, cleaning up of any zombie PyROOT objects.

What the spill count counts

Both histogram reducer super-classes keep count of the number of spills received in an attribute self.spill_count. This holds the number of spills received by the reducer. This is not the same as the number of spills used to update the histograms since some spills received may have errors or be missing information required to update the histogram.

Other useful functions

Both ReducePyROOTHistogram and ReducePyMatplotlibHistogram provide other functions which you may find useful.

ReducePyROOTHistogram provides:

  • get_image_doc(self, keywords, description, tag, canvas) which can be used to create JSON documents with image data in a form suitable for OutputPyImage. It:
    • Prints the contents of the given PyROOT canvas in the form of data in the current image type and saves this into a temporary file.
    • Reloads this temporary file.
    • Creates a JSON image document with the image data base 64 encoded, the given keywords (a list of strings) and description string (a simple textual description of the image content) and the given image tag.
    • If auto numbering of images has been enabled then the current spill number will be added to the tag zero-padded to make a 6 digit number (e.g. 000123).
    • The JSON document is then returned.

ReducePyMatplotlibHistogram provides:

  • _get_image_doc(self, keywords, description, tag, canvas) which can be used to create JSON documents with image data in a form suitable for OutputPyImage. It:
    • Prints the contents of the matplotlib FigureCanvas in the form of data in the current image type and saves this into a string buffer.
    • Creates a JSON image document with the image data base 64 encoded, the given keywords (a list of strings) and description string (a simple textual description of the image content) and the given image tag.
    • If auto numbering of images has been enabled then the current spill number will be added to the tag zero-padded to make a 6 digit number (e.g. 000123).
    • The JSON document is then returned.
  • _create_histogram(self) which creates and returns a matplotlib FigureCanvas object, with figure size 6x6, axes and a grid.
  • _rescale_axes(self, histogram, xmin, xmax, ymin, ymax, xfudge = 0.5, yfudge = 0.5) which rescales the X and Y axes of a histogram in a FigureCanvas to ensure that the given X and Y ranges are visible.
    • The fudge factors can be provided to avoid matplotlib warning about Attempting to set identical bottom==top which arises if the axes are set to be exactly the maximum of the data.

Sub-classing histogram reducer super-classes - what you need to implement

Your reducer sub-class needs to provide three functions.

Initialisation - __init__(self)

  • Your class constructor.
  • This should first invoke the super-class constructor to do super-class-specific initialisation e.g.
    • ReducePyROOTHistogram.__init__(self)
    • or
    • ReducePyMatplotlibHistogram.__init__(self)
  • Then it should perform initialisation of attributes specific to your class. For example:
    • ReducePyROOTHistogram initialises the refresh rate (the number of spills to process before outputting a histogram).
    • ReducePyMatplotlibHistogram initialises the TDC and ADC counts.

Birth and configuration - _configure_at_birth(self, config_doc)

  • Called by birth, this function takes a JSON configuration document.
  • It should extract any additional sub-class-specific configuration from this. For example:
    • ReducePyROOTHistogram checks for a refresh_rate configuration parameter.
    • ReducePyMatplotlibHistogram initialises the TDC and ADC counts.
  • It should create the histogram plot objects. For example:
    • ReducePyROOTHistogram creates ROOT.TH1F and ROOT.TCanvas objects.
    • ReducePyMatplotlibHistogram creates a matplotlib FigureCanvas.
  • If configuration and creation is successful it should return True.
  • Any errors should be raised as exceptions e.g. if there is a missing mandatory configuration parameter then ValueError could be thrown.

Processing of spills - _update_histograms(self, spill)

  • Called by process, this function should extract information from the spill and update the histograms.
  • It should check that the spill has the information needed.
    • If not it can either ignore the spill or raise an error. The super-class will manage the insertion of the error into the spill. For example:
    • ReducePyROOTHistogram does:
              if not self.get_slab_hits(spill): 
                  raise ValueError("slab_hits not in spill")
      
    • ReducePyMatplotlibHistogram does:
          if "digits" not in spill:
                  raise KeyError("digits field is not in spill")
      
  • The function can then update the histograms.
  • The function must return a list of one or more spills. This can be one of:
    • [{}] - a list with an empty spill. You may want to return this when handling end_of_run spills, see below.
    • [spill] - a list with the input spill. You may want to do this if you only output histograms after every N spills have been read, so when the spill count isn't divisible by N you can just return the input spill. This is done by ReducePyROOTHistogram:
              # Refresh canvases at requested frequency.
              if self.spill_count % self.refresh_rate == 0:
                  self.update_histos()
                  return self.get_histogram_images()
              else:
                  return [spill]
      
    • [image,...] - a list of one or more JSON image documents. How you build this is up to you but you can use the super-class utility functions. For example:
      • ReducePyTOFPlot calls the following, where self.canvas_nsp contains a PyROOT ROOT.TCanvas:
                image_list = []
                ...
                doc = ReducePyROOTHistogram.get_image_doc( \
                    self, keywords, description, tag, self.canvas_nsp)
                image_list.append(doc)
        
      • ReducePyHistogramTDCADCCounts does the following, where self._tdcadchistogram contains a matplotlib FigureCanvas:
                image_doc = ReducePyMatplotlibHistogram._get_image_doc( \
                    self, self._keywords, self._description, self._tag, \
                    self._tdcadchistogram)
                return [image_doc]
        
  • The function must also handle end_of_run spills
    • At the end of a run, reducers receive an end_of_run spill. This is a spill with a daq_event_type field with value end_of_run. This is so that, in cases where a reducer only takes action every N spills, it can take any final actions (e.g. output the final histograms).
    • You need to detect and handle this spill. If your reducer outputs the histogram for every spill then it can just return an empty spill e.g. ReducePyHistogramTDCADCCounts does this:
          def _update_histograms(self, spill):
              ...
              if (spill.has_key("daq_event_type") and
                  spill["daq_event_type"] == "end_of_run"):
                  return [{}]
              ...
      
    • If however it only outputs histograms for every N spills then this is where the final histograms should be output e.g. ReducePyTOFPlot does this:
          def _update_histograms(self, spill):
              ...
              if (spill.has_key("daq_event_type") and
                  spill["daq_event_type"] == "end_of_run"):
                  if (not self.run_ended):
                      self.update_histos()
                      self.run_ended = True
                      return self.get_histogram_images()
                  else:
                      return [{}]
              ...
      

Death and clean-up - _cleanup_at_death(self)

If your reducer needs to do specific clean-up functions then it can also implement this function.

  • Called by death, this does any sub-class-specific cleanup.
  • This should first invoke the super-class function to do super-class-specific clean-up e.g.
    • ReducePyROOTHistogram.__cleanup_at_death__(self)
    • or
    • ReducePyMatplotlibHistogram.__cleanup_at_death__(self)
  • If clean-up is successful it should return True.
  • If there is no sub-class specific clean-up required then you don't need to provide this function.

Remember, do NOT save image files

Reducers should not save files, that is the responsibility of output workers. Histogram reducers should output a JSON document with the base 64 encoded image data in the form described above.

These can be saved using the OutputPyImage worker.

How to save the images to files - OutputPyImage

As histogram reducers output JSON documents with histogram image data and are not meant to save histograms themselves, how then do you save the images?

This is the role of the OutputPyImage output worker. OutputPyImage takes in JSON documents of form:

{"image": {"keywords": [...list of image keywords...],
           "description":"...a description of the image...",
           "tag": "TAG",
           "image_type": "EXTENSION", 
           "data": "...base 64 encoded image..."}}

It decodes the base 64 encoded image data and saves it in a file DIRECTORY/PREFIXTAG.EXTENSION where:

  • DIRECTORY is a directory specified in an image_directory configuration parameter ("data card") provided when OutputPyImage is first created.
    • If this directory does not exist then it will be created.
    • If no such configuration parameter is given then the current directory is used.
  • PREFIX is a file prefix specified in an image_file_prefix_ configuration parameter provided when @OutputPyImage is first created.
    • If no such configuration parameter is given then the default of image is used.
  • TAG is the value of the tag field in the JSON document.
  • EXTENSION is the value of the image_type field in the JSON document.

In addition, a file DIRECTORY/PREFIXTAG.json will also be saved with the image meta-data. This will be the image JSON document but without the data field i.e.:

{"image": {"keywords": [...list of image keywords...],
           "description":"...a description of the image...",
           "tag": "TAG",
           "image_type": "EXTENSION"}}

So, for example, if OutputPyImage is configured with parameters:

image_directory="/home/user/plots" 
prefix="histogram_" 

and receives a JSON document of form:

{"image": {"keywords":["TDC", "ADC", "counts"],
           "description":"Total TDC and ADC counts to spill 2",
           "tag": "tdcadc",
           "image_type": "eps", 
           "data": "...base 64 encoded image..."}}

then it will save the image data into a file /home/user/plots/histogram_tdcadc.eps and the JSON file, /home/user/plots/histogram_tdcadc.json will contain:

{"image": {"keywords":["TDC", "ADC", "counts"],
           "description":"Total TDC and ADC counts to spill 2",
           "tag": "tdcadc",
           "image_type": "eps"}}

OutputPyImage does not validate that the image data is consistent with the image type - it just decodes the base 64 encoded image and saves it.

Why is the image data base 64 encoded in the JSON document?

JSON documents are converted to and from strings as they pass through MAUS workers. This can cause exceptions to be thrown if passing raw image data for certain formats (e.g. PNG). Base 64 encoding the image data prevents such exceptions arising, and allows any sort of image data to be passed around MAUS in a JSON document.

Updated by Jackson, Mike about 11 years ago ยท 32 revisions