Data representation conversion
The aim is to implement an implicit conversion from one data representation to another at runtime. So at the moment we have, say, Module A, Module B and Module C each of which interfaces the data in a different. Say Module A and Module B represent the data using JsonCpp and Module C represents the data using Cpp. Then I would like to be able to determine at run time the data representation used by each Module and call a converter class to handle the interface between the modules. So at the moment the model is that everything interfaces by a string, like:
Module A -> string -> Module B -> string -> Module C -> ...
What I would like is
Module A -> JsonCpp -> Converter<JsonCpp, JsonCpp> -> JsonCpp -> Module B -> JsonCpp -> Converter<JsonCpp, NativeCpp> -> NativeCpp -> Module C -> ...
I would like to keep the API essentially the same however. So for example, we have in bin/simulate_mice.py
# This input generates empty spills, to be filled by the beam maker later on my_input = MAUS.InputPySpillGenerator() # Create an empty array of mappers, then populate it # with the functionality you want to use. my_map = MAUS.MapPyGroup() my_map.append(MAUS.MapPyBeamMaker()) # beam construction my_map.append(MAUS.MapCppSimulation()) # geant4 simulation my_map.append(MAUS.MapCppTrackerMCDigitization()) # SciFi electronics model # can specify datacards here or by using appropriate command line calls datacards = io.StringIO(u"") # Then construct a MAUS output component - filename comes from datacards my_output = MAUS.OutputPyJSON() # The Go() drives all the components you pass in, then check the file # (default simulation.out) for output MAUS.Go(my_input, my_map, MAUS.ReducePyDoNothing(), my_output, datacards)
Don't mind cosmetic changes, but basically I think this is a nice set up and I would like to keep it.
Updated by Rogers, Chris over 9 years ago
So this necessitates a couple of things:
- New Converter abstract type and python bindings (some implementation already for this).
- Implementation of converter type for Converter<PyJson, CppJson>, Converter<NativeCpp, CppJson>, Converter<NativeCpp, PyJson>. I consider PyJson and CppJson to be different representations of the data (they look different in memory).
- Add the converter calls at runtime, determining the interface type of each module dynamically
The natural order is to start with 1 and end at 3, but probably want to think about how to do 3 first because it may impact on the abstraction layer somewhat - even if the actual implementation is done last.The converter call needs to happen in a few places. For single threaded operation:
- src/common_py/Go.py - PipelineSingleThreadDataflowExecutor.execute
- src/common_py/Go.py - DataflowUtilities.buffer_input
- src/map/MapPyGroup/MapPyGroup.py MapPyGroup.process
- src/common_py/Go.py - MultiProcessInputTransformDataflowExecutor.execute
- src/common_py/Go.py - MultiProcessMergeOutputDataflowExecutor.execute
- src/map/MapPyGroup/MapPyGroup.py MapPyGroup.process (same as above - this used to be handled differently, I think now it is handled the same multiprocess vs single process)
For testing, you will need to explicitly have the celery stuff set up to check the distributed processing operation. I will try to get you a login for the test machines and give you sudo access, so we can install the celery stuff there (or maybe ask Matt Robinson to do it, he may end up being in charge of that machine).
Note that this has probably changed in recent releases so you really want latest release I think. I added Mike as a watcher so he can comment.