Feature #589
ROOT io
100%
Description
Convert from MAUS json schema to ROOT event
Files
Updated by Rogers, Chris over 12 years ago
Email from Dave Colling:
My UROP student Prash, has a first draft of the code to turn MAUS JSON
output into events analysable in root. He does done this through
generating an even class which the user then needs to load, this keeps
the full structure of the JSON file. So far this is only tested on a
file with some MC tracker data in it and it needs a lot more testing and
testing with data with different structures. Can you help here i.e. do
you have data with different levels of information in them for testing etc.
Updated by Tunnell, Christopher over 12 years ago
- Assignee deleted (
Tunnell, Christopher)
does that mean you want to take ownership of getting it into MAUS?
Updated by Santos, Edward over 12 years ago
No, I meant that I would like to have a copy of this code, so that I can use it. But I can do what you said too, if you wish (or at least try).
Updated by Tunnell, Christopher over 12 years ago
He's at Imperial and says he has a prototype. You could try to use it and if it's good, work with him to incorporate it into MAUS? So just ask him for a copy :)
Updated by Kumar, Prashant over 12 years ago
- File Maus-JSONtoROOT.tar Maus-JSONtoROOT.tar added
Hello,
I am attaching the first draft of my program and documentation etc. I have done some testing on the program but would like have a look at other JSON files that have a different data structure to see if I can incorporate that into the program. This program only works for MC tracker data that is created by the simulation.py in the MAUS v 0.03 release at the moment. Please see the documentation in the folder which is named 'JSON-ROOT Documentation'. I have covered the instructions needed to run the program but if there are issues please let me know. Meanwhile, I will continue to carry out further tests and look for other bugs.
Cheers
Updated by Rogers, Chris over 12 years ago
- Assignee changed from Kumar, Prashant to Santos, Edward
Edward, do you want to have a look through? Else bump the tracker to me and I can have a look...
Updated by Santos, Edward over 12 years ago
I have my hands full, don't think I can look at it before Monday... If it can wait till then, I'll do it, otherwise you can have it. But it will take an extra interaction with me after, because I changed the hit properties for the tracker: channel_id[channel_number] is no longer in the "mc" branch, it is only created during the digitization. Instead, in "mc" we have a channel_id["fiber_number"].
I'm filling the tracker schema right now, I guess this is a discussion for another issue. But it definitely affects this tool Kumar gave us.
Updated by Rogers, Chris over 12 years ago
- Assignee changed from Santos, Edward to Rogers, Chris
If Kumar's tool needs to know the json tree format then it should use the schema. Should definitely not be hard coded - we will never be able to maintain it.
Kumar: the json tree is described in the file src/common_py/SpillSchema.py. You can read this description to figure out what the json schema looks like; or alternately just set up the root tree dynamically depending on what is needed (may require two passes through the data; one to figure out what the tree looks like, another to actually store data).
Updated by Kumar, Prashant over 12 years ago
I have seen what the JSON schema looks like but still don't see how to create the class structure in the Event Class header file automatically. The issue lies with the code not being able to find the 'root' values automatically in the JSON file. If the data structure of all the other detectors (of which there are 7?) is defined in the EventClass header file then the program could look for the root values in the JSON file and fill the tree accordingly, but there is still the issue of all the elements in that particular entity fully defined to be filled in the tree as they exist in the JSON file. Could please point me in direction where I can create a dynamically structured ROOT tree which doesn't need to know the root value(s) in the JSON file and replicates the structure of the file before filling itself with data on the second run?
Updated by Kumar, Prashant over 12 years ago
Dear Chris,
I have tried some other method of obtaining the data structure on the first run of the file but doesn't seem to work.. could you please give me some ideas to start me off to making this work?
Updated by Tunnell, Christopher over 12 years ago
Do you mean generating the dictionary (in the ROOT sense) on the fly? ie. upon program load?
What do you mean when you say 'root' values? You mean like at the root of the tree (in the computer science sense)?
I think that it's going to be really tough (though not impossible) to make the ROOT file look like the JSON output. If you want to call, we can brainstorm and clarify. There's also the ability to convert XML to ROOT files within ROOT, so maybe looking at that could provide inspiration?
Updated by Kumar, Prashant over 12 years ago
Sorry, by 'root' values, I mean the top branch values in the JSON file, e.g. Digits, mc, Spacepoints etc. nothing to do with 'ROOT'. The dictionaries I can already create and make them work in ROOT fr a specific structure, but then again it has to be defined in the 'Linkdef.h' file. I need to be able to read in the entire data structure of the JSON input file and then fill it with data automatically. How do I call?
Updated by Tunnell, Christopher over 12 years ago
Our JSON schema is a JSON file. What language are you wanting to parse the schema in? It looks like C++ so I'd recommend using jsoncpp (google it).That answer your question?
Updated by Kumar, Prashant over 12 years ago
I am alreday using that at the moment.. My question is, that in my program currently, I have to specify the 'root' value in order to read a particular JSON file and then drill down the structure all the way to write those values into a ROOT Tree. I want to be able to not specify that root value at all in the program and have it map out the structure of the JSON file on the first run through the file and then fill the tree on the second run?
Updated by Kumar, Prashant over 12 years ago
i.e the program should be able to decide for itself what the root values in any JSON file should be.
Updated by Tunnell, Christopher over 12 years ago
http://jsoncpp.sourceforge.net/class_json_1_1_value.html#30fa08af88f2d0a038b22ba9f4e88b2a
I think returns a STL vector of strings
Updated by Kumar, Prashant over 12 years ago
Could you call me on 020758 41626 please, I need to discuss this in a bit more detail. Thank you
Updated by Kumar, Prashant over 12 years ago
Or alternatively, if you could please give me your contact number and a time when you will be available, I can call you. Thanks
Updated by Kumar, Prashant over 12 years ago
Sorry, I mised your call, I 'm available now! Thanks
Updated by Kumar, Prashant over 12 years ago
Notes and Observations
The current suggestion is to create a flat root branch that works though a Ntuple Class, which would be a first step towards making a fully automated system, since automatically generating a structure Event Class is not currently working.
I will be looking into using TMap in ROOT as well to see if that can provide a good solution.
I have also posted this issue on ROOT Talk. If you could also please ask other people for ideas to this, and let me know if you come across something that may help.
Another suggestion is to look into ROOT XML since ROOT has some XML readability built into it.
Updated by Rogers, Chris over 12 years ago
Right, so the work breaks down:
import flat json spill to root (Prashant)
Then, potentially someone else (Prashant if interested is welcome):
flatten json spill
unflatten json spill
Updated by Rogers, Chris about 12 years ago
- Status changed from Open to Rejected
I think we have to cross this bridge only if the file size gets unwieldy. Reject for now.
Updated by Rogers, Chris about 12 years ago
- Due date set to 01 November 2011
- Status changed from Rejected to Open
- Assignee deleted (
Rogers, Chris)
Will assign to Alex Richards when he gets a login
Updated by Richards, Alexander about 12 years ago
Hi Chris,
Due to deadlines else where this is currently something that I am tinkering with every now and then. Looking good though hopefully still on track for having if not it finished then at least some form of beta by end of month.
Cheers
Alex
Updated by Rogers, Chris about 12 years ago
Great that you made progress. Just a warning about code quality stuff... What's your testing and code commenting like? Did you try running the unit tests yet? I guess this is C++ - did you try running the style test cpplint stuff?
Should I slip the due date?
Updated by Richards, Alexander about 12 years ago
Well I havent touched any of the Maus code yet, this is purely a JSON->ROOT streamer at the moment. I will of course have a look at the c++ style tests and so forth before commiting anything. Keep the data as it is for now and may have to revise it by next monday. That way it keeps the pressure up. If the date slips I'm more likely to start working on the NUMEROUS other tasks that require my attention from the Ganga side ;-)
--Alex
Updated by Richards, Alexander about 12 years ago
One question that perhaps you can answer for me... I am looking at am example JSON file which contains at the root(poor choice of words given the task, but not my choice ;-) ) node two members, 'digits' and 'mc'. These members appear to be themselves arrays. Looking VERY briefly at the digits one for example I see a load of objects that look like the below. As someone completely not it the know do I interpret the below as coming from 2 events? i.e. one object per event? Also some insight into the structure of the mc array (or any other arrays that may be found) would be nice. I'm working at the moment on the assumption that the reason that the root members contain arrays is that we have one element per event. please correct me here if I'm wrong.
Cheers
Alex
{
- "adc_counts": 94,
*
-
"channel_id": {
o "fiber_number": 107,
o "plane_number": 0,
o "station_number": 1,
o "tracker_number": 0,
o "type": "Tracker"
}, - "tdc_counts": 0
}, {
- "adc_counts": 25,
*
-
"channel_id": {
o "fiber_number": 107,
o "plane_number": 0,
o "station_number": 1,
o "tracker_number": 0,
o "type": "Tracker"
}, - "tdc_counts": 0
}
Updated by Richards, Alexander about 12 years ago
Hi Chris,
I've consulted the schema src/common_py/SpillSchema.py which nicely describs what I can see for the mc object however digits is not defined here. Is the schema incomplete or is the digits object defined elsewhere. It would be useful to know what type of objects one would expect to find in the JSON file.
--Alex
Updated by Rogers, Chris about 12 years ago
Sorry for the delay I didn't see your response. The SpillSchema.py is all we have. I asked devs to write up the spill schema, I haven't chased hard enough. The schema is machine readable, you could make your code read in and parse it to set up the ROOT tree structure? Maybe for now use the schema hard coded but prepare then for a next stage to read the schema dynamically?
Updated by Rogers, Chris about 12 years ago
Sorry I'm being crap - email is slow and wrong time zone doens't help. So the rule is 1 line per event (which corresponds to a MICE spill). So each call to readline() or getline() [python/c++] should be an event. Note that contains multiple particles going through the detector system on different particle events.
Updated by Rogers, Chris about 12 years ago
- File simulation.out simulation.out added
Email from Alex:
As I understand it then the file (attached) has only the one event in it. Do you have an example of a file with more that one event in it?
I attached a simulation file with 10 lines, i.e. 10 events (spills). Each event has about 20-30 particles in it.
Updated by Rogers, Chris about 12 years ago
Following phone conversation with Alex:
Pointed out (python) API in e.g. src/output/OutputPyJson/
Discussed a few implementation issues
Updated by Richards, Alexander about 12 years ago
After discussions with chris, it seems that the only way to preserve the structure of the data in the root tree is to define the schema in c++ with the python version parsed from it. I will also investigate the possibility of increasing the number levels of the data structure within the ROOT tree.
-Alex
Updated by Richards, Alexander about 12 years ago
- Due date changed from 01 November 2011 to 01 December 2011
Updated by Rogers, Chris almost 12 years ago
- File ROOT_IO.odt ROOT_IO.odt added
- Workflow set to New Issue
Added work specification document
Updated by Rogers, Chris almost 12 years ago
I spent a bit more time browsing your code. Couple of comments:
1. MsgStream is essentially a duplicate functionality of the code in
src/legacy/Interface/Squeak.hh
Please use this instead (just takes a bit of time to get to know the MAUS code).
2. You should use .hh and .cc for the file endings, need to comment everything up. I think you're just trying to set out the framework, so that's fine.
3. In DigitsProcessor, you have a lot of lines which are essentially cut and paste, like
Json::Value sub_node = node.get("channel_number", Json::Value::null);
if subNode.isNull() { throw an error; }
{set channel number value}
This seems to be very messy. Would much prefer that we define a mapping of say, string to Json type and string to Set functions (urk, function pointers, maybe a better way?) and then write one generic function in the base class of the Processor rather than cutting and pasting everywhere. I think that is possible, what do you reckon?
4. Prefer to throw exceptions rather than return an error code. Use the Squeal exception class in src/legacy/Interface (this allows us to change the way errors are handled at runtime).
Updated by Rogers, Chris almost 12 years ago
Oh and another thing - better to use Set/Get functions instead of raw structs - it allows us to change the internal data representation (Oh sugar, I wanted to store my vector in cylindrical coordinates, don't want to change everything now!)
Updated by Richards, Alexander almost 12 years ago
Of course I agree, Set/Get is far better than public members (which break encapsulation). As you said I'm setting out the framework first as the data structure will surely be setup by someone who know more what they want. It's a proof of principle I guess ;-). I will look into the squeak and squeal classes and add their functionality. More to come...
3) mapping of string to templated function pointer sounds like a sensible approach and certainly doable, it is afterall what i already have used for the framework and give the nice ability of using the STL algorithms to manipulate them.
Updated by Rogers, Chris almost 12 years ago
mapping of string to templated function pointer sounds like a sensible approach and certainly doable
I guess the problem comes because branches can be either objects (string:value mapping) or arrays. Fear if for example some data structure can have different types in the same array. I hope there's some reasonable way to do it...
Updated by Rogers, Chris almost 12 years ago
So back after holidays and I'm getting back up to speed with things. I was concerned however when I discovered that within a given event in the sample datafile you gave > me the names of the data members change. I dont see how this has arisen given that as I say this was in the same event and so presumably was dumped all together at the > same time and with the same code. I'm thinking in particular of the pid variable which in some places is pid and others "particle_id".
Right, its a bug. Looks like the SpecialVirtual hits are writing as "particle_id" and the TOF hits are writing as "pid". We should make proper inheritance tree to stop this sort of stuff happening (i.e. SpecialVirtual and TOF hits both inherit from MCHit data type).
Updated by Rogers, Chris over 11 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Now writing ROOT data as default data output...
Updated by Rogers, Chris over 11 years ago
- Target version changed from Future MAUS release to MAUS-v0.2.3