For large data files it will be necessary to chunk the data in input and output. Otherwise we will get a large data file in the output which will be uncomfortable for e.g. analysis users. It would also be useful to be able to reconstruct a range of spills for the same reason.
So propose that we add to InputCppDAQData some spill start and spill end parameter, add to execute_against_data some wrapper to chunk into e.g. 1000 spill chunks and reconstruct each set of 1000 spills.
Updated by Nebrensky, Henry over 8 years ago
Presumably, all the RECO chunks would still be gathered into one RECO tarball per run, else we have a significant change in the data management and tracking.
MAUS is probably the easiest place to implement this chunking, but note we do have the long-standing plan to limit the size of the input data by limiting the run length (#1368). Is it worth pushing the latter to solve both, or is a (say) 2GB RECO chunk still too big for the users?
Updated by Rogers, Chris almost 8 years ago
- Status changed from Open to Rejected
This is not an issue - data chunking occurs at operations level, MAUS recon looks like it is smaller than raw data by order of magnitude. Can fix it if this becomes a problem with e.g. the tracker data.