For large data files it will be necessary to chunk the data in input and output. Otherwise we will get a large data file in the output which will be uncomfortable for e.g. analysis users. It would also be useful to be able to reconstruct a range of spills for the same reason.
So propose that we add to InputCppDAQData some spill start and spill end parameter, add to execute_against_data some wrapper to chunk into e.g. 1000 spill chunks and reconstruct each set of 1000 spills.
Updated by Nebrensky, Henry almost 9 years ago
Presumably, all the RECO chunks would still be gathered into one RECO tarball per run, else we have a significant change in the data management and tracking.
MAUS is probably the easiest place to implement this chunking, but note we do have the long-standing plan to limit the size of the input data by limiting the run length (#1368). Is it worth pushing the latter to solve both, or is a (say) 2GB RECO chunk still too big for the users?