Project

General

Profile

Feature #1555

Data chunking

Added by Rogers, Chris almost 9 years ago. Updated about 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Data Structure
Target version:
Start date:
01 October 2014
Due date:
% Done:

0%

Estimated time:
Workflow:
New Issue

Description

For large data files it will be necessary to chunk the data in input and output. Otherwise we will get a large data file in the output which will be uncomfortable for e.g. analysis users. It would also be useful to be able to reconstruct a range of spills for the same reason.

So propose that we add to InputCppDAQData some spill start and spill end parameter, add to execute_against_data some wrapper to chunk into e.g. 1000 spill chunks and reconstruct each set of 1000 spills.

#1

Updated by Nebrensky, Henry almost 9 years ago

Presumably, all the RECO chunks would still be gathered into one RECO tarball per run, else we have a significant change in the data management and tracking.

MAUS is probably the easiest place to implement this chunking, but note we do have the long-standing plan to limit the size of the input data by limiting the run length (#1368). Is it worth pushing the latter to solve both, or is a (say) 2GB RECO chunk still too big for the users?

#2

Updated by Rogers, Chris about 8 years ago

  • Status changed from Open to Rejected

This is not an issue - data chunking occurs at operations level, MAUS recon looks like it is smaller than raw data by order of magnitude. Can fix it if this becomes a problem with e.g. the tracker data.

Also available in: Atom PDF