Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the collection information in a struct instead of a tuple #711

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tmadlener
Copy link
Collaborator

@tmadlener tmadlener commented Nov 20, 2024

BEGINRELEASENOTES

  • Store the collection information in a proper struct instead of using a tuple to facilitate access for non podio based backends (e.g. Julia)
  • Harmonize the format of RNTuple and TTree based ROOT backends
    • Both now store the collection information into the podio_metadata TTree / Model as a vector<podio::root_utils::CollectionWriteInfo>, which contains all the necessary information for reading a collection.
    • The collection ID and the name are part of this struct, so the CollectionIDTable is no longer written separately.
  • This is a breaking change in the format if you rely on reading the metadata about the stored collections.
  • **The changes are completely transparent for the TTree based backend!.

ENDRELEASENOTES

Fixes #705

@peremato This addresses the TTree part of the problem. I just realized that RNTuple stores this in several vectors that run in parallel. I suppose it would be easiest for you (and also for us) to have similar layouts for TTree and RNTuple?

@tmadlener
Copy link
Collaborator Author

The format for RNTuple has changed quite a bit with these changes, but I have no implemented anything yet to handle the older format transparently. This is mainly because I think it's not necessary yet, since I don't think there is any actual data stored in RNTuple yet.

@tmadlener tmadlener changed the title [WIP] Store the collection information in a struct instead of a tuple Store the collection information in a struct instead of a tuple Mar 6, 2025
@tmadlener
Copy link
Collaborator Author

Not adding backwards compatibility for the RNTuple backend breaks the EDM4hep backwards compatibility tests, because those actually have been written with podio v1.2 and still use the old format.

@tmadlener
Copy link
Collaborator Author

tmadlener commented Mar 7, 2025

After these changes the storage details look like the following. The main differences is that the RNTuple version has one additional field availableCategories which is not present in the TTree version because there this is reconstructed from the available branch names in the podio_metadata tree.

TTree
root [9] podio_metadata->Print()
******************************************************************************
*Tree    :podio_metadata: metadata tree for podio I/O functionality              *
*Entries :        1 : Total =           34897 bytes  File  Size =      11109 *
*        :          : Tree compression factor =   2.33                       *
******************************************************************************
*Br    0 :other_events___CollectionTypeInfo :                                *
*         | Int_t other_events___CollectionTypeInfo_                         *
*Entries :        1 : Total  Size=       5158 bytes  File Size  =        126 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :other_events___CollectionTypeInfo.collectionID :                   *
*         | UInt_t collectionID[other_events___CollectionTypeInfo_]          *
*Entries :        1 : Total  Size=       1044 bytes  File Size  =        255 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    2 :other_events___CollectionTypeInfo.dataType :                       *
*         | string dataType[other_events___CollectionTypeInfo_]              *
*Entries :        1 : Total  Size=       1925 bytes  File Size  =        517 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   2.23     *
*............................................................................*
*Br    3 :other_events___CollectionTypeInfo.isSubset :                       *
*         | Bool_t isSubset[other_events___CollectionTypeInfo_]              *
*Entries :        1 : Total  Size=        934 bytes  File Size  =        156 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.03     *
*............................................................................*
*Br    4 :other_events___CollectionTypeInfo.schemaVersion :                  *
*         | UInt_t schemaVersion[other_events___CollectionTypeInfo_]         *
*Entries :        1 : Total  Size=       1049 bytes  File Size  =        170 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.51     *
*............................................................................*
*Br    5 :other_events___CollectionTypeInfo.name :                           *
*         | string name[other_events___CollectionTypeInfo_]                  *
*Entries :        1 : Total  Size=       1347 bytes  File Size  =        443 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.33     *
*............................................................................*
*Br    6 :events___CollectionTypeInfo : Int_t events___CollectionTypeInfo_   *
*Entries :        1 : Total  Size=       4996 bytes  File Size  =        120 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    7 :events___CollectionTypeInfo.collectionID :                         *
*         | UInt_t collectionID[events___CollectionTypeInfo_]                *
*Entries :        1 : Total  Size=        970 bytes  File Size  =        217 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    8 :events___CollectionTypeInfo.dataType :                             *
*         | string dataType[events___CollectionTypeInfo_]                    *
*Entries :        1 : Total  Size=       1544 bytes  File Size  =        403 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   2.00     *
*............................................................................*
*Br    9 :events___CollectionTypeInfo.isSubset :                             *
*         | Bool_t isSubset[events___CollectionTypeInfo_]                    *
*Entries :        1 : Total  Size=        884 bytes  File Size  =        147 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   10 :events___CollectionTypeInfo.schemaVersion :                        *
*         | UInt_t schemaVersion[events___CollectionTypeInfo_]               *
*Entries :        1 : Total  Size=        975 bytes  File Size  =        159 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.37     *
*............................................................................*
*Br   11 :events___CollectionTypeInfo.name :                                 *
*         | string name[events___CollectionTypeInfo_]                        *
*Entries :        1 : Total  Size=       1121 bytes  File Size  =        350 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.14     *
*............................................................................*
*Branch  :PodioBuildVersion                                                  *
*Entries :        1 : BranchElement (see below)                              *
*............................................................................*
*Br   12 :major     : UShort_t                                               *
*Entries :        1 : Total  Size=        577 bytes  File Size  =         84 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   13 :minor     : UShort_t                                               *
*Entries :        1 : Total  Size=        577 bytes  File Size  =         84 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   14 :patch     : UShort_t                                               *
*Entries :        1 : Total  Size=        577 bytes  File Size  =         84 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   15 :EDMDefinitions : Int_t EDMDefinitions_                             *
*Entries :        1 : Total  Size=       2421 bytes  File Size  =        107 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   16 :EDMDefinitions._1 : string _1[EDMDefinitions_]                     *
*Entries :        1 : Total  Size=      14392 bytes  File Size  =       4299 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   3.20     *
*............................................................................*
*Br   17 :EDMDefinitions._0 : string _0[EDMDefinitions_]                     *
*Entries :        1 : Total  Size=        799 bytes  File Size  =        164 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Branch  :datamodel___Version                                                *
*Entries :        1 : BranchElement (see below)                              *
*............................................................................*
*Br   18 :major     : UShort_t                                               *
*Entries :        1 : Total  Size=        577 bytes  File Size  =         84 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   19 :minor     : UShort_t                                               *
*Entries :        1 : Total  Size=        577 bytes  File Size  =         84 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   20 :patch     : UShort_t                                               *
*Entries :        1 : Total  Size=        577 bytes  File Size  =         84 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
RNTuple
root [7] r->PrintInfo(ROOT::Experimental::ENTupleInfo::kStorageDetails)
============================================================
NTUPLE:      podio_metadata
Compression: 505
------------------------------------------------------------
  # Entries:        1
  # Fields:         25
  # Columns:        28
  # Alias Columns:  0
  # Pages:          28
  # Clusters:       1
  Size on storage:  4478 B
  Compression rate: 3.89
  Header size:      590 B
  Footer size:      359 B
  Meta-data / data: 0.212
------------------------------------------------------------
CLUSTER DETAILS
------------------------------------------------------------
  #     0   Entry range:     [0..0]  --  1
            # Pages:         28
            Size on storage: 4478 B
            Compression:     3.89
------------------------------------------------------------
COLUMN DETAILS
------------------------------------------------------------
  EDMDefinitions [#0]  --  SplitIndex64               {id:4}
    # Elements:          1
    # Pages:             1
    Avg elements / page: 1
    Avg page size:       8 B
    Size on storage:     8 B
    Compression:         1.00
............................................................
  EDMDefinitions._0._0 [#0]  --  SplitIndex64         {id:5}
    # Elements:          3
    # Pages:             1
    Avg elements / page: 3
    Avg page size:       24 B
    Size on storage:     24 B
    Compression:         1.00
............................................................
  EDMDefinitions._0._0 [#1]  --  Char                 {id:6}
    # Elements:          49
    # Pages:             1
    Avg elements / page: 49
    Avg page size:       49 B
    Size on storage:     49 B
    Compression:         1.00
............................................................
  EDMDefinitions._0._1 [#0]  --  SplitIndex64         {id:7}
    # Elements:          3
    # Pages:             1
    Avg elements / page: 3
    Avg page size:       24 B
    Size on storage:     24 B
    Compression:         1.00
............................................................
  EDMDefinitions._0._1 [#1]  --  Char                 {id:8}
    # Elements:          13630
    # Pages:             1
    Avg elements / page: 13630
    Avg page size:       2854 B
    Size on storage:     2854 B
    Compression:         4.78
............................................................
  PodioBuildVersion [#0]  --  SplitIndex64            {id:0}
    # Elements:          1
    # Pages:             1
    Avg elements / page: 1
    Avg page size:       8 B
    Size on storage:     8 B
    Compression:         1.00
............................................................
  PodioBuildVersion._0 [#0]  --  SplitUInt16          {id:1}
    # Elements:          3
    # Pages:             1
    Avg elements / page: 3
    Avg page size:       6 B
    Size on storage:     6 B
    Compression:         1.00
............................................................
  availableCategories [#0]  --  SplitIndex64          {id:9}
    # Elements:          1
    # Pages:             1
    Avg elements / page: 1
    Avg page size:       8 B
    Size on storage:     8 B
    Compression:         1.00
............................................................
  availableCategories._0 [#0]  --  SplitIndex64      {id:10}
    # Elements:          2
    # Pages:             1
    Avg elements / page: 2
    Avg page size:       16 B
    Size on storage:     16 B
    Compression:         1.00
............................................................
  availableCategories._0 [#1]  --  Char              {id:11}
    # Elements:          18
    # Pages:             1
    Avg elements / page: 18
    Avg page size:       18 B
    Size on storage:     18 B
    Compression:         1.00
............................................................
  datamodel___Version [#0]  --  SplitIndex64          {id:2}
    # Elements:          1
    # Pages:             1
    Avg elements / page: 1
    Avg page size:       8 B
    Size on storage:     8 B
    Compression:         1.00
............................................................
  datamodel___Version._0 [#0]  --  SplitUInt16        {id:3}
    # Elements:          3
    # Pages:             1
    Avg elements / page: 3
    Avg page size:       6 B
    Size on storage:     6 B
    Compression:         1.00
............................................................
  events___CollectionTypeInfo [#0]  --  SplitIndex64 {id:20}
    # Elements:          1
    # Pages:             1
    Avg elements / page: 1
    Avg page size:       8 B
    Size on storage:     8 B
    Compression:         1.00
............................................................
  events___CollectionTypeInfo._0.collectionID [#0]  --  SplitUInt32{id:21}
    # Elements:          22
    # Pages:             1
    Avg elements / page: 22
    Avg page size:       88 B
    Size on storage:     88 B
    Compression:         1.00
............................................................
  events___CollectionTypeInfo._0.dataType [#0]  --  SplitIndex64{id:22}
    # Elements:          22
    # Pages:             1
    Avg elements / page: 22
    Avg page size:       48 B
    Size on storage:     48 B
    Compression:         3.67
............................................................
  events___CollectionTypeInfo._0.dataType [#1]  --  Char{id:23}
    # Elements:          654
    # Pages:             1
    Avg elements / page: 654
    Avg page size:       231 B
    Size on storage:     231 B
    Compression:         2.83
............................................................
  events___CollectionTypeInfo._0.isSubset [#0]  --  Bit{id:24}
    # Elements:          22
    # Pages:             1
    Avg elements / page: 22
    Avg page size:       3 B
    Size on storage:     3 B
    Compression:         7.33
............................................................
  events___CollectionTypeInfo._0.name [#0]  --  SplitIndex64{id:26}
    # Elements:          22
    # Pages:             1
    Avg elements / page: 22
    Avg page size:       48 B
    Size on storage:     48 B
    Compression:         3.67
............................................................
  events___CollectionTypeInfo._0.name [#1]  --  Char {id:27}
    # Elements:          251
    # Pages:             1
    Avg elements / page: 251
    Avg page size:       177 B
    Size on storage:     177 B
    Compression:         1.42
............................................................
  events___CollectionTypeInfo._0.schemaVersion [#0]  --  SplitUInt32{id:25}
    # Elements:          22
    # Pages:             1
    Avg elements / page: 22
    Avg page size:       34 B
    Size on storage:     34 B
    Compression:         2.59
............................................................
  other_events___CollectionTypeInfo [#0]  --  SplitIndex64{id:12}
    # Elements:          1
    # Pages:             1
    Avg elements / page: 1
    Avg page size:       8 B
    Size on storage:     8 B
    Compression:         1.00
............................................................
  other_events___CollectionTypeInfo._0.collectionID [#0]  --  SplitUInt32{id:13}
    # Elements:          30
    # Pages:             1
    Avg elements / page: 30
    Avg page size:       120 B
    Size on storage:     120 B
    Compression:         1.00
............................................................
  other_events___CollectionTypeInfo._0.dataType [#0]  --  SplitIndex64{id:14}
    # Elements:          30
    # Pages:             1
    Avg elements / page: 30
    Avg page size:       56 B
    Size on storage:     56 B
    Compression:         4.29
............................................................
  other_events___CollectionTypeInfo._0.dataType [#1]  --  Char{id:15}
    # Elements:          985
    # Pages:             1
    Avg elements / page: 985
    Avg page size:       294 B
    Size on storage:     294 B
    Compression:         3.35
............................................................
  other_events___CollectionTypeInfo._0.isSubset [#0]  --  Bit{id:16}
    # Elements:          30
    # Pages:             1
    Avg elements / page: 30
    Avg page size:       4 B
    Size on storage:     4 B
    Compression:         7.50
............................................................
  other_events___CollectionTypeInfo._0.name [#0]  --  SplitIndex64{id:18}
    # Elements:          30
    # Pages:             1
    Avg elements / page: 30
    Avg page size:       56 B
    Size on storage:     56 B
    Compression:         4.29
............................................................
  other_events___CollectionTypeInfo._0.name [#1]  --  Char{id:19}
    # Elements:          427
    # Pages:             1
    Avg elements / page: 427
    Avg page size:       234 B
    Size on storage:     234 B
    Compression:         1.82
............................................................
  other_events___CollectionTypeInfo._0.schemaVersion [#0]  --  SplitUInt32{id:17}
    # Elements:          30
    # Pages:             1
    Avg elements / page: 30
    Avg page size:       40 B
    Size on storage:     40 B
    Compression:         3.00
............................................................

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change from tuple to struct for storing metadata of stored collections
1 participant