|
| 1 | +--- |
| 2 | +title: "N5 API Basics" |
| 3 | +description: "Basics of the N5 API for Java developers. This tutorial shows how to read and write n-dimensional image data and structured metadata into HDF5, N5, and Zarr containers using the N5 API." |
| 4 | +author: |
| 5 | + - name: John Bogovic |
| 6 | + - name: Caleb Hulbert |
| 7 | +date: "2/27/2024" |
| 8 | +date-modified: "4/23/2024" |
| 9 | +notebook-links: global |
| 10 | +image: n5-basic-tutorial-thumbnail.png |
| 11 | +categories: |
| 12 | + - hdf5 |
| 13 | + - n5 |
| 14 | + - zarr |
| 15 | + - imglib2 |
| 16 | + - tutorial |
| 17 | +format: |
| 18 | + html: |
| 19 | + toc: true |
| 20 | +--- |
| 21 | + |
| 22 | +This tutorial for Java developers covers the most basic functionality of the [N5 API](https://github.com/saalfeldlab/n5) |
| 23 | +for storing large, chunked n-dimensional image data and structured metadata. The N5 API and documentation refer to n-dimensional images as |
| 24 | +"datasets", [terminology inherited from HDF5](https://docs.hdfgroup.org/hdf5/develop/_g_l_s.html#title3). We will use this terminology in this tutorial. |
| 25 | +If you are used to work with Python and Numpy, an n-dimensional image or dataset is what you know as an [`ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html). |
| 26 | +We will learn about: |
| 27 | + |
| 28 | +* creating readers and writers |
| 29 | +* modifying and inspecting the hierarchy ("folder structure") |
| 30 | +* saving and loading datasets |
| 31 | +* saving and loading metadata |
| 32 | + |
| 33 | +## Readers and writers |
| 34 | + |
| 35 | +[`N5Reader`](https://github.com/saalfeldlab/n5/blob/n5-3.2.0/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java)s and |
| 36 | +[`N5Writer`](https://github.com/saalfeldlab/n5/blob/n5-3.2.0/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java)s form |
| 37 | +the basis of the N5 API and allow you to read and write data, respectively. We generally recommend using an |
| 38 | +[`N5Factory`](https://github.com/saalfeldlab/n5-universe/blob/n5-universe-1.4.2/src/main/java/org/janelia/saalfeldlab/n5/universe/N5Factory.java) to create readers and writers: |
| 39 | + |
| 40 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#make-reader-writer echo=true >}} |
| 41 | + |
| 42 | +The N5 API gives you access to a number of different storage formats: HDF5, Zarr, and N5's own |
| 43 | +format. `N5Factory`'s convenience methods try to infer the storage format from the extension |
| 44 | +of the path you provide: |
| 45 | + |
| 46 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#factory-types echo=true >}} |
| 47 | + |
| 48 | +In fact, it is possible to read with `N5Writer`s since every `N5Writer` |
| 49 | +is also an `N5Reader`, so from now on we'll just be using the |
| 50 | +`n5Writer`. |
| 51 | + |
| 52 | +::: {.callout-tip} |
| 53 | +## Try it! |
| 54 | + |
| 55 | +We use the the N5 storage format for the rest of the tutorial, but it will work just as well over either |
| 56 | +an HDF5 file or Zarr container. |
| 57 | +::: |
| 58 | + |
| 59 | +## Groups |
| 60 | + |
| 61 | +N5 containers form hierarchies of *groups* - think "nested folders on your file system." |
| 62 | +It's easy to create groups and test if they exist: |
| 63 | + |
| 64 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#make-groups echo=true >}} |
| 65 | + |
| 66 | +The `list` method lists groups that are children of the given group: |
| 67 | + |
| 68 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#list echo=true >}} |
| 69 | + |
| 70 | +and `deepList` recursively lists every descendent of the given group: |
| 71 | + |
| 72 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#deep-list echo=true >}} |
| 73 | + |
| 74 | +Notice that these methods *only* give information about what groups are |
| 75 | +present and do not provide information about metadata or datasets. |
| 76 | + |
| 77 | +::: {.callout-note} |
| 78 | +Some storage / access systems (AWS-S3) separate permissions for reading and listing, meaning |
| 79 | +it may be possible to access data but not list. |
| 80 | +::: |
| 81 | + |
| 82 | +## Datasets |
| 83 | + |
| 84 | +N5 stores datasets (n-dimensional arrays) in particular groups in the hierarchy. |
| 85 | + |
| 86 | +::: {.callout-warning} |
| 87 | +Datasets must be terminal (leaf) nodes in the container hierarchy - i.e. a dataset can not contain |
| 88 | +another group or dataset. (Is this strictly true? May be confusing with names like multiscale "datasets") |
| 89 | +::: |
| 90 | + |
| 91 | +We recommend using code from [n5-ij](https://github.com/saalfeldlab/n5-ij) or [n5-imglib2](https://github.com/saalfeldlab/n5-imglib2) |
| 92 | +to write datasets. The examples in this post will use the latter. |
| 93 | + |
| 94 | +The [`N5Utils`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java) |
| 95 | +class in n5-imglib2 has many useful methods, but in this post, we'll cover simple methods for reading and writing. First, |
| 96 | +[`N5Utils.save`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java#L1440) |
| 97 | +writes a dataset and required metadata to the container at a group that you specify. The group will be created if it does |
| 98 | +not already exist. The parameters will be discussed in more detail below. |
| 99 | + |
| 100 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-save echo=true >}} |
| 101 | + |
| 102 | +You can write in parallel by providing an [`ExecutorService`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html) to this variant of |
| 103 | +[`N5Utils.save`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java#L1514) |
| 104 | + |
| 105 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-save-exec echo=true >}} |
| 106 | + |
| 107 | +Reading the dataset from the container is also easy with |
| 108 | +[`N5Utils.open`](https://github.com/saalfeldlab/n5-imglib2/blob/241dc2b503d01007ec6aec72dacecc9706f023ab/src/main/java/org/janelia/saalfeldlab/n5/imglib2/N5Utils.java#L428) : |
| 109 | + |
| 110 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-open echo=true >}} |
| 111 | + |
| 112 | +::: {.callout-warning} |
| 113 | + |
| 114 | +## Overwriting data is possible |
| 115 | + |
| 116 | +This save method *DOES NOT* perform any checks prior to writing data and will overwrite data that exists in the specified location. |
| 117 | +Be sure to check and take appropriate action if it is possible that data could already be at a particular location and |
| 118 | +container to avoid data loss or corruption. |
| 119 | +::: |
| 120 | + |
| 121 | +This example shows that data can be over written: |
| 122 | + |
| 123 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#n5-imglib2-overwrite echo=true >}} |
| 124 | + |
| 125 | +### Parameter details |
| 126 | + |
| 127 | +#### `groupPath` |
| 128 | + |
| 129 | +is the location inside the container that will store the dataset. You can store an dataset at the |
| 130 | +root of a container by specifying `""` or `"/"` as the `groupPath`. In this case, the container |
| 131 | +will only be able to store one dataset ([see the warning above](#datasets)). |
| 132 | + |
| 133 | +#### `blockSize` |
| 134 | + |
| 135 | +is a very important parameter. HDF5, N5, and Zarr all break up the datasets they store |
| 136 | +into equally sized blocks or "chunks". The block size parameter specifies the size of these blocks. |
| 137 | + |
| 138 | +For the example above, we stored an image of size `64 x 64` using blocks sized `32 x 32`. As a result, N5 uses |
| 139 | +four blocks to store the entire image: |
| 140 | + |
| 141 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#four-blocks echo=true >}} |
| 142 | + |
| 143 | +*Quiz:* How many blocks would there be if the block size was `64 x 8`? |
| 144 | + |
| 145 | +<details> |
| 146 | +<summary>Click here to show the answer.</summary> |
| 147 | + |
| 148 | +There would be eight blocks. |
| 149 | + |
| 150 | +One block covers the first dimension, but it takes 8 blocks to cover the second dimension ($8 \times 8 = 64$). |
| 151 | +Also demonstrated by the code below: |
| 152 | + |
| 153 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#eight-blocks echo=true >}} |
| 154 | + |
| 155 | +</details> |
| 156 | + |
| 157 | +::: {.callout-tip} |
| 158 | +## Try it! |
| 159 | + |
| 160 | +N5 lets you store your image in a single file if you want - just provide a block size that |
| 161 | +is equal to or larger than the image size. |
| 162 | +::: |
| 163 | + |
| 164 | +#### `compression` |
| 165 | + |
| 166 | +Each block is compressed independently, using the specified compression. |
| 167 | +Use [`RawCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/RawCompression.java) |
| 168 | +to store blocks without compression. |
| 169 | + |
| 170 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#no-compression echo=true >}} |
| 171 | + |
| 172 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#no-compression-blk-sizes echo=true >}} |
| 173 | + |
| 174 | +Notice that blocks were previously ~1700-2000 bytes and are now ~4100 without compression. |
| 175 | + |
| 176 | +The available compression options at the time of this writing are: |
| 177 | + |
| 178 | +* [`BloscCompression`](https://github.com/saalfeldlab/n5-blosc/blob/n5-blosc-1.1.1/src/main/java/org/janelia/saalfeldlab/n5/blosc/BloscCompression.java) |
| 179 | +* [`Bzip2Compression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/Bzip2Compression.java) |
| 180 | +* [`GzipCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/GzipCompression.java) |
| 181 | +* [`Lz4Compression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/Lz4Compression.java) |
| 182 | +* [`RawCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/RawCompression.java) |
| 183 | +* [`XzCompression`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/XzCompression.java) |
| 184 | +* [`ZstandardCompression`](https://github.com/JaneliaSciComp/n5-zstandard/blob/n5-zstandard-1.0.2/src/main/java/org/janelia/scicomp/n5/zstandard/ZstandardCompression.java) |
| 185 | + |
| 186 | +## Metadata |
| 187 | + |
| 188 | +N5 can also store rich structured metadata in addition to array data. This tutorial will discuss basic, low-level metadata operations. |
| 189 | +Advanced operations and metadata standards may be described in a future tutorial. |
| 190 | + |
| 191 | +### Basics |
| 192 | + |
| 193 | +`N5Writer`s have a |
| 194 | +[`setAttribute`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L55) |
| 195 | +method for writing metadata to the storage backend. It takes three arguments: |
| 196 | + |
| 197 | +```java |
| 198 | +<T> void setAttribute(String groupPath, String attributePath, T attribute) |
| 199 | +``` |
| 200 | + |
| 201 | +* `groupPath` : the group in which to store this metadata |
| 202 | +* `attributePath` : the name of this attribute |
| 203 | +* `attribute` : the metadata attribute to be stored. Can be an arbitrary type (denoted `T`). |
| 204 | + |
| 205 | +::: {.callout-note} |
| 206 | +There are differences between an attribute "name" and an attribute "path", but attribute "paths" are an advanced topic |
| 207 | +and will be covered elsewhere. |
| 208 | +::: |
| 209 | + |
| 210 | +Similarly, `N5Reader`s have a |
| 211 | +[`getAttribute`](https://github.com/saalfeldlab/n5/blob/n5-3.1.3/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L241-L244) |
| 212 | +method: |
| 213 | + |
| 214 | +```java |
| 215 | +<T> T getAttribute(String groupPath, String attributePath, Class<T> clazz) |
| 216 | +``` |
| 217 | + |
| 218 | +The last argument (`Class<T>`) lets you specify the type that `getAttribute` should return. |
| 219 | +An `N5Exception` will be thrown if the requested type can not be created from the requested attribute. |
| 220 | +If an attribute does not exist, `null` will be returned (see the last example of this section). |
| 221 | +Consider these examples: |
| 222 | + |
| 223 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#attributes-1 echo=true >}} |
| 224 | + |
| 225 | +Sometimes it is possible to interpret an attribute as multiple different types: |
| 226 | + |
| 227 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#attr-types echo=true >}} |
| 228 | + |
| 229 | +### Rich metadata |
| 230 | + |
| 231 | +It possible to save attributes of arbitrary types, enabling you to struture your |
| 232 | +metadata into classes that are easy to save and load directly. For example, if we define a metadata class `FunWithMetadata`: |
| 233 | + |
| 234 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#fun-with-metadata echo=true >}} |
| 235 | + |
| 236 | +then make an instance and save it: |
| 237 | + |
| 238 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#rich-metadata echo=true >}} |
| 239 | + |
| 240 | +To retrieve all the metadata in a group as JSON: |
| 241 | + |
| 242 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#all-metadata echo=true >}} |
| 243 | + |
| 244 | +### Removing metadata |
| 245 | + |
| 246 | +You can remove attributes by their name as well. To return the element that was removed, just provide the class for that element |
| 247 | +(this mirrors the [remove method](https://docs.oracle.com/javase/8/docs/api/java/util/List.html#remove-int-) for `List`s in Java. |
| 248 | + |
| 249 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#remove-attrs echo=true >}} |
| 250 | + |
| 251 | +### Working with Dataset Metadata |
| 252 | + |
| 253 | +Metadata used to describe datasets can be `get` and `set` the same as all other metadata. |
| 254 | +However there are special [`DatasetAttributes`](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/DatasetAttributes.java) |
| 255 | +methods to safely work with dataset metadata. |
| 256 | +[`N5Reader.getDatasetAttributes`](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L276) and |
| 257 | +[`N5Writer.setDatasetAttributes`](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L134) |
| 258 | +ensure the metadata is always a valid representation of dataset metadata. |
| 259 | +Setting `DatasetAttributes` however should only be done when the dataset is initially saved. This ensure the required metadata is tightly coupled with the data. |
| 260 | +For example, `set`ting dataset metadata should be done through the |
| 261 | +[N5Writer.createDataset](https://github.com/saalfeldlab/n5/blob/8e14d529276b57e1817ff21df9cac9fb1a517d59/src/main/java/org/janelia/saalfeldlab/n5/N5Writer.java#L200) |
| 262 | +methods (or indirectly through the `N5Utils.save` [methods mentioned above](#datasets)) |
| 263 | + |
| 264 | +{{< embed ../../_notebooks/N5-Basics-Tutorial.ipynb#array-metadata echo=true >}} |
| 265 | + |
| 266 | +::: {.callout-warning} |
| 267 | +## Warning |
| 268 | + |
| 269 | +The attributes that N5 uses to read datasets can be set with `setAttribute`, and modifying them could corrupt your data. |
| 270 | +**Do not manually set these attributes unless you absolutely know what you're doing!** |
| 271 | + |
| 272 | +* `dimensions` |
| 273 | +* `blockSize` |
| 274 | +* `dataType` |
| 275 | +* `compression` |
| 276 | + |
| 277 | +The attributes that describe datasets are also accessible using `getAttribute`, try running: |
| 278 | + |
| 279 | +```java |
| 280 | +n5Writer.getAttribute("data", "dimensions", long[].class); |
| 281 | +``` |
| 282 | + |
| 283 | +though using `getDatasetAttributes().getDimensions()` are generally recommended. |
| 284 | +::: |
| 285 | + |
| 286 | +## What to try next |
| 287 | + |
| 288 | +* [How to work with the N5 API and ImgLib2](https://imglib.github.io/imglib2-blog/posts/2022-09-27-n5-imglib2.html) |
| 289 | + |
0 commit comments