From 655b0893926fa3200aeddd73a102feaa4643db8f Mon Sep 17 00:00:00 2001 From: github-actions Date: Mon, 9 Dec 2024 18:51:58 +0000 Subject: [PATCH] Update GitHub Pages --- README.md | 1 + learn/concepts/index.html | 2 +- sources.tar | Bin 655360 -> 655360 bytes 3 files changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 730ba86..ce4a97c 100644 --- a/README.md +++ b/README.md @@ -18,3 +18,4 @@ Wed Sep 18 11:13:08 UTC 2024 Fri Nov 15 13:46:13 UTC 2024 Mon Dec 2 21:30:54 UTC 2024 Fri Dec 6 14:34:48 UTC 2024 +Mon Dec 9 18:51:57 UTC 2024 diff --git a/learn/concepts/index.html b/learn/concepts/index.html index 5b6a7d7..9dc9f01 100644 --- a/learn/concepts/index.html +++ b/learn/concepts/index.html @@ -154,7 +154,7 @@

}); -

ZML Concepts

Model lifecycle

ZML is an inference stack that helps running Machine Learning (ML) models, and particulary Neural Networks (NN).

The lifecycle of a model is implemented in the following steps:

  1. Open the model file and read the shapes of the weights, but leave the weights on the disk.

  2. Using the loaded shapes and optional metadata, instantiate a model struct with Tensors, representing the shape and layout of each layer of the NN.

  3. Compile the model struct and it's forward function into an accelerator specific executable. The forward function describes the mathematical operations corresponding to the model inference.

  4. Load the model weights from disk, onto the accelerator memory.

  5. Bind the model weights to the executable.

  6. Load some user inputs, and copy them to the accelerator.

  7. Call the executable on the user inputs.

  8. Fetch the returned model output from accelerator into host memory, and finally present it to the user.

  9. When all user inputs have been processed, free the executable resources and the associated weights.

Some details:

Note that the compilation and weight loading steps are both bottlenecks to your model startup time, but they can be done in parallel. ZML provides asynchronous primitives to make that easy.

The compilation can be cached across runs, and if you're always using the same model architecture with the same shapes, it's possible to by-pass it entirely.

The accelerator is typically a GPU, but can be another chip, or even the CPU itself, churning vector instructions.

Tensor Bros.

In ZML, we leverage Zig's static type system to differentiate between a few concepts, hence we not only have a Tensor to work with, like other ML frameworks, but also Buffer, HostBuffer, and Shape.

Let's explain all that.

The model struct

The model struct is the Zig code that describes your Neural Network (NN). Let's look a the following model architecture:

+

ZML Concepts

Model lifecycle

ZML is an inference stack that helps running Machine Learning (ML) models, and particulary Neural Networks (NN).

The lifecycle of a model is implemented in the following steps:

  1. Open the model file and read the shapes of the weights, but leave the weights on the disk.

  2. Using the loaded shapes and optional metadata, instantiate a model struct with Tensors, representing the shape and layout of each layer of the NN.

  3. Compile the model struct and it's forward function into an accelerator specific executable. The forward function describes the mathematical operations corresponding to the model inference.

  4. Load the model weights from disk, onto the accelerator memory.

  5. Bind the model weights to the executable.

  6. Load some user inputs, and copy them to the accelerator.

  7. Call the executable on the user inputs.

  8. Fetch the returned model output from accelerator into host memory, and finally present it to the user.

  9. When all user inputs have been processed, free the executable resources and the associated weights.

Some details:

Note that the compilation and weight loading steps are both bottlenecks to your model startup time, but they can be done in parallel. ZML provides asynchronous primitives to make that easy.

The compilation can be cached across runs, and if you're always using the same model architecture with the same shapes, it's possible to by-pass it entirely.

The accelerator is typically a GPU, but can be another chip, or even the CPU itself, churning vector instructions.

Tensor Bros.

In ZML, we leverage Zig's static type system to differentiate between a few concepts, hence we not only have a Tensor to work with, like other ML frameworks, but also Buffer, HostBuffer, and Shape.

Let's explain all that.

  • Shape: describes a multi-dimension array.

    • Shape.init(.{16}, .f32) represents a vector of 16 floats of 32 bits precision.
    • Shape.init(.{512, 1024}, .f16) represents a matrix of 512*1024 floats of 16 bits precision, i.e. a [512][1024]f16 array.

    A Shape is only metadata, it doesn't point to or own any memory. The Shape struct can also represent a regular number, aka a scalar: Shape.init(.{}, .i32) represents a 32-bit signed integer.

  • HostBuffer: is a multi-dimensional array, whose memory is allocated on the CPU.

    • points to the slice of memory containing the array
    • typically owns the underlying memory - but has a flag to remember when it doesn't.
  • Buffer: is a multi-dimension array, whose memory is allocated on an accelerator.

    • contains a handle that the ZML runtime can use to convert it into a physical address, but there is no guarantee this address is visible from the CPU.
    • can be created by loading weights from disk directly to the device via zml.aio.loadBuffers
    • can be created by calling HostBuffer.toDevice(accelerator).
  • Tensor: is a mathematical object representing an intermediary result of a computation.

    • is basically a Shape with an attached MLIR value representing the mathematical operation that produced this Tensor.

The model struct

The model struct is the Zig code that describes your Neural Network (NN). Let's look a the following model architecture:

Multilayer perceptrons

This is how we can describe it in a Zig struct:

const Model = struct {
     input_layer: zml.Tensor,
     output_layer: zml.Tensor,
diff --git a/sources.tar b/sources.tar
index cdc6ea58d49351287b4cf041ecd238c7a100d478..4f48cf6d11e773e617580ae8775ea8f0ea958053 100755
GIT binary patch
delta 992
zcmY+?y-EW?6b0Z+vN1bYNCXkFPzjQV&CcBUDZ~d53o9Q$w3Cz;7Q%u>ifbXHu(Xgo
zfyI_SfkhB2@c}F>MXYoSckkIE)t57~_sq@nQJ#-(Z~8t^$b4*N#)d4mHnXWwXOBwp
zIS|=MHH|qUc}UH8h-AE^o|%AbFz)}rLY0^8fe{&7+rHv$G#9hlP8F|!ITL+tmx_15
zZ1kbF2gSz?3zGX$%@N3mZMFFobIyx>xA_e%IGOLI!xvy=;J$-|2WY`nl@)57@DiE{
zrk>#{;VxMDDU!QRcpa>)Q#!j(c#1~8I&-3^TmBXbMLm9p|2T@$T|ebMSQ(dl2PyA@
zxs0Trqm*Z0WL&jth5Aa_fZ1RrH`J`b$XnSBqj{}iaNnVB>;`#~_c+nK2)S~Es)Mrg
z=9%6ZnSG)g@8LYW$DQU4IFlmmccHn5=AFF9lWz7&(92x&D&!pQ;)muAnsd_2r~Y03
E10$f^-T(jq

delta 992
zcmY+?u}T9$5C-6^xtLolB!Y-os02yG=H_O1uMi(VEUbJ0(N0oYSO^CeDaJxbVQC?G
z0*fns0t*o<@c}F>MXdA+v-4k(>dVaD{4-@aEX(1|bzr5no|8AU;B>y1ww!}8!F&6O^wENwMOK+nB8zC`
zL!99)rvz}wao%%Ol8SctE{Ks*j+4WNCfw3q3-hL`O
zU@l|L=P;Ei7?YdbT4lacA+S8;S~sv_!Rq()Hpt3q!{EIG+qez(tot~&vH*3yzWkW$
zsM*sy(%Hwhv5z`)t-H0d4n4Vm`>w2X(LC!u9&IyEhFqSltU#T^Tl}!nMyr<;XZW