From 655b0893926fa3200aeddd73a102feaa4643db8f Mon Sep 17 00:00:00 2001
From: github-actions <bot@zml.ai>
Date: Mon, 9 Dec 2024 18:51:58 +0000
Subject: [PATCH] Update GitHub Pages

---
 README.md                 |   1 +
 learn/concepts/index.html |   2 +-
 sources.tar               | Bin 655360 -> 655360 bytes
 3 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 730ba86..ce4a97c 100644
--- a/README.md
+++ b/README.md
@@ -18,3 +18,4 @@ Wed Sep 18 11:13:08 UTC 2024
 Fri Nov 15 13:46:13 UTC 2024
 Mon Dec  2 21:30:54 UTC 2024
 Fri Dec  6 14:34:48 UTC 2024
+Mon Dec  9 18:51:57 UTC 2024
diff --git a/learn/concepts/index.html b/learn/concepts/index.html
index 5b6a7d7..9dc9f01 100644
--- a/learn/concepts/index.html
+++ b/learn/concepts/index.html
@@ -154,7 +154,7 @@ <h3 class="centered"></h3>
   });
   </script>
   <!-- <h1 :text="$page.title"></h1> -->
-  <div id="docs"><h1 id="zml-concepts">ZML Concepts</h1><h2 id="model-lifecycle">Model lifecycle</h2><p>ZML is an inference stack that helps running Machine Learning (ML) models, and particulary Neural Networks (NN).</p><p>The lifecycle of a model is implemented in the following steps:</p><ol><li><p>Open the model file and read the shapes of the weights, but leave the weights on the disk.</p></li><li><p>Using the loaded shapes and optional metadata, instantiate a model struct with <code>Tensor</code>s, representing the shape and layout of each layer of the NN.</p></li><li><p>Compile the model struct and it's <code>forward</code> function into an accelerator specific executable. The <code>forward</code> function describes the mathematical operations corresponding to the model inference.</p></li><li><p>Load the model weights from disk, onto the accelerator memory.</p></li><li><p>Bind the model weights to the executable.</p></li><li><p>Load some user inputs, and copy them to the accelerator.</p></li><li><p>Call the executable on the user inputs.</p></li><li><p>Fetch the returned model output from accelerator into host memory, and finally present it to the user.</p></li><li><p>When all user inputs have been processed, free the executable resources and the associated weights.</p></li></ol><p><strong>Some details:</strong></p><p>Note that the compilation and weight loading steps are both bottlenecks to your model startup time, but they can be done in parallel. <strong>ZML provides asynchronous primitives</strong> to make that easy.</p><p>The <strong>compilation can be cached</strong> across runs, and if you're always using the same model architecture with the same shapes, it's possible to by-pass it entirely.</p><p>The accelerator is typically a GPU, but can be another chip, or even the CPU itself, churning vector instructions.</p><h2 id="tensor-bros">Tensor Bros.</h2><p>In ZML, we leverage Zig's static type system to differentiate between a few concepts, hence we not only have a <code>Tensor</code> to work with, like other ML frameworks, but also <code>Buffer</code>, <code>HostBuffer</code>, and <code>Shape</code>.</p><p>Let's explain all that.</p><ul><li><p><code>Shape</code>: <em>describes</em> a multi-dimension array.</p><ul><li><code>Shape.init(.{16}, .f32)</code> represents a vector of 16 floats of 32 bits precision.</li><li><code>Shape.init(.{512, 1024}, .f16)</code> represents a matrix of <code>512*1024</code> floats of 16 bits precision, i.e. a <code>[512][1024]f16</code> array.</li></ul><p>A <code>Shape</code> is only <strong>metadata</strong>, it doesn't point to or own any memory. The <code>Shape</code> struct can also represent a regular number, aka a scalar: <code>Shape.init(.{}, .i32)</code> represents a 32-bit signed integer.</p></li><li><p><code>HostBuffer</code>: <em>is</em> a multi-dimensional array, whose memory is allocated <strong>on the CPU</strong>.</p><ul><li>points to the slice of memory containing the array</li><li>typically owns the underlying memory - but has a flag to remember when it doesn't.</li></ul></li><li><p><code>Buffer</code>: <em>is</em> a multi-dimension array, whose memory is allocated <strong>on an accelerator</strong>.</p><ul><li>contains a handle that the ZML runtime can use to convert it into a physical address, but there is no guarantee this address is visible from the CPU.</li><li>can be created by loading weights from disk directly to the device via <code>zml.aio.loadBuffers</code></li><li>can be created by calling <code>HostBuffer.toDevice(accelerator)</code>.</li></ul></li><li><p><code>Tensor</code>: is a mathematical object representing an intermediary result of a computation.</p><ul><li>is basically a <code>Shape</code> with an attached MLIR value representing the mathematical operation that produced this <code>Tensor</code>.</li></ul></li></ul><h2 id="the-model-struct">The model struct</h2><p>The model struct is the Zig code that describes your Neural Network (NN). Let's look a the following model architecture:</p><p><figure><img src="https://zml.ai/docs-assets/perceptron.png">
+  <div id="docs"><h1 id="zml-concepts">ZML Concepts</h1><h2 id="model-lifecycle">Model lifecycle</h2><p>ZML is an inference stack that helps running Machine Learning (ML) models, and particulary Neural Networks (NN).</p><p>The lifecycle of a model is implemented in the following steps:</p><ol><li><p>Open the model file and read the shapes of the weights, but leave the weights on the disk.</p></li><li><p>Using the loaded shapes and optional metadata, instantiate a model struct with <code>Tensor</code>s, representing the shape and layout of each layer of the NN.</p></li><li><p>Compile the model struct and it's <code>forward</code> function into an accelerator specific executable. The <code>forward</code> function describes the mathematical operations corresponding to the model inference.</p></li><li><p>Load the model weights from disk, onto the accelerator memory.</p></li><li><p>Bind the model weights to the executable.</p></li><li><p>Load some user inputs, and copy them to the accelerator.</p></li><li><p>Call the executable on the user inputs.</p></li><li><p>Fetch the returned model output from accelerator into host memory, and finally present it to the user.</p></li><li><p>When all user inputs have been processed, free the executable resources and the associated weights.</p></li></ol><p><strong>Some details:</strong></p><p>Note that the compilation and weight loading steps are both bottlenecks to your model startup time, but they can be done in parallel. <strong>ZML provides asynchronous primitives</strong> to make that easy.</p><p>The <strong>compilation can be cached</strong> across runs, and if you're always using the same model architecture with the same shapes, it's possible to by-pass it entirely.</p><p>The accelerator is typically a GPU, but can be another chip, or even the CPU itself, churning vector instructions.</p><h2 id="tensor-bros">Tensor Bros.</h2><p>In ZML, we leverage Zig's static type system to differentiate between a few concepts, hence we not only have a <code>Tensor</code> to work with, like other ML frameworks, but also <code>Buffer</code>, <code>HostBuffer</code>, and <code>Shape</code>.</p><p>Let's explain all that.</p><ul><li><p><code>Shape</code>: <em>describes</em> a multi-dimension array.</p><ul><li><code>Shape.init(.{16}, .f32)</code> represents a vector of 16 floats of 32 bits precision.</li><li><code>Shape.init(.{512, 1024}, .f16)</code> represents a matrix of <code>512*1024</code> floats of 16 bits precision, i.e. a <code>[512][1024]f16</code> array.</li></ul><p>A <code>Shape</code> is only <strong>metadata</strong>, it doesn't point to or own any memory. The <code>Shape</code> struct can also represent a regular number, aka a scalar: <code>Shape.init(.{}, .i32)</code> represents a 32-bit signed integer.</p></li><li><p><code>HostBuffer</code>: <em>is</em> a multi-dimensional array, whose memory is allocated <strong>on the CPU</strong>.</p><ul><li>points to the slice of memory containing the array</li><li>typically owns the underlying memory - but has a flag to remember when it doesn't.</li></ul></li><li><p><code>Buffer</code>: <em>is</em> a multi-dimension array, whose memory is allocated <strong>on an accelerator</strong>.</p><ul><li>contains a handle that the ZML runtime can use to convert it into a physical address, but there is no guarantee this address is visible from the CPU.</li><li>can be created by loading weights from disk directly to the device via <code>zml.aio.loadBuffers</code></li><li>can be created by calling <code>HostBuffer.toDevice(accelerator)</code>.</li></ul></li><li><p><code>Tensor</code>: is a mathematical object representing an intermediary result of a computation.</p><ul><li>is basically a <code>Shape</code> with an attached MLIR value representing the mathematical operation that produced this <code>Tensor</code>.</li></ul></li></ul><h2 id="the-model-struct">The model struct</h2><p>The model struct is the Zig code that describes your Neural Network (NN). Let's look a the following model architecture:</p><p><figure><img src="https://raw.githubusercontent.com/zml/zml.github.io/refs/heads/main/docs-assets/perceptron.png">
 <figcaption>Multilayer perceptrons</figcaption></figure></p><p>This is how we can describe it in a Zig struct:</p><pre><code class="zig"><span class="type qualifier">const</span> <span class="variable">Model</span> = <span class="keyword">struct</span> <span class="punctuation bracket">{</span>
     <span class="field">input_layer</span><span class="punctuation delimiter">:</span> <span class="variable">zml</span><span class="punctuation delimiter">.</span><span class="field">Tensor</span><span class="punctuation delimiter">,</span>
     <span class="field">output_layer</span><span class="punctuation delimiter">:</span> <span class="variable">zml</span><span class="punctuation delimiter">.</span><span class="field">Tensor</span><span class="punctuation delimiter">,</span>
diff --git a/sources.tar b/sources.tar
index cdc6ea58d49351287b4cf041ecd238c7a100d478..4f48cf6d11e773e617580ae8775ea8f0ea958053 100755
GIT binary patch
delta 992
zcmY+?y-EW?6b0Z+vN1bYNCXkFPzjQV&CcBUDZ~d53o9Q$w3Cz;7Q%u>ifbXHu(Xgo
zfyI_SfkhB2@c}F>MXYoSckkIE)t57~_sq@nQJ#-(Z~8t^$b4*N#)d4mHnXWwXOBwp
zIS|=MHH|qUc}UH8h-AE^o|%AbFz)}rLY0^8fe{&7+rHv$G#9hlP8F|!ITL+tmx_15
zZ1kbF2gSz?3zGX$%@N3mZMFFobIyx>xA_e%IGOLI!xvy=;J$-|2WY`nl@)57@DiE{
zrk>#{;VxMDDU!QRcpa>)Q#!j(c#1~8I&-3^TmBXbMLm9p|2T@$T|ebMSQ(dl2PyA@
zxs0Trqm*Z0WL&jth5Aa_fZ1RrH`J`b$XnSBqj{}iaNnVB>;`#~_c+nK2)S~Es)Mrg
z=9%6ZnSG)g@8LYW$DQU4IFlmmccHn5=AFF9lWz7&(92x&D&!pQ;)muAnsd_2r~Y03
E10$f^-T(jq

delta 992
zcmY+?u}T9$5C-6^xtLolB!Y-os02yG=H_O1uMi(VEUbJ0(N0oYSO^CeDaJxbVQC?G
z0*fns0t*o<@c}F>MXdA+v-4k(>dVaD{4-@aEX(1|b<c&&d+%Bq<soyejch{Z^uZW8
z1LC~MB8{X#@=!IE0TOwwdJ+c81JQrO%G8hTf-z@Bv^^tRXfF6@Cq|aR92I%A3nSZL
zdEq(Qy^*7a1+Dur%@LT?Hs|CO>zr5no|8AU;B>y1ww!}8!F&6O^wENwMOK+nB8zC`
zL!99<kq(&iG3v@(C9(#_M4H)MA`>)rvz}wao%%Ol8SctE{Ks*j+4WNCfw3q3-hL`O
zU@l|L=P;Ei7?YdbT4lacA+S8;S~sv_!Rq()Hpt3q!{EIG+qez(tot~&vH*3yzWkW$
zsM*sy(%Hwhv5z`)t-H0d4n4Vm`>w2X(LC!u9&IyEhFqSltU#T^Tl}!nMyr<;XZW<g
FOMmrL+)@Al