Update GitHub Pages

zml · Jan 28, 2025 · d2e943e · d2e943e
1 parent 655b089
commit d2e943e
Show file tree

Hide file tree

Showing 5 changed files with 20 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -19,3 +19,4 @@ Fri Nov 15 13:46:13 UTC 2024
 Mon Dec  2 21:30:54 UTC 2024
 Fri Dec  6 14:34:48 UTC 2024
 Mon Dec  9 18:51:57 UTC 2024
+Tue Jan 28 10:21:33 UTC 2025
diff --git a/howtos/howto_torch2zml/index.html b/howtos/howto_torch2zml/index.html
@@ -284,7 +284,7 @@ <h2 id="loading-an-individual-layer">Loading an individual layer</h2><p>In the a
 
     <span class="type qualifier">var</span> <span class="variable">ctx</span> = <span class="operator">try</span> <span class="variable">zml</span><span class="punctuation delimiter">.</span><span class="field">Context</span><span class="punctuation delimiter">.</span><span class="function">init</span><span class="punctuation bracket">(</span><span class="punctuation bracket">)</span><span class="punctuation delimiter">;</span>
     <span class="keyword">defer</span> <span class="variable">ctx</span><span class="punctuation delimiter">.</span><span class="function">deinit</span><span class="punctuation bracket">(</span><span class="punctuation bracket">)</span><span class="punctuation delimiter">;</span>
-    <span class="type qualifier">const</span> <span class="variable">platform</span> = <span class="variable">ctx</span><span class="punctuation delimiter">.</span><span class="function">autoPlatform</span><span class="punctuation bracket">(</span><span class="punctuation bracket">)</span><span class="punctuation delimiter">;</span>
+    <span class="type qualifier">const</span> <span class="variable">platform</span> = <span class="variable">ctx</span><span class="punctuation delimiter">.</span><span class="function">autoPlatform</span><span class="punctuation bracket">(</span><span class="punctuation delimiter">.</span><span class="punctuation bracket">{</span><span class="punctuation bracket">}</span><span class="punctuation bracket">)</span><span class="punctuation delimiter">;</span>
     <span class="type qualifier">const</span> <span class="variable">mlp_weights</span> = <span class="operator">try</span> <span class="variable">zml</span><span class="punctuation delimiter">.</span><span class="field">aio</span><span class="punctuation delimiter">.</span><span class="function">loadModelBuffers</span><span class="punctuation bracket">(</span><span class="variable">Mlp</span><span class="punctuation delimiter">,</span> <span class="variable">mlp_shape</span><span class="punctuation delimiter">,</span> <span class="variable">model_weights</span><span class="punctuation delimiter">,</span> <span class="variable">allocator</span><span class="punctuation delimiter">,</span> <span class="variable">platform</span><span class="punctuation bracket">)</span><span class="punctuation delimiter">;</span>
 
     <span class="variable">zml</span><span class="punctuation delimiter">.</span><span class="field">testing</span><span class="punctuation delimiter">.</span><span class="function">testLayer</span><span class="punctuation bracket">(</span><span class="variable">platform</span><span class="punctuation delimiter">,</span> <span class="variable">activations</span><span class="punctuation delimiter">,</span> <span class="string">&quot;model.layers.0.mlp&quot;</span><span class="punctuation delimiter">,</span> <span class="variable">mlp_shape</span><span class="punctuation delimiter">,</span> <span class="variable">mlp_weights</span><span class="punctuation delimiter">,</span> <span class="float">1e-3</span><span class="punctuation bracket">)</span><span class="punctuation delimiter">;</span>

diff --git a/sources.tar b/sources.tar
diff --git a/tutorials/getting_started/index.html b/tutorials/getting_started/index.html
@@ -157,26 +157,23 @@ <h3 class="centered"></h3>
   <div id="docs"><h1 id="getting-started-with-zml">Getting Started with ZML</h1><p>In this tutorial, we will install <code>ZML</code> and run a few models locally.</p><h2 id="prerequisites">Prerequisites</h2><p>First, let's checkout the ZML codebase. In a terminal, run:</p><pre><code>git clone https://github.com/zml/zml.git
 cd zml/
 </code></pre><p>We use <code>bazel</code> to build ZML and its dependencies. We recommend to download it through <code>bazelisk</code>, a version manager for <code>bazel</code>.</p><h3 id="install-bazel">Install Bazel:</h3><p><strong>macOs:</strong></p><pre><code>    brew install bazelisk
-</code></pre><p><strong>Linux:</strong></p><pre><code>    curl -L -o /usr/local/bin/bazel &apos;https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64&apos;
+</code></pre><p><strong>Linux:</strong></p><pre><code>    curl -L -o /usr/local/bin/bazel &apos;https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-linux-amd64&apos;
     chmod +x /usr/local/bin/bazel
 </code></pre><h2 id="run-a-pre-packaged-model">Run a pre-packaged model</h2><p>ZML comes with a variety of model examples. See also our reference implementations in the <a href="https://github.com/zml/zml/tree/master/examples/" target="_blank">examples</a> folder.</p><h3 id="mnist">MNIST</h3><p>The <a href="https://en.wikipedia.org/wiki/MNIST_database" target="_blank">classic</a> handwritten digits recognition task. The model is tasked to recognize a handwritten digit, which has been converted to a 28x28 pixel monochrome image. <code>Bazel</code> will download a pre-trained model, and the test dataset. The program will load the model, compile it, and classify a randomly picked example from the test dataset.</p><p>On the command line:</p><pre><code>cd examples
 bazel run -c opt //mnist
-</code></pre><h3 id="llama">Llama</h3><p>Llama is a family of "Large Language Models", trained to generate text, based on the beginning of a sentence/book/article. This "beginning" is generally referred to as the "prompt".</p><h4 id="tinyllama--stories-15m">TinyLlama, Stories 15M</h4><p>To start, you can use a small model trained specifically on children's history books. This model has been trained by <a href="https://x.com/karpathy" target="_blank">Andrej Karpathy</a>; you can read more about it on his <a href="https://github.com/karpathy/llama2.c" target="_blank">Github</a>.</p><pre><code>cd examples
-bazel run -c opt //llama:TinyLlama-Stories-15M
-bazel run -c opt //llama:TinyLlama-Stories-15M -- --prompt=&quot;Once upon a time, there was a cute little dragon&quot;
-</code></pre><h4 id="openllama-3b">OpenLLama 3B</h4><pre><code>cd examples
-bazel run -c opt //llama:OpenLLaMA-3B
-bazel run -c opt //llama:OpenLLaMA-3B -- --prompt=&quot;Once upon a time,&quot;
-</code></pre><h4 id="meta-llama-3-8b">Meta Llama 3 8B</h4><p>This model has restrictions, see <a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B" target="_blank">here</a>: it <strong>requires approval from Meta on Huggingface</strong>, which can take a few hours to get granted.</p><p>While waiting for approval, you can already <a href="/howtos/huggingface_access_token/">generate your Huggingface access token</a>.</p><p>Once you've been granted access, you're ready to download a gated model like <code>Meta-Llama-3-8b</code>!</p><pre><code># requires token in $HOME/.cache/huggingface/token, as created by the
+</code></pre><h3 id="llama">Llama</h3><p>Llama is a family of "Large Language Models", trained to generate text, based on the beginning of a sentence/book/article. This "beginning" is generally referred to as the "prompt".</p><h4 id="meta-llama-3-1-8b">Meta Llama 3.1 8B</h4><p>This model has restrictions, see <a href="https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct" target="_blank">here</a>. It <strong>requires approval from Meta on Huggingface</strong>, which can take a few hours to get granted.</p><p>While waiting for approval, you can already <a href="/howtos/huggingface_access_token/">generate your Huggingface access token</a>.</p><p>Once you've been granted access, you're ready to download a gated model like <code>Meta-Llama-3.1-8B-Instruct</code>!</p><pre><code># requires token in $HOME/.cache/huggingface/token, as created by the
 # `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.
 cd examples
-bazel run -c opt //llama:Meta-Llama-3-8b
-bazel run -c opt //llama:Meta-Llama-3-8b -- --promt=&quot;Once upon a time,&quot;
-</code></pre><h2 id="run-tests">Run Tests</h2><pre><code>bazel test //zml:test
+bazel run -c opt //llama:Llama-3.1-8B-Instruct
+bazel run -c opt //llama:Llama-3.1-8B-Instruct -- --prompt=&quot;What is the capital of France?&quot;
+</code></pre><p>You can also try <code>Llama-3.1-70B-Instruct</code> if you have enough memory.</p><h3 id="meta-llama-3-2-1b">Meta Llama 3.2 1B</h3><p>Like the 8B model above, this model also requires approval. See <a href="https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct" target="_blank">here</a> for access requirements.</p><pre><code>cd examples
+bazel run -c opt //llama:Llama-3.2-1B-Instruct
+bazel run -c opt //llama:Llama-3.2-1B-Instruct -- --prompt=&quot;What is the capital of France?&quot;
+</code></pre><p>For a larger 3.2 model, you can also try <code>Llama-3.2-3B-Instruct</code>.</p><h2 id="run-tests">Run Tests</h2><pre><code>bazel test //zml:test
 </code></pre><h2 id="running-models-on-gpu---tpu">Running Models on GPU / TPU</h2><p>You can compile models for accelerator runtimes by appending one or more of the following arguments to the command line when compiling or running a model:</p><ul><li>NVIDIA CUDA: <code>--@zml//runtimes:cuda=true</code></li><li>AMD RoCM: <code>--@zml//runtimes:rocm=true</code></li><li>Google TPU: <code>--@zml//runtimes:tpu=true</code></li><li>AWS Trainium/Inferentia 2: <code>--@zml//runtimes:neuron=true</code></li><li><strong>AVOID CPU:</strong> <code>--@zml//runtimes:cpu=false</code></li></ul><p>The latter, avoiding compilation for CPU, cuts down compilation time.</p><p>So, to run the OpenLLama model from above on your host sporting an NVIDIA GPU, run the following:</p><pre><code>cd examples
-bazel run -c opt //llama:OpenLLaMA-3B             \
-          --@zml//runtimes:cuda=true              \
-          -- --prompt=&quot;Once upon a time,&quot;
+bazel run -c opt //llama:Llama-3.2-1B-Instruct            \
+          --@zml//runtimes:cuda=true                      \
+          -- --prompt=&quot;What is the capital of France?&quot;
 </code></pre><h2 id="where-to-go-next">Where to go next:</h2><p>In <a href="/howtos/deploy_on_server/">Deploying Models on a Server</a>, we show how you can cross-compile and package for a specific architecture, then deploy and run your model. Alternatively, you can also <a href="/howtos/dockerize_models/">dockerize</a> your model.</p><p>You might also want to check out the <a href="https://github.com/zml/zml/tree/master/examples" target="_blank">examples</a>, read through the <a href="/">documentation</a>, start <a href="/tutorials/write_first_model/">writing your first model</a>, or read about more high-level <a href="/learn/concepts/">ZML concepts</a>.</p></div>
 </div>
     </main>