Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blog post about PTPC FP8 on ROCm #38

Closed
wants to merge 11 commits into from
4 changes: 2 additions & 2 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ GEM
gemoji (>= 3, < 5)
html-pipeline (~> 2.2)
jekyll (>= 3.0, < 5.0)
json (2.10.1)
json (2.10.2)
kramdown (2.4.0)
rexml
kramdown-parser-gfm (1.1.0)
Expand Down Expand Up @@ -270,7 +270,7 @@ GEM
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
unicode-display_width (1.8.0)
uri (1.0.2)
uri (1.0.3)
webrick (1.9.1)

PLATFORMS
Expand Down
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,22 @@ To add a new blogpost, please refer to `_posts/2023-06-20-vllm.md` as an example

The blog is automatically built and deployed by GitHub Actions when `main` is pushed to.

## LaTeX Math

The blog supports LaTeX math via [MathJax](https://docs.mathjax.org/en/latest/index.html).

It can be enabled by adding `math: true` to the document frontmatter. It has been configured to support the standard LaTeX style math notation, i.e.:

```latex
$ inline math $
```

```latex
$$
math block
$$
```

## Theme customization

The theme we are using is [Minima](https://github.com/jekyll/minima). If you need to customise anything from this theme, see [Overriding theme defaults](https://jekyllrb.com/docs/themes/#overriding-theme-defaults).
12 changes: 12 additions & 0 deletions _includes/custom-head.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{% if page.math %}
<script>
MathJax = {
tex: {
inlineMath: [['$', '$'], ['\\(', '\\)']],
displayMath: [['$$', '$$'], ['\\[', '\\]']]
}
};
</script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
</script>
{% endif %}
2 changes: 1 addition & 1 deletion _posts/2023-06-20-vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ This utilization of vLLM has also significantly reduced operational costs. With

### Get started with vLLM

Install vLLM with the following command (check out our [installation guide](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) for more):
Install vLLM with the following command (check out our [installation guide](https://docs.vllm.ai/en/latest/getting_started/installation.html) for more):

```bash
$ pip install vllm
Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-09-05-perf-update.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Importantly, we will also focus on improving the core of vLLM to reduce the comp

### Get Involved

If you haven’t, we highly recommend you to update the vLLM version (see instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/index.html)) and try it out for yourself\! We always love to learn more about your use cases and how we can make vLLM better for you. The vLLM team can be reached out via [[email protected]](mailto:[email protected]). vLLM is also a community project, if you are interested in participating and contributing, we welcome you to check out our [roadmap](https://roadmap.vllm.ai/) and see [good first issues](https://github.com/vllm-project/vllm/issues?q=is:open+is:issue+label:%22good+first+issue%22) to tackle. Stay tuned for more updates by [following us on X](https://x.com/vllm\_project).
If you haven’t, we highly recommend you to update the vLLM version (see instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation.html)) and try it out for yourself\! We always love to learn more about your use cases and how we can make vLLM better for you. The vLLM team can be reached out via [[email protected]](mailto:[email protected]). vLLM is also a community project, if you are interested in participating and contributing, we welcome you to check out our [roadmap](https://roadmap.vllm.ai/) and see [good first issues](https://github.com/vllm-project/vllm/issues?q=is:open+is:issue+label:%22good+first+issue%22) to tackle. Stay tuned for more updates by [following us on X](https://x.com/vllm\_project).

If you are in the Bay Area, you can meet the vLLM team at the following events: [vLLM’s sixth meetup with NVIDIA(09/09)](https://lu.ma/87q3nvnh), [PyTorch Conference (09/19)](https://pytorch2024.sched.com/event/1fHmx/vllm-easy-fast-and-cheap-llm-serving-for-everyone-woosuk-kwon-uc-berkeley-xiaoxuan-liu-ucb), [CUDA MODE IRL meetup (09/21)](https://events.accel.com/cudamode), and [the first ever vLLM track at Ray Summit (10/01-02)](https://raysummit.anyscale.com/flow/anyscale/raysummit2024/landing/page/sessioncatalog?search.sessiontracks=1719251906298001uzJ2).

Expand Down
6 changes: 3 additions & 3 deletions _posts/2025-01-10-dev-experience.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ For those who prefer a faster package manager, [**uv**](https://github.com/astra
uv pip install vllm
```

Refer to the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html?device=cuda#create-a-new-python-environment) for more details on setting up [**uv**](https://github.com/astral-sh/uv). Using a simple server-grade setup (Intel 8th Gen CPU), we observe that [**uv**](https://github.com/astral-sh/uv) is 200x faster than pip:
Refer to the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=cuda#create-a-new-python-environment) for more details on setting up [**uv**](https://github.com/astral-sh/uv). Using a simple server-grade setup (Intel 8th Gen CPU), we observe that [**uv**](https://github.com/astral-sh/uv) is 200x faster than pip:

```sh
# with cached packages, clean virtual environment
Expand Down Expand Up @@ -77,11 +77,11 @@ VLLM_USE_PRECOMPILED=1 pip install -e .

The `VLLM_USE_PRECOMPILED=1` flag instructs the installer to use pre-compiled CUDA kernels instead of building them from source, significantly reducing installation time. This is perfect for developers focusing on Python-level features like API improvements, model support, or integration work.

This lightweight process runs efficiently, even on a laptop. Refer to our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html?device=cuda#build-wheel-from-source) for more advanced usage.
This lightweight process runs efficiently, even on a laptop. Refer to our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=cuda#build-wheel-from-source) for more advanced usage.

### C++/Kernel Developers

For advanced contributors working with C++ code or CUDA kernels, we incorporate a compilation cache to minimize build time and streamline kernel development. Please check our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html?device=cuda#build-wheel-from-source) for more details.
For advanced contributors working with C++ code or CUDA kernels, we incorporate a compilation cache to minimize build time and streamline kernel development. Please check our [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=cuda#build-wheel-from-source) for more details.

## Track Changes with Ease

Expand Down
2 changes: 1 addition & 1 deletion _posts/2025-01-27-intro-to-llama-stack-with-vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ huggingface-cli login --token <YOUR-HF-TOKEN>
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir /tmp/test-vllm-llama-stack/.cache/huggingface/hub/models/Llama-3.2-1B-Instruct
```

Next, let's build the vLLM CPU container image from source. Note that while we use it for demonstration purposes, there are plenty of [other images available for different hardware and architectures](https://docs.vllm.ai/en/latest/getting_started/installation/index.html).
Next, let's build the vLLM CPU container image from source. Note that while we use it for demonstration purposes, there are plenty of [other images available for different hardware and architectures](https://docs.vllm.ai/en/latest/getting_started/installation.html).

```
git clone [email protected]:vllm-project/vllm.git /tmp/test-vllm-llama-stack
Expand Down
Loading