Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate activation sparsity with activation compression #1920

Open
jcaip opened this issue Mar 18, 2025 · 1 comment
Open

Accelerate activation sparsity with activation compression #1920

jcaip opened this issue Mar 18, 2025 · 1 comment
Labels
good first issue Good for newcomers

Comments

@jcaip
Copy link
Contributor

jcaip commented Mar 18, 2025

We've come up with a training recipe for 2:4 activation sparsity, which is outlined in this paper: https://openreview.net/pdf?id=O5feVk7p6Y

The gist of this approach is that:

  1. we find high level of activation sparsity (> 85%) when training SquaredRELU based FFNs instead of SwiGLU FFNs. These Squared-RELU based FFNs show minimal to no accuracy loss.
  2. We accelerate the sparse activation x dense weight matmul with 2:4 sparsity. We can naively sparsity for the forwards pass, dropping values to fit the 2:4 constraint if they do not fit. For the backwards pass, we need some special sauce to mantain accuraccy.

However @janeyx99 pointed out to me that instead of accelerating the model using 2:4 sparsity, we can seek to exploit (1) with activation compression instead. The idea here is that we can use something like nvcomp to compress the sparse squared-relu activations.

We should run some tests to know what compression ratio and thus the memory savings we could achieve, as well as if there's additional overhead for the compression to account for.

@jcaip jcaip added the good first issue Good for newcomers label Mar 18, 2025
@agrawal-aka
Copy link
Contributor

Hi @jcaip, this seems an interesting take on activation sparsity. I would like to know if the model activations are highly sparse (>85% onwards), wont restricting them to a 50% sparsity be creating a hard upper bound? I think an unstructured sparse kernel make more sense in such scenarios, and makes for CPU inferencing a case as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants