Handling Reduced Visual Tokens During Inference in LLMs #7

naajeehxe · 2024-10-25T05:12:12Z

I’m not sure if the LLM can handle a different number of visual tokens than what was used during training. If N visual tokens are discarded, is there a step to adjust the dimension before feeding them into the LLM?

UnableToUseGit · 2024-10-28T04:01:44Z

In their code, redundant visual tokens are pruned inside LLM (decode layer[2, 6, 15, 19]). There is no step to adjust the dimension. After pruned, these rest tokens(hidden_states) will be passed in the next layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling Reduced Visual Tokens During Inference in LLMs #7

Handling Reduced Visual Tokens During Inference in LLMs #7

naajeehxe commented Oct 25, 2024

UnableToUseGit commented Oct 28, 2024

Handling Reduced Visual Tokens During Inference in LLMs #7

Handling Reduced Visual Tokens During Inference in LLMs #7

Comments

naajeehxe commented Oct 25, 2024

UnableToUseGit commented Oct 28, 2024