You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m not sure if the LLM can handle a different number of visual tokens than what was used during training. If N visual tokens are discarded, is there a step to adjust the dimension before feeding them into the LLM?
The text was updated successfully, but these errors were encountered:
In their code, redundant visual tokens are pruned inside LLM (decode layer[2, 6, 15, 19]). There is no step to adjust the dimension. After pruned, these rest tokens(hidden_states) will be passed in the next layer.
I’m not sure if the LLM can handle a different number of visual tokens than what was used during training. If N visual tokens are discarded, is there a step to adjust the dimension before feeding them into the LLM?
The text was updated successfully, but these errors were encountered: