You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
simple_connector.py: more efficient use of GPU memory in send
This makes more efficient use of GPU memory by preallocating the `keys`
and `values` and copying onto them, rather than using `torch.cat` on
sub-tensors which would otherwise take double the GPU memory.
Signed-off-by: Dan Aloni <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
0 commit comments