Skip to content

Commit

Permalink
fix: reduce grad accumulation
Browse files Browse the repository at this point in the history
  • Loading branch information
ex3ndr committed Jul 11, 2024
1 parent f90b836 commit b14fb7a
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
# 6k tokens is routhly 3 rows, because a single row is a 1500-2500 tokens
# We have MUCH faster GPUs and therefore instead of gradient accumulation,
# we increase batch size 4x and reduce number of gradients to just 4x
train_grad_accum_every = 8
train_grad_accum_every = 2
train_batch_size = 8

# We speculate that learning rate is given for all GPUs, so we divide it by number of GPUs
Expand Down
2 changes: 1 addition & 1 deletion train_ar.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
# 6k tokens is routhly 3 rows, because a single row is a 1500-2500 tokens
# We have MUCH faster GPUs and therefore instead of gradient accumulation,
# we increase batch size 4x and reduce number of gradients to just 4x
train_grad_accum_every = 8
train_grad_accum_every = 2
train_batch_size = 8

# We speculate that learning rate is given for all GPUs, so we divide it by number of GPUs
Expand Down

0 comments on commit b14fb7a

Please sign in to comment.