ValueError: offset must be non-negative and no greater than buffer length #5543

LiYixuan727 · 2024-09-23T15:40:21Z

Hi,
I'm training the fairseq with the following script and get the error ValueError: offset must be non-negative and no greater than buffer length.

fairseq-train data-bin --arch transformer
--max-epoch 10
--max-tokens 2048
--num-workers 20
--max-sentences 5000
--fp16
--optimizer adam --lr-scheduler inverse_sqrt --lr 0.0007
--criterion label_smoothed_cross_entropy

LiYixuan727 · 2024-09-23T15:43:22Z

And here is the whole traceback:

2024-09-23 14:53:13 | INFO | fairseq_cli.train | task: TranslationTask
2024-09-23 14:53:13 | INFO | fairseq_cli.train | model: TransformerModel
2024-09-23 14:53:13 | INFO | fairseq_cli.train | criterion: LabelSmoothedCrossEntropyCriterion
2024-09-23 14:53:13 | INFO | fairseq_cli.train | num. shared model params: 22,480,862,208 (num. trained: 22,480,862,208)
2024-09-23 14:53:13 | INFO | fairseq_cli.train | num. expert model params: 0 (num. trained: 0)
2024-09-23 14:53:13 | INFO | fairseq.data.data_utils | loaded 51,352 examples from: data-bin/valid.en-es.en
2024-09-23 14:53:13 | INFO | fairseq.data.data_utils | loaded 51,352 examples from: data-bin/valid.en-es.es
2024-09-23 14:53:13 | INFO | fairseq.tasks.translation | data-bin valid en-es 51352 examples
2024-09-23 14:53:45 | INFO | fairseq.utils | CUDA enviroments for all 1 workers
2024-09-23 14:53:45 | INFO | fairseq.utils | rank 0: capabilities = 8.6 ; total memory = 47.431 GB ; name = NVIDIA RTX A6000
2024-09-23 14:53:45 | INFO | fairseq.utils | CUDA enviroments for all 1 workers
2024-09-23 14:53:45 | INFO | fairseq_cli.train | training on 1 devices (GPUs/TPUs)
2024-09-23 14:53:45 | INFO | fairseq_cli.train | max tokens per device = 4096 and max sentences per device = 5000
2024-09-23 14:53:45 | INFO | fairseq.trainer | Preparing to load checkpoint checkpoints/checkpoint_last.pt
2024-09-23 14:53:45 | INFO | fairseq.trainer | No existing checkpoint found checkpoints/checkpoint_last.pt
2024-09-23 14:53:45 | INFO | fairseq.trainer | loading train data for epoch 1
2024-09-23 14:53:49 | INFO | fairseq.data.data_utils | loaded 51,249,574 examples from: data-bin/train.en-es.en
2024-09-23 14:53:53 | INFO | fairseq.data.data_utils | loaded 51,249,574 examples from: data-bin/train.en-es.es
2024-09-23 14:53:53 | INFO | fairseq.tasks.translation | data-bin train en-es 51249574 examples
Traceback (most recent call last):
File "/home/ag/.local/bin/fairseq-train", line 8, in
sys.exit(cli_main())
File "/home/ag/.local/lib/python3.10/site-packages/fairseq_cli/train.py", line 557, in cli_main
distributed_utils.call_main(cfg, main)
File "/home/ag/.local/lib/python3.10/site-packages/fairseq/distributed/utils.py", line 369, in call_main
main(cfg, **kwargs)
File "/home/ag/.local/lib/python3.10/site-packages/fairseq_cli/train.py", line 164, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
File "/home/ag/.local/lib/python3.10/site-packages/fairseq/checkpoint_utils.py", line 272, in load_checkpoint
epoch_itr = trainer.get_train_iterator(
File "/home/ag/.local/lib/python3.10/site-packages/fairseq/trainer.py", line 719, in get_train_iterator
self.reset_dummy_batch(batch_iterator.first_batch)
File "/home/ag/.local/lib/python3.10/site-packages/fairseq/data/iterators.py", line 368, in first_batch
return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
File "/home/ag/.local/lib/python3.10/site-packages/fairseq/data/iterators.py", line 368, in
return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
File "/home/ag/.local/lib/python3.10/site-packages/fairseq/data/language_pair_dataset.py", line 305, in getitem
tgt_item = self.tgt[index] if self.tgt is not None else None
File "/home/ag/.local/lib/python3.10/site-packages/fairseq/data/indexed_dataset.py", line 523, in getitem
np_array = np.frombuffer(
ValueError: offset must be non-negative and no greater than buffer length (6711936916)

Herostomo · 2024-10-05T09:24:24Z

I wanted to offer my assistance regarding the ValueError: offset must be non-negative and no greater than buffer length error you encountered while training with Fairseq.

Summary of the Issue:
The error occurs during the training process, specifically when the code attempts to access an index in the dataset that is out of range. This typically indicates a potential issue with the dataset formatting or indexing.

Approach :
Verify Dataset Integrity
Check Data Loading and Indexing
Consistency Between Datasets
Adjust Worker Count
Check Configuration Parameters
Inspect Data Paths

dtamayo-nlp · 2024-10-08T10:41:43Z

Hi!

In my case this problem appeared because of a problem with integer precision when processing long files in the binarization of the corpus. It can be solved by adding here the following line:

sizes = [np.int64(el) for el in sizes]
address = np.int64(0)

And processing again the corpus with fairseq-preprocess.

You could also avoid this problem by splitting your big files in smaller ones.

LiYixuan727 added needs triage question labels Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: offset must be non-negative and no greater than buffer length #5543

ValueError: offset must be non-negative and no greater than buffer length #5543

LiYixuan727 commented Sep 23, 2024

LiYixuan727 commented Sep 23, 2024

Herostomo commented Oct 5, 2024

dtamayo-nlp commented Oct 8, 2024

ValueError: offset must be non-negative and no greater than buffer length #5543

ValueError: offset must be non-negative and no greater than buffer length #5543

Comments

LiYixuan727 commented Sep 23, 2024

LiYixuan727 commented Sep 23, 2024

Herostomo commented Oct 5, 2024

dtamayo-nlp commented Oct 8, 2024