Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make build_batch_data_loader work better when dataset size is not multiple of batch size and num_workers #5035

Closed
wants to merge 1 commit into from

Conversation

wat3rBro
Copy link
Contributor

Summary:
Previously ToIterableDataset shards the dataset for each worker in round robin fashion without considering batch size. This combined with drop_last=True can cause issue that more than 1 iteration is dropped, i.e. the number of iterations is less than len(data_loader). Let's say the dataset size is 46 and batch size is 8, when there're 3 DL workers, the dataset would be shared into:

  • worker 0: [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45]
  • worker 1: [1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43]
  • worker 2: [2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44]

Since the batching is per worker, the loaded data would be: [0, 3, 6, 9, 12, 15, 18, 21], [1, 4, 7, 10, 13, 16, 19, 22], [2, 5, 8, 11, 14, 17, 20, 23], [24, 27, 30, 33, 36, 39, 42, 45]. It has a few issues:

  • the data is not loaded in sequence
  • it potentially wastes data, eg. here it has 46 images, so it can have 5 full batches.
  • len(dl) returns 5, but it only runs 4 iterations. Although len can be inaccurate for iterable dataset, but it still causes confusion.

This diff changes the shard pattern, so that in the same case, different workers would get:

  • worker 0: [0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31]
  • worker 1: [8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39]
  • worker 2: [16, 17, 18, 19, 20, 21, 22, 23, 40, 41, 42, 43, 44, 45]

This would solves the issues above, the loaded data now become: [0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39]

Differential Revision: D47529917

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jul 17, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D47529917

@wat3rBro wat3rBro requested a review from ppwwyyxx July 17, 2023 23:48
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D47529917

wat3rBro pushed a commit to wat3rBro/detectron2-1 that referenced this pull request Jul 24, 2023
…tiple of batch size and num_workers

Summary:
Pull Request resolved: facebookresearch#5035

Previously `ToIterableDataset` shards the dataset for each worker in round robin fashion without considering batch size. This combined with `drop_last=True` can cause issue that more than 1 iteration is dropped, i.e. the number of iterations is less than `len(data_loader)`. Let's say the dataset size is 46 and batch size is 8, when there're 3 DL workers, the dataset would be shared into:
- worker 0: [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45]
- worker 1: [1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43]
- worker 2: [2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44]

Since the batching is per worker, the loaded data would be: [0, 3, 6, 9, 12, 15, 18, 21], [1, 4, 7, 10, 13, 16, 19, 22], [2, 5, 8, 11, 14, 17, 20, 23], [24, 27, 30, 33, 36, 39, 42, 45]. It has a few issues:
- the data is not loaded in sequence
- it potentially wastes data, eg. here it has 46 images, so it can have 5 full batches.
- `len(dl)` returns 5, but it only runs 4 iterations. Although `len` can be inaccurate for iterable dataset, but it still causes confusion.

This diff changes the shard pattern, so that in the same case, different workers would get:
- worker 0: [0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31]
- worker 1: [8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39]
- worker 2: [16, 17, 18, 19, 20, 21, 22, 23, 40, 41, 42, 43, 44, 45]

This would solves the issues above, the loaded data now become: [0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39]

Reviewed By: zechenghe

Differential Revision: D47529917

fbshipit-source-id: 83b1843549b68f904b79435ea32a4e21a0cd1ae0
…tiple of batch size and num_workers

Summary:
Pull Request resolved: facebookresearch#5035

Previously `ToIterableDataset` shards the dataset for each worker in round robin fashion without considering batch size. This combined with `drop_last=True` can cause issue that more than 1 iteration is dropped, i.e. the number of iterations is less than `len(data_loader)`. Let's say the dataset size is 46 and batch size is 8, when there're 3 DL workers, the dataset would be shared into:
- worker 0: [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45]
- worker 1: [1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43]
- worker 2: [2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44]

Since the batching is per worker, the loaded data would be: [0, 3, 6, 9, 12, 15, 18, 21], [1, 4, 7, 10, 13, 16, 19, 22], [2, 5, 8, 11, 14, 17, 20, 23], [24, 27, 30, 33, 36, 39, 42, 45]. It has a few issues:
- the data is not loaded in sequence
- it potentially wastes data, eg. here it has 46 images, so it can have 5 full batches.
- `len(dl)` returns 5, but it only runs 4 iterations. Although `len` can be inaccurate for iterable dataset, but it still causes confusion.

This diff changes the shard pattern, so that in the same case, different workers would get:
- worker 0: [0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31]
- worker 1: [8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39]
- worker 2: [16, 17, 18, 19, 20, 21, 22, 23, 40, 41, 42, 43, 44, 45]

This would solves the issues above, the loaded data now become: [0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39]

Reviewed By: zechenghe

Differential Revision: D47529917

fbshipit-source-id: 539145d2262663b9a967189f30b3e0a3aa7a4c3e
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D47529917

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 57bdb21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants