Skip to content

Commit 892403b

Browse files
Add pagination option for items_and_annotations_generator (#449)
* Add pagination option for items_and_annotations_generator. Default of 10,000 items per page remains, but includes the option to specify fewer pages to reduce timout errors. * Bumped package version Included changes to changelog and project toml
1 parent df07aa1 commit 892403b

File tree

3 files changed

+10
-2
lines changed

3 files changed

+10
-2
lines changed

CHANGELOG.md

+6
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ All notable changes to the [Nucleus Python Client](https://github.com/scaleapi/n
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
9+
## [0.17.10](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.17.10) - 2025-03-19
10+
11+
### Added
12+
- Adding page size variable to `items_and_annotation_generator()` to reduce timeout errors for customers with large datasets
13+
814
## [0.17.9](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.17.9) - 2025-03-11
915

1016
### Added

nucleus/dataset.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -1518,13 +1518,15 @@ def items_and_annotation_generator(
15181518
query: Optional[str] = None,
15191519
use_mirrored_images: bool = False,
15201520
only_most_recent_tasks: bool = True,
1521+
page_size=10000
15211522
) -> Iterable[Dict[str, Union[DatasetItem, Dict[str, List[Annotation]]]]]:
15221523
"""Provides a generator of all DatasetItems and Annotations in the dataset.
15231524
15241525
Args:
15251526
query: Structured query compatible with the `Nucleus query language <https://nucleus.scale.com/docs/query-language-reference>`_.
15261527
use_mirrored_images: If True, returns the location of the mirrored image hosted in Scale S3. Useful when the original image is no longer available.
15271528
only_most_recent_tasks: If True, only the annotations corresponding to the most recent task for each item is returned.
1529+
page_size: Number of items to fetch per page. Default is maximum ES page size of 10000.
15281530
15291531
Returns:
15301532
Generator where each element is a dict containing the DatasetItem
@@ -1548,7 +1550,7 @@ def items_and_annotation_generator(
15481550
client=self._client,
15491551
endpoint=f"dataset/{self.id}/exportForTrainingPage",
15501552
result_key=EXPORT_FOR_TRAINING_KEY,
1551-
page_size=10000, # max ES page size
1553+
page_size=page_size, # default is max ES page size of 10000
15521554
query=query,
15531555
chip=use_mirrored_images,
15541556
onlyMostRecentTask=only_most_recent_tasks,

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ ignore = ["E501", "E741", "E731", "F401"] # Easy ignore for getting it running
2525

2626
[tool.poetry]
2727
name = "scale-nucleus"
28-
version = "0.17.9"
28+
version = "0.17.10"
2929
description = "The official Python client library for Nucleus, the Data Platform for AI"
3030
license = "MIT"
3131
authors = ["Scale AI Nucleus Team <[email protected]>"]

0 commit comments

Comments
 (0)