Skip to content

Commit a0d491e

Browse files
srstevensonSaaketh Narayan
and
Saaketh Narayan
authored
Fix a few typos (#843)
Co-authored-by: Saaketh Narayan <[email protected]>
1 parent 69304c5 commit a0d491e

File tree

11 files changed

+12
-12
lines changed

11 files changed

+12
-12
lines changed

CONTRIBUTING.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ pytest -vv -s . # run all the unittests
7575
cd docs && make clean && make doctest # run doctests
7676
```
7777

78-
6\. [Optional] Compile and visualize the documentation locally. If you have a documentation changes, running the below commands is mandatory.
78+
6\. [Optional] Compile and visualize the documentation locally. If you have documentation changes, running the below commands is mandatory.
7979

8080
<!--pytest.mark.skip-->
8181
```bash

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# several pytest settings
22
PYTHON ?= python # Python command
33
PYTEST ?= pytest # Pytest command
4-
PYRIGHT ?= pyright # Pyright command. Pyright must be installed seperately -- e.g. `node install -g pyright`
4+
PYRIGHT ?= pyright # Pyright command. Pyright must be installed separately -- e.g. `node install -g pyright`
55
EXTRA_ARGS ?= # extra arguments for pytest
66

77
dirs := streaming tests docs

docs/source/_templates/base.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@
9999
version = fragments[1].split("/")[0]
100100

101101
// NOTE: The version string will resolve to the PR number for RTD sites.
102-
// Checking whether first charater is a number.
102+
// Checking whether first character is a number.
103103
if (version[0] >= '0' && version[0] <= '9') {
104104
version = undefined
105105
}

docs/source/dataset_configuration/shuffling.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -70,4 +70,4 @@ Samples within each shard are shuffled both before and after shards are split am
7070

7171
Globally shuffles all samples. This is useful for single-node training on small data, where you want the most random shuffle possible, but is the least download-efficient of all shuffle algorithms. Training throughput is often much lower when using the `naive` shuffling algorithm.
7272

73-
If you are having trouble with throughput, network downloads, or shuffle quality, please refer to the [perfomance tuning page](../distributed_training/performance_tuning.md).
73+
If you are having trouble with throughput, network downloads, or shuffle quality, please refer to the [performance tuning page](../distributed_training/performance_tuning.md).

docs/source/distributed_training/performance_tuning.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ $$L = 2 \cdot S \cdot \lceil\frac{C}{P}\rceil $$
2323

2424
Where $L$ is the required minimum cache limit per node, in MB, $S$ is the average shard size, in MB, $C$ is the number of canonical nodes (see [here](../dataset_configuration/shuffling.md#how-shuffling-works) and [here](../distributed_training/elastic_determinism.md#requirements)), and $P$ is the number of physical nodes. This is because only a single shard, plus a potentially predownloaded subsequent shard, needs to be resident per canonical node to make progress during training.
2525

26-
If using a shuffle-block-based algorithm such as [`'py1e'`](../dataset_configuration/shuffling.md#py1e-default) or [`'py1br'`](../dataset_configuration/shuffling.md#py1br), the required minumum cache limit per node will be approximately:
26+
If using a shuffle-block-based algorithm such as [`'py1e'`](../dataset_configuration/shuffling.md#py1e-default) or [`'py1br'`](../dataset_configuration/shuffling.md#py1br), the required minimum cache limit per node will be approximately:
2727

2828
$$L = k \cdot S \lceil \frac{B}{Q} \rceil \cdot \lceil\frac{C}{P}\rceil $$
2929

scripts/samples/bench_and_plot.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ def bench(args: Namespace, bench_name: str, desc: str, generate: Callable,
237237
args (Namespace): Command-line arguments.
238238
bench_name (str): What to call this benchmark.
239239
desc (str): Brief description of the data.
240-
generate (Callable): Method to genereate the dataset.
240+
generate (Callable): Method to generate the dataset.
241241
formats (List[str]): List of shard formats to benchmark this data in.
242242
"""
243243
print(f'Bench: {bench_name}')
@@ -373,7 +373,7 @@ def bench(args: Namespace, bench_name: str, desc: str, generate: Callable,
373373
y *= args.plot_bins
374374
y = y.astype(np.int64)
375375

376-
# Truncate the higest ``args.truncate_highest_frac`` timings because they get further
376+
# Truncate the highest ``args.truncate_highest_frac`` timings because they get further
377377
# and further spaced as you ascend, which would ruin the plot.
378378
y = y[np.nonzero(y < args.plot_bins)[0]]
379379

simulation/core/utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ def get_batches_epochs(dataset: SimulationDataset, max_duration: Time) -> tuple[
2020
Returns:
2121
Tuple[int, int, int]: batches per epoch, epochs, and the total batches.
2222
"""
23-
# get epochs, batches_per_epoch, and total_batches from a Time obect
23+
# get epochs, batches_per_epoch, and total_batches from a Time object
2424
dataset_batches = dataset.get_num_batches()
2525
batches_per_epoch = dataset_batches
2626
epochs = 1

streaming/base/batching/stratified.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ def generate_work_stratified_batching(dataset: StreamingDataset, world: World, e
115115
f'Number of samples for stream {stream_id} is {batch_portion} because the portion '
116116
+
117117
f'of this stream in the global batch, which is of size {global_batch_size}, is ' +
118-
f'too low. Please increase the global batch size or increase the porportion of ' +
118+
f'too low. Please increase the global batch size or increase the proportion of ' +
119119
f'total samples that come from stream {stream_id}.')
120120

121121
# We now merge the partitions from each stream to get our final partition over all

streaming/text/convert/enwiki/mds/merge_shard_groups.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212

1313
def parse_args() -> Namespace:
14-
"""Parse commmand-line arguments.
14+
"""Parse command-line arguments.
1515
1616
Returns:
1717
Namespace: Command-line arguments.

streaming/text/convert/enwiki/tfrecord/pick_eval_samples.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Script for picking certain number of sampels.
1+
"""Script for picking certain number of samples.
22
"""
33

44
import argparse

tests/test_streaming.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -512,7 +512,7 @@ def test_stratified_batching_Exception(local_remote_dir: tuple[str, str], stream
512512

513513
with pytest.raises(ValueError, match=f'Number of samples for stream*'):
514514
# When we iterate through the dataloader, the samples will be partitioned.
515-
# This should thow ValueError since stream 2 is too small to be included in each batch.
515+
# This should throw ValueError since stream 2 is too small to be included in each batch.
516516
for _ in dataloader:
517517
continue
518518

0 commit comments

Comments
 (0)