Skip to content

Commit d2f83de

Browse files
committed
tensor_parallel_llama location change
1 parent 6394d79 commit d2f83de

File tree

2 files changed

+3
-4
lines changed

2 files changed

+3
-4
lines changed

docsrc/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ Tutorials
8888
tutorials/_rendered_examples/dynamo/mutable_torchtrt_module_example
8989
tutorials/_rendered_examples/dynamo/weight_streaming_example
9090
tutorials/_rendered_examples/dynamo/pre_allocated_output_example
91-
tutorials/_rendered_examples/dynamo/tensor_parallel_llama
91+
tutorials/_rendered_examples/distributed_inference/tensor_parallel_llama
9292

9393
Dynamo Frontend
9494
----------------

examples/distributed_inference/tensor_parallel_llama3.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
# Taken and modified pytorch lightening
2-
# https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning
31
"""
42
.. _tensor_parallel_llama:
53
64
Torch distributed example for llama3-7B model
75
======================================================
86
9-
As model sizes are increasing, large models with billions of parameters are trained with many GPUs, where regular data parallel training is no longer possible. In this example, we illustrate the Llama3-7B model inference using Torch-TensorRT backend, split across multiple GPUs using a form of model parallelism called Tensor Parallelism. We make use of Pytorch Distributed Tensor Parallelism Module. Please refer to these tutorials- https://pytorch.org/tutorials/intermediate/TP_tutorial.html and https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning?section=featured"""
7+
As model sizes are increasing, large models with billions of parameters are trained with many GPUs, where regular data parallel training is no longer possible. In this example, we illustrate the Llama3-7B model inference using Torch-TensorRT backend, split across multiple GPUs using a form of model parallelism called Tensor Parallelism. We make use of Pytorch Distributed Tensor Parallelism Module. Please refer to these tutorials- https://pytorch.org/tutorials/intermediate/TP_tutorial.html and https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning?section=featured
8+
"""
109

1110
# %%
1211
# Imports and Model Definition

0 commit comments

Comments
 (0)