Benchmarks for Metis #10

YuMJie · 2024-11-09T06:55:26Z

It is the excellent work "Metis: Fast Automatic Distributed Training on Heterogeneous GPUs", however, I have a couple of questions about the code:

Why are the configuration files execution_memory are same for different micro batch sizes?
Can you provide the profile files of different devices (e.g., RTX3090, etc.).
I see that the profile file configured in the README.md is about activation memory but this is not used in the code.

Could you provide the benchmarks for metis?
Thank you!

mgong-kang · 2024-11-12T23:14:31Z

Thank you for your interest in this project.

As you mentioned, memory usage varies depending on the micro batch size. the files in profile_data_samples are based on a micro batch size of 1. It appears there was an error in copying the sample data, resulting in incorrect values. I will update this with the correct values.
While we do not have precise measurement data for the exact model provided in the sample, we will add some reference data that may be helpful. Additionally, you might consider regenerating and running the existing data to reflect approximate device performance as an alternative approach.
To measure pipeline communication costs more accurately, it is recommended to profile and use the activation size in advance. In the Metis code, a GPT model is provided, and it includes code for calculating the activation size of the GPT model, which has been used for the calculations.

YuMJie · 2024-11-13T08:35:48Z

Thank you for your reply:
But I have some questions:

I have seen the code to calculate the activation size, but I can not find any code using the item "activation_parameters_bytes" in the profile file.
Could you provide the code for executing the config that Metis generates?
the item "parameters" in profiler file means the size of model, but using mixed-precision training, it exists two model weights which have different sizes, and which wight size should be writen?

Thank you!

mgong-kang · 2024-11-13T09:23:15Z

activation_parameter_bytes is not currently used in the code. It is a format prepared for models where calculating the activation size per model is challenging.
Are you referring to the code for generating the profile? If so, there is a generation guide in the README.md, and I kindly ask for your understanding as we cannot provide the code.
Since it is important for the communication cost to reflect the actual weight size of the model, it is deemed appropriate to use the FP32 weight size for measuring communication cost, even if some weights are converted to FP16 for computational efficiency in mixed-precision training.

Thank you!

YuMJie · 2024-11-13T11:22:31Z

I got it!
Thank you for your reply!
I will closed this issue

YuMJie · 2024-11-15T03:36:59Z

Hi, @mgong-kang
I notice that you achieved the hete-aware data-parallelism load balanced, but I cannot see any code or output result about it. Could you provide some information?
what is more. Could you provide how to use the metis config to run alpa?
Thank you!

mgong-kang · 2024-11-15T06:11:05Z

The DataLoadBalancer is implemented at the follwing path:
https://github.com/SamsungLabs/Metis/blob/main/model/load_balancer.py#L147

mgong-kang · 2024-11-15T06:39:07Z

@goeunee326
I would appreciate it if you could provide guidance on how to execute the Metis results in Alpa.

YuMJie · 2024-11-15T07:20:17Z

The DataLoadBalancer is implemented at the follwing path: https://github.com/SamsungLabs/Metis/blob/main/model/load_balancer.py#L147

Thank you for your help, but it seems that the output of the Metis strategy does not reflect the batch size when data parallelism in different GPUs.

What is more, I found some same strategies have different costs.

mgong-kang · 2024-11-15T07:31:55Z

@YuMJie

Data parallelism occurs when heterogeneous GPUs are allocated within a stage.
If you could send the profile data you've worked on, I'll take a look.

Thank you.

goeunee326 · 2024-11-18T02:53:07Z

You can execute the process by modifying a specific part of the Alpa benchmark code.
https://github.com/alpa-projects/alpa/blob/main/benchmark/alpa/suite_auto_gpt.py#L31-L44

To run the results from Metis in Alpa, parameter mapping is required:

layer_partition -> forward_stage_layer_ids
device_group -> submesh_physical_shapes
strategies -> submesh_logical_shapes
node_sequence -> submesh_to_hosts (a concept not currently present in Alpa)

Since the concept of submesh_to_hosts does not exist in Alpa, you will need to add it by modifying Alpa's internal code. Ensure GPU placement aligns with Metis's node_sequence by referencing your Ray status. These adjustments will enable the system to function properly.

mgong-kang · 2024-11-25T02:03:58Z

If there are no further discussions or additional points to address, we will proceed to close this issue. Please feel free to reopen it at any time if further discussions or inquiries are needed. Do not hesitate to share any additional comments or questions.

mgong-kang self-assigned this Nov 12, 2024

YuMJie closed this as completed Nov 13, 2024

mgong-kang assigned goeunee326 Nov 15, 2024

YuMJie reopened this Nov 15, 2024

mgong-kang closed this as completed Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks for Metis #10

Benchmarks for Metis #10

YuMJie commented Nov 9, 2024

mgong-kang commented Nov 12, 2024

YuMJie commented Nov 13, 2024

mgong-kang commented Nov 13, 2024

YuMJie commented Nov 13, 2024

YuMJie commented Nov 15, 2024

mgong-kang commented Nov 15, 2024

mgong-kang commented Nov 15, 2024 •

edited

Loading

YuMJie commented Nov 15, 2024

mgong-kang commented Nov 15, 2024 •

edited

Loading

goeunee326 commented Nov 18, 2024

mgong-kang commented Nov 25, 2024

Benchmarks for Metis #10

Benchmarks for Metis #10

Comments

YuMJie commented Nov 9, 2024

mgong-kang commented Nov 12, 2024

YuMJie commented Nov 13, 2024

mgong-kang commented Nov 13, 2024

YuMJie commented Nov 13, 2024

YuMJie commented Nov 15, 2024

mgong-kang commented Nov 15, 2024

mgong-kang commented Nov 15, 2024 • edited Loading

YuMJie commented Nov 15, 2024

mgong-kang commented Nov 15, 2024 • edited Loading

goeunee326 commented Nov 18, 2024

mgong-kang commented Nov 25, 2024

mgong-kang commented Nov 15, 2024 •

edited

Loading

mgong-kang commented Nov 15, 2024 •

edited

Loading