Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup examples #10097

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Cleanup examples #10097

wants to merge 25 commits into from

Conversation

xnuohz
Copy link
Contributor

@xnuohz xnuohz commented Mar 5, 2025

  • Combine 15 examples related to cora into 1 file
  • Update related docs, github workflow, readme

@xnuohz xnuohz requested review from a team, wsad1, rusty1s and EdisonLeeeee as code owners March 5, 2025 12:51
Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changes look good at first review, but for such an extensive change i would ask that you run every one of the deleted examples, and then the new version with its corresponding args for this planetoid example. I would ask that you share those logs and also check yourself to make sure that the new version is as good or better in terms of accuracy, speed, and readability. and if not for any reason try to understand why. This is a very valuable PR but we want to be careful with something like this.

If compute is an issue, i would ask that you write a bash script for me that will automatically run the above experiments and have the logs saved to some shared cloud drive or a github repo so you can do the above mentioned anlaysis after i run the script.

regardless once you have said analysis, i will look through myself as well and make sure our analyses align. then i think this is safe to merge

@puririshi98
Copy link
Contributor

also please update the changelog

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far! ✨

Tbh, I'd vote for having many separate 90~100 lines of code examples to allow users to immediately understand what the script does and easily modify the scirpt that they're interested in (at the cost of maintaining duplicate code in the example directory), but I guess this is just my preference.

Could we also add type annotations?

Comment on lines 37 to 38
if not WITH_TORCH_SPLINE_CONV:
quit("This example requires 'torch-spline-conv'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this script need the package in all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and only judged when gnn_choice = splinegnn

if not WITH_TORCH_SPLINE_CONV:
quit("This example requires 'torch-spline-conv'")

# define args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# define args

wall_clock_start = time.perf_counter()
seed_everything(123)

# detect device
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to remove comments that don't add any information.

Suggested change
# detect device

Comment on lines 486 to 493
total_loss = 0
for pos_rw, neg_rw in loader:
optimizer.zero_grad()
loss = model.loss(pos_rw.to(device), neg_rw.to(device))
loss.backward()
optimizer.step()
total_loss += loss.item()
return total_loss / len(loader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's accumulate the loss as a tensor and not as a float so that we don't need to cuda sync every iteration:

Suggested change
total_loss = 0
for pos_rw, neg_rw in loader:
optimizer.zero_grad()
loss = model.loss(pos_rw.to(device), neg_rw.to(device))
loss.backward()
optimizer.step()
total_loss += loss.item()
return total_loss / len(loader)
total_loss = torch.zeros(1, device=device).squeeze()
for pos_rw, neg_rw in loader:
optimizer.zero_grad()
loss = model.loss(pos_rw.to(device), neg_rw.to(device))
loss.backward()
optimizer.step()
total_loss += loss
return total_loss.item() / len(loader)

times = []
best_val_acc = test_acc = 0.
for epoch in range(args.epochs):
start = time.time()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for consistency:

Suggested change
start = time.time()
start = time.perf_counter()

@xnuohz
Copy link
Contributor Author

xnuohz commented Mar 9, 2025

@puririshi98 @akihironitta ready for review, thank you

  1. total 14 models, 4 improved, 9 keep the same, 1 worse
    overall.csv
  2. the new file aligns config and params of the old file, when ready to merge, should delete the redundant parts, which may affect the metrics
  3. the old file does not init model params explicitly, while i updated in the new file. commented out for aligning config and params of the old file
  4. script for the log of new file, refer to planetoid_train.sh,
    new_logs.zip
  5. script for the log of old file, refer to run_cora.sh
    old_logs.zip

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I'd vote for having many separate 90~100 lines of code examples to allow users to immediately understand what the script does and easily modify the scirpt that they're interested in (at the cost of maintaining duplicate code in the example directory), but I guess this is just my preference.

I think that is a valid point, instead of just deleting them all, @xnuohz could you leave them in and update the readme to say that

"planetoid_train.py combines (insert list of python examples it covers) into a single unified framework that can all be run run with planetoid_train.sh. (insert description of the planetoid datasets it covers in general and explain why they can all be covered with a unified codebase)".

@akihironitta thoughts?

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also @xnuohz please figure out why that 1 example is worse and add some special condition to the framework that fixes the issue. this should be considered a strict requirement for merging

@xnuohz
Copy link
Contributor Author

xnuohz commented Mar 10, 2025

after added model params init, their results are consistent

@puririshi98
Copy link
Contributor

okay lets wait till @akihironitta replies to my previous comment about restructuring this PR to satisfy his concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants