Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection of QoL changes #432

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

GMNGeoffrey
Copy link
Contributor

@GMNGeoffrey GMNGeoffrey commented Jan 29, 2025

This is a bad PR in the sense that it does a bunch of different things, but they're all pretty small. This is a collection of changes I made while developing kernels, mostly around improving debugging with more verbose and specific error messages. Most of these don't really seem sufficiently important to break out into their own PR, but it seemed a shame to just throw away things that seemed like improvements. I did try to clean things up a bit to make them actually upstreamable.

Changes:

  • __str__ method for IndexingContext and __repr__ methods to ExpansionInfo and ReductionInfo. Maybe these should be data classes?
  • Add some missing types to some functions
  • Remove some unused variables
  • Fix the method signature for tkw.reshape
  • Make the callable to CapturedTrace.walk optional (if None, just returns all nodes)
  • More verbose (or any) error messages for existing asserts and exceptions. Maybe this is too much, but the current messages frequently don't tell you what's going wrong. I think it's better to err on the side of too much information. If these get annoying for some reason we can trim them.
  • Transform some builtin KeyError into more specific error messages with context
  • Made the validation if get_custom is passed an op an error instead. I think this more likely than not points to a bug, so it's better for it to be an error. I can't remember the specific case in which I hit it.
  • Add earlier validation if there's a type mismatch between a reduce op init and return types.
  • Add earlier validation if there's an issue when decomposing reduce ops and the local reduction doesn't match the accumulator reduction.
  • Report the argument that has an issue if there's a failure in decomposing reduce ops.
  • Print which node had an issue if there's a failure during codegen
  • Validate that reduction and generated for loop have the same number of arguments. This otherwise results in a failure later on, but we can be more helpful here. I reported [TKW] Bug: reduction expansion loses return values #384 for the bug that causes this to fire.
  • Validate MMA shapes. m has to be in lhs and n in rhs. Locally, I actually have much more restrictive validation that lhs had to be [..., m, k] and rhs [..., n, k]. In theory it looks like Wave is supposed to figure things out if that isn't the setup, but I never had a case where it actually worked, so it seems like you need walk some narrow path. This version is the less restrictive one though.
  • Report more information if IREE invocation fails.
  • Don't assume that a reduction has users in get_users
  • Fix some errors in the Interpreter (using the wrong variable name etc.)
  • Name shape dimensions in some tests. I think this makes things much more readable.
  • Add torch references to some tests. Note that the numeric differences vs torch are quite suspicious here. There's another place where MI200 is special cased for really bad numerics. I think this likely warrants more investigation.
  • Give testChainedGemmF8 better parametrized names.
  • Seed the RNG in testBatchedGemm

One note is that I'm not really sure what the convention is for Exception types in the project, so a lot of these are just RuntimeError. That's not awesome, but I think it's still a lot more helpful than nothing. I tried to avoid raise ... from as in my experience these usually result in unhelpfully verbose stacks.

Let me know if you'd like me to do this differently, there are any of these changes that you think warrant more discussion, or you don't think they're worth this grab bag PR. If this doesn't land before I head off, feel free to modify into something uncontroversial and merge.

This is a bad PR in the sense that it does a bunch of different things.
This is a collection of changes I made, mostly around improving
debugging with more verbose and specific error messages. But there's
also some other stuff in here. Most of these don't really seem
sufficiently important to break out into their own PR, but it seemed a
shame to just throw away things that seemed like improvements. I did
try to clean things up a bit to make them actually upstreamable.

One note is that I'm not really sure what the convention is for
Exception types in the project, so a lot of these are just RuntimeError.
That's not awesome, but I think it's still a lot more helpful than
nothing.
@GMNGeoffrey
Copy link
Contributor Author

Sorry, should've run tests after doing cleanup, obviously....

@GMNGeoffrey GMNGeoffrey marked this pull request as draft January 29, 2025 00:47
@raikonenfnu
Copy link
Contributor

I know it can get quite long, but would be lovely if we have the list of fixes on the PR message

There's still something suspicious going on with the index for reduction
ops, but my change causes breakage, so reverting it.
This requires changing too many tests to be part of this PR. Will try
to submit separately.
I'm not sure that this is correct. I hit some weird thing here, but this
may not be the fix.
@GMNGeoffrey
Copy link
Contributor Author

GMNGeoffrey commented Jan 29, 2025

I removed some of the more substantive changes, fixed the test failures, and wrote down more detail about the changes.

@GMNGeoffrey GMNGeoffrey marked this pull request as ready for review January 29, 2025 23:00
I added these while debugging the test failures from this PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants