Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wave] Fun projects for beginnners #278

Open
5 of 16 tasks
raikonenfnu opened this issue Nov 19, 2024 · 18 comments
Open
5 of 16 tasks

[Wave] Fun projects for beginnners #278

raikonenfnu opened this issue Nov 19, 2024 · 18 comments

Comments

@raikonenfnu
Copy link
Contributor

raikonenfnu commented Nov 19, 2024

Are you interested in learning more about GPU programming and developing cool optimizations? Do you want to help build next generation and state-of-the-art machine learning models and layers? Do you want to define the future programming paradigm of machine learning and GPU layers? Look no further, come join us in building "Wave"!

Here are some fun starter tasks to look at:

Core infrastructure

Useful Integration/Deployment on LLM and GenAI model

  • Implement Pytorch Linear Layer (forward, backward) that uses custom TKW kernel
  • Implement Pytorch Conv2d Layer (forward, backward) that uses custom TKW kernel
  • Implement Pytorch Attention Layer (forward, backward) that uses custom TKW kernel
  • Integrate Wave kernel into sharktank for e2e S.T we can inject IR from TKW into model IR.

Useful Operations for Quantized LLM and GenAI workload

@NoumanAmir657
Copy link
Contributor

Would this be open for someone who has almost no knowledge of GPU programming except for some basic understanding.

@raikonenfnu
Copy link
Contributor Author

Hi @NoumanAmir657 thanks for reaching out! There are plenty of tasks to go around, depending on your experience level, would probably assign easier tasks to start with.

@NoumanAmir657
Copy link
Contributor

Hi @NoumanAmir657 thanks for reaching out! There are plenty of tasks to go around, depending on your experience level, would probably assign easier tasks to start with.

Hi, yes would love any pointers to which task would help me getting started. You can assign me whichever you see fit and I can work towards it

@raikonenfnu
Copy link
Contributor Author

@NoumanAmir657 I think a good starter task is to implement tkw.abs and it's lowering into math.absfOp or math.absIOp in math dialect of MLIR. Here is a sample of a similar task on reciprocal (82852d1).

LMK what you think. :)

Disclaimer: if this(or any task assigned to external collaborator) becomes high priority, someone internal may take on this task to get it out asap.

@NoumanAmir657
Copy link
Contributor

@NoumanAmir657 I think a good starter task is to implement tkw.abs and it's lowering into math.absfOp or math.absIOp in math dialect of MLIR. Here is a sample of a similar task on reciprocal (82852d1).

LMK what you think. :)

Disclaimer: if this(or any task assigned to external collaborator) becomes high priority, someone internal may take on this task to get it out asap.

Yes this seems a good starting point. I shall make a PR soon. Thanks!

@raikonenfnu
Copy link
Contributor Author

thanks @NoumanAmir657 for the awesome work with tkw.abs! let me know if you're interested in picking up other issues. The next important thing for us is adding support for tkw.minimum (elementwise) and extending it's support to tkw.min(reduction) but feel free to pick other things more aligned with your interest of course.

@NoumanAmir657
Copy link
Contributor

thanks @NoumanAmir657 for the awesome work with tkw.abs! let me know if you're interested in picking up other issues. The next important thing for us is adding support for tkw.minimum (elementwise) and extending it's support to tkw.min(reduction) but feel free to pick other things more aligned with your interest of course.

Thanks for getting me started. I want to contribute more. Over the weekend, I will go over issues and decide on one. Thanks!

@NoumanAmir657
Copy link
Contributor

@raikonenfnu Hi. You can assign me the tkw.minimum issue.
Thanks!

@raikonenfnu
Copy link
Contributor Author

Hi. You can assign me the tkw.minimum issue.

@NoumanAmir657 SG! thanks :)

@egebeysel
Copy link

Hi, can I volunteer for the tkw.round_even support?, thanks :)

@raikonenfnu
Copy link
Contributor Author

Hi, can I volunteer for the tkw.round_even support?, thanks :)

Sounds great! :)

@raikonenfnu
Copy link
Contributor Author

@egebeysel Here are some sample PRs that does similar thing to what you'd be doing: (82852d1, 71eb1c8)

But in this case, you'd want to lower tkw.round_to_even to math.round_even from the math standard dialect in MLIR https://mlir.llvm.org/docs/Dialects/MathOps/#mathroundeven-mathroundevenop. Do reach out if you have any questions! :)

@ziereis
Copy link
Contributor

ziereis commented Jan 2, 2025

@raikonenfnu Hello, I have some questions regarding "Support more architectures in codegen aside from CDNA." Currently, the codegen directly emits ops from the AMDGPU dialect, like barriers, MMA ops, etc. What is the plan for supporting GPUs from different vendors? Are these ops going to be turned into a higher-level dialect, for instance a vector.contract instead of a amdgpu.mma and let iree handle the lowering? Or is the plan to directly emit the ops for the target architecture in codegen.py?

@raikonenfnu
Copy link
Contributor Author

raikonenfnu commented Jan 28, 2025

Hey @ziereis SFLR, just got back from holidays!

What is the plan for supporting GPUs from different vendors?

We do plan on adding support for different GPUs. However, currently we are putting a little more prioity in polishing variance of attention, low-bit precisions/quantizations, and also performance but if someone from the community is interested in adding support on different HWs, we are happy to talk RFCs and take PRs!

Are these ops going to be turned into a higher-level dialect, for instance a vector.contract instead of a amdgpu.mma and let iree handle the lowering? Or is the plan to directly emit the ops for the target architecture in codegen.py?

We are planning to keep most of the emitting of ops in iree-turbine (not iree proper). Hence the latter, or a variant of the latter option would most likely be the way we are going down. If you are interested in adding support on a new target architecture, do let me or us know! Would be happy to brainstorm over an GH issue or VC. :)

@NoumanAmir657
Copy link
Contributor

@raikonenfnu you can assign me that tkw.min issue. Thanks!

raikonenfnu pushed a commit that referenced this issue Feb 19, 2025
The TKW::MinOp is lowered using TKW::MinimumOp into arith::MinimumFOp,
arith::MinisIOp, and arith::MiniuIOp for floats, signed and unsigned
integers respectively

For #278

---------

Signed-off-by: nouman-10x <[email protected]>
@NoumanAmir657
Copy link
Contributor

@raikonenfnu Can you help me pick another issue?

@raikonenfnu
Copy link
Contributor Author

Hey @NoumanAmir657, sure thing! I think now that you are quite familiar with the code base there are two potential interesting projects here:

  1. Refactor wave kernel compilation from context based to function call based. Super useful for future wave caching and runtime performance as well as code quality.
  2. Scalar codegen, how do we pass in scalar arguments as inputs to kernels, super useful for quantization stuff

@NoumanAmir657
Copy link
Contributor

@raikonenfnu I would like to work on the 1st one. Can you point me to how to get started on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants