Skip to content

mastershin/learn-cuda-101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

95bb6fa · Aug 4, 2024

History

29 Commits
Jul 8, 2024
Jul 8, 2024
Jul 10, 2024
Jul 11, 2024
Aug 4, 2024
Jul 8, 2024
May 24, 2024
Jul 10, 2024
Jul 8, 2024
Jul 9, 2024

Repository files navigation

Fundamentals

  1. L100: Vector Addition: The "Hello World" of GPU programming. Add two arrays element-wise on the GPU, illustrating data transfer and kernel execution.
    • C/C++: Use CPU AVX, Multi threading, CUDA
    • Python: Use libraries cuda-python
    • Rust: Explore rust-cuda or accelerate.
  2. L110: Matrix Multiplication: A step up in complexity, showcasing thread organization and memory access patterns for performance.
    • C/C++: Implement tiled matrix multiplication.
    • Python: NumPy-like syntax with CuPy.
    • Rust: Leverage ndarray crate for matrix operations.
  3. L120: Array Reduction (Sum/Max/Min): Learn to efficiently combine data from multiple threads, introducing reduction techniques and atomic operations.
    • C/C++: Explore different reduction approaches.
    • Python: CuPy's reduction functions.
    • Rust: Use Rayon for parallel iteration and reduction.

Image Processing

  1. Image Blur: Apply a convolution filter (like a Gaussian blur) to an image, highlighting parallel image processing concepts.
    • Python: Use libraries like OpenCV with CUDA support.
    • C/C++: Manipulate pixel data directly.
  2. Image Histogram: Calculate the color histogram of an image, introducing parallel reduction and memory access patterns for efficient data gathering.
    • JavaScript: Display the histogram results using libraries like Chart.js or D3.js.

Numerical Computation

  1. Mandelbrot Set: Generate a fractal image by iterating a complex function, demonstrating parallel computation of independent pixels.
    • JavaScript: Render the resulting Mandelbrot set on a canvas element.
  2. Monte Carlo Pi Estimation: Use random sampling to estimate the value of Pi, illustrating how CUDA can accelerate Monte Carlo simulations.

Slightly More Advanced

  1. Particle Simulation (Basic): Simulate a simple particle system with forces (e.g., gravity) and collisions. Introduce basic particle-particle interactions on the GPU.
    • JavaScript: Animate the particle movement in the browser.
  2. Game of Life (CUDA Kernel): Implement the core rule update for Conway's Game of Life on the GPU, highlighting parallel neighbor calculations.
    • Python/C++/Rust: Handle display and user interaction in the host language.
  3. Simple Ray Tracing (First Hit): A challenging but rewarding project. Calculate the intersection of rays with a scene (e.g., spheres) to render a basic image.

Key Objectives

  • Gradual Progression: Start with the most straightforward examples (vector addition) and build up complexity.
  • Explain Concepts: Emphasize data parallelism, kernel execution, memory management, and how CUDA differs from traditional programming.
  • Language Focus: Choose one primary language (like Python with CuPy) and potentially show snippets or concepts in others to illustrate different approaches.
  • Visualization: Where applicable, link results to visual outputs (images, charts) to make the learning process more engaging.

For cuDF Installation

pip, CUDA12

## Install cuDF, using pip, CUDA 12.0
pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12==24.6.* dask-cudf-cu12==24.6.* cuml-cu12==24.6.* \
    cugraph-cu12==24.6.* cuspatial-cu12==24.6.* cuproj-cu12==24.6.* \
    cuxfilter-cu12==24.6.* cucim-cu12==24.6.* pylibraft-cu12==24.6.* \
    raft-dask-cu12==24.6.* cuvs-cu12==24.6.*

Conda, Python 3.11, CUDA 12.0

conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia  \
    rapids=24.06 python=3.11 cuda-version=12.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published