Skip to content

Latest commit

 

History

History
54 lines (34 loc) · 1.45 KB

README.md

File metadata and controls

54 lines (34 loc) · 1.45 KB

Sudoku-RWKV

A specialized RWKV model for solving Sudoku puzzles.

menu

Requirements

  • rwkv
  • tkinter

Quick Start

  • Run demo.py or minimum_inference.py to solve Sudoku puzzles
  • Run generate_sudoku_data.py to generate training data

Model

The current sudoku_rwkv_20241120.pth model is a specialized RWKV-v6 model trained on 2M Sudoku samples (~39.2B tokens) specifically for solving Sudoku puzzles.

Model specifications:

  • Parameters: ~12.7M
  • Vocabulary size: 133
  • Architecture: 8 layers, 320 dimensions

The model includes a simple improvement for better performance (see model.py line 372). Corresponding modifications were made in the inference code (rwkv_model.py lines 852, 893-896).

Training

The model was trained using the RWKV-LM repository.

Hyperparameters:

  • M_BSZ: 48
  • CTX_LEN: 8192
  • LR: 12e-4 to 3e-5
  • ADAM_EPS: 1e-18
  • ADAM_BETA1: 0.9
  • ADAM_BETA2: 0.95
  • WEIGHT_DECAY: 0.1

Loss Curve: Training Loss Curve

Experiments

  • Below are the old results. The current model seems to be able to solve any solvable Sudoku. If you find any failed cases, please let me know.

I tested the model on samples of varying difficulty levels, with results shown below:

Note: Difficulty is measured by the number of empty cells in the Sudoku puzzle

Accuracy Results

Token Usage