Skip to content

R-D-BioTech-Alaska/Qelm

Repository files navigation

Quantum-Enhanced Language Model (QELM) – Theoretical

License Python Qiskit Qiskit Aer GitHub Stars

QELM (Quantum-Enhanced Language Model) merges quantum computing and NLP to provide compact-yet-powerful language models with advanced features like multi-block quantum transformers, ring entanglement, data reuploading, parameter-shift gradient training, and much more.

Important: QelmT.py is the new consolidated training/inference script. The older scripts (Qelm2.py, QelmGUI.py) remain functional and can still be used. However, they are now considered outdated.


Table of Contents

  1. What’s New in QelmT.py?
  2. Quantum vs. Classical Size Comparison
  3. Features
  4. Installation
  5. Usage with QelmT.py (Recommended)
  6. (Outdated but Working) Legacy Scripts
  7. Project Structure
  8. License
  9. Contact

What’s New in QelmT.py?

QelmT.py is the newest codebase that includes better control, sub-bit encoding and entropy control.

  • Training with either real or synthetic datasets
  • Parameter tuning (learning rate, epochs, advanced quantum ansatz, multi-threading, data re-uploading, sub-bit encoding, entropy factor, etc.)
  • Inference (prompt-based generation and conversation)
  • Resource monitoring (CPU/GPU usage)
  • Model checkpointing (save & load using a .qelm file)

Quantum vs. Classical Size Comparison

With the addition of sub-bit encoding and entropy-based qubit mixing, QELM has become even more space-efficient than our earlier comparisons indicated. While the original table below provides a rough idea of how QELM’s quantum “compression” compares to typical classical LLMs, these figures may be underselling QELM’s true potential. In recent tests, leveraging sub-bit encoding at around 13.69 bytes per qubit and carefully tuned entropy factors in the training phase allowed us to store more representational information in fewer qubits, further shrinking the model size.

Note: We retain the original table for historical/contextual reference. If anything, real-world deployments of sub-bit + entropy-optimized QELM will likely show even greater size reductions.

Classical Size (MB) Classical LLM (bits) QELM (bits) Relationship
1 MB ~8.39×106 ~8.44×107 QELM >> LLM
5 MB ~4.19×107 ~9.84×107 QELM > LLM
16.6 MB ~1.39×108 ~1.39×108 QELM ≈ LLM
50 MB ~4.19×108 ~2.56×108 QELM << LLM
100 MB ~8.39×108 ~4.31×108 QELM << LLM
1 GB ~8.59×109 ~3.67×109 QELM << LLM
100 GB ~8.59×1011 ~3.59×1011 QELM << LLM

With entanglement, sub-bit encoding, and entropy-mixed gates, QELM drastically reduces storage requirements. In practice, you can expect significantly smaller footprints than the ones listed here once you enable these advanced techniques.


Features

  • Quantum Circuit Transformers
    • Advanced ring entanglement, data reuploading, multi-block attention
    • Parameter-shift gradient training (supports multi-threading)
  • QelmT.py: One script for everything (training + inference + more)
  • Live Resource Monitoring
    • CPU usage, GPU usage if available
  • Lightweight Models
    • Potentially 10-100x smaller than classical LLMs of similar capacity currently
  • Wide Range of Tokenization
    • Exponential subword, BPE, WordPiece, dynamic vocab, etc.

Installation

Prerequisites

  • Python 3.7+ (tested up to 3.11)
  • Qiskit + Qiskit Aer
  • NumPy, TensorFlow
  • Tkinter (usually included in Python)
  • psutil (optional, for CPU usage)
  • nltk (for tokenizing text data)

Cloning the Repository

git clone https://github.com/R-D-BioTech-Alaska/QELM.git
cd QELM

Virtual Environment Setup

python -m venv qiskit_env
# Activate virtualenv:
# Linux/Mac:
source qiskit_env/bin/activate
# Windows:
qiskit_env\Scripts\activate

Dependency Installation

pip install --upgrade pip
pip install -r requirements.txt

Usage with QelmT.py (Recommended)

Below are basic usage examples for QelmT.py. For more advanced options, use --help:

Basic Command Line Training

python QelmT.py --train \
                --dataset /path/to/data.txt \
                --vocab_size 8000 \
                --embed_dim 256 \
                --num_heads 4 \
                --hidden_dim 512 \
                --epochs 5 \
                --lr 0.001

Flags:

  • --train : Activates training mode
  • --dataset : Path to your .txt dataset
  • --vocab_size : Limit for vocabulary
  • --embed_dim : Embedding dimension (must be divisible by --num_heads)
  • --num_heads : Number of attention heads
  • --hidden_dim : Hidden dimension for feed-forward
  • --epochs : Number of training epochs
  • --lr : Learning rate

Performing Inference

python QelmT.py --inference \
                --input_token "hello" \
                --max_length 50 \
                --temperature 1.0 \
                --model /path/to/saved_model.qelm

Flags:

  • --inference : Inference mode
  • --input_token : Starting word or token
  • --max_length : Maximum output tokens
  • --temperature : Sampling temperature (higher => more random)
  • --model : Path to a .qelm checkpoint

Advanced Options

  • --num_blocks N : Multi-block quantum transformers (default=1)
  • --use_advanced_ansatz : Enable advanced quantum gates
  • --use_data_reuploading : Use data reuploading technique
  • --sim_method [cpu|gpu|both|simulation] : Simulation approach
  • --threads N : For multi-threaded parameter-shift
  • --decimal_precision N : Force quantum channels to round to N decimals
  • --use_subbit_encoding : Sub-bit encoding to store more info per qubit

Check all available flags:

python QelmT.py --help

(Outdated but Working) Legacy Scripts

We continue to include the older scripts for users who wish to see or compare the original QELM approach. They are still functional but are no longer actively updated.

Qelm2.py

A simple command-line script for:

  • Training (--train)
  • Inference (--inference)
  • Basic model save/load

QelmGUI.py

A Tkinter-based GUI with:

  • Dataset selection, training hyperparams, real-time logs & progress bars
  • Inference tab for text generation

QELMChatUI.py

Chat-like interface (a la ChatGPT style):

  • Multi-turn conversation with QELM
  • Model selection, load/save, conversation logs

Note: Both QelmGUI.py and QELMChatUI.py require a local Python environment with Tkinter.


Project Structure

QELM/
├── QelmT.py                # NEW: Unified training+inference script (Recommended)
├── Qelm2.py                # Legacy CLI script
├── QelmGUI.py              # Legacy GUI for training & inference
├── QELMChatUI.py           # Legacy Chat UI
├── requirements.txt
├── docs/
│   └── images/
│       ├── QELM_Diagram.png
│       ├── quantum.png
│       └── Qelm.png
└── README.md               # This documentation

License

This project is licensed under the MIT License. See the LICENSE file for details.


Contact

For additional guidance, collaboration, or bug reports: