Bouzyges

Bouzyges (pronounced boo-zee-jes) is a Python program to interactively generate semantic graphs of medical terms utilizing the SNOMED CT attribute-value pairs. The script can be interfaced with a LLM model to generate graphs in automated fashion. End result of the script is a set of SNOMED CT concepts, that serve as the closest possible strict supertypes that together fully capture the meaning of the input term.

Intended use

In current form, Bouzyges serves as a proof-of-concept of a novel approach to automating ontology mapping and standardization. In the future, possible applications include:

Mapping of medical terms to SNOMED CT concepts
SNOMED CT authoring support
Automated SNOMED CT quality assurance
Automated creation of custom local Standard concepts in OMOP CDM.

Installation

Bouzyges requires Python 3.12 or later. To install the script, clone the repository, initialize a virtual environment and install the required packages:

git clone https://github.com/OHDSI/Bouzyges.git
cd Bouzyges
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Prerequisites

SNOMED CT

Current implementation of Bouzyges relies on Snowstorm REST API to interface with SNOMED CT. To use the API, you need to provide the endpoint and the API key either as environment variables or inside .env file in the root directory of the project (see below).

Snowstorm version 10 with SNOMED International (July 2024 release) was tested. We recommend using the Docker image provided by SNOMED International to run Snowstorm locally and loading the SNOMED RF2 release archive via Swagger UI.

External links:

Snowstorm GitHub repository
Using Snowstorm with Docker
SNOMED International release in RF2 format (hosted by NLM)

LLM interface

Bouzyges relies on outputting LLM prompts and parsing their input; currently, three options are supported:

Manual input: the user is prompted to input the desired LLM prompt and is expected to provide the input manually. This can be used to debug the script or test different LLMs interactively. To use this, set PROMPTER_OPTION constant to "human" in the body of the script. Better configuration interface is coming soon.
OpenAI: to use this API, you will need to ensure that a valid OPENAI_API_KEY is set either as environment variable or (recommended) in env file (see below). To use this, set PROMPTER_OPTION to "openai"
Azure: Azure OpenAI API is also supported. To use this API, you will need to provide the API information either an by explicitly setting environment variables or (preferred way) inside .env file. The PROMPTER_OPTION should be set to "azure".

Implementing new interfaces

It is possible to implement additional API interfaces (e.g. to locally available models) by inheriting from PromptFormat class to generate prompts in the correct format in inheriting from Prompter to provide interface to send prompts to the LLM.

`.env` file

To avoid accidental exposure of API keys, we strongly recommend using an .env file to manage environment variables. Bouzyges will try to automatically load the .env file in the working directory using the python-dotenv library.

Example content of the file:

# Snowstorm endpoint is always required
# This is example for default local/docker installation is given
export SNOWSTORM_ENDPOINT="https://localhost:8080/"

# OpenAI requirements
# Project API key created at https://platform.openai.com/api-keys
export OPENAI_API_KEY="sk-abc...def"

# Azure OpenAI interface requirements
# Attainable at your organization's infrastructure team
export AZURE_OPENAI_API_KEY="123abcd...789"
export AZURE_OPENAI_API_VERSION="2024-06-01"  # Most recent version
export AZURE_OPENAI_ENDPOINT="https://example.openai.azure.com/

Caching of results

Bouzyges will cache all calls to LLM APIs in an SQLite database prompt_cache.db. Prompts to the same model with the same API options will be reused across runs. Database file can be read and analyzed by any tool supporting sqlite3 APIs. Schema DDL is stored in init_prompt_cache.sql file.

Usage

Warning

Bouzyges is currently in the early development stage and is not yet ready for production use. The script makes a lot of API calls and may consume a LOT of tokens. Currently, processing one concept consumes tokens on magnitude of 150,000 (3 cents with gpt-4o-mini).

Currently, only exemplary usage inside the script is supported; batch loading interface is planned to be implemented very soon. To run the script, execute the following command:

$ python bouzyges.py

License

The code is not yet licensed and is provided as-is. The code is provided for educational purposes only and is not intended for production use. Please refrain from disributing the code or using it in any commercial or production environment.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
Test.csv		Test.csv
default_config.json		default_config.json
icd11_sieve.py		icd11_sieve.py
icon.png		icon.png
init_prompt_cache.sql		init_prompt_cache.sql
main.py		main.py
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bouzyges

Intended use

Installation

Prerequisites

SNOMED CT

External links:

LLM interface

Implementing new interfaces

`.env` file

Caching of results

Usage

License

Current work in progress

About

Releases

Packages

Languages

OHDSI/Bouzyges

Folders and files

Latest commit

History

Repository files navigation

Bouzyges

Intended use

Installation

Prerequisites

SNOMED CT

External links:

LLM interface

Implementing new interfaces

.env file

Caching of results

Usage

License

Current work in progress

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`.env` file

Packages