ClinDiag

This is the official repo for the paper ClinDiag: Grounding Large Language Model in Clinical Diagnostics.

Demo website: https://clindiag.streamlit.app/

Installation

Set up a virtual environment

When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state. We use conda as an example here:

Create and activate:

$ conda create -n clindiag python==3.11.1
$ conda activate clindiag

To deactivate later, run:

(clindiag) conda deactivate

Install dependencies

(clindiag) pip install -r requirements.txt

Usage

Before running a script, go to configs/OAI_Config_List.json to fill in your model and API key.

{
    "model": "gpt-4o-mini",
    "api_key": "[YOUR_API_KEY]",
    "base_url": "[YOUR_BASE_URL]",
    "tags": [
        "x_gpt4omini"
    ]
}

The tags will be used to filter selected model(s) for each stage, see parse_args() for details.

Human+LLM

This script implements a human-LLM collaboration framework where LLMs serve as an assistant to answer physician's questions.

(clindiag) python code/test_human_llm.py --data_dir benchmark_dataset

Human Alone

This is to simulate the human-alone scenario where a physician performs the clinical diagnostic procedure all by itself in the ClinDiag framework.

(clindiag) python code/test_human_alone.py --data_dir benchmark_dataset

Ablation Study

The following scripts were used for ablation study. We examined the effects of (1) multi-doctor collaboration, (2) introducing a critic agent, and (3) prompt engineering on diagnostic performance.

1. Multi-doctor agents

We tested the effect of having 2–3 doctor agents collaborate in the clinical decision making process.

(clindiag) python code/trial_stepwise_multiagent_converse.py --data_dir benchmark_dataset --num_specialists 2

--num_specialists: number of doctor agents, defaults to 3

2. Critic agent

This framework incorporates a critic agent to suggest further revisions on doctor agent's questions.

(clindiag) python code/trial_stepwise_nochain_critic.py --data_dir benchmark_dataset --model_name_critic x_gpt4omini

--model_name_critic: model used for the critic agent, defaults to gpt-4o-mini

3. Expert prompt

This script adopts expert-generated prompts.

(clindiag) python code/trial_stepwise_nochain_expert_prompt.py --data_dir benchmark_dataset

Datasets

ClinDiag-Benchmark (n=4,421)

./benchmark_dataset.zip

(To uncompress, run unzip benchmark_dataset.zip in the root directory)

A comprehensive clinical dataset comprising 4,421 real-world cases, encompassing both rare and common diseases across 32 specialties.

Standardized Patients (n=35)

./human_examiner_scripts/

A set of 35 patient scripts sourced from the hospital’s Objective Structured Clinical Examination (OSCE) test dataset for standardized patient training.

Fine-Tuning Data (n=7,616)

./finetune_data.zip

(To uncompress, run unzip finetune_data.zip in the root directory)

The multi-turn chat dataset used for fine-tuning a chat model. Each conversation example was constructed from a quality-checked real-world case and structured to adhere to standard clinical diagnostic practice. The data is available in both jsonl and json formats.

finetune_data_messages.jsonl:

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

finetune_data_conversations.json:

{
    "conversations": [
        [
            {"from": "system", "value": "..."},
            {"from": "user", "value": "..."},
            {"from": "assistant", "value": "..."},
        ],
        [
            {"from": "system", "value": "..."},
            {"from": "user", "value": "..."},
            {"from": "assistant", "value": "..."},
        ]
    ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ClinDiag

Table of Contents

Installation

Set up a virtual environment

Install dependencies

Usage

Human+LLM

Human Alone

Ablation Study

1. Multi-doctor agents

2. Critic agent

3. Expert prompt

Datasets

ClinDiag-Benchmark (n=4,421)

Standardized Patients (n=35)

Fine-Tuning Data (n=7,616)

Files

README.md

Latest commit

History

README.md

File metadata and controls

ClinDiag

Table of Contents

Installation

Set up a virtual environment

Install dependencies

Usage

Human+LLM

Human Alone

Ablation Study

1. Multi-doctor agents

2. Critic agent

3. Expert prompt

Datasets

ClinDiag-Benchmark (n=4,421)

Standardized Patients (n=35)

Fine-Tuning Data (n=7,616)