Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Create many shot jailbreak orchestrator #709

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

AdrGav941
Copy link

@AdrGav941 AdrGav941 commented Feb 12, 2025

Description

Added the Many Shot Jailbreak Orchestrator as a child of the Prompt Sending Orchestrator
Information on the Jailbreak can be found here:
Many-Shot Jailbreaking Research by Anthropic.

Issue #708

Tests and Documentation

Unit tests for this change can be found here: tests\unit\orchestrator\test_many_shot_orchestrator.py

Modified documentation that previously described the Many Shot Jailbreak now makes use of the new Orchestrator and can be found here:
doc\code\orchestrators\many_shot_jailbreak.ipynb

I ran JupyText to generate the notebook referenced above using the following command:
jupytext --execute --to notebook .\doc\code\orchestrators\many_shot_jailbreak.py

@AdrGav941 AdrGav941 changed the title [FEAT] Create many shot jailbreak orchestrator (#708) [FEAT] Create many shot jailbreak orchestrator Feb 12, 2025
Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! At first, I wasn't quite sure if we want this at all. It doesn't make things a lot easier than with the plain PromptSendingOrchestrator, but every little bit of removed friction helps I think. Plus, it highlights the technique somewhat more than it would be if we had only the example.

I left a few comments. Let me know what you think.

str: The constructed many shot dialogue
"""
# Fetch the Many Shot Jailbreaking dataset
examples = fetch_many_shot_jailbreaking_dataset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable IMO. You could have a many_shot_examples arg on the constructor that defaults to None and in that case we fetch it this way. That allows people to specify their own. Fetching should only happen once. The selection from the set of examples can be random at prompt-sending time

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you this is a great point, I added the configurable example database logic into the constructor

self.num_examples = len(examples)

# Choose num_examples either static or random examples from the dataset
examples = examples[: self.num_examples] if (self.isTest) else random.sample(examples, self.num_examples)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done with mocking random.sample and without this isTest variable in the code.

Check test_ansi_attack_converter.py for an example of mocking random.choice, or test_char_swap_generator_converter.py for an example of mocking random.randint

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I made the change to mock the random.samples function and removed the isTest variable.

objective_target: PromptChatTarget,
scorers: Optional[list[Scorer]] = None,
verbose: bool = False,
num_examples: int = 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the paper found that it works better with more examples. Maybe the default should be higher? @KutalVolkan may have thoughts.

Copy link
Author

@AdrGav941 AdrGav941 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this as well, I chose 3 only because the old notebook using Prompt Sending Orchestrator took 3 examples (used to be static examples[4:7]) but if we should default to more examples I am all for it

self.num_examples = num_examples
self.isTest = isTest

async def construct_many_shot_dialogue(self, malicious_prompt: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"prompt" will do I think. It's more consistent with our terminology elsewhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks

self.num_examples = num_examples
self.isTest = isTest

async def construct_many_shot_dialogue(self, malicious_prompt: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods that we're not calling externally should be prefixed with _

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you


return many_shot_dialogue

async def send_prompts_async( # type: ignore[override]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's that type hint override?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure why this works here, but it overrides an error that I was getting when running the automatic checks. I see it used in pyrit\orchestrator\single_turn\flip_attack_orchestrator.py so I figured it was safe practice. Please let me know if that is not the case and I can find another way.

@romanlutz romanlutz changed the title [FEAT] Create many shot jailbreak orchestrator FEAT: Create many shot jailbreak orchestrator Feb 13, 2025
@AdrGav941 AdrGav941 requested a review from romanlutz February 13, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants