New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

FEAT: Create many shot jailbreak orchestrator #709

Open

AdrGav941 wants to merge 14 commits into Azure:main from AdrGav941:create-many-shot-jailbreak-orchestrator

+202 −100

AdrGav941 commented Feb 12, 2025 •

edited

Loading

Description

Added the Many Shot Jailbreak Orchestrator as a child of the Prompt Sending Orchestrator
Information on the Jailbreak can be found here:
Many-Shot Jailbreaking Research by Anthropic.

Tests and Documentation

Unit tests for this change can be found here: tests\unit\orchestrator\test_many_shot_orchestrator.py

Modified documentation that previously described the Many Shot Jailbreak now makes use of the new Orchestrator and can be found here:
doc\code\orchestrators\many_shot_jailbreak.ipynb

I ran JupyText to generate the notebook referenced above using the following command:
jupytext --execute --to notebook .\doc\code\orchestrators\many_shot_jailbreak.py

Adrian Gavrila added 11 commits

February 10, 2025 16:36


          Adding Many Shot Jailbreak Orchestrator and initial testing

571d331


          Adding assertions for each use : assistant entry in test_construct_ma…

fc894f4

…ny_shot_dialogue_static_examples


          Adding Send prompt test for random examples and tests for init

74c91d4


          Removing optional tag on isTest, adding new jupyter notebook .py file…

776ea6a

… as well as .ipynb generated by jupytext


          Fixing formatting and unused imports

7a6eb87


          Fixing long lines referenfced by commit hook

773249a


          Automatic changed via commit hooks

040c69a


          Automatic changes via commit hooks

dc54a90


          Merge remote-tracking branch 'refs/remotes/adrgav941/create-many-shot…

fd39a5b

…-jailbreak-orchestrator' into create-many-shot-jailbreak-orchestrator


          More commit hook automatic changes

84a51de


          Removing captured notebook output

b7b05c7

AdrGav941 mentioned this pull request

Add Many Shot Jailbreak Orchestrator as a Subclass of Prompt Sending Orchestrator #708

Open


          Merge branch 'main' into create-many-shot-jailbreak-orchestrator

0b9231c

AdrGav941 changed the title ~~[FEAT] Create many shot jailbreak orchestrator (#708)~~ [FEAT] Create many shot jailbreak orchestrator

romanlutz reviewed

View reviewed changes

Contributor

romanlutz left a comment

Thanks for this contribution! At first, I wasn't quite sure if we want this at all. It doesn't make things a lot easier than with the plain PromptSendingOrchestrator, but every little bit of removed friction helps I think. Plus, it highlights the technique somewhat more than it would be if we had only the example.

I left a few comments. Let me know what you think.

pyrit/orchestrator/single_turn/many_shot_jailbreak_orchestrator.py Outdated

+                          str: The constructed many shot dialogue
+                      """
+                      # Fetch the Many Shot Jailbreaking dataset
+                      examples = fetch_many_shot_jailbreaking_dataset()

Contributor

romanlutz Feb 13, 2025

This should be configurable IMO. You could have a many_shot_examples arg on the constructor that defaults to None and in that case we fetch it this way. That allows people to specify their own. Fetching should only happen once. The selection from the set of examples can be random at prompt-sending time

Author

AdrGav941 Feb 13, 2025

Thank you this is a great point, I added the configurable example database logic into the constructor

pyrit/orchestrator/single_turn/many_shot_jailbreak_orchestrator.py Outdated

+                          self.num_examples = len(examples)
+                      # Choose num_examples either static or random examples from the dataset
+                      examples = examples[: self.num_examples] if (self.isTest) else random.sample(examples, self.num_examples)

Contributor

romanlutz Feb 13, 2025

This can be done with mocking random.sample and without this isTest variable in the code.

Check test_ansi_attack_converter.py for an example of mocking random.choice, or test_char_swap_generator_converter.py for an example of mocking random.randint

Author

AdrGav941 Feb 13, 2025

Thank you! I made the change to mock the random.samples function and removed the isTest variable.

pyrit/orchestrator/single_turn/many_shot_jailbreak_orchestrator.py

+                      objective_target: PromptChatTarget,
+                      scorers: Optional[list[Scorer]] = None,
+                      verbose: bool = False,
+                      num_examples: int = 3,

Contributor

romanlutz Feb 13, 2025

I think the paper found that it works better with more examples. Maybe the default should be higher? @KutalVolkan may have thoughts.

Author

AdrGav941 Feb 13, 2025 •

edited

Loading

I saw this as well, I chose 3 only because the old notebook using Prompt Sending Orchestrator took 3 examples (used to be static examples[4:7]) but if we should default to more examples I am all for it

pyrit/orchestrator/single_turn/many_shot_jailbreak_orchestrator.py Outdated

+                      self.num_examples = num_examples
+                      self.isTest = isTest
+                  async def construct_many_shot_dialogue(self, malicious_prompt: str) -> str:

Contributor

romanlutz Feb 13, 2025

"prompt" will do I think. It's more consistent with our terminology elsewhere

Author

AdrGav941 Feb 13, 2025

Done! Thanks

pyrit/orchestrator/single_turn/many_shot_jailbreak_orchestrator.py Outdated

+                      self.num_examples = num_examples
+                      self.isTest = isTest
+                  async def construct_many_shot_dialogue(self, malicious_prompt: str) -> str:

Contributor

romanlutz Feb 13, 2025

methods that we're not calling externally should be prefixed with _

Author

AdrGav941 Feb 13, 2025

Done! Thank you

pyrit/orchestrator/single_turn/many_shot_jailbreak_orchestrator.py


		return many_shot_dialogue

		async def send_prompts_async( # type: ignore[override]

Contributor

romanlutz Feb 13, 2025

what's that type hint override?

Author

AdrGav941 Feb 13, 2025

I am not quite sure why this works here, but it overrides an error that I was getting when running the automatic checks. I see it used in pyrit\orchestrator\single_turn\flip_attack_orchestrator.py so I figured it was safe practice. Please let me know if that is not the case and I can find another way.

romanlutz changed the title ~~[FEAT] Create many shot jailbreak orchestrator~~ FEAT: Create many shot jailbreak orchestrator

Adrian Gavrila added 2 commits

February 13, 2025 11:13


          Addressing formatting comments, adding mock for random.examples

163cbe6


          Merge branch 'create-many-shot-jailbreak-orchestrator' of https://git…

e514381

…hub.com/adrgav941/PyRIT into create-many-shot-jailbreak-orchestrator

AdrGav941 requested a review from romanlutz

February 13, 2025 16:24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet