-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Create many shot jailbreak orchestrator #709
base: main
Are you sure you want to change the base?
FEAT: Create many shot jailbreak orchestrator #709
Conversation
…ny_shot_dialogue_static_examples
… as well as .ipynb generated by jupytext
…-jailbreak-orchestrator' into create-many-shot-jailbreak-orchestrator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution! At first, I wasn't quite sure if we want this at all. It doesn't make things a lot easier than with the plain PromptSendingOrchestrator, but every little bit of removed friction helps I think. Plus, it highlights the technique somewhat more than it would be if we had only the example.
I left a few comments. Let me know what you think.
str: The constructed many shot dialogue | ||
""" | ||
# Fetch the Many Shot Jailbreaking dataset | ||
examples = fetch_many_shot_jailbreaking_dataset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be configurable IMO. You could have a many_shot_examples arg on the constructor that defaults to None and in that case we fetch it this way. That allows people to specify their own. Fetching should only happen once. The selection from the set of examples can be random at prompt-sending time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you this is a great point, I added the configurable example database logic into the constructor
self.num_examples = len(examples) | ||
|
||
# Choose num_examples either static or random examples from the dataset | ||
examples = examples[: self.num_examples] if (self.isTest) else random.sample(examples, self.num_examples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done with mocking random.sample
and without this isTest
variable in the code.
Check test_ansi_attack_converter.py
for an example of mocking random.choice
, or test_char_swap_generator_converter.py
for an example of mocking random.randint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I made the change to mock the random.samples function and removed the isTest variable.
objective_target: PromptChatTarget, | ||
scorers: Optional[list[Scorer]] = None, | ||
verbose: bool = False, | ||
num_examples: int = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the paper found that it works better with more examples. Maybe the default should be higher? @KutalVolkan may have thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this as well, I chose 3 only because the old notebook using Prompt Sending Orchestrator took 3 examples (used to be static examples[4:7]) but if we should default to more examples I am all for it
self.num_examples = num_examples | ||
self.isTest = isTest | ||
|
||
async def construct_many_shot_dialogue(self, malicious_prompt: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"prompt" will do I think. It's more consistent with our terminology elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks
self.num_examples = num_examples | ||
self.isTest = isTest | ||
|
||
async def construct_many_shot_dialogue(self, malicious_prompt: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
methods that we're not calling externally should be prefixed with _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thank you
|
||
return many_shot_dialogue | ||
|
||
async def send_prompts_async( # type: ignore[override] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's that type hint override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure why this works here, but it overrides an error that I was getting when running the automatic checks. I see it used in pyrit\orchestrator\single_turn\flip_attack_orchestrator.py so I figured it was safe practice. Please let me know if that is not the case and I can find another way.
…hub.com/adrgav941/PyRIT into create-many-shot-jailbreak-orchestrator
Description
Added the Many Shot Jailbreak Orchestrator as a child of the Prompt Sending Orchestrator
Information on the Jailbreak can be found here:
Many-Shot Jailbreaking Research by Anthropic.
Issue #708
Tests and Documentation
Unit tests for this change can be found here: tests\unit\orchestrator\test_many_shot_orchestrator.py
Modified documentation that previously described the Many Shot Jailbreak now makes use of the new Orchestrator and can be found here:
doc\code\orchestrators\many_shot_jailbreak.ipynb
I ran JupyText to generate the notebook referenced above using the following command:
jupytext --execute --to notebook .\doc\code\orchestrators\many_shot_jailbreak.py