-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: PDF Injection for RAG Vulnerabilities #541
Comments
@KutalVolkan I totally agree! I'd like to share what we already have first, and then we can compare and see what makes sense. We call this type of indirect prompt injection XPIA (for cross-domain prompt injection attack) and there's an XPIAOrchestrator. In short, you have one target ( Let's make it concrete with an example: In an applicant platform you upload your resume as PDF and they store it in a blob store. Then, the recruiter can kick off analysis of all the available PDFs to assign scores for fit, let's say 1 to 10. A slight variation of this might be that it's not manually kicked off by a human but rather triggered upon upload for each PDF individually. You'll see why I make this distinction in a second. With Now that we have a better overview of what exists, how does this work for the RAG scenario you mention?
[Not yet added: theoretically, one could use the scorer feedback to iterate on the initial XPIA. Would love to get to this someday!] Needless to say, we have some work to do here to make this easier to understand. For example, documenting how RAG fits into this XPIA setup. Any recommendations are (as always) appreciated. Wdyt? If the |
Hello @romanlutz, Thank you for the detailed overview! I will focus on assessing the existing XPIAOrchestrator and PDF Converter capabilities in PyRIT once the PDF Converter is completed. If the PDF Converter is based on After testing, I plan to document my findings, including detailed documentation on how RAG fits into the XPIA setup. This will help align the results with the feature supported by Garak and ensure a comprehensive comparison. |
Sounds great! I should also mention that our existing XPIA work is by no means set in stone. We can modify it if it's useful to support more scenarios. Thanks! |
Hello Roman, The PDFConverter is ready! We can now generate new PDFs and embed invisible text by matching the font color to the background. However, because fpdf2 doesn’t allow modifying existing PDFs, we’ll need something like pypdf 5.1.0 for already designed CVs. Here are our immediate options:
How does this fit with RAG?
Where XPIA Fits In
So, while the initial GPT-4 recruiter demo focuses on direct prompt injection, the same concept applies if we have an embedding-based retrieval layer. The orchestrator logic—upload, retrieval, and final GPT-4 call—shows how hidden instructions might manipulate a model in a more realistic, at-scale recruiting flow. Wdyt? Feel free to share any preferences or suggestions. I’m open to whichever direction you think is best! |
Ah, we ran into the limits of fpdf2 faster than so had hoped... Looks like the license for pypdf is permissible so we should be fine. Thanks for investigating that already! Can you elaborate on what value the vector DB provides for the recruiter? Haven't used them so far so I'm probably missing something obvious. Apart from that, this sounds like a pretty cool scenario that should illustrate the risks nicely! I can't wait to see this. If it requires some tweaks for XPIA that's totally fine! It's built with just a single use case in mind and might need generalizing here and there. |
Hello @romanlutz, If you mean by value, the key advantage of using a vector database for the AI recruiter lies in its ability to perform semantic search. This allows for matching résumés to job descriptions conceptually, when the exact keywords differ. For instance, it could link "experience with distributed systems" in a job description to "expertise in Kafka and microservices architecture" in a résumé. This ensures that candidates with relevant technical skills are accurately discovered, even if the terminology varies. Here’s a code example to demonstrate what I mean. While this explains how semantic search and embeddings work, we could extend it into a full demo where XPIA attacks the AI recruiter to test how hidden malicious text in PDFs impacts its behavior, if that’s within scope for you. Let me know if this answers your question or if I’ve gone off-track! 😊 import os
from pypdf import PdfReader
from openai import OpenAI
import pandas as pd
import chromadb
from dotenv import load_dotenv
load_dotenv()
# -------------------------
# Step 1: Initialize Chroma Client and Create Collection
# -------------------------
# Initialize Chroma client
chroma_client = chromadb.Client()
# Create or get an existing collection
collection_name = "resume_collection"
collection = chroma_client.get_or_create_collection(name=collection_name)
# -------------------------
# Step 2: Extract Text from PDFs
# -------------------------
def extract_text_from_pdf(pdf_path):
"""Extracts text from a PDF file."""
text = ""
with open(pdf_path, 'rb') as file:
reader = PdfReader(file)
for page_num in range(len(reader.pages)):
page = reader.pages[page_num]
extracted = page.extract_text()
if extracted:
text += extracted + " "
return text.strip()
pdf_directory = r'C:\Users\vkuta\projects\PyRIT\results\dbdata\urls' # Replace with your PDF directory
resumes = []
for filename in os.listdir(pdf_directory):
if filename.lower().endswith('.pdf'):
pdf_path = os.path.join(pdf_directory, filename)
extracted_text = extract_text_from_pdf(pdf_path)
resumes.append({
'id': str(len(resumes) + 1), # Chroma requires string IDs
'name': os.path.splitext(filename)[0], # Assuming filename is the candidate's name
'text': extracted_text
})
# -------------------------
# Step 3: Generate Embeddings
# -------------------------
client = OpenAI(api_key=os.getenv('OPENAI_KEY'))
def get_embedding(text, model="text-embedding-3-small"):
"""Generates an embedding for the given text using OpenAI's API."""
text = text.replace("\n", " ")
response = client.embeddings.create(input=[text], model=model)
return response.data[0].embedding
# Generate embeddings for each résumé
for resume in resumes:
resume['embedding'] = get_embedding(resume['text'])
# -------------------------
# Step 4: Store Embeddings in ChromaDB
# -------------------------
# Create a DataFrame for easier manipulation
df = pd.DataFrame(resumes)
# Prepare data for ChromaDB
documents = df['text'].tolist()
metadatas = df[['name']].to_dict(orient='records')
ids = df['id'].tolist()
embeddings = df['embedding'].tolist()
# Add documents to the ChromaDB collection
collection.add(
documents=documents,
metadatas=metadatas,
ids=ids,
embeddings=embeddings
)
print(f"Number of vectors in the ChromaDB collection: {collection.count()}")
print("Debug: Documents in Collection:", documents)
# -------------------------
# Step 5: Perform Semantic Search with ChromaDB
# -------------------------
def search_candidates(job_description_text, k=5):
"""Searches for the top k candidates that best match the job description."""
# Generate embedding for the job description
job_embedding = get_embedding(job_description_text)
# Perform similarity search in ChromaDB
results = collection.query(
query_embeddings=[job_embedding],
n_results=k,
include=['documents', 'metadatas', 'distances'] # Ensure documents are included
)
print("Debug: Query Results:", results)
if not results or not results.get('documents') or len(results['documents'][0]) == 0:
print("No results found.")
return []
documents = results.get('documents', [[]])[0] or ["No content available"]
metadatas = results.get('metadatas', [[]])[0]
distances = results.get('distances', [[]])[0]
print("Debug: Documents:", documents)
print("Debug: Metadata:", metadatas)
print("Debug: Distances:", distances)
top_candidates = []
for i in range(min(len(documents), k)): # Ensure we don't exceed available results
result = documents[i]
metadata = metadatas[i]
distance = distances[i]
top_candidates.append({
'name': metadata.get('name', 'Unknown'),
'text': result[:100] + "..." if result != "No content available" else result, # Snippet of the résumé
'distance': distance
})
return top_candidates
# Example job description
job_description = "Looking for a software engineer with experience in machine learning and Python."
# Perform search
top_matches = search_candidates(job_description, k=3)
# Display top matches
print(f"Job Description: {job_description}\n")
print("Top Candidates:")
for match in top_matches:
print(f"Name: {match['name']}")
print(f"Résumé Snippet: {match['text']}")
print(f"Distance: {match['distance']:.4f}\n") |
Thank you! Yes, that actually makes perfect sense. I haven't heard of chroma and probably would have used an Azure AI search which is similar but that's a small detail. The overall flow makes sense to me. |
Hello Roman, Thank you for confirming! I’ll proceed with the next steps as outlined. If anything else comes up or needs adjusting along the way, feel free to let me know. 😊 I will proceeding with two separate PRs:
|
Proposal for PDF Injection Feature to Address RAG Vulnerabilities
Use Case
Testing how AI models react to hidden or indirect prompt injections embedded within PDF files. This capability can help identify vulnerabilities where models can be manipulated through subtle, non-visible text instructions, which is critical for evaluating AI robustness in automated document processing systems, such as in HR processes evaluating CVs. For example, see this demonstration where a GPT-4 recruiter is tricked.
Next Steps
My suggestion is to start with a simple PDF injection feature that allows users to embed invisible text into existing or new PDFs, with configurable parameters for font size and opacity. I suggest implementing this using a
converter
andutility/helper classes
.Please let me know if you have any special preferences or best practices for how this should be approached.
Note: PyRIT’s PDF features likely do not support invisible text injection. If incorrect, please close this issue.
Parent Issue: PyRIT Issue #511
The text was updated successfully, but these errors were encountered: