Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secret scanner #1068

Merged
merged 11 commits into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,19 @@ Write me a poem

---

### Pluggable Secret Scanning

Scan your chats for secrets using [secret scanning](/genaiscript/reference/scripts/secret-scanning).

```json
{
"secretPatterns": {
...,
"OpenAI API Key": "sk-[A-Za-z0-9]{32,48}"
}
}
```

### ⚙ Automate with CLI or API

Automate using the [CLI](https://microsoft.github.io/genaiscript/reference/cli) or [API](https://microsoft.github.io/genaiscript/reference/cli/api).
Expand Down
15 changes: 13 additions & 2 deletions docs/public/schemas/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"modelAliases": {
"type": "object",
"patternProperties": {
"^[a-zA-Z0-9_:]+$": {
"^[a-zA-Z0-9_]+$": {
"oneOf": [
{
"type": "string",
Expand All @@ -64,7 +64,18 @@
}
},
"additionalProperties": true,
"description": "Aliases for model identifiers (provider:model:tag)"
"description": "Aliases for model identifiers (name)"
},
"secretPatterns": {
"type": "object",
"patternProperties": {
"^[a-zA-Z0-9_:\\-\\. ]+$": {
"type": ["string", "null"],
"description": "Secret regex"
}
},
"additionalProperties": true,
"description": "Secret scanners to use for scanning chat messages"
}
}
}
17 changes: 16 additions & 1 deletion docs/src/content/docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ to validate [content safety](/genaiscript/reference/scripts/content-safety).

```js wrap
script({ ...,
system: ["system.safety_harmful_content", ...],
systemSafety: "default",
contentSafety: "azure" // use azure content safety
})

Expand Down Expand Up @@ -486,6 +486,21 @@ importTemplate("poem.prompty", { something: "code " })

</Card>

<Card title="Pluggable Secret Scanning" icon="seti:license">

Scan your chats for secrets using [secret scanning](/genaiscript/reference/scripts/secret-scanning).

```json title="genaiscript.config.json"
{
"secretPatterns": {
...,
"OpenAI API Key": "sk-[A-Za-z0-9]{32,48}"
}
}
```

</Card>

<Card title="Automate with CLI" icon="github">

Automate using the [CLI](/genaiscript/reference/cli),
Expand Down
15 changes: 15 additions & 0 deletions docs/src/content/docs/reference/cli/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,7 @@ Commands:
jsonl2json Converts JSONL files to a JSON file
prompty [options] <file...> Converts .prompty files to genaiscript
jinja2 [options] <file> Renders Jinj2 or prompty template
secrets <file...> Applies secret scanning and redaction to files
```

### `parse data`
Expand Down Expand Up @@ -578,6 +579,20 @@ Options:
-h, --help display help for command
```

### `parse secrets`

```
Usage: genaiscript parse secrets [options] <file...>

Applies secret scanning and redaction to files

Arguments:
file input files

Options:
-h, --help display help for command
```

## `info`

```
Expand Down
8 changes: 8 additions & 0 deletions docs/src/content/docs/reference/scripts/content-safety.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@ The following safety prompts are included by default when running a prompt, unle
- [system.safety_jailbreak](/genaiscript/reference/scripts/system#systemsafety_jailbreak), safety script to ignore prompting instructions in code sections, which are created by the `def` function.
- [system.safety_protected_material](/genaiscript/reference/scripts/system#systemsafety_protected_material) safety prompt against Protected material. See https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/safety-system-message-templates

You can ensure those safety are always used by setting the `systemSafety` option to `default`.

```js
script({
systemSafety: "default",
})
```

Other system scripts can be added to the prompt by using the `system` option.

- [system.safety_ungrounded_content_summarization](/genaiscript/reference/scripts/system#systemsafety_ungrounded_content_summarization) safety prompt against ungrounded content in summarization
Expand Down
13 changes: 12 additions & 1 deletion docs/src/content/docs/reference/scripts/images.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Local files are loaded and encoded as a data uri.

## Buffer, Blob, ReadableStream

The `defImages` function also supports [Buffer](https://nodejs.org/api/buffer.html),
The `defImages` function also supports [Buffer](https://nodejs.org/api/buffer.html),
[Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob), [ReadableStream](https://nodejs.org/api/stream.html).

This example takes a screenshot of bing.com and adds it to the images.
Expand Down Expand Up @@ -108,3 +108,14 @@ defImages(img, { maxWidth: 800 })
// and / or
defImages(img, { maxHeight: 800 })
```

## Tiling

When you specify the `tiled: true` option, all the images will
be tiled in a single image, after all the transformations are applied.

The resulting image will be further resized to fit into the maximum image size constraints.

```js "tiled: true"
defImages(env.files, { details: "low", tiled: true })
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a brief introduction or context to the "Tiling" section to help readers understand its relevance and purpose within the document.

AI-generated content by pr-docs-review-commit markdown_structure may be incorrect

77 changes: 77 additions & 0 deletions docs/src/content/docs/reference/scripts/secret-scanning.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: Secret Scanning
sidebar:
order: 10
---

One should not have secrets lying around in their codebase, but sometimes it happens.
To help you avoid this, we have a secret scanning feature that will scan your codebase for secrets
and warn you if any are found.

:::note

The secret scanning feature is by no means exhaustive and should not be relied upon as the sole
method of securing your codebase. It is a best-effort feature that will help you avoid common mistakes.

:::

## Supported patterns

By default set of secret patterns
are defined at https://github.com/microsoft/genaiscript/tree/main/packages/core/src/config.json.

:::cautio

\is is not a complete list by design, and needs to be updated to match your needs.

:::

You can find examples of patterns at https://github.com/mazen160/secrets-patterns-db/.

## Scanning messages

By default, all messages sent to LLMs are scanned and redacted if they contain secrets.

You can disable secret scanning alltogher by setting the `secretScanning` option to `false` in your script.

```js
script({
secretScanning: false,
})
```

## Configuring patterns

If you have a specific pattern that you want to scan for, you can configure it in your
[configuration file](/genaiscript/reference/scripts/configuration-files).

```json title="genaiscript.config.json"
{
"secretPatterns": {
...,
"my secret pattern": "my-secret-pattern-regex"
}
}
```

- do not use `^` or `$` in your regex pattern

### Disabling patterns

Set the pattern key to `null` or `false` to disable it.

```json title="genaiscript.config.json"
{
"secretPatterns": {
"OpenAI API Key": null
}
}
```

## CLI

You can test your patterns against files using the CLI.

```sh
npx --yes genaiscript parse secrets *
```
3 changes: 2 additions & 1 deletion docs/src/content/docs/reference/scripts/system.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3463,7 +3463,8 @@ system({
title: "Tools support",
})

$`Use tools if possible.
$`## Tools
Use tools if possible.
- **Do NOT invent function names**.
- **Do NOT use function names starting with 'functions.'.
- **Do NOT respond with multi_tool_use**.`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of a new heading without context can be confusing. Ensure that the new section provides clear information and is well-integrated with the existing content.

AI-generated content by pr-docs-review-commit markdown_structure may be incorrect

Expand Down
6 changes: 6 additions & 0 deletions packages/cli/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import {
parseHTMLToText,
parseJinja2,
parsePDF,
parseSecrets,
parseTokens,
prompty2genaiscript,
} from "./parse" // Parsing functions
Expand Down Expand Up @@ -479,6 +480,11 @@ export async function cli() {
"variables, as name=value passed to the template"
)
.action(parseJinja2)
parser
.command("secrets")
.description("Applies secret scanning and redaction to files")
.argument("<file...>", "input files")
.action(parseSecrets)

// Define 'info' command group for utility information tasks
const info = program.command("info").description("Utility tasks")
Expand Down
18 changes: 18 additions & 0 deletions packages/cli/src/parse.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import { splitMarkdown } from "../../core/src/frontmatter"
import { parseOptionsVars } from "./vars"
import { dataTryParse } from "../../core/src/data"
import { resolveFileContent } from "../../core/src/file"
import { redactSecrets } from "../../core/src/secretscanner"

/**
* This module provides various parsing utilities for different file types such
Expand Down Expand Up @@ -215,3 +216,20 @@ export async function prompty2genaiscript(
await writeText(gf, script)
}
}

export async function parseSecrets(files: string[]) {
const fs = await expandFiles(files)
let n = 0
for (const f of fs) {
const content = await readText(f)
const { found } = redactSecrets(content)
const entries = Object.entries(found)
if (entries.length) {
n++
console.log(
`${f}: ${entries.map(([k, v]) => `${k} (${v})`).join(", ")}`
)
}
}
if (n > 0) console.warn(`found secrets in ${n} of ${fs.length} files`)
}
6 changes: 6 additions & 0 deletions packages/core/src/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"$schema": "../../../docs/public/schemas/config.json",
"secretPatterns": {
"OpenAI API Key": "sk-[A-Za-z0-9]{32,48}"
}
}
10 changes: 8 additions & 2 deletions packages/core/src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import { resolveLanguageModel } from "./lm"
import { deleteEmptyValues } from "./cleaners"
import { errorMessage } from "./error"
import schema from "../../../docs/public/schemas/config.json"
import defaultConfig from "./config.json"

export async function resolveGlobalConfiguration(
dotEnvPath?: string
Expand All @@ -27,13 +28,14 @@ export async function resolveGlobalConfiguration(
const exts = ["yml", "yaml", "json"]

// import and merge global local files
let config: HostConfiguration = {}
let config: HostConfiguration = structuredClone(defaultConfig)
delete (config as any)["$schema"]
for (const dir of dirs) {
for (const ext of exts) {
const filename = resolve(`${dir}/${TOOL_ID}.config.${ext}`)
if (existsSync(filename)) {
const fileContent = readFileSync(filename, "utf8")
const parsed =
const parsed: HostConfiguration =
ext === "yml" || ext === "yaml"
? YAMLTryParse(fileContent)
: JSON5TryParse(fileContent)
Expand Down Expand Up @@ -63,6 +65,10 @@ export async function resolveGlobalConfiguration(
config?.modelEncodings || {},
parsed?.modelEncodings || {}
),
secretScanners: structuralMerge(
config?.secretPatterns || {},
parsed?.secretPatterns || {}
),
})
}
}
Expand Down
8 changes: 2 additions & 6 deletions packages/core/src/expander.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,7 @@ import {
import { createPromptContext } from "./promptcontext"
import { evalPrompt } from "./evalprompt"
import { renderAICI } from "./aici"
import {
addToolDefinitionsMessage,
appendSystemMessage,
} from "./chat"
import { addToolDefinitionsMessage, appendSystemMessage } from "./chat"
import { importPrompt } from "./importprompt"
import { parseModelIdentifier } from "./models"
import { runtimeHost } from "./host"
Expand Down Expand Up @@ -355,8 +352,7 @@ export async function expandTemplate(
}

const { responseType, responseSchema } = finalizeMessages(messages, {
responseType: template.responseType,
responseSchema: template.responseSchema,
...template,
fileOutputs,
trace,
})
Expand Down
3 changes: 2 additions & 1 deletion packages/core/src/genaisrc/system.tools.genai.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ system({
title: "Tools support",
})

$`Use tools if possible.
$`## Tools
Use tools if possible.
- **Do NOT invent function names**.
- **Do NOT use function names starting with 'functions.'.
- **Do NOT respond with multi_tool_use**.`
5 changes: 5 additions & 0 deletions packages/core/src/hostconfiguration.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,9 @@ export interface HostConfiguration {
* Model identifier to encoding mapping
*/
modelEncodings?: Record<string, string>

/**
* A map of secret name and their respective regex pattern
*/
secretPatterns?: Record<string, string>
}
Loading
Loading