-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(smart-apply): stripping of markdown code blocks #7105
Conversation
Add instructions in the chat preamble to close fenced code blocks correctly and introduce a transform that wraps code blocks starting with ```markdown in backticks to preserve their formatting in the UI when it contains other code blocks.
- After removing the filtering having these tests means they fail
… constant We need to check for PromptString as that is what prompt contains, not just simple strings
`${MARKDOWN_CODE_BLOCK_DELIMITER_START}\\s*([\\s\\S]*?)\\s*${MARKDOWN_CODE_BLOCK_DELIMITER_END}`, | ||
'g' | ||
) | ||
// const MARKDOWN_CODE_BLOCK_DELIMITER_START = '```(?:\\w+)?' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ichim-david!
Just trying to understand this PR more, do we need to remove these? Or are they just commented out for this branch.
We need these for typical edit commands, as often LLMs will give us code in Markdown blocks and in 99% of cases we do not want to apply those to the document.
Is this essentially:
- We get a chat response with some Markdown text, it contains code blocks within it.
- We try to apply this Markdown to the document (smart apply), but our edit logic removes these code blocks.
…mprove text stripping logic - Remove markdown tags from non markdown files - Avoid transformation until message is no longer in progress to avoid expensive mutations
@umpox Tests fail but cleaning logic is now sound and this work can now be checked for any potential fixes or regressions from this work |
…me that we perform the sanitation
:( tests were green before merge from master to fix conflict and adapt to how others referenced the preamble chats 58e9b46 |
Hey @ichim-david! |
let strippedText = text | ||
// Strip specific XML tags referenced in the prompt, e.g. <CODE511> | ||
.replaceAll(PROMPT_TOPIC_REGEX, '') | ||
// Strip Markdown syntax for code blocks, e.g. ```typescript. | ||
.replaceAll(MARKDOWN_CODE_BLOCK_REGEX, block => | ||
|
||
// Strip Markdown syntax for code blocks, e.g. ```typescript, leaving them for markdown files | ||
if (task?.document?.languageId !== 'markdown') { | ||
strippedText = strippedText.replaceAll(MARKDOWN_CODE_BLOCK_REGEX, block => | ||
block.replace(MARKDOWN_CODE_BLOCK_START, '').replace(MARKDOWN_CODE_BLOCK_END, '') | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to extract this into a function and add some more docs for future.
Then we can just do an early return if it's a Markdown file which makes this a bit easier to read too!
/**
* Strips the text of any unnecessary content.
* This includes:
* 1. Prompt topics, e.g. <CODE511>. These are used by the LLM to wrap the output code.
* 2. Markdown code blocks, e.g. ```typescript. Most LLMs are trained to produce Markdown-suitable responses.
*/
function stripText(text: string, task: FixupTask): string {
const strippedText = text
// Strip specific XML tags referenced in the prompt, e.g. <CODE511>
.replaceAll(PROMPT_TOPIC_REGEX, '')
if (task.document.languageId === 'markdown') {
// Return this text as is, we do not want to strip Markdown blocks as they may be valuable
// in Markdown files
return strippedText
}
// Strip Markdown syntax for code blocks, e.g. ```typescript.
return strippedText.replaceAll(MARKDOWN_CODE_BLOCK_REGEX, block =>
block.replace(MARKDOWN_CODE_BLOCK_START, '').replace(MARKDOWN_CODE_BLOCK_END, '')
)
}
The backport to
To backport this PR manually, you can either: Via the sg toolUse the sg backport -r M72 -p 7105 Via your terminalTo backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-M72 M72
# Navigate to the new working tree
cd .worktrees/backport-M72
# Create a new branch
git switch --create backport-7105-to-M72
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 74aeb4bf086670f28dca061e43be322b1e8cf850
# Push it to GitHub
git push --set-upstream origin backport-7105-to-M72
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-M72 If you encouter conflict, first resolve the conflict and stage all files, then run the commands below: git cherry-pick --continue
# Push it to GitHub
git push --set-upstream origin backport-7105-to-M72
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-M72
Once the pull request has been created, please ensure the following:
|
Issue: https://linear.app/sourcegraph/issue/CODY-4428/markdown-code-blocks-render-incorrectly-due-to-unintended-triple Ask Cody to generate a readme that documents a file with code examples. Notice when applying the changes that the code blocks remain <!-- Required. See https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles. --> --------- Co-authored-by: Tom Ross <[email protected]> (cherry picked from commit 74aeb4b)
Description
Issue: https://linear.app/sourcegraph/issue/CODY-4428/markdown-code-blocks-render-incorrectly-due-to-unintended-triple
Test plan
Ask Cody to generate a readme that documents a file with code examples.
Notice when applying the changes that the code blocks remain