fix(smart-apply): stripping of markdown code blocks #7105

ichim-david · 2025-02-15T20:06:37Z

Description

Issue: https://linear.app/sourcegraph/issue/CODY-4428/markdown-code-blocks-render-incorrectly-due-to-unintended-triple

Test plan

Ask Cody to generate a readme that documents a file with code examples.
Notice when applying the changes that the code blocks remain

Add instructions in the chat preamble to close fenced code blocks correctly and introduce a transform that wraps code blocks starting with ```markdown in backticks to preserve their formatting in the UI when it contains other code blocks.

- After removing the filtering having these tests means they fail

… constant We need to check for PromptString as that is what prompt contains, not just simple strings

umpox · 2025-02-17T10:03:26Z

vscode/src/edit/output/response-transformer.ts

-    `${MARKDOWN_CODE_BLOCK_DELIMITER_START}\\s*([\\s\\S]*?)\\s*${MARKDOWN_CODE_BLOCK_DELIMITER_END}`,
-    'g'
-)
+// const MARKDOWN_CODE_BLOCK_DELIMITER_START = '```(?:\\w+)?'


Hey @ichim-david!

Just trying to understand this PR more, do we need to remove these? Or are they just commented out for this branch.

We need these for typical edit commands, as often LLMs will give us code in Markdown blocks and in 99% of cases we do not want to apply those to the document.

Is this essentially:

We get a chat response with some Markdown text, it contains code blocks within it.

We try to apply this Markdown to the document (smart apply), but our edit logic removes these code blocks.

…AMBLE

…mprove text stripping logic - Remove markdown tags from non markdown files - Avoid transformation until message is no longer in progress to avoid expensive mutations

ichim-david · 2025-02-17T22:02:54Z

@umpox Tests fail but cleaning logic is now sound and this work can now be checked for any potential fixes or regressions from this work

…me that we perform the sanitation

…own test cases

ichim-david · 2025-02-18T19:56:27Z

:( tests were green before merge from master to fix conflict and adapt to how others referenced the preamble chats 58e9b46

julialeex · 2025-02-19T07:34:07Z

Hey @ichim-david!
nit: Can you add a more detailed PR description to what you fixed and why you did this? Can use this guide for reference. For example, you can link the linear ticket in the PR title

vscode/webviews/components/MarkdownFromCody.tsx

umpox · 2025-02-19T09:06:30Z

vscode/src/edit/output/response-transformer.ts

+    let strippedText = text
        // Strip specific XML tags referenced in the prompt, e.g. <CODE511>
        .replaceAll(PROMPT_TOPIC_REGEX, '')
-        // Strip Markdown syntax for code blocks, e.g. ```typescript.
-        .replaceAll(MARKDOWN_CODE_BLOCK_REGEX, block =>
+
+    // Strip Markdown syntax for code blocks, e.g. ```typescript, leaving them for markdown files
+    if (task?.document?.languageId !== 'markdown') {
+        strippedText = strippedText.replaceAll(MARKDOWN_CODE_BLOCK_REGEX, block =>
            block.replace(MARKDOWN_CODE_BLOCK_START, '').replace(MARKDOWN_CODE_BLOCK_END, '')
        )
+    }



It would be useful to extract this into a function and add some more docs for future.

Then we can just do an early return if it's a Markdown file which makes this a bit easier to read too!

/** * Strips the text of any unnecessary content. * This includes: * 1. Prompt topics, e.g. <CODE511>. These are used by the LLM to wrap the output code. * 2. Markdown code blocks, e.g. ```typescript. Most LLMs are trained to produce Markdown-suitable responses. */ function stripText(text: string, task: FixupTask): string { const strippedText = text // Strip specific XML tags referenced in the prompt, e.g. <CODE511> .replaceAll(PROMPT_TOPIC_REGEX, '') if (task.document.languageId === 'markdown') { // Return this text as is, we do not want to strip Markdown blocks as they may be valuable // in Markdown files return strippedText } // Strip Markdown syntax for code blocks, e.g. ```typescript. return strippedText.replaceAll(MARKDOWN_CODE_BLOCK_REGEX, block => block.replace(MARKDOWN_CODE_BLOCK_START, '').replace(MARKDOWN_CODE_BLOCK_END, '') ) }

vscode/webviews/components/MarkdownFromCody.tsx

sourcegraph-release-bot · 2025-02-19T11:17:14Z

The backport to M72 failed at https://github.com/sourcegraph/cody/actions/runs/13411266762:

The process '/usr/bin/git' failed with exit code 1

To backport this PR manually, you can either:

Via the sg tool

Use the sg backport command to backport your commit to the release branch.

sg backport -r M72 -p 7105

Via your terminal

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-M72 M72
# Navigate to the new working tree
cd .worktrees/backport-M72
# Create a new branch
git switch --create backport-7105-to-M72
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 74aeb4bf086670f28dca061e43be322b1e8cf850
# Push it to GitHub
git push --set-upstream origin backport-7105-to-M72
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-M72

If you encouter conflict, first resolve the conflict and stage all files, then run the commands below:

git cherry-pick --continue
# Push it to GitHub
git push --set-upstream origin backport-7105-to-M72
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-M72

Follow above instructions to backport the commit.
Create a pull request where the base branch is M72 and the compare/head branch is backport-7105-to-M72., click here to create the pull request.

Once the pull request has been created, please ensure the following:

Make sure to tag @sourcegraph/release in the pull request description.
kindly remove the release-blocker from this pull request.

Issue: https://linear.app/sourcegraph/issue/CODY-4428/markdown-code-blocks-render-incorrectly-due-to-unintended-triple Ask Cody to generate a readme that documents a file with code examples. Notice when applying the changes that the code blocks remain  --------- Co-authored-by: Tom Ross <[email protected]> (cherry picked from commit 74aeb4b)

ichim-david added 6 commits February 15, 2025 22:05

fix(smart-apply): stripping of markdown code blocks

b5b7cb0

fix(test-fixtures): comment out markdown syntax test cases

66b3bc4

- After removing the filtering having these tests means they fail

change(chat): export CHAT_PREAMBLE for use in prompt tests

41d97d2

Use static value of pre-prompt until code is finalized

6d7e708

fix(prompt): test to replace hardcoded prompt text with CHAT_PREAMBLE…

67b784f

… constant We need to check for PromptString as that is what prompt contains, not just simple strings

umpox reviewed Feb 17, 2025

View reviewed changes

ichim-david added 3 commits February 17, 2025 23:56

fix(preamble): revert SMART_APPLY_PREAMBLE change and export CHAT_PRE…

675047d

…AMBLE

fix(chat): enhance childrenTransform to handle '```markdown' replacement

7e6bb74

fix(response-transformer): restore markdown code block handling and i…

4dabc79

…mprove text stripping logic - Remove markdown tags from non markdown files - Avoid transformation until message is no longer in progress to avoid expensive mutations

ichim-david marked this pull request as ready for review February 17, 2025 22:01

ichim-david added 2 commits February 18, 2025 08:36

Revert premature optimization of responseTransformer since tests assu…

5724868

…me that we perform the sanitation

fix(response-transformer): add null checks for task and restore markd…

e77c63b

…own test cases

ichim-david requested a review from dominiccooney February 18, 2025 08:28

Merge branch 'main' into ichimdav/markdown_escape

58e9b46

julialeex reviewed Feb 19, 2025

View reviewed changes

vscode/webviews/components/MarkdownFromCody.tsx Show resolved Hide resolved

umpox reviewed Feb 19, 2025

View reviewed changes

umpox added 2 commits February 19, 2025 10:02

Apply code suggestions

54963f5

revert

5f9cfa2

umpox approved these changes Feb 19, 2025

View reviewed changes

umpox merged commit 74aeb4b into main Feb 19, 2025
21 checks passed

umpox deleted the ichimdav/markdown_escape branch February 19, 2025 11:13

umpox added backported-to-M72 backported-to-M70 backport M72 and removed backported-to-M72 backported-to-M70 labels Feb 19, 2025

sourcegraph-release-bot added backports release-blocker failed-backport-to-M72 labels Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(smart-apply): stripping of markdown code blocks #7105

fix(smart-apply): stripping of markdown code blocks #7105

ichim-david commented Feb 15, 2025 •

edited by umpox

Loading

umpox Feb 17, 2025 •

edited

Loading

ichim-david commented Feb 17, 2025

ichim-david commented Feb 18, 2025

julialeex commented Feb 19, 2025 •

edited

Loading

umpox Feb 19, 2025

sourcegraph-release-bot commented Feb 19, 2025

fix(smart-apply): stripping of markdown code blocks #7105

fix(smart-apply): stripping of markdown code blocks #7105

Conversation

ichim-david commented Feb 15, 2025 • edited by umpox Loading

Description

Test plan

umpox Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

ichim-david commented Feb 17, 2025

ichim-david commented Feb 18, 2025

julialeex commented Feb 19, 2025 • edited Loading

umpox Feb 19, 2025

Choose a reason for hiding this comment

sourcegraph-release-bot commented Feb 19, 2025

ichim-david commented Feb 15, 2025 •

edited by umpox

Loading

umpox Feb 17, 2025 •

edited

Loading

julialeex commented Feb 19, 2025 •

edited

Loading