Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added codeAction (extract subSchema to defs) #133

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

arpitkuriyal
Copy link

@arpitkuriyal arpitkuriyal commented Feb 20, 2025

#132
Point to Note in this:-

  1. Child Nodes of $defs Are Added from the Top
    • Instead of adding child elements from the bottom, they are inserted from the top.
    • This prevents inconsistencies caused by offset + textLength changes due to spaces and brackets
Screen.Recording.2025-02-20.at.10.47.01.PM.mov

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great progress! But, it looks like there's still some work to do.

Adding the definitions to the top isn't good enough. Also, the sloppy formatting of the generated code isn't acceptable either. The biggest motivating factor for this project is to encourage best practices and good style. Two of those rules are that $schema and $id always come first and that $defs always goes last. It's important that the definitions are in the right place and reasonably formatted when moved. You might have to somehow run the moved code through a formatter once it's in its new location. jsonc-parser is used internally to parse the schema. It has a formatting feature that you might be able to use somehow.

I noticed that the schema you're using in your demo video has a problem. The "shipping_address" property has a property "address" that then has the actual schema in it. That subschema will do nothing at all. "address" is not JSON Schema keyword, so it gets ignored. You can tell something's wrong because the schema in "address" doesn't have the expected syntax highlighting. JSON Schema keywords should be highlighted differently than plain JSON properties.

I see you're using the property name as the definition name. That's a nice default, but that's not always going to work. Subschemas can be in quite a few places other than properties values. For example, this would work for the items keyword. I think what needs to happen is that the user needs to provide the definition name. Have a look at the if/then completion provider. There's a special syntax that allows you to set placeholders that the user will be prompted to fill in. That approach may also work here.

You'll need to add tests for this feature that cover whatever edge cases you can think of.

I appreciate the clean and well written code so far. I suggest installing the ESLint plugin in your editor to get linting feedback while developing. It will help avoid getting build errors in your PRs if you forget to run the linter before pushing.

@arpitkuriyal
Copy link
Author

Thank you for the detailed feedback. However, I encountered an issue while looking into what you mentioned:

I think what needs to happen is that the user needs to provide the definition name.
There's a special syntax that allows you to set placeholders that the user will be prompted to fill in. That approach may also work here.

I found that InsertTextFormat: InsertTextFormat.Snippet is coming in LSP 3.18 (see here). I’ll try using the command feature as an alternative and see how it goes.

@jdesrosiers
Copy link
Collaborator

Thanks for figuring that out. Yes, it looks like SnippetTextEdit is what we need. Although 3.18 isn't released yet, it looks like vscode-languageserver-node already supports it. That means that vscode's language server client probably supports it too. Other clients may not support it yet, but that's ok.

@arpitkuriyal
Copy link
Author

arpitkuriyal commented Feb 21, 2025

I think it's not supported by vscode-languageserver-node yet.

As you can see in the screenshots, the latest available version is still 3.17.5, and this feature is introduced in 3.18. I also checked the node_modules directory and couldn't find any trace of this feature there.
Could you please confirm this on your end as well?

version ScreenShot
Screenshot 2025-02-22 at 1 34 48 AM

nodeModule Screenshot
Screenshot 2025-02-22 at 1 20 05 AM
PR Screenshot that u mentioned here:-

Although 3.18 isn't released yet, it looks like vscode-languageserver-node already supports it.

Screenshot 2025-02-22 at 1 21 54 AM

@jdesrosiers
Copy link
Collaborator

Yes, you're right. Although the code was merged over a year ago, it hasn't made it to an official release yet. It's planned for the next major release (v10) and that only has pre-releases published so far. We could change the dependency to `"vscode-languageserver": "^10.0.0-next" to use the pre-release. But, that's only the server. It probably wouldn't work in vscode yet.

I guess that means we have to hold off on that detail. I was hoping this would give us a way to avoid having to come up with a more robust way to generate definition names, but it looks like we're going to have to find a temporary solution until this feature is released.

@arpitkuriyal
Copy link
Author

I think, for now, we should keep it simple and implement a basic defCounter that generates definition names like def1, def2, and so on.

Do you have any other solutions in mind? Please let me know your thoughts.

@jdesrosiers
Copy link
Collaborator

Number based naming can work, but there are some some edge cases.

Extract a schema and you'd get def1. Then restart the language server and extract another schema and you get def1 again. Ideally people will rename after extraction and this won't come up often, but it could be a problem.

Also, if you do refactorings in one schema and then go to another schema, it would be weird for it to generate def5 or something when there isn't def1 - def4 yet.

One thing that I think would work is the generate number-based names starting at def1 and check if the name already exists. If it does, increment and check again until you have a unique name.

Or, you could inspect all the definition names looking for the def{number} pattern and increment starting from the number you find or 1 if the pattern isn't found.

Or, you could generate a UUID and use that as the name and not have to check anything. But, a bulky UUID might not make for as good a user experience.

Any of those options would be ok.

@arpitkuriyal
Copy link
Author

Alright, I’ll work on it and let you know once it’s done.

@arpitkuriyal
Copy link
Author

Screen.Recording.2025-02-24.at.12.48.49.AM.mov

Please review it. If everything looks good, I will start writing the test cases.

@arpitkuriyal
Copy link
Author

arpitkuriyal commented Feb 23, 2025

When switching the dialect URI of the schema, we need to change $defs to definitions for older drafts. Therefore, I added a condition: if the dialect is 2020-12 or 2019-09, use $defs otherwise, use definitions. Is this correct, or is there anything else I should do?

@jdesrosiers
Copy link
Collaborator

Is this correct, or is there anything else I should do?

Use the getKeywordName function from @hyperjump/json-schema/experimental. It takes a keyword URI and a dialect URI and returns the right label for the dialect.

@arpitkuriyal
Copy link
Author

Got it! I'll use getKeywordName from @hyperjump/json-schema/experimental for the right label.

@arpitkuriyal
Copy link
Author

I just made the change, you can check it now. I'll write the test case as soon as possible.

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have another try at the JSON formatting part. The other things I mentioned should be small and easy to address. I included a few code style suggestions that aren't caught by the linter. In general, I thought the code made better use of whitespace the last time I reviewed. Things are more compact now making the code harder to read.

@arpitkuriyal
Copy link
Author

arpitkuriyal commented Feb 26, 2025

I have used a library to detect the indentation of the current file correctly, and now everything is working fine. I also added this condition (range.start.line === range.end.line && range.start.character === range.end.character) in the codeAction to ensure that the extract option only appears when a subschema is selected. Previously, the bulb symbol would pop up even when we were just inside a subschema.

In the current CodeAction logic, it finds the parent definition node and places the new definition property at the beginning. However, we could modify it to find the nearest definition node and place the new definition property there. While this approach would be more precise, it would also make the implementation slightly more complex. Additionally, we need to determine the spaces before the $defs and align them with the user's indentation to maintain consistency in formatting.
What are your thoughts on this?

Additionally, in the formatNewDef function, I used:
jsoncParser.applyEdits(newDefText, edits).replace(/\n/g, \n${" ".repeat(detectedIndent?.size ?? defaultTabSize)}) since I couldn't find a better option to match the overall indentation.

if put only this jsoncParser.applyEdits(newDefText, edits)

Screen.Recording.2025-02-27.at.3.07.38.AM.mov

with this jsoncParser.applyEdits(newDefText, edits).replace(/\n/g, \n${" ".repeat(detectedIndent?.size ?? defaultTabSize)})

Screen.Recording.2025-02-27.at.3.08.26.AM.mov

If you have any better ideas, please share.

@jdesrosiers
Copy link
Collaborator

Your problem is that you're formatting the extracted definition in isolation. The idea is to format the definition within the context of the full JSON document. That context is how it knows how much to indent.

The the edits the format function returns applies to the whole document, but we can adjust the offsets so they can be applied to just the new definition. Here's an example.

import { applyEdits, format } from "jsonc-parser";

/**
 * @import { FormattingOptions, Edit } from "jsonc-parser"
 */


/** @type (text: string, edit: Edit, options: FormattingOptions) => Edit */
const withFormatting = (text, edit, options) => {
  const newText = applyEdits(text, [edit]);

  const range = { offset: edit.offset, length: edit.content.length };
  const formatEdits = format(newText, range, options);

  // Adjust the offsets so they can be applied to just the definition
  for (const formatEdit of formatEdits) {
    formatEdit.offset -= edit.offset;
  }

  return { ...edit, content: applyEdits(edit.content, formatEdits) };
};

const schema = `{
  "type": "object",
  "properties": {
    "foo": {
      "type": "number"
    },
    "bar": { "$ref": "#/$defs/bar" }
  },
  "$defs": {
    "bar": {
      "type": "boolean"
    }
  }
}`;

const offset = 51;
const length = 30;
const definition = schema.substring(offset, offset + length);

const formattingOptions = {
  keepLines: true,
  insertSpaces: true,
  tabSize: 2,
  eol: "\n"
};

/** @type Edit[] */
const edits = [
  {
    offset: 51,
    length: 30,
    content: `{ "$ref": "#/$defs/def1" }`
  },
  withFormatting(schema, {
    offset: 137,
    length: 0,
    content: `\n"def1": ${definition},`
  }, formattingOptions)
];
console.log(edits);

console.log(applyEdits(schema, edits));

You should be able to do something similar, but your version of the withFormatting function would take and return edits in the form the code action response expects.

@arpitkuriyal
Copy link
Author

Thanks for the insight!

@jdesrosiers
Copy link
Collaborator

we could modify it to find the nearest definition node and place the new definition property there.

The definition should go at the root of the schema resource it's in. A schema resource is a schema with $id or the root schema. The root on SchemaNode points to the schema resource, not necessarily the root of the document. So, I don't think there's anything special that needs to be done to get the right behavior. It should just work.

@arpitkuriyal
Copy link
Author

arpitkuriyal commented Mar 2, 2025

now it is working fine with all the correct indentation.

simple schema with indentation 2 spaces

Screen.Recording.2025-03-02.at.7.28.01.PM.mov

simple schema with indentation of 4 spaces

Screen.Recording.2025-03-02.at.7.29.10.PM.mov

now we can see it can correct it indentation according to the schema.

Screen.Recording.2025-03-02.at.7.32.42.PM.mov

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove .DS_Store. It should be in .gitignore if you would like to add it.

The formatting code is sloppy. Please try to clean that up. For example, there's no reason to pass textDocument. You only use it to get the uri which you use to read the file, but you already have the text, so there's no need to read the file. It's all more complicated than it needs to be and needs more thought.

@arpitkuriyal
Copy link
Author

I have made the changes. Could you please check?

@jdesrosiers
Copy link
Collaborator

I did some clean up of the formatting code. Have a look.

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have another look at the Configuration changes and try to clean things up. I noticed there was some duplication in that configs were being built in two places. I handled that problem, so you don't have to worry about it, but let's see if we can make the rest a little nicer.

@arpitkuriyal
Copy link
Author

arpitkuriyal commented Mar 6, 2025

I did some clean up of the formatting code. Have a look.

That's a lot of learning from you! I can clearly see the difference between my approach and yours in the withFormatting function. Thanks for that!
i'll update you with the changes in 2-3 days as i have my college exams going on.

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few issues with the types.

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking much better now! I pushed a little more clean up for the Configuration class and I updated the TestClient. It originally only supported the one configuration scope. Now it can support others. You'll need this when you write tests.

One more change I'd like to see is that I want the definition to get added to the bottom of $defs, not the top. Then go ahead with writing tests.

Thanks for sticking with this. I told you when you took this on that it could be a tough one. The hardest part is often writing the tests, so good luck as you move into that phase.

@arpitkuriyal
Copy link
Author

One more change I'd like to see is that I want the definition to get added to the bottom of $defs, not the top

Okay i'll do it,

Thanks for sticking with this. I told you when you took this on that it could be a tough one

Thanks for your guidance. It’s been a challenging task, but also a great learning experience for me!

@arpitkuriyal
Copy link
Author

One more change I'd like to see is that I want the definition to get added to the bottom of $defs, not the top.

The easiest approach I found is using definitionsNode?.children.at(-1) to get the last definition. Alternatively, we can use SchemaNode.keys() to iterate over the definition node and retrieve the last definition. Which approach do you think is better or you have something better in your mind

@jdesrosiers
Copy link
Collaborator

children.at(-1) is perfect. There's no reason to iterate. We just want the last one.

@arpitkuriyal
Copy link
Author

I wanted to point out an edge case, if the definitionsNode exists but is empty (defs: {}), the insertion position should default to definitionsNode.offset + 1. Otherwise, we should find the last definition and insert the new one after it.

Screen.Recording.2025-03-15.at.12.42.23.AM.mov

@jdesrosiers
Copy link
Collaborator

That's definitely something we'll want to have a test for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants