Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIKE] Test validate dataset classes in catalog using LSP #159

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

SajidAlamQB
Copy link
Contributor

@SajidAlamQB SajidAlamQB commented Nov 25, 2024

Related to: #5

This is a spike to test using the lamguage server protocol from pygls to add validation for dataset classes.

How to Use the Feature

  1. Install or Update the Kedro VSCode Extension
  2. Open Your Kedro Project in VSCode
  3. The extension will automatically initialize and start scanning catalog files.
  4. View Diagnostics in the Problems Pane it will display diagnostics for any issues found in your catalog files, even if the files are not open.

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB SajidAlamQB self-assigned this Nov 25, 2024
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is ready for review, but I left some comments. Using diagnostic is the right direction.

Something to considered:

  • The current feature works on opened file, would it makes sense to work on closed file as well? i.e. when I start with a new project, I want to quickly identify all the missing dependencies.
  • Once this is done, we need to deprecated the old static YAML schema validation in package.json

image

  • Bonus: Is there a way to improve the error message to show what's the missing dependencies?

This is looking good already, you may want to add some documentation or screenshot at the end to show how to use this feature. This is definitely a very useful feature!

Final word: as a next step, is it possible to make this functions standalone, so it can be used as a CLI command? This way the VSCode extension can be a consumer of this function, but also keep it opened so it can be used as a CLI command or extend with other tool easily.

bundled/tool/lsp_server.py Show resolved Hide resolved
bundled/tool/lsp_server.py Outdated Show resolved Hide resolved
Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB
Copy link
Contributor Author

This is looking good already, you may want to add some documentation or screenshot at the end to show how to use this feature. This is definitely a very useful feature!

Hey @noklam, thanks for the feedback! I'm currently working on extending the feature to validate closed files as well. I agree that it would make sense to quickly identify all missing dependencies when starting a new project. I'll also add documentation and include screenshots to demonstrate how to use this new feature.

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB
Copy link
Contributor Author

I've implemented validation of closed files but there is a bit of a performance aspect to this. For projects with many files it may take some time to load them all.

@SajidAlamQB SajidAlamQB marked this pull request as ready for review November 29, 2024 12:29
@noklam
Copy link
Contributor

noklam commented Nov 29, 2024

Haven't read the code as I am on my phone. It only need to read the relevant files so it shouldn't need to read the entire project. It can make use of the configuration that it only scan the base /local or the selected env

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB SajidAlamQB requested a review from noklam December 5, 2024 14:09
@astrojuanlu
Copy link
Member

I'd love to do some QA, @SajidAlamQB do you think a .vsix can be generated from this PR?

@SajidAlamQB
Copy link
Contributor Author

I'd love to do some QA, @SajidAlamQB do you think a .vsix can be generated from this PR?

Absolutely please find it attached here.

kedro.vsix.zip

@astrojuanlu
Copy link
Member

Tested with a spaceflights project. It works, although the LSP doesn't immediately pick up changes in the environment. For example, starting from here:

image

(kedro-datasets and pandas not present in the environment)

I then do uv pip install kedro-datasets pandas and it didn't immediately update. Maybe it has to do with this error I'm observing in the Output tab:

2024-12-16 14:54:35.847 [info] Failed to handle request 12 textDocument/definition DefinitionParams(text_document=TextDocumentIdentifier(uri='file:///Users/juan_cano/Projects/QuantumBlackLabs/kedro-spaceflights-mlflow/conf/base/catalog.yml'), position=45:3, work_done_token=None, partial_result_token=None)
Traceback (most recent call last):
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/libs/pygls/protocol/json_rpc.py", line 266, in _handle_request
    self._execute_request(msg_id, handler, params)
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/libs/pygls/protocol/json_rpc.py", line 188, in _execute_request
    self._send_response(msg_id, handler(params))
                                ^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/tool/lsp_server.py", line 366, in definition
    result = _query_catalog(document, word)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/tool/lsp_server.py", line 333, in _query_catalog
    catalog_paths = _get_conf_paths(server, "catalog")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.vscode/extensions/kedro.kedro-0.2.2/bundled/tool/lsp_server.py", line 235, in _get_conf_paths
    config_loader: OmegaConfigLoader = server.config_loader
                                       ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'KedroLanguageServer' object has no attribute 'config_loader'

Also, something that just noticed (and I think it's happening with the current version too): somehow the extension is trying to parse pipeline.py as YAML

image

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @astrojuanlu said, the extension is parsing more files than it needed like pipeline.py.

image

In addition, it seems that it is still validating the entries starts with _

I am not sure why but when I rerun this I don't see these error again and it works fine.

@noklam
Copy link
Contributor

noklam commented Jan 2, 2025

I create a PR, it can be merged either with this PR or we can have a separate one to deprecate the old validation with Redhat's YAML extension.

#164

@noklam noklam self-requested a review January 9, 2025 15:08
@SajidAlamQB
Copy link
Contributor Author

I then do uv pip install kedro-datasets pandas and it didn't immediately update.

I've added periodic revalidation this should resolve this, also it should now check live while entering catalog entries if they are importable.

Signed-off-by: Sajid Alam <[email protected]>
@SajidAlamQB
Copy link
Contributor Author

Also, something that just noticed (and I think it's happening with the current version too): somehow the extension is trying to parse pipeline.py as YAML.

Added check in watchers for only catalog files now, this shouldn't happen anymore.

Signed-off-by: Sajid Alam <[email protected]>
@noklam noklam requested a review from astrojuanlu January 17, 2025 12:10
@noklam
Copy link
Contributor

noklam commented Jan 17, 2025

tagging @astrojuanlu in case you want to do another round of QA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spike] - Explore how to validate dataset classes in catalog using LSP
4 participants