Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instrument NADE commands using metric spans #1844

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-mchok
Copy link
Collaborator

@sfc-gh-mchok sfc-gh-mchok commented Nov 8, 2024

Pre-review checklist

  • I've confirmed that instructions included in README.md are still correct after my changes in the codebase.
  • I've added or updated automated unit tests to verify correctness of my new code.
  • I've added or updated integration tests to verify correctness of my new code.
  • I've confirmed that my changes are working by executing CLI's commands manually on MacOS.
  • I've confirmed that my changes are working by executing CLI's commands manually on Windows.
  • I've confirmed that my changes are up-to-date with the target branch.
  • I've described my changes in the release notes.
  • I've described my changes in the section below.

Changes description

  • Added metric spans to telemetry payload being sent
  • Instrumented native app commands with metric spans API
  • Added integ tests to verify the proper spans are being sent in telemetry for example commands (see tests_integration/nativeapp/test_metrics.py for examples of traces being sent)
  • Refactored test files to fit with convention, combining metrics counters + spans tests

An example trace of snow app run looks like the following (truncated for brevity):

{
        'message': {
            'command_execution_time': 30.307388332905248,
            'counters': {
                'features.global.pdf_templates': 1,
                'features.global.templates_processor': 1,
                'features.app.snowpark_processor': 0,
                'features.app.post_deploy_scripts': 1,
                'features.app.event_sharing': 0,
                'features.app.event_sharing_warning': 0,
                'features.app.event_sharing_error': 0
            },
            'spans': {
                'completed_spans': [{
                    'id': 'a45d527defc44719a0b0d8e8ab9d02b5',
                    'name': 'action.app.deploy',
                    'start_time': 0.6444029589183629,
                    'parent': None,
                    'parent_id': None,
                    'execution_time': 27.891165500041097,
                    'error': None,
                    'span_count_in_subtree': 10,
                    'span_depth': 1,
                    'trimmed': False
                }, {
                    'id': '8dd033f84f4f481b8542fff24a759ccf',
                    'name': 'action.app_pkg.deploy',
                    'start_time': 0.644967291969806,
                    'parent': 'action.app.deploy',
                    'parent_id': 'a45d527defc44719a0b0d8e8ab9d02b5',
                    'execution_time': 11.048065917100757,
                    'error': None,
                    'span_count_in_subtree': 7,
                    'span_depth': 2,
                    'trimmed': False
                }, {
                    'id': '088ae6c6ac0947e9ac8796fe83e60dd9',
                    'name': 'bundle',
                    'start_time': 0.6450087090488523,
                    'parent': 'action.app_pkg.deploy',
                    'parent_id': '8dd033f84f4f481b8542fff24a759ccf',
                    'execution_time': 0.0043289999011904,
                    'error': None,
                    'span_count_in_subtree': 1,
                    'span_depth': 3,
                    'trimmed': False
                }, {
                    'id': 'ae49cdc6fe944579a057fa4850d92c1d',
                    'name': 'artifact_processors',
                    'start_time': 0.6495900419540703,
                    'parent': 'action.app_pkg.deploy',
                    'parent_id': '8dd033f84f4f481b8542fff24a759ccf',
                    'execution_time': 0.00230620801448822,
                    'error': None,
                    'span_count_in_subtree': 2,
                    'span_depth': 3,
                    'trimmed': False
                }, {
                    'id': 'b83c2257cedd41a38a89bc79b60f0214',
                    'name': 'templates_processor',
                    'start_time': 0.6496683340519667,
                    'parent': 'artifact_processors',
                    'parent_id': 'ae49cdc6fe944579a057fa4850d92c1d',
                    'execution_time': 0.0022297080140560865,
                    'error': None,
                    'span_count_in_subtree': 1,
                    'span_depth': 4,
                    'trimmed': False
                }, {
                    'id': '399ba61144824d25861401573f9447c4',
                    'name': 'sync_deploy_root_with_stage',
                    'start_time': 2.6214112089946866,
                    'parent': 'action.app_pkg.deploy',
                    'parent_id': '8dd033f84f4f481b8542fff24a759ccf',
                    'execution_time': 5.49098225007765,
                    'error': None,
                    'span_count_in_subtree': 1,
                    'span_depth': 3,
                    'trimmed': False
                }, {
                    'id': '483b5fe50c1c4e6ea4e81598a0bf926d',
                    'name': 'post_deploy_hooks',
                    'start_time': 8.112657208926976,
                    'parent': 'action.app_pkg.deploy',
                    'parent_id': '8dd033f84f4f481b8542fff24a759ccf',
                    'execution_time': 3.1565516660921276,
                    'error': None,
                    'span_count_in_subtree': 1,
                    'span_depth': 3,
                    'trimmed': False
                }, {
                    'id': 'cd065a3c3148457d82d45f9b62d3ec62',
                    'name': 'validate_setup_script',
                    'start_time': 11.27043924992904,
                    'parent': 'action.app_pkg.deploy',
                    'parent_id': '8dd033f84f4f481b8542fff24a759ccf',
                    'execution_time': 0.422550291987136,
                    'error': None,
                    'span_count_in_subtree': 1,
                    'span_depth': 3,
                    'trimmed': False
                }, {
                    'id': '99829e4b459b48fb92e111c951f7f58c',
                    'name': 'update_app_object',
                    'start_time': 11.693080459022895,
                    'parent': 'action.app.deploy',
                    'parent_id': 'a45d527defc44719a0b0d8e8ab9d02b5',
                    'execution_time': 16.84247687482275,
                    'error': None,
                    'span_count_in_subtree': 2,
                    'span_depth': 2,
                    'trimmed': False
                }, {
                    'id': '7f79381f00f442bc8d499e7950ca9ac0',
                    'name': 'post_deploy_hooks',
                    'start_time': 22.447485291864723,
                    'parent': 'update_app_object',
                    'parent_id': '99829e4b459b48fb92e111c951f7f58c',
                    'execution_time': 6.088065165793523,
                    'error': None,
                    'span_count_in_subtree': 1,
                    'span_depth': 3,
                    'trimmed': False
                }, {
                    'id': '71938e6500ac427db621280c229bdf4e',
                    'name': 'get_snowsight_url_for_app',
                    'start_time': 28.535641624825075,
                    'parent': None,
                    'parent_id': None,
                    'execution_time': 1.8380501249339432,
                    'error': None,
                    'span_count_in_subtree': 1,
                    'span_depth': 1,
                    'trimmed': False
                }],
                'num_spans_past_depth_limit': 0,
                'num_spans_past_total_limit': 0
            }
        }

@sfc-gh-mchok sfc-gh-mchok self-assigned this Nov 8, 2024
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file renamed from test_feature_metrics since it was going to be too large with the spans, and also converted the tests to use the factories

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the test directory mirrors the layout of the source directory so that tests are easy to locate. Breaking down the files like this because of some arbitrary size constraint makes things easier to manage, and makes it easier for us to review the changes because we can no longer see the diff. Please follow the convention instead.

Copy link
Contributor

@sfc-gh-fcampbell sfc-gh-fcampbell Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't follow this comment, did you mean to say it makes it harder if we split the test file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I merged the new integ tests and also the old unit tests to test_metrics.py in their respective folders, and also adjusted the codeowners file accordingly, let me know if this works for you!

@sfc-gh-mchok sfc-gh-mchok marked this pull request as ready for review November 11, 2024 22:21
@sfc-gh-mchok sfc-gh-mchok requested review from a team as code owners November 11, 2024 22:21
[
(
"app deploy",
lambda: ProjectV11Factory(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the best I could come up with to have parametrization with factories, if anyone has a better approach I'd be interested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we need parametrized tests at all? I don't think I can read this anymore. I think overall we make way, way too frequent use of parametrized tests in our codebase, but that's a different story.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the arguments to the factories are the same, you can parametrize just on the class and instantiate it in the test

@pytest.mark.parametrize("factory_class", [FooFactory, BarFactory])
def test_factory(factory_class):
    obj = factory_class(arg=123)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up getting rid of all the parametrization and separating the tests out since the factories didn't have the same arguments, I think it's more clear and separated this way, let me know what you think

@sfc-gh-bdufour
Copy link
Contributor

description + pre-review checklist aren't optional, please fill them in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the test directory mirrors the layout of the source directory so that tests are easy to locate. Breaking down the files like this because of some arbitrary size constraint makes things easier to manage, and makes it easier for us to review the changes because we can no longer see the diff. Please follow the convention instead.

@@ -88,7 +88,7 @@ def compile_artifacts(self):
if not self._should_invoke_processors():
return

with cc.phase("Invoking artifact processors"):
with cc.phase("Invoking artifact processors", span_name="compile_artifacts"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no compilation happening. How about simply "artifact_processors"? We should probably also have individual processors as sub-spans.

@@ -213,6 +214,26 @@ def get_cli_context() -> _CliGlobalContextAccess:
return _CliGlobalContextAccess(get_cli_context_manager())


def start_cli_metrics_span(span_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "start" makes sense here, since this both starts and stops the span at the appropriate time. Let's workshop the name of this + the start_span method it calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I feel @span would suffice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed both the method name and the decorator name to just span, let me know if this works for you

@@ -420,6 +421,7 @@ def get_objects_owned_by_application(self) -> List[ApplicationOwnedObject]:
).fetchall()
return [{"name": row[1], "type": row[2]} for row in results]

@start_cli_metrics_span("create_or_upgrade_app")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spans shouldn't be implementation-specific, but rather represent a logic step in the overall operation being performed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the names and locations of all the spans to better encapsulate this, let me know if they work for you!

@@ -915,6 +916,7 @@ def validate_setup_script(
if validation_result["status"] == "FAIL":
raise SetupScriptFailedValidation()

@start_cli_metrics_span("get_validation_result")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, let's not name things according to the implementation. That can change over time.

@@ -68,6 +68,7 @@ def phase(
self,
enter_message: str,
exit_message: Optional[str] = None,
span_name: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is truly needed, since we can use multiple context managers on the same line. Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about that feature, I adjusted these instances, thanks for the reminder

@@ -304,6 +308,7 @@ def validation_item_to_str(item: dict[str, str | int]):
return s


@start_cli_metrics_span("drop_generic_object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

[
(
"app deploy",
lambda: ProjectV11Factory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we need parametrized tests at all? I don't think I can read this anymore. I think overall we make way, way too frequent use of parametrized tests in our codebase, but that's a different story.

… un-parametrize tests, merge test files to keep with covention
@sfc-gh-mchok
Copy link
Collaborator Author

description + pre-review checklist aren't optional, please fill them in

My mistake, I usually do it at the same time as when I make the PR but I started it as a draft this time and forgot to do it when I set it as ready for review, should be populated now!

init=False, default_factory=lambda: time.monotonic()
)
# start time of the step from a performance counter in order to calculate execution time
_start_time: float = field(init=False, default_factory=time.perf_counter)
Copy link
Collaborator Author

@sfc-gh-mchok sfc-gh-mchok Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran into failing integ tests where the ordering of the spans was not deterministic, did some more research and found that time.perf_counter should just be a better version of time.monotonic with good granularity on Windows, so replaced those calls here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get that part - Why is time determining the order? The most accurate way is first call goes first. Or are we dealing with multithreads here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, time.monotonic() does not have as much precision, so multiple spans would end up having the same start time, so when we sort them by start time (to make it so that the first call goes first) the order ends up not being the same as the order as when we start the spans. This is fixed with time.perf_counter()

Copy link
Contributor

@sfc-gh-bdufour sfc-gh-bdufour Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that the sorting function wasn't stable though. IMHO regardless of this "fix", we shouldn't rely on the uniqueness of timestamps and switch to a stable sort regardless.

def mock_time_monotonic():
with mock.patch("time.monotonic", side_effect=count()) as mocked_time_monotonic:
yield mocked_time_monotonic
def test_metrics_no_counters():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All counter tests moved from test_metrics_counters.py to consolidate the files and keep to the convention


@pytest.mark.integration
@mock.patch("snowflake.connector.telemetry.TelemetryClient.try_add_log_to_batch")
def test_feature_counters_v2_post_deploy_not_available_in_bundle(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counter tests moved from test_feature_metrics.py to consolidate them in one file and keep to the convention, and converted to use factories

@@ -649,6 +650,7 @@ def resolve_without_follow(path: Path) -> Path:
return Path(os.path.abspath(path))


@span("build_initial_bundle")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just bundle ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my head bundle is the higher level container that does both this step (populating the deploy root) and then applying all the processors on it, but if we're fine just having the first step be called bundle then that's good with me

@@ -355,6 +355,7 @@ def post_deploy_hooks(self) -> list[PostDeployHook] | None:
model = self._entity_model
return model.meta and model.meta.post_deploy

@span("deploy_app")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider some kind of standardized scheme for actions + entities, maybe something like action.app.deploy ?

@@ -544,6 +546,7 @@ def _bundle(self):
compiler.compile_artifacts()
return bundle_map

@span("deploy_app_package")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rather do this on the action for consistency

@@ -68,7 +68,11 @@ def _format_message(self, message: str, output: Output) -> Text:
return text

@contextmanager
def phase(self, enter_message: str, exit_message: Optional[str] = None):
def phase(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're no longer touching this, I'd revert the file entirely

@@ -76,6 +76,7 @@ def _get_stage_paths_to_sync(
return stage_paths


@span("sync_remote_and_local_files")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the rare cases where I'd use the actual function name, since it's exactly the logical concept we're timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants