Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: capture all W3C fields in ResourceEvents #489

Conversation

williazz
Copy link
Contributor

@williazz williazz commented Jan 9, 2024

Update the ResourceEvent schema to exactly match PerformanceResourceTiming in aws-rum-web 2.0. I also cleaned up some tech debt, as highlighted in the comments.

  1. All durations are converted into start and end timestamps (excluding overall duration)
  2. No experimental fields are included
  3. Non-W3C fields such as fileType and targetUrl are dropped

In addition:

  1. PerformanceResourceTimingEvents are sampled by first N FIFO, instead of random sampling. This also removes a bug in the random sampling logic when eventLimit is large.
  2. Derive fileType from the initiatorType only, not the file extension.

@williazz williazz changed the base branch from main to release/2.0.x January 9, 2024 02:01
@williazz williazz marked this pull request as draft January 9, 2024 02:27
@williazz williazz force-pushed the PerformanceResourceTimingEvent branch 2 times, most recently from d4754b7 to 95e3f04 Compare January 10, 2024 03:12
@williazz williazz changed the title fix: replace ResourceEvent with W3C compliant PerformanceResourceTimingEvent feat: capture all W3C fields in ResourceEvents Jan 10, 2024
@williazz williazz marked this pull request as ready for review January 10, 2024 03:39
@williazz williazz requested review from qhanam and limhjgrace January 10, 2024 03:39
@@ -49,73 +55,75 @@ export class ResourcePlugin extends InternalPlugin {
}

performanceEntryHandler = (list: PerformanceObserverEntryList): void => {
this.recordPerformanceEntries(list.getEntries());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove random sampling logic in favor of first N

@@ -144,51 +118,6 @@ export const getResourceFileType = (
return ext;
};

/* Helpers */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove massive unused object

* @param payload The main message that is usually also sent to PutRumEvents
* @param internalMessage If payload is a RUM event, then additional information can be communicated here that should not be sent to PutRumEvents
*/
export type Subscriber = (payload: any, internalMessage?: any) => void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why do we need to pass around this internalMessage object? Are there other ways we can solve this problem without modifying the signatures of this and the other functions that use internalMessage?

Copy link
Contributor Author

@williazz williazz Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the internalMessage interface after dropping our dependency on file extensions.

renderBlockingStatus?: string;
}

export interface OmittedResourceFields {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What happens if we don't propagate these fields? My intuition is that we can do without them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, removed.

@@ -13,7 +13,7 @@ export const CLS_EVENT_TYPE = `${RUM_AMZ_PREFIX}.cumulative_layout_shift_event`;

// Page load event schemas
export const PERFORMANCE_NAVIGATION_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_navigation_event`;
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_resource_event`;
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `{RUM_AMZ_PREFIX}.performance_resource_timing_event`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Shouldn't this bug cause a unit test failure?

Suggested change
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `{RUM_AMZ_PREFIX}.performance_resource_timing_event`;
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_resource_timing_event`;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, added unit tests

@williazz williazz force-pushed the PerformanceResourceTimingEvent branch from 6f2a989 to f5b5cd6 Compare January 10, 2024 21:56
@williazz williazz requested a review from qhanam January 10, 2024 22:43
export const performanceKey = (startTime: number, duration: number) =>
[startTime, duration].join('#');

export const getResourceFileType = (initiatorType: string): ResourceType => {
Copy link
Contributor Author

@williazz williazz Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed our dependency on file extension to determine fileType, which is convenient for RUM at cost of slightly lower accuracy.

It's less accurate because images are not limited to initatorType=image, img, and input. For example, an image could also be the background of an element, or the source of an embedded element. However, these sorts of images are typically not considered for web vitals, and should not be prioritized anyways.

This is convenient for RUM because the operation is significantly more lightweight, more closely aligns with W3C, and we can always add an alwaysRecord callback if RUM operators want to restore this feature.

const v = a[i];
a[i] = a[j];
a[j] = v;
export const performanceKey = (startTime: number, duration: number) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using name it can be personally identifying information and omitted from RUM events

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration works reliably as is, since both startTime and duration are DOMHighResTimeStamps and highly unique. We could improve performanceKey() by using name if it exists, and if not then use the current setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't see that we were talking about it in the context of just performanceKey() and thought we were talking about using the name field overall. I agree that startTime and duration should be unique enough!

@williazz williazz force-pushed the PerformanceResourceTimingEvent branch from 7030314 to a71a351 Compare January 10, 2024 23:53
@@ -127,6 +127,13 @@ test('when a plugin calls recordEvent then the event is recorded', async (t: Tes
});

test('when a plugin calls recordEvent x times then event is recorded x times', async (t: TestController) => {
// This test passes individually, but fails on Edge when run as apart of the suite
Copy link
Contributor Author

@williazz williazz Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I just couldn't fix this integ test. When run individually, the test is fine; however, it fails when run as apart of the suite. I also could not identify a connection between my changes to ResourcePlugin and this seemingly unrelated CustomPlugin test. To me, it reveals some underlying infrastructure problems in our integ tests run on TestCafe that needs further investigation, but could not resolve it in scope of this PR. For now, I have disabled these tests when run on Edge since I have confirmed that they work in quarantine and I have double checked through manual testing on Edge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be related to the order that the tests are run. You can confirm this by removing tests from the suite until this test passes; try to find a single test that, when run before this one, causes the failure.

A common cause is that a test resource isn't being cleaned up properly.

Copy link
Contributor Author

@williazz williazz Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played around with the order like you suggested, but it doesn't look like a normal out-of-order issue. Seems like the issues prop up no matter what order. Also tried separating the tests into two separate files to keep the suites small, but that didn't fix anything. Very frustrating :/

* PerformanceResourceTiming with initiatorType=Input must be an image
* Per MDN docs: "if the request was initiated by an <input> element of type image.""
*/
// IMAGES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: there are other valid initiatorTypes listed in the doc. Are these not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many other initiatorTypes, but they do not align with our RUM-specific choices for fileType. There may be something that I missed since there are so many initatorTypes, but from my deep dive, these choices cover what we need.

I also left another comment that talks about this choice. Was also planning on updating documentation when all the discussions are done. #489 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit(blocking): Name this file "performance-resource-timing.json" so that the file name matches the event type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to performance-resource-timing.json

@@ -1,88 +1,106 @@
{
"$id": "com.amazon.rum.performance_resource_event",
"$id": "com.amazon.rum.performance_resource_timing_event",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The event name should be "com.amazon.rum.performance_resource_timing", to match the name of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the _event naming convention, but I agree. The name is too long anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to ...performance_resource_timing

Comment on lines 108 to 112
serverTiming: r.serverTiming
? (r.serverTiming.map(
(e) => e as PerformanceServerTimingPolyfill
) as PerformanceServerTimingPolyfill[])
: undefined,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: In other cases where there are nested objects, we emit them as separate events, and link those events to their parent through the event Id.

My first thought is to simply omit this field for now. Doing so would allow us to merge in this change now, and work through how to support server timing in a later PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PerformanceServerTiming is unique because it does not have its own entryType for the PerformanceObserver to listen for, but only exists when nested in PerformanceResourceTiming and PerformanceNavigationTiming. Instead, PerformanceServerTimings are manually added by the server to breakdown the requests, sort of like a custom timing API. I'm not sure that we actually want to record this as a separate RUM event, but we can omit for now to avoid blocking.

For example: https://developer.mozilla.org/en-US/docs/Web/API/PerformanceServerTiming#example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed server timing from this PR.

type: RESOURCE,
buffered: true
});
this.onload();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The original seems more correct to me. We want to start the observer again, not load the plugin. I suppose we save a few characters but it costs readability.

Note that making refactoring changes like this inside another change make reviewing more difficult -- it's one more change a reviewer needs to trace through. Better to submit these in their own PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this probably violates single responsibility principle, will refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved observe logic to its own method observe()


export const isPutRumEventsCall = (url: string, endpointHost: string) => {
const pathRegex =
/.*\/application\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\/events/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This looks like it accounts for proxy endpoints, correct? Do we have unit tests for both proxy and non-proxy enpoints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is old logic but I will deep dive on this topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the preexisting logic for isPutRumEvents() is wrong. For example, a valid url is https://dataplane.rum.us-west-2.amazonaws.com/appmonitors/265d8f93-d021-44e1-b0c5-41fcc660842f, which would fail the validation. Path should be appmonitors instead of application. And there is no events path. This deserves its own PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can preview this change in 3e29234, but I will omit from this PR.

// otherwise we end up in an infinite loop of recording PutRumEvents.
if (
this.config.ignore(e) ||
isPutRumEventsCall(name, this.context.config.endpointUrl.host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Aren't fetch and xhr requests already ignored by the default config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. This is old logic, but I'm not sure that this is needed either until we add support for Fetch/Xhr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this was the original logic. I agree that, with respect to this PR, we should leave as-is.

Regarding the documentation change, "Ignore by custom callback" isn't clear. Please clean up the comment and move it back inside the if statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored the original comment.

@@ -13,7 +13,7 @@ export const CLS_EVENT_TYPE = `${RUM_AMZ_PREFIX}.cumulative_layout_shift_event`;

// Page load event schemas
export const PERFORMANCE_NAVIGATION_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_navigation_event`;
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_resource_event`;
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_resource_timing_event`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_resource_timing_event`;
export const PERFORMANCE_RESOURCE_EVENT_TYPE = `${RUM_AMZ_PREFIX}.performance_resource_timing`;

Comment on lines +35 to +36
export const performanceKey = (startTime: number, duration: number) =>
[startTime, duration].join('#');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is this related to the change, or is this just refactoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was refactoring but unintentional due to some other changes that I have removed. I will restore the exact old lines for clarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change. I'm ok with leaving it in.

As I said, co-located changes (especially with features/bug fixes) do make it a little harder to review, so my advice is to do so sparingly, and default to creating separate PRs. Separate PRs are easy to review, preserve history, and more accurately reflect your contributions.

@williazz williazz force-pushed the PerformanceResourceTimingEvent branch from b9a5cf7 to 98f1dba Compare January 13, 2024 01:09
@williazz
Copy link
Contributor Author

Final touches:

  1. Remove version
  2. Add enum for initiatorType

Tech debt:

  1. Schema is too strict and will not accept new values that PerformanceAPI may put out in the future for the fields we are expecting. Only relevant for initiatorType. See feat: capture all W3C fields in NavigationEvents #494 (comment)
  2. isPutRumEvents() bug, see feat: capture all W3C fields in ResourceEvents #489 (comment)

domainLookupStart: r.domainLookupStart,
fetchStart: r.fetchStart,
encodedBodySize: r.encodedBodySize,
initiatorType: r.initiatorType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: with the schema update, the initiatorType has to be one of the 21 valid initiatorTypes. in the case that there are new initiatorTypes that are added, events with the new initiatorTypes will not be ingested. could be out of scope for this PR but we might need to handle this edge case by adding a catch-all value (e.g. "other") that is considered to be a valid initiatorType so that events aren't dropped until we update the schema with the new initiatorType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williazz williazz force-pushed the PerformanceResourceTimingEvent branch from 67e121a to 92b8ea4 Compare February 1, 2024 07:49
@williazz williazz merged commit ac3f0af into aws-observability:release/2.0.x Feb 9, 2024
3 checks passed
williazz added a commit to williazz/aws-rum-web that referenced this pull request Feb 20, 2024
* fix: replace ResourceEvent with W3C compliant PerformanceResourceTimingEvent (breaking)

* fix: firefox

* chore: add version 2.0.0

* chore: dispatch omitted resource fields to event bus

* chore: remove internalMessage

* chore: cleanup

* fix: flaky ms edge integ test

* chore: add unit tests for event type validation

* fix: integ test port

* fix: flaky ms edge integ test

* chore: add links to stylesheet filetype

* chore: test

* fix: disable flaky tests on edge

* chore: add integ test for value check

* fix: port

* chore: remove server timing for now

* chore: observe()

* chore: restore ignore comment

* chore: add length check to integ test

* Revert "chore: add length check to integ test"

This reverts commit 5061775.

* chore: change title to PerformanceResourceTimingEvent

* chore: remove version and add enum for initiatorType

* fix: isPutRumEvents()

* Revert "fix: isPutRumEvents()"

This reverts commit 3e29234.

* chore: add initiatorType "other"

* chore: add runtime check for resource

* chore: restore name

* chore: restore name

* Add a wait to see if it fixes integ tests

---------

Co-authored-by: Quinn Hanam <[email protected]>
williazz added a commit that referenced this pull request Feb 20, 2024
* fix: replace ResourceEvent with W3C compliant PerformanceResourceTimingEvent (breaking)

* fix: firefox

* chore: add version 2.0.0

* chore: dispatch omitted resource fields to event bus

* chore: remove internalMessage

* chore: cleanup

* fix: flaky ms edge integ test

* chore: add unit tests for event type validation

* fix: integ test port

* fix: flaky ms edge integ test

* chore: add links to stylesheet filetype

* chore: test

* fix: disable flaky tests on edge

* chore: add integ test for value check

* fix: port

* chore: remove server timing for now

* chore: observe()

* chore: restore ignore comment

* chore: add length check to integ test

* Revert "chore: add length check to integ test"

This reverts commit 5061775.

* chore: change title to PerformanceResourceTimingEvent

* chore: remove version and add enum for initiatorType

* fix: isPutRumEvents()

* Revert "fix: isPutRumEvents()"

This reverts commit 3e29234.

* chore: add initiatorType "other"

* chore: add runtime check for resource

* chore: restore name

* chore: restore name

* Add a wait to see if it fixes integ tests

---------

Co-authored-by: Quinn Hanam <[email protected]>
williazz added a commit that referenced this pull request Jul 16, 2024
* fix: replace ResourceEvent with W3C compliant PerformanceResourceTimingEvent (breaking)

* fix: firefox

* chore: add version 2.0.0

* chore: dispatch omitted resource fields to event bus

* chore: remove internalMessage

* chore: cleanup

* fix: flaky ms edge integ test

* chore: add unit tests for event type validation

* fix: integ test port

* fix: flaky ms edge integ test

* chore: add links to stylesheet filetype

* chore: test

* fix: disable flaky tests on edge

* chore: add integ test for value check

* fix: port

* chore: remove server timing for now

* chore: observe()

* chore: restore ignore comment

* chore: add length check to integ test

* Revert "chore: add length check to integ test"

This reverts commit 5061775.

* chore: change title to PerformanceResourceTimingEvent

* chore: remove version and add enum for initiatorType

* fix: isPutRumEvents()

* Revert "fix: isPutRumEvents()"

This reverts commit 3e29234.

* chore: add initiatorType "other"

* chore: add runtime check for resource

* chore: restore name

* chore: restore name

* Add a wait to see if it fixes integ tests

---------

Co-authored-by: Quinn Hanam <[email protected]>
williazz added a commit that referenced this pull request Jul 16, 2024
* fix: replace ResourceEvent with W3C compliant PerformanceResourceTimingEvent (breaking)

* fix: firefox

* chore: add version 2.0.0

* chore: dispatch omitted resource fields to event bus

* chore: remove internalMessage

* chore: cleanup

* fix: flaky ms edge integ test

* chore: add unit tests for event type validation

* fix: integ test port

* fix: flaky ms edge integ test

* chore: add links to stylesheet filetype

* chore: test

* fix: disable flaky tests on edge

* chore: add integ test for value check

* fix: port

* chore: remove server timing for now

* chore: observe()

* chore: restore ignore comment

* chore: add length check to integ test

* Revert "chore: add length check to integ test"

This reverts commit 5061775.

* chore: change title to PerformanceResourceTimingEvent

* chore: remove version and add enum for initiatorType

* fix: isPutRumEvents()

* Revert "fix: isPutRumEvents()"

This reverts commit 3e29234.

* chore: add initiatorType "other"

* chore: add runtime check for resource

* chore: restore name

* chore: restore name

* Add a wait to see if it fixes integ tests

---------

Co-authored-by: Quinn Hanam <[email protected]>
williazz added a commit that referenced this pull request Jul 16, 2024
* fix: replace ResourceEvent with W3C compliant PerformanceResourceTimingEvent (breaking)

* fix: firefox

* chore: add version 2.0.0

* chore: dispatch omitted resource fields to event bus

* chore: remove internalMessage

* chore: cleanup

* fix: flaky ms edge integ test

* chore: add unit tests for event type validation

* fix: integ test port

* fix: flaky ms edge integ test

* chore: add links to stylesheet filetype

* chore: test

* fix: disable flaky tests on edge

* chore: add integ test for value check

* fix: port

* chore: remove server timing for now

* chore: observe()

* chore: restore ignore comment

* chore: add length check to integ test

* Revert "chore: add length check to integ test"

This reverts commit 5061775.

* chore: change title to PerformanceResourceTimingEvent

* chore: remove version and add enum for initiatorType

* fix: isPutRumEvents()

* Revert "fix: isPutRumEvents()"

This reverts commit 3e29234.

* chore: add initiatorType "other"

* chore: add runtime check for resource

* chore: restore name

* chore: restore name

* Add a wait to see if it fixes integ tests

---------

Co-authored-by: Quinn Hanam <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants