-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baggage #1600
Comments
Hmm, I'm not so sure. My rough reading online is that a user ID made up of non-PII is not necessarily considered PII. What would you recommend, given the context that Baggage is information that could be exposed to third-parties since it sits in HTTP headers? |
I would assume that is rarely the case? Might be true if one uses "roles as user" like "admin" or something, but with GDPR even a pseudonym or a pseudonymization (hashing) is problematic quickly. I had a great paper on that topic a few months back, let me see if I can find it Independent of that discussion, I would agree with @truekonrads that providing some examples that are non-potential-PII might be better (and also give people some idea what they could use instead of ips/user ids, etc.):
@truekonrads are you open to provide a PR with a different list of examples? |
This is one example I was referring to: https://developers.google.com/analytics/solutions/crm-integration#user_id |
Thanks for sharing! Coming from GDPR-land and having my share of customer conversations on that topic, I might be overthinking that: "non obfuscated alphanumeric database identifiers" or "encrypted identifier that is based on PII" are both pseudonymised datat that can become PII data when combined with other data (e.g. a data breach of the database holding those identifiers), so I personally prefer not recommending collecting that kind of data and provide alternatives, that might be good enough (like a way to identify a cohort a user belongs to that helps to troubleshoot your performance issue) Here's the document I was referring to (a great good night read if you have trouble sleeping) https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
|
I wanted to also chip in that logging session IDs raw is not a good idea (hashed could be a suitable). Whoever views the logs can set the session ID in a cookie and work as the user. In no way an advert for the product, but Google Cloud DLP transformations which does data masking/format preserving encryption could be a good processor. |
Baggage is not the only place affected by this kind of "please handle with care" situation and it's not the right place to answer all those questions. In the spec we have a call out for identity attributes:
So, here's my proposal:
@truekonrads would you be open to provide a PR with changes? |
I think it's also important that examples of this kind of information are specifically called out (like they are today) in the baggage doc because they sit in HTTP headers rather than the body, making them eminently more "sniffable". Words and examples can change of course, but I want to preserve that dynamic in this doc specifically since it's a unique dynamic compared to attaching similar fields on attributes for spans/logs/span events/span links |
@svrnm I'll give it a try |
thanks @truekonrads |
The documentation on Baggage provides examples of "non-sensitive data that you’re okay with potentially exposing to third parties", such as:
Under GDPR regulation, the user information is likely to be considered as personal information and should not be shared with third parties without express permission from the user.
Suggestion: rewrite this example.
The text was updated successfully, but these errors were encountered: