Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baggage #1600

Open
truekonrads opened this issue Aug 6, 2022 · 9 comments
Open

Baggage #1600

truekonrads opened this issue Aug 6, 2022 · 9 comments
Assignees
Labels
docs e0-minutes Effort: < 60 min help wanted Extra attention is needed stale

Comments

@truekonrads
Copy link

The documentation on Baggage provides examples of "non-sensitive data that you’re okay with potentially exposing to third parties", such as:

  • Account Identification;
  • User Ids;
  • Product Ids;and
  • origin IPs,
    Under GDPR regulation, the user information is likely to be considered as personal information and should not be shared with third parties without express permission from the user.

Suggestion: rewrite this example.

@cartermp cartermp added docs e0-minutes Effort: < 60 min labels Aug 7, 2022
@cartermp
Copy link
Contributor

cartermp commented Aug 7, 2022

Hmm, I'm not so sure. My rough reading online is that a user ID made up of non-PII is not necessarily considered PII.

What would you recommend, given the context that Baggage is information that could be exposed to third-parties since it sits in HTTP headers?

@svrnm
Copy link
Member

svrnm commented Aug 8, 2022

a user ID made up of non-PII is not necessarily considered PII.

I would assume that is rarely the case? Might be true if one uses "roles as user" like "admin" or something, but with GDPR even a pseudonym or a pseudonymization (hashing) is problematic quickly. I had a great paper on that topic a few months back, let me see if I can find it

Independent of that discussion, I would agree with @truekonrads that providing some examples that are non-potential-PII might be better (and also give people some idea what they could use instead of ips/user ids, etc.):

  • User Segmentation (loyalty group, service level groups)
  • SaaS Tenant
  • high level Geo Data (Country, State, City) or specific data relevant for your company (office, store location, ...)
  • product / booking details (product ids, flight details (departure & destination, maybe not the seat number..)
  • any other information that might influence the behavior of a down stream service and with that lead to performance issues.

@truekonrads are you open to provide a PR with a different list of examples?

@cartermp
Copy link
Contributor

cartermp commented Aug 8, 2022

I would assume that is rarely the case?

This is one example I was referring to: https://developers.google.com/analytics/solutions/crm-integration#user_id

@svrnm
Copy link
Member

svrnm commented Aug 8, 2022

Thanks for sharing! Coming from GDPR-land and having my share of customer conversations on that topic, I might be overthinking that: "non obfuscated alphanumeric database identifiers" or "encrypted identifier that is based on PII" are both pseudonymised datat that can become PII data when combined with other data (e.g. a data breach of the database holding those identifiers), so I personally prefer not recommending collecting that kind of data and provide alternatives, that might be good enough (like a way to identify a cohort a user belongs to that helps to troubleshoot your performance issue)

Here's the document I was referring to (a great good night read if you have trouble sleeping) https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf

A specific pitfall is to consider pseudonymised data to be equivalent to anonymised data.
The Technical Analysis section will explain that pseudonymised data cannot be equated to
anonymised information as they continue to allow an individual data subject to be singled out
and linkable across different data sets. Pseudonymity is likely to allow for identifiability, and
therefore stays inside the scope of the legal regime of data protection

@truekonrads
Copy link
Author

I wanted to also chip in that logging session IDs raw is not a good idea (hashed could be a suitable). Whoever views the logs can set the session ID in a cookie and work as the user. In no way an advert for the product, but Google Cloud DLP transformations which does data masking/format preserving encryption could be a good processor.

@svrnm
Copy link
Member

svrnm commented Aug 9, 2022

Baggage is not the only place affected by this kind of "please handle with care" situation and it's not the right place to answer all those questions.

In the spec we have a call out for identity attributes:

Given the sensitive nature of this information, SDKs and exporters SHOULD drop these attributes by default and then provide a configuration parameter to turn on retention for use cases where the information is required and would not violate any policies or regulations.

So, here's my proposal:

  • have some more non-PII examples for baggage
  • keep some of the PII (or potentially PIIs) and add a note that this data needs to be handled with care.

@truekonrads would you be open to provide a PR with changes?

@cartermp
Copy link
Contributor

cartermp commented Aug 9, 2022

I think it's also important that examples of this kind of information are specifically called out (like they are today) in the baggage doc because they sit in HTTP headers rather than the body, making them eminently more "sniffable". Words and examples can change of course, but I want to preserve that dynamic in this doc specifically since it's a unique dynamic compared to attaching similar fields on attributes for spans/logs/span events/span links

@truekonrads
Copy link
Author

@svrnm I'll give it a try

@svrnm svrnm assigned svrnm and truekonrads and unassigned svrnm Aug 16, 2022
@svrnm
Copy link
Member

svrnm commented Aug 16, 2022

thanks @truekonrads

@svrnm svrnm added the help wanted Extra attention is needed label Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs e0-minutes Effort: < 60 min help wanted Extra attention is needed stale
Projects
None yet
Development

No branches or pull requests

4 participants