You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using the benchmarks evaluation function to test some privacy metrics.
The Docs state that k-anonimity and l-diversity are ratios of the respective metrics between the original and synthetic dataset. However, in the benchmarks evaluation function I find both k-anonymity.gt and k-anomymity.syn, hence computing k-anon for both original and synthetic datasets.
In my case, I get k-anonimity = 59 for the original and k = 58.5 for the synthetic dataset, but is this actually a ratio? And with respect to what?
Moreover, k-anonymity and l-diversity should report different information, while for some reason they assume the same values all the time. Is k-anonymity computed only on the sensitive attributes without quasi-identifiers?
It would be great to improve consistency between documentation and github/readme description of metrics.
Screenshots
Screenshot uploaded below
System Information
AWS cloud environment
Language Version: Python 3.10.8, IPython: 8.29.0, jupyterlab: 1.2.21
Package Manager Version: conda 24.7.1
Browser: Firefox
The text was updated successfully, but these errors were encountered:
Thanks for your help in cleaning up the documentation. The correct definition is the on in the README. I will update the documentation to match.
Why does 𝑙=𝑘
The values of k-anonymity and l-diversity can be identical in specific scenarios where the conditions for both metrics align perfectly. Here's how this can happen:
Equivalence Classes with Exactly k Records:
Suppose the dataset is structured such that each equivalence class (group of records with identical QI values) contains exactly 𝑘 records.
Implication: The dataset satisfies k-anonymity by definition since each record is part of a group of size
𝑘.
Distinct Sensitive Attribute Values within Each Equivalence Class:
Within each of these equivalence classes, if every record has a unique value for the sensitive attribute(s), then:
Number of Distinct Sensitive Values (𝑙) = Number of Records (𝑘)
Implication: This satisfies l-diversity with 𝑙=𝑘, as there are 𝑘 distinct sensitive values within each equivalence class.
Resulting Equality:
Since each equivalence class has exactly 𝑘 records and 𝑙=𝑘, both k-anonymity and l-diversity metrics will yield the same value for the dataset.
Example: If 𝑘=3 and within every group of 3 records sharing the same QIs, there are 3 distinct sensitive attribute values, then both k-anonymity and l-diversity will report a value of 3.
I'm using the benchmarks evaluation function to test some privacy metrics.
The Docs state that k-anonimity and l-diversity are ratios of the respective metrics between the original and synthetic dataset. However, in the benchmarks evaluation function I find both k-anonymity.gt and k-anomymity.syn, hence computing k-anon for both original and synthetic datasets.
In my case, I get k-anonimity = 59 for the original and k = 58.5 for the synthetic dataset, but is this actually a ratio? And with respect to what?
Moreover, k-anonymity and l-diversity should report different information, while for some reason they assume the same values all the time. Is k-anonymity computed only on the sensitive attributes without quasi-identifiers?
It would be great to improve consistency between documentation and github/readme description of metrics.
Screenshots
Screenshot uploaded below



System Information
The text was updated successfully, but these errors were encountered: