Skip to content

This is the code artifact of the paper "A Public and Reproducible Assessment of the Topics API on Real Data"

License

Notifications You must be signed in to change notification settings

yohhaan/topics_api_analysis

Repository files navigation

topics_api_analysis

This is the code artifact of the paper A Public and Reproducible Assessment of the Topics API on Real Data

@inproceedings{topics_secweb24_beugin,
      title={A Public and Reproducible Assessment of the Topics API on Real Data},
      author={Yohan Beugin and Patrick McDaniel},
      booktitle={2024 IEEE Security and Privacy Workshops (SPW)},
      year={2024},
      month={may},
}

Check out also our other topics_analysis repository.


Getting Started

  1. Clone this topics_api_analysis repository and the topics_classifier submodule at once with:
    • git clone --recurse-submodules [email protected]:yohhaan/topics_api_analysis.git (SSH)
    • git clone --recurse-submodules https://github.com/yohhaan/topics_api_analysis.git (HTTPS)

A Dockerfile is provided under .devcontainer/; for direct integration with VS Code or to manually build the image and deploy the Docker container, follow the instructions in this guide.

Reproduction Steps

Topics classification: refer to and execute the bash scripts in the corresponding folder under ./data to classify the different datasets with the Topics API:

  • CrUX: cd data/crux && ./crux.sh
  • Tranco: cd data/tranco && ./tranco.sh
  • Real Browsing Histories: cd data/web_data && ./web_data.sh

Topics evaluation: refer to the topics_simulator.py script to evaluate the Topics API (simulation of the API for users, denoising, and re-identification across epochs)

usage: python3 topics_simulator.py [-h]
                                   users_topics_tsv nb_epochs config_model_json top_list_tsv
                                   unobserved_topics_threshold repeat_each_user_n_times output_prefix

Simulate the Topics API and evaluate its privacy guarantees

positional arguments:
  users_topics_tsv
  nb_epochs
  config_model_json
  top_list_tsv
  unobserved_topics_threshold
  repeat_each_user_n_times
  output_prefix

Examples:

  • python3 topics_simulator.py data/web_data/users_topics_5_weeks.tsv 5 topics_classifier/chrome5/config.json data/crux/crux_202406_chrome5_topics-api.tsv 10 1 data/reidentification_exp/5_weeks_10_unobserved
  • python3 topics_simulator.py data/web_data/users_topics_5_weeks.tsv 5 topics_classifier/chrome5/config.json data/crux/crux_202406_chrome5_topics-api.tsv 10 100 data/denoise_exp/5_weeks_100_repetitions_10_unobserved

Analysis: to extract statistics and plot the figures, refer to the analysis.py script.

About

This is the code artifact of the paper "A Public and Reproducible Assessment of the Topics API on Real Data"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published