This code serves as a supplement to the paper titled Accessible Color Sequences for Data Visualization. It contains code for creating random color sets with a minimum perceptual distance enforced, for running a color-cycle aesthetic-preference survey, for creating a machine-learning aesthetic-preference model based on the survey results, and for creating final color cycles that balance accessibility with aesthetics. Additionally, it includes data resulting from the survey and the analysis.
The analysis was performed under Ubuntu 18.04 on a dual-socket system with Intel Xeon E5-2690 v4 CPUs and a Python 3.6.9 virtual environment defined by the requirements.txt
file (installed with pip==21.1.1
, setuptools==56.2.0
, wheel==0.36.2
).
Notebooks were executed with:
$ jupyter nbconvert --ExecutePreprocessor.timeout=-1 --ExecutePreprocessor.record_timing=False --to notebook --execute notebook.ipynb
The final results of the present analysis are located in the aesthetic-models/top-cycles.json
file. The top six-color color cycle is:
["#5790fc", "#f89c20", "#e42536", "#964a8b", "#9c9ca1", "#7a21dd"]
The top eight-color color cycle is:
["#1845fb", "#ff5e02", "#c91f16", "#c849a9", "#adad7d", "#86c8dd", "#578dff", "#656364"]
The top ten-color color cycle is:
["#3f90da", "#ffa90e", "#bd1f01", "#94a4a2", "#832db6", "#a96b59", "#e76300", "#b9ac70", "#717581", "#92dadd"]
The set-generation
directory contains scripts for generating color sets. The gen_color_sets.py
script generates random color sets with perceptual-distance constraints. The sets used for the color-cycle survey can be regenerated with:
$ python3 gen_color_sets.py --num-colors 6 --min-light-dist 2 --min-color-dist 20 --include-bug
$ python3 gen_color_sets.py --num-colors 8 --min-light-dist 2 --min-color-dist 18 --include-bug
$ python3 gen_color_sets.py --num-colors 10 --min-light-dist 2 --min-color-dist 16 --include-bug
The final color sets can be regenerated with:
$ python3 gen_color_sets.py --num-colors 6 --min-light-dist 5.0 --min-color-dist 20 --max-j 80
$ python3 gen_color_sets.py --num-colors 8 --min-light-dist 4.2 --min-color-dist 18 --max-j 82
$ python3 gen_color_sets.py --num-colors 10 --min-light-dist 3.6 --min-color-dist 16 --max-j 84
Regenerating the color sets requires several thousand CPU hours. The --include-bug
flag forces the script to include a bug that was present when the color sets used for the survey were generated, which affected how uniformly the color gamut was sampled.
The max_dist_seq.py
script generates color cycles using the sequential-search method of Glasbey et al. (2007) extended to use CAM02-UCS and color-vision-deficiency simulations. The contents of Table 1 of the paper can be regenerated with:
$ python3 max_dist_seq.py --min-j 0 --max-j 100
$ python3 max_dist_seq.py --min-j 40 --max-j 90
This process is single-threaded, requires >60GB of memory, and takes a day or two.
The survey
directory contains the code used to run the color-cycle survey. The to_hcl.py
script was used to sort the color sets generated in the previous step by hue, then chroma, then lightness to be used for the survey. The main.go
file contains the server backend code, which records survey responses to a text-based log file. The static files for the survey's frontend can be regenerated using npm run build
.
The survey-results
directory contains the results of the color-cycle survey. The parse-log.ipynb
notebook contains the code used to parse the survey results log file and contains summary statistics of the survey user sessions. The results.db
SQLite database, which was created by the previously-mentioned notebook, contains the survey responses. To protect the privacy of the survey respondents, neither the log file used to create the results database nor the database's sessions table are included in this data release. Country codes were derived from partial IP addresses using the 2019-12-24 release of the MaxMind GeoLite2 Country geolocation database. The SHA-256 hashes of the results.log
and GeoLite2-Country_20191224.mmdb
files (which are not included in this repository) are 4f4519dfb17e0cb459bb4f16d1656d46d893d56304c5ef7c136d81d851088cc8
and 9aedbe59990318581ec5d880716d3415e40bfc07280107112d6d7328c771e059
, respectively.
The validation-survey
directory contains the code used to run the validation survey, which was slightly tweaked from the main color-cycle survey code. The main.go
file contains the server backend code, which records survey responses to a text-based log file. The static files for the survey's frontend can be regenerated using npm run build
.
The validation-survey-results
directory contains the results of the validation survey. The parse-log.ipynb
notebook contains the code used to parse the survey results log file and contains summary statistics of the survey user sessions. The results.db
SQLite database, which was created by the previously-mentioned notebook, contains the survey responses.
The color-name-model
directory contains a reanalysis of the probabilistic color-name model described in Heer & Stone (2012). The post-process-heer-stone.ipynb
notebook contains this analysis, which finds and merges synonymous color names and eliminates rarely-used color names. The colornamemodel.npz
file contains the data of the post-processed model.
The aesthetic-models
directory contains the code and weights used to create and evaluate machine-learning models for aesthetic preference of color sets and cycles. The set-analysis.ipynb
notebook contains the code used to create and train the color set model, while the set-evaluation.ipynb
contains the code used to evaluate it. The cycle-analysis.ipynb
notebook contains the code used to create and train the color cycle model, while the cycle-evaluation.ipynb
notebook contains the code used to evaluate it. The weights
subdirectory contains the model weights for both models. The numpy-version
subdirectory contains a script for converting the weights from the TensorFlow format to a NumPy-compatible format and an example implementation for evaluating the model in NumPy, without the need for TensorFlow. The set-scores.npz
file contains the scores for each of the input color sets, while the top-sets.json
file contains the color sets with the highest scores. The cycle-scores.npz
file contains the scores for each ordering of the color sets with the highest scores, and the top-cycles.json
file contains the color cycles with the highest scores, which are the final results of the present analysis. The additional-evaluation.ipynb
notebook looks at the various score ranges.
Note that the v1.0 release had a data loading bug, which affected the cycle model. Although it affected the model accuracy, it did not affect the final color cycles. The original model should not be used.
The other-analysis
directory contains code for performing other analyses on the data. The max-min-dists.ipynb
notebook contains code for finding the color sets with the maximum minimum-perceptual distances, and the min-max-dist-sets.json
file contains the results of this analysis. The set-analysis-linear-model.ipynb
notebook contains an analysis of the survey responses based on a linear model. The result-plots.ipynb
notebook generates the tables and plots for this repository's companion paper. The compare-pair-preference-scores.ipynb
notebook evaluates the effectiveness of the aesthetic pair-preference model described in Gramazio et al. (2017) on the survey data. The notebook can be executed from the src
directory of Colorgorical using Git commit 90649656a57ce9743b00390473adce51a821cadc. Unfortunately, the Colorgorical repository does not include licensing information. The cycle-comparison.ipynb
notebook evaluates the accessibility of a variety of existing color cycles and compiles the results into a table. The linear-ml-set-comparison.ipynb
notebooks does a statistical comparison of the accuracies of the linear model and machine learning model for sets.
The lightness-analysis
directory contains a reanalysis of the scatter plot accessibility data collected by Smart & Szafir (2019). The lightness-analysis.ipynb
notebook contains the code used to perform this analysis, which looks at how scatter-plot-marker lightness affects response times. The color-shape_data_processed.csv file has a SHA-256 hash of c0fe510d1b673d26a9fa1085fc5da3017b27cc3814c6e587a1800a7a647c778d
but is not included in this repository as it was not distributed with explicit licensing information.
- Added validation survey and results
- Fixed data loading bug in cycle model: this affected the model but did not affect the final cycles
- Added evaluation of score ranges
- Added statistical comparison of linear model and machine learning model for sets
- Moved NumPy model out of notebook
- Initial release.
The code contained in this repository is distributed under the MIT License.
The included color-vision-deficiency simulation and color-distance calculation implementations (set-generation/color_conversions.py
) are based on Colorspacious, which is MIT licensed.
The color-name-model/c3_data.json
file is from C3 (Categorical Color Components), which is distributed under a 3-Clause BSD License.
The resulting data files (including, but not limited to, the final color cycles and the survey-results/results.db
file) are released into the public domain using the CC0 1.0 Public Domain Dedication.
This code was written by Matthew A. Petroff (ORCID:0000-0002-4436-4215).
The included scatter plot lightness analysis is based on data collected by:
Stephen Smart and Danielle Albers Szafir. 2019. Measuring the Separability of Shape, Size, and Color in Scatterplots. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM Press, New York, 14 pages. doi:10.1145/3290605.3300899
The included color naming model is based on work by:
Jeffrey Heer and Maureen Stone. 2012. Color naming models for color selection, image editing and palette design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas) (CHI '12). ACM Press, New York, 1007--1016. doi:10.1145/2207676.2208547
which used data collected by Randall Munroe's xkcd color survey.
A comparison is made to a pair-preference model developed by:
Connor C. Gramazio, David H. Laidlaw, and Karen B. Schloss. 2017. Colorgorical: Creating discriminable and preferable color palettes for information visualization. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan. 2017), 521--530. doi:10.1109/tvcg.2016.2598918
The color-vision-deficiency simulation is based on:
Gustavo M. Machado, Manuel M. Oliveira, and Leandro A. F. Fernandes. 2009. A Physiologically-based Model for Simulation of Color Vision Deficiency. IEEE Transactions on Visualization and Computer Graphics 15, 6 (Nov. 2009), 1291--1298. doi:10.1109/tvcg.2009.113