v0.15.0 - The one with less collisions #99

justinvdm · 2022-09-20T10:38:28Z

justinvdm
Sep 20, 2022

Snaplet makes use of Copycat in order to turn Personally Identifiable Information input data from your production database into output data that resembles the original value, yet does not allow the original value to be inferred.

If you had a large database though, collisions in these output values became quite likely - in other words, it would be likely that two different input values in your database would share the same output value returned by Copycat. For example, if you had a table with 77,000 rows in it, and you were using copycat.uuid() for a particular column, there was about a 50% chance of two rows sharing the same output value for that column.

In this release, we're using a newer version of copycat (0.6.0) that should make collisions significantly less likely: under the hood, copycat is now using md5 alone for hashing.

Of course, this still depends on the data type. For example, for copycat.uuid(), collisions are significantly less likely than for copycat.firstName(), simply because the range of output values is larger.

You can expect some more updates ahead with more details about the new collision probabilities for copycat.

⚠️ heads up: All of the transformed values of any new snapshots will now have entirely different values to what they were before: for any given input value, copycat will now generate a different value to what it previously did.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.15.0 - The one with less collisions #99

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

v0.15.0 - The one with less collisions #99

justinvdm Sep 20, 2022

Replies: 0 comments

justinvdm
Sep 20, 2022