You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
Snaplet makes use of Copycat in order to turn Personally Identifiable Information input data from your production database into output data that resembles the original value, yet does not allow the original value to be inferred.
If you had a large database though, collisions in these output values became quite likely - in other words, it would be likely that two different input values in your database would share the same output value returned by Copycat. For example, if you had a table with 77,000 rows in it, and you were using copycat.uuid() for a particular column, there was about a 50% chance of two rows sharing the same output value for that column.
In this release, we're using a newer version of copycat (0.6.0) that should make collisions significantly less likely: under the hood, copycat is now using md5 alone for hashing.
Of course, this still depends on the data type. For example, for copycat.uuid(), collisions are significantly less likely than for copycat.firstName(), simply because the range of output values is larger.
You can expect some more updates ahead with more details about the new collision probabilities for copycat.
⚠️ heads up: All of the transformed values of any new snapshots will now have entirely different values to what they were before: for any given input value, copycat will now generate a different value to what it previously did.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Snaplet makes use of Copycat in order to turn Personally Identifiable Information input data from your production database into output data that resembles the original value, yet does not allow the original value to be inferred.
If you had a large database though, collisions in these output values became quite likely - in other words, it would be likely that two different input values in your database would share the same output value returned by Copycat. For example, if you had a table with 77,000 rows in it, and you were using
copycat.uuid()
for a particular column, there was about a 50% chance of two rows sharing the same output value for that column.In this release, we're using a newer version of copycat (0.6.0) that should make collisions significantly less likely: under the hood, copycat is now using md5 alone for hashing.
Of course, this still depends on the data type. For example, for
copycat.uuid()
, collisions are significantly less likely than forcopycat.firstName()
, simply because the range of output values is larger.You can expect some more updates ahead with more details about the new collision probabilities for copycat.
Beta Was this translation helpful? Give feedback.
All reactions