import { copycat } from '@snaplet/copycat'
copycat.email('foo')
// => '[email protected]'
copycat.email('bar')
// => '[email protected]'
copycat.email('foo')
// => '[email protected]'
Many of the use cases we aim on solving with snaplet involve anonymizing sensitive information. In practice, this involves replacing each bit of sensitive data with something else that resembles the original value, yet does not allow the original value to be inferred.
To do this, we initially turned to faker for replacing the sensitive data with fake data. This approach took us quite far. However, we struggled with getting the replacement data to be deterministic: we found we did not have enough control over how results are generated to be able to easily ensure that for each value of the original data we wanted to replace, we'd always get the same replacement value out.
Faker allows one to seed a psuedo-random number generator (PRNG), such that the same sequence of values will be generated every time. While this means the sequence is deterministic, the problem was we did not have enough control over where the next value in the sequence was going to be used. Changes to the contents or structure in the original data we're replacing and changes to how we are using faker both had an effect on the way we used this sequence, which in turn had an effect on the resulting replacement value for any particular value in the original data. In other words, we had determinism, but not in a way that is useful for our purposes.
What we were really needing was not the same sequence of generated values every time, but the same mapping to generated values every time.
This is exactly what we designed copycat
to do. For each method provided by copycat, a given input value will always map to the same output value.
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
// => '[email protected]'
copycat.email('bar')
// => '[email protected]'
copycat.email('foo')
// => '[email protected]'
Copycat work statelessly: for the same input, the same value will be returned regardless of the environment, process, call ordering, or any other external factors.
Under the hood, copycat hashes the input values (in part relying on md5), with the intention of making it computationally infeasible for the input values to be inferred from the output values.
It is still technically possible to make use of faker or similar libraries that offer deterministic PRNG - with some modification. That said, these solutions came with practical limitations that we decided made them less viable for us:
- It is possible to simply seed the PRNG for every identifier, and then use it to generate only a single value. This seems to be a misuse of these libraries though: there is an up-front cost to seeding these PRNGs that can be expensive if done for each and every value to be generated. Here are benchmarks that point to this up-front cost.
- You can generate a sequence of N values, hash identifiers to some integer smaller than N, then simply use that as an index to lookup a value in the sequence. This can even be done lazily. Still, you're now limiting the uniqueness of the values to N. The larger N is, the larger the cost of keeping these sequences in memory, or the more computationally expensive it is if you do not hold onto the sequences in memory. The smaller N is, the less unique your generated values are.
Note though that for either of these approaches, hashing might also still be needed to make it infeasible for the inputs to be inferred from the outputs.
All copycat functions take in an input
value as their first parameter:
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
// => '[email protected]'
The given input can be any JSON-serializable value. For any two calls to the same function, the input given in each call serializes down to the same value, the same output will be returned.
Note that unlike JSON.stringify()
, object property ordering is not considered.
A re-export of faker
from @faker-js/faker
. We do not alter faker in any way, and do not seed it.
Takes in an input
value and an array of values
, and returns an item in values
that corresponds to that input
:
copycat.oneOf('foo', ['red', 'green', 'blue'])
// => 'red'
Takes in an input
value and a function fn
, calls that function repeatedly (each time with a unique input) for a number of times within the given range
, and returns the results as an array:
copycat.times('foo', [4, 5], copycat.word)
// => [ 'Raeko', 'Vame', 'Kiyumo', 'Koviva', 'Kiyovami' ]
As shown above, range
can be a tuple array of the minimum and maximum possible number of times the maker should be called. It can also be given as a number, in which case fn
will be called exactly that number of times:
copycat.times('foo', 2, copycat.word)
// => [ 'Raeko', 'Vame' ]
Takes in an input
value and returns an integer.
int('foo')
// => 2196697842
min=0
andmax=Infinity
: the minimum and maximum possible values for returned numbers
Takes in an input
value and returns a boolean.
copycat.bool('foo')
// => false
Takes in an input
value and returns a number value with both a whole and decimal segment.
copycat.float('foo')
// => 2566716916.329745
Takes in an input
value and returns a string with a single character.
copycat.char('foo')
// => 'M'
The generated character will be an alphanumeric: lower and upper case ASCII letters and digits 0 to 9.
Takes in an input
value and returns a string with a single digit value.
copycat.digit('foo')
// => '2'
Takes in an input
value and returns a string with a single hex value.
copycat.hex('foo')
// => '2'
min=0
andmax=Infinity
: the minimum and maximum possible values for returned numbers
Takes in an input
value and returns a string representing a date in ISO 8601 format.
dateString('foo')
// => '1982-07-11T18:47:39.000Z'
minYear=1980
andmaxYear=2019
: the minimum and maximum possible year values for returned dates
Takes in an input and returns a string value resembling a uuid.
copycat.uuid('foo')
// => '540b95dd-98a2-56fe-9c95-6e7123c148ca'
Takes in an input and returns a string value resembling an email address.
copycat.email('foo')
// => '[email protected]'
Takes in an input and returns a string value resembling a first name.
copycat.firstName('foo')
// => 'Alejandrin'
Takes in an input and returns a string value resembling a last name.
copycat.lastName('foo')
// => 'Keeling'
Takes in an input and returns a string value resembling a full name.
copycat.fullName('foo')
// => 'Zakary Hessel'
Takes in an input and returns a string value resembling a phone number.
copycat.phoneNumber('foo')
// => '+3387100418630'
note The strings resemble phone numbers, but will not always be valid. For example, the country dialing code may not exist, or for a particular country, the number of digits may be incorrect. Please let us know if you need valid phone numbers, and feel free to contribute :)
Takes in an input and returns a string value resembling a username.
copycat.username('foo')
// => 'Zakary.Block356'
Takes in an input
value and returns a string value resembling a password.
password('foo')
// => 'uRkXX&u7^uvjX'
Note: not recommended for use as a personal password generator.
Takes in an input and returns a string value representing a city.
copycat.city('foo')
// => 'Garland'
Takes in an input and returns a string value representing a country.
copycat.country('foo')
// => 'Bosnia and Herzegovina'
Takes in an input and returns a string value representing a fictitious street name.
copycat.streetName('foo')
// => 'Courtney Orchard'
Takes in an input and returns a string value representing a fictitious street address.
copycat.streetAddress('foo')
// => '757 Evie Vista'
Takes in an input and returns a string value representing a fictitious postal address.
copycat.postalAddress('foo')
// => '178 Adaline Forge, Moreno Valley 8538, Haiti'
Takes in an input and returns a string value representing a country code.
copycat.countryCode('foo')
// => 'BV'
Takes in an input and returns a string value representing a time zone.
copycat.timezone('foo')
// => 'Asia/Tbilisi'
Takes in an input
value and returns a string value resembling a fictitious word.
copycat.word('foo')
// => 'Kinkami'
capitalize=true
: whether or not the word should start with an upper case letterminSyllables=2
andmaxSyllables=4
: the minimum and maximum possible number of syllables that returned words will contain
word('id-2', {
minSyllables: 1,
maxSyllables: 6,
unicode: 0.382
})
// =>
'Rayuashira'
Takes in an input
value and returns a string value resembling fictitious words.
copycat.words('foo')
// => 'Niko vichinashi'
min=2
andmax=3
: the minimum and maximum possible number of words that returned strings will contain.capitalize='first'
: whether or not the words should start with upper case letters. Iftrue
or'all'
is given, each string returned will start with an upper case letter in each word. If'first'
is given, for each string returned, only the first word will start with an upper case letter. Iffalse
is given, each string returned will always contain only lower case letters.minSyllables=1
andmaxSyllables=4
: the minimum and maximum possible number of syllables that returned words will contain
Takes in an input
value and returns a string value resembling a sentence of fictitious words.
copycat.sentence('foo')
// => 'Kiraevavi somani kihy viyoshi nihahyke kimeraeni.'
minClauses=1
andmaxClauses=2
: the minimum and maximum possible number of clauses that a returned sentence will contain.minWords=5
andmaxWords=8
: the minimum and maximum possible number of words that each clause will contain.minSyllables=1
andmaxSyllables=4
: the minimum and maximum possible number of syllables that returned words will contain
Takes in an input
value and returns a string value resembling a paragraph of fictitious words.
copycat.paragraph('foo')
// => 'Vakochiko ke rako kimuvachi hayuso mi vako kaichina, mishi mukaimo hakin va racea. Raechime miko kaimo keki shi navi makin yomehyha, na hya nano kin yokimo rae ra. Ke chi kakinaki kakorae machi. Raeva ka kaiko muvani ka racea kaichiyuchi muvinota, sokaiyu komechino shiso yuha raeraceaki kin chitavi. Kokaiashi chirako rae muyo vachi mukani nakoyuta kinmochikai, muhamuva hy mayushita ke shimo takinka notavi kinvayo.'
minSentences=3
andminSentences=7
: the minimum and maximum possible number of sentences that a returned paragraph will contain.minClauses=1
andmaxClauses=2
: the minimum and maximum possible number of clauses that each sentence will contain.minWords=5
andmaxWords=8
: the minimum and maximum possible number of words that each clause will contain.minSyllables=1
andmaxSyllables=4
: the minimum and maximum possible number of syllables that returned words will contain
Takes in an input
value and returns a string value resembling an IPv4 address.
copycat.ipv4('foo')
// => '166.164.23.159'
Takes in an input
value and returns a string value resembling a MAC address.
copycat.mac('foo')
// => 'e1:2c:54:74:b7:80'
Takes in an input
value and returns a string value resembling a browser User Agent string.
copycat.userAgent('foo')
// => 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.3; Trident/3.1; .NET CLR 1.2.39149.4)'
note For simplicity, this is currently working off of a list of 500 pre-defined user agent strings. If this is too limiting for your needs and you need something more dynamic than this, please let us know, and feel free to contribute :)