Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dont randomize the first charachters #74

Closed
baloght opened this issue Jan 9, 2020 · 23 comments
Closed

Dont randomize the first charachters #74

baloght opened this issue Jan 9, 2020 · 23 comments
Labels
good-first-issue A problem which is simple to fix and gets you to know the sources.
Milestone

Comments

@baloght
Copy link

baloght commented Jan 9, 2020

Hi,

great tool, works fine!
Unfortunately I miss a feature. I would like to keep the first 3 chars/digits in a string.

5550123456789 --> 5551743025698

Is there any way to achieve this with the current code? If not, is there any plan you will deploy this feature later?

Cheers,
T

@realrolfje
Copy link
Owner

Hi Balogh, Thanks for trying out Anonimatron! The feature you are asking about is something that can be easily realized by making a simple Anonymizer and puting that on the classpath of Anonimatron. I haven't documented it very well, this may be a good trigger to do just that.

If you have a bit of Java knowledge, you should be able to build a class like this which can generate the strings you want.

Let me know if this helps.

@realrolfje realrolfje added the good-first-issue A problem which is simple to fix and gets you to know the sources. label Jan 13, 2020
@BElluu
Copy link
Contributor

BElluu commented Jan 19, 2020

Hello. Can I take care of this task?

@realrolfje
Copy link
Owner

You are more than welcome to. If you need help just give a shout!

@baloght
Copy link
Author

baloght commented Jan 20, 2020

Hello. Can I take care of this task?

Thank you that would be great. Let me know if I can help in testing.

@BElluu
Copy link
Contributor

BElluu commented Jan 21, 2020

Hi @realrolfje
please give me advice - should I create new class for that function or just add new method to CharacterStringAnonymizer class?

@realrolfje
Copy link
Owner

Personally I'd start with a separate class, maybe use the CharacterStringAnonymizer as a superclass to get you started. In our case, I see we also need digits.

Maybe @baloght can tell us what type of data this is, like "phone number" or "address" or "name", so we know if we need only digits, only upper case characters, or mixed case.

@BElluu
Copy link
Contributor

BElluu commented Jan 21, 2020

#77

@baloght Does it meet your needs?
@realrolfje Can you give me code review?

@realrolfje
Copy link
Owner

Great work @BElluu , I've added my comments in a code review.

@BElluu
Copy link
Contributor

BElluu commented Jan 21, 2020

@realrolfje where can I see you comments? Beucase in my pull request I do not see any conversation. In files changes do not see any comments too :)

---EDIT---
nvm. I see your comments. I will check that tomorrow. Thanks! :)

@realrolfje
Copy link
Owner

Yes sorry my fault, I posted the comment before I saved the review. Take your time, I haven't got a release schedule to keep :-)

@BElluu
Copy link
Contributor

BElluu commented Jan 27, 2020

Hi @realrolfje Sorry but I only had time today to improve it. Could you verify if I did it correctly?

@baloght
Copy link
Author

baloght commented Jan 28, 2020

phone number

Hi, sorry for the late reply. The column contains phone numbers so data always will be digit.

@baloght
Copy link
Author

baloght commented Jan 29, 2020

Hi,

thanks for working on it.
Just a note:
Correct me if Im wrong but based on the pull request the digits are changed to totally random values.
In this case if we randomize the same phone-number at the same run we get different results.
I would like to keep the original RANDOMDIGITS behaviour where you always get the same result with the same salt string.
Wouldn't be easier to extend type RANDOMDIGITS with parameter options?

@realrolfje
Copy link
Owner

Hi Baloght, Getting the same output for the same input is handled by Anonimatron, not the anonymizer. This is because it knows (loads) it's synonyms for each run.

@realrolfje
Copy link
Owner

Working on getting it working for you. It generates consecutive digit strings now, keeping the first x digits of the original string. I have two questions:

  1. How would you call this, e.g. what are you anonymizing, is it a "Finland Phone Number" for instane?
  2. Is the format always consecutve digits, or are the dashes and spaces possible, and do you want those in the output?

@baloght
Copy link
Author

baloght commented Feb 10, 2020

I would like to call the function for msisdns (phone numbers from different countries) where the first x digits comes from the country code and provider ID. I want to keep the original country codes and provider IDs.
eg: 436701234567
36201234567

In my case the format is always only digits, but if you think it could worth to add other options for special characters. Maybe those will come in handy for others, and the tool would be even more customizable.

Maybe you don't like the idea, but I think that would be a nice feature, if users can add pattern as parameter for a given anonymization type.
For example this could be the pattern: XXXOOXXOOO
where X means keep the original character and O means replace to anonymized character.

Then the digits 5553433478 would be 5552133659 after the anonymization. I think this method would be extremely customizable instead a 'keep first or last 5 characters' method.
But again, this highly customizable feature is not needed for me, just an idea.

@realrolfje
Copy link
Owner

That is actually a brilliant idea! It makes it more flexible and also fixes my original question: If there is a dash or space in there, you can just mask it out and it will not be replaced with a number. I'll see if I can change it to your suggestions. I need to rename the class and type too, it will be worth it I think.

@realrolfje
Copy link
Owner

I added it as a feature to the DigitStringAnonymizer we already have. Have a look at the javadoc of that method, does that look usable for you?

@baloght
Copy link
Author

baloght commented Feb 12, 2020

Yes, it seems pretty fine.
I will try to build and test it in the next day.

@BElluu
Copy link
Contributor

BElluu commented Feb 14, 2020

@realrolfje I saw you merged my branch. Sorry but I was too busy to doing something ;/

@realrolfje
Copy link
Owner

Hi @BElluu no problem, I was just a bit impatient, sorry ;-) Did you see Balogh's ideas about extending the DigitStringAnonymizer? It is a more flexible solution to Balogh's request. I may re-implement the loop to be faster, but I think this will do the trick. If you check out the feature/partial-character-anonymizer-cleanup branch you can play with it and let me know what you think.

@realrolfje
Copy link
Owner

Re-implented the loop: 3 times faster :-) Happy with that.

@realrolfje
Copy link
Owner

realrolfje commented Feb 15, 2020

Released! Enjoy your new Anonymizer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-issue A problem which is simple to fix and gets you to know the sources.
Projects
None yet
Development

No branches or pull requests

3 participants