Jump to content

Obfuscating data from one database to create another.

Levent
Go to solution Solved by Sauron,

You could query the columns separately and in random order, only keeping track of what indexes have already been queried and not which entries correspond to them. Then insert the results next to each other.

Hi,

 

I got a database I pulled out of a production database, this database contains sensitive information and I blanked out of most of the sensitive data before even pulling it out of the server BUT issue I am facing is, I need some data to be obfuscated or mixed instead of being blanked out so that it can be used in the test environment.

 

Let say I got two columns, Make and Model. I would be fine if one row of Make is replaced with random Model from another row.

 

Any ideas?

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

You could query the columns separately and in random order, only keeping track of what indexes have already been queried and not which entries correspond to them. Then insert the results next to each other.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

We had something like that at my previous employer, which we used when we asked customers for a database dump. You could specify tables, columns and a cycle count. It would then randomly select two rows/columns from the table/column set and swap their contents. With a high enough cycle count the data was sufficiently anonymized for developer use as test data.

 

So basically:

  1. Specify a set of tables and associated columns you want to mix
  2. Select two random tables from tables as table1 and table2
  3. Select random columns from table1.columns and table2.columns
  4. Select a random index for each table (based on total count)
  5. Swap their contents
  6. Repeat n times

It's quite possible n was automatically selected based on the number of database entries. As far as I remember the software didn't keep track of the the things it had already swapped. I think the argument was that this actually improves randomness because otherwise the amount of data to choose from is reduced over time.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×