Some questions on Data Generator

Discussion of open issues, suggestions and bugs regarding database management and development tools for SQL Server
Post Reply
shoecake
Posts: 4
Joined: Mon 09 Jan 2017 10:44

Some questions on Data Generator

Post by shoecake » Mon 09 Jan 2017 11:37

I am using the Data Generator on my first non-trivial database and I have a few questions.

1. I have the option to start from some live data, is there a way to perform updates and not inserts? it would make life for me much easier and also allow me to create more realistic dev/test data. (if I can do this, then the below questions should not be needed).

2. I have three bit fields, if either of the first two are true, then the third must be false. How would I go about doing this?

3. I have a related table to this which will have multiple records. The content of the records will be different based on these bit flags. How do I setup to insert multiple records based on the foreign key? (I could create separate datasets of these records which I could randomly pick one of.)

4. I have a column which is an address, but without the postcode. How do I assign multiple generator types to one field?

5. I have a date field in one table, ideally this should be populated with a date which is between 2 dates in a related table, can I create a SQL query which references a field in the record being created?

Any help on these would be greatly appreciated.

.jp
Devart Team
Posts: 345
Joined: Wed 09 Sep 2009 06:55
Location: devart

Re: Some questions on Data Generator

Post by .jp » Tue 10 Jan 2017 12:02

Hello,

Thank for the post.

1) Data Generator adds new data to the existing one. It is possible to truncate the table and generate a completely new data set. Please specify what did you mean by data refresh?

2) What condition do you use to generate data for the linked table? You can set the following values manually:
date_1 = from 1971 to 1973
date_2 = from 1982 to 1985
for date_3, you can specify a period (in the date generator) between date_1 and date_2 -- date_3 = from 1973 to 1982

3) You need to use the FK generator for this purpose. You can specify a linked table and a column there. Alternatively, you can use the Table or View generator. In the future, we are going to add the option that allows you to specify a data source - either parent table values or random values.

4) To combine several generators, you need the following:
- Select the RegExp generator
- Click the Generator button and add 2 or more generators to the RegExp editor window.

5) Here is the code for the Python generator (for 3rd column):

Code: Select all

import random
def main(config):
 
 # Check whether any field is null then return false.
 if column1 == DBNull.Value or column2 == DBNull.Value:
   return 0
 
 return not(eval(str(column1)) and eval(str(column2)))
Best Regards.

shoecake
Posts: 4
Joined: Mon 09 Jan 2017 10:44

Re: Some questions on Data Generator

Post by shoecake » Tue 10 Jan 2017 14:18

Thanks for the replies. That will certainly help me to get further on.

Just for clarity, on point 1, the live data I have is a combination of personal data and application specific data. I need to anonymise the columns with personal info in (e.g. names, email addresses, postal addresses etc) but the application specific data does not need to be changed. The ability to perform an UPDATE and not an INSERT sql query would make this task much easier as I would not have to re-generate the application specific data which is where the complexities with related tables comes in.

.jp
Devart Team
Posts: 345
Joined: Wed 09 Sep 2009 06:55
Location: devart

Re: Some questions on Data Generator

Post by .jp » Wed 11 Jan 2017 10:17

Hi!
Thanks for the reply.

If we understand you correctly, you need the data masking functionality. What a use case is applicable in your situation:

1. There is a copy of the production database. You run a tool that masks data in specified columns, with a specified algorithm in the same database.

--or--

2. There is an empty test database without data. You need to populate the test database with help of the data generator. While populating the test database, you need to grab data from the production database, obfuscate it, and pull to the test database.

dbForge Data Generator will not help you in the first case yet, but it helps you to implement the second scenario. To do it you need to use the python generator (that is built-in into dbForge Data Generator). You can use this python generator to create a custom script that will obfuscate data. Alternatively, you can use the SQL Query generator (that is built-in into dbForge Data Generator). You can use REPLACE SQL Server function (or other) to substitute symbols in required rows.

Could you please tell us what data-masking algorithm is suitable for you?

shoecake
Posts: 4
Joined: Mon 09 Jan 2017 10:44

Re: Some questions on Data Generator

Post by shoecake » Thu 12 Jan 2017 10:07

Thanks for your help on this. I have worked out how to solve my need.

I created new tables with just the columns I needed to change and then used these to generate the test data. For the primary key field I used a "Table or View" generator to create a one to one relationship between each row.

Then I used some SQL queries to update the tables I needed and to fix up a few issues.


The Data Generator is a pretty powerful tool, but I think the documentation on the capabilities of python generator could be improved. I really mean in how it can be used to implement logic between fields.

e.g. I wanted to return the contents of a file, the pseudo logic being "if a different boolean column is true then insert a random address into this field"

the python code I was trying is:

Code: Select all

def main(config):
   
	mygen = RegexGenerator( '$"UKStreets.txt"')

	# IsSale is another column in this record
	if IsSale: 
		return mygen.Generate()

I get the error "the file name is either not specified correctly or does not exist." I have tried a number of ways to specify the UKStreets.txt file including using the config["config_path"] variable but can't work this out.
Is this even possible? It seems that the script is processed per row, but returns an array of possible data to pick from.

I also wonder what order columns are processed in so this may not always work if the IsSale column is processed after the python column? Would it be better to be able to define a python script on a row basis which could then generate data for multiple columns at once? The idea being that the python script is run per row and manipulates data just for that row.

Just my suggestions.

.jp
Devart Team
Posts: 345
Joined: Wed 09 Sep 2009 06:55
Location: devart

Re: Some questions on Data Generator

Post by .jp » Thu 12 Jan 2017 14:43

Hello,

Thank you for the questions!

You may use the following script to implement the required logic (You need to replace bit_column_name to a required one.):

Code: Select all

import random

def getStreets(): 

  fileName = config["config_path"] +'\\'+  r"UKStreets.txt"  
  with open(fileName) as f:
    content = f.readlines()
    for row in content:
      yield row

streets = list(getStreets())

def main(config):
  if bit_column_name:
    return streets[random.randint(1, len(streets)-1)]
  else:
    return 'some else'
In case the Python script contains other columns, the dependency mechanism will take it into account and generate data in the correct order. So you can use other columns in the Python script, that appears before or after. So you can use other columns in the Python script, that appears before or after.

You can find some examples of Python generators in the following folder:
C:\Users\Public\Documents\Devart\dbForge Data Generator for SQL Server\Data Generators\
AGE.xml
Death Date.xml
Integrated Circuit Card ID (ICCID).xml
Python.DUNS.xml
Python.Gravatar.xml
Python.Language Code (ISO 639-1 alpha-2, UpperCase).xml
Python.Language Code (ISO 639-1 alpha-3, UpperCase).xml
Python.LoremPixel.xml
Python.MD5.xml
Also we described some cases of how to use Python generators in our blog:

http://blog.devart.com/how-to-generate- ... rator.html
http://blog.devart.com/getting-real-cur ... erver.html
http://blog.devart.com/generating-rrela ... erver.html

Feel free to post or email us if you have other questions, we'd like to help.

Best Regards,

shoecake
Posts: 4
Joined: Mon 09 Jan 2017 10:44

Re: Some questions on Data Generator

Post by shoecake » Fri 13 Jan 2017 09:47

Thanks for the info, that will help a lot.

Post Reply