Friday, January 2, 2009

CSV Data Mix: Readme

The comma-separated values (CSV) file format is a tabular data format that has fields separated by the comma character and quoted by the double quote character. If a field's value contains a double quote character it is escaped with a pair of double quote characters.
 
In some cases, the values stored in CSV files could be considered sensitive or private information that should be conceal from the public. In the case that the public desires similar information stored in CSV and the information is deemed sensitive, then the private information could be randomly disorganized and then presented to the public for use. 

For example, if the public desires private customer information from an organization to test against an application that will be manipulating data in the same format, then that private customer data might be considered for concealment before presented to the testing team of the application. In this case the public would be the application development team, or testing team of the application. 

Files
csvmain.py:
  • Sample use and command line handling for CSVDataMix and CSVMap classes.
  • Type csvmain.py -h for options
csvdatamix.py:
  • Class definitions for CSVDataMix and CSVMap
Sample Use

Shuffle data in rows.csv, had headings, write to shuffled.csv, show progress, reshuffle 5 times, don't print actual data to prompt.

$ python csvmain.py -i 5 -pqt rows.csv -o shuffled.csv
Create a map between rows.csv and shuffled.csv, show progress, don't print actual data to prompt. Map will always be saved to a file in map mode.

$ python csvmain.py -mqp rows.csv shuffled.csv 

Links

No comments:

Share on Twitter