Thursday, January 1, 2009

CSV Data Mix Project Page

Welcome to the CSV Data Mix project page. This page used to be hosted on Sourceforge as a static page, but I don't want to write an informative project page as part of the project. So I decided to move the general "about" page to this blog. 


This project's goal is to accomplish a safe way to provide more realistic testing data to application development teams. To do this, this project assumes the use of comma separated values as an input of data into test applications. In order to use real data in a testing environment, the real data may be considered sensitive, and will need to be concealed in some way to protect the privacy of that data. This is where the CSV Data Mix application is used.


So how does the CSV Data Mix application accomplish its goal? Easy. Given the actual data that is required to be concealed prior to distribution for whatever reason (i.e testing), the application accepts the data as input, randomizes the data, and then outputs it aaccordingly (standard output or file). The final state of the new data will have each original column of data shuffled around. 

The process of doing this requires a bit more detail, and can be understood better by reading the source code, but this is the basic idea. This project is written in Python and has been published under the GNU General Public License and is open source software. Please reference the links below to learn more about Python and the GNU General Public License.

This approach to concealing data is not new. You may have heard of data masking or data transformation. Although these techniques differ in the way they present the new data, they share a similar goal to conceal sensitive data.


No comments:

Share on Twitter