Sunday, March 30, 2008

My Google Reader Feeds

If you are interested, the following is what I am reading via Google Reader.

Wednesday, March 26, 2008

Why are you in data?

I had a conversation the other day with someone explaining what I do at work. After my explanation, I was asked, "why are you in data?".

This question was asked because my focus at work is very data oriented. I am on a team that worries about data quality, data governance, data reuse, data modeling, metadata, data visualization, the meaning of information, how people think about information, how information is stored, the lineage of information, redundant data, etc, etc, etc. This is a lot by the way. We really care about information, where it's at, what it means, how it's being used in comparison to its meaning. All in an effort to make better use of it for better decision making.

So given this line of topics, why would someone ask me why I am in data? Because I explained solving these problems with data from a people oriented perspective. I feel a lot of problems around information usage stem from people problems. For example, people are not provided the right information and people misinterpret what is provided to them because it isn't easy to understand.

This isn't the user's fault. A data problem is never the users fault. Improper information understanding is rooted from bad delivery of that information to the user. Whether through conversation, or application screens, information needs to be demonstrated in a way that makes sense to all individuals so they use it properly. Misuse results in a lot of our interests above, as a team. The only reason we are interested in those areas, is because people make bad decisions based on information content found in those areas. Now we want to examine that problem and make it better.

To make it better, the path of writing new software, buying new software, re-architecture of data and software is an obvious need... sometimes. We need to do a better job of understanding individuals and how they think about information. Once accommodated in that fashion, a developer has the opportunity to realize their own presumptions of the problem may be incorrect.

How are people misunderstanding a given set of information? The only way to answer this question is through interviews and examination. The observations extracted from that exercise will hopefully be a step in truly solving information problems starting with people first.

Monday, March 24, 2008 tags

Just a junk post, these are my tags...

What "social" did, or can do, for "computer science"

I was in class today, and we were talking about determination of evolutionary trees in Phylogenetics. While I was suppose to be paying attention, I realized what "social" has done, or can do, for "computer science". In terms of data structures (trees and graphs) to data manipulation (search and sorting), the social graph on the internet has created a tremendous amount of test data for the application of computer science concepts in the classroom. Think about how much fun sorting would have been if it was via your friends on facebook or myspace. I think if computer science professors would somehow work this manipulation of the social concepts that are big today, students might take a bigger interest in class.

See also:

Monday, March 10, 2008

Digital Organisms

I was sitting in class the other day and had a thought.

Before I detail this thought, the class I am taking is a gradate course at Cleveland State University titled "Bio-informatics", or "Computational Molecular Biology". To be brief, the purpose of the course is to apply computer science to solve many of the data processing problems when analyzing biological information. The objective is to determine the function and purpose of biological material to understand the evolutionary history of that material. For example, gene analysis. Genes are studied to identify a function in an organism. Modification and manipulation of a gene over time, or in a lab, yields outcomes to let us understand the function of that gene. This is a very simple example, but the point is made. If you want to learn more, please read up on the topic, a starting point might be some links below.

Now for my thought... can't digital information be studied the same way?

Think about it... Lets use the comparison of DNA. DNA is made up of four elements, or nucleotides, making up the alphabet to essentially make sentences defining the governing rules of an organism. Within an organism there are large strands of DNA that detail the entire genetic structure of that organism called chromosomes. We study the DNA sequence of the chromosomes to determine what genes do what (function). For example, eye color, hair color, height, facial expressions, all defined.

Now, lets compare that DNA to a data source. These sources can be anything, a file, a database table, a document, a photo, etc. All digital information is made up of the same fundamental alphabet. All digital information is created in a sense, has a life cycle, and is destroyed. The creating and destruction can either be absolute, or continues from another cycle. Based on the assumption that we know nothing about the digital information, how can we come to understand is function, purpose, and why it really ever existed?

We can do so the same way we come to understand genetic material and function. Lets use an example of data transformation. Given 3 data sources, how can we determine what was the combination of the datasets to form another? To be direct, assume two data sources as an input, and the outcome is a single data source (similar to human reproduction, a male, and female combined to form a child). The logic between the two sources and the target can be compared to as the combination process from the sources to determine the target.

We now have a map. If we can determine the mapping between the source, we understand how they combined to form a target, a part of its evolutionary history. This sheds light on what the functions and pieces are of the sources, given that they were broken down and reused in the target. The sources will always share a similarity to the target since that target was the result of the combination. We've also done so without any existing examination of transformation logic, or meaning of the originating data.

And so begins the accumulation of sequence information about digital material similar to biological. Each process of examination builds towards the whole. Eventually leading to a complete sequence of digital information, or a classified organism of digital information, such as a business unit, an identity, or any digital organism.

See Also:

The Objective

Hello, and welcome. If you are reading this post, it is likely you want to understand the purpose and the reason for this site to exist amongst so many others. Simply put, I have something to say just like everyone else. The difference is, I hope to educate and open some different lines of thought that may not have been covered, or at least may not be so main stream. If what I speak to is common, I simply haven't done my homework. Or I am re-enforcing a great idea, which always makes the originator feel good.

What will I discuss? Mainly software, technology, and science oriented... stuff. These are my interest areas, my profession and my hobby. I'll do my best to be to the point and direct. If something is already detailed in a way I like, I will link to it, if not, I will re-word it and link too it.

I hope you enjoy what is to come. If not, please read anyway, it is likely that your negativity is backed by misunderstanding or preconceived knowledge that is not complete, and you just don't know about it.

Share on Twitter