Monday, September 6, 2010

The Model Is In The Data

I've seen a lot of table definitions. I've designed custom definitions. I've come to understand out of the box definitions. I've written a metadata storage tool. I've worked to understand an enterprise and model it. I've written SQL. I've written ETL processes. I know how to go parallel with data.

The more and more I work with data, I am continuously reminded that the model is with the data itself verses the model defined for the storage. Why do I know this? Because the model always changes. The model always evolves, it grows. New ideas are spawned. New applications are applied. These "new" aspects are then integrated in to the base model.

What is the best way to reorganize a model? Refactor it? Migrate it?

I think it should be left alone, if designed correctly. Why can't two model's exist? Three? Four? The model describes the same data, or the values that fulfill the many models.

Is this chaotic? I think it's real.

I am continuously reassured that the purest model for dynamic models is the map (keys and values). It accounts for mistakes, mis-design and allows the errors to remain and the corrections to move forward in the same storage.

Confusing having the blend of models? Guess what, the blend is the new model, not what you designed on paper. The model is pulled from reality of the data evolution and application. If you want to learn from the data, don't force it into a direction of correction and clean up.

Where is the relationships, dependencies, the cardinality? It's all in the key design. The values can be keys that re-link to new values to give depth and relationships. Computationally hard? You'll likely using the wrong tools and approach. Different fetching for different models in the same data? Sounds right.

I guess my point is reconsider the map over a complex relational model next time you are building your model in your next application. You likely need a set of attributes and not an exhaustive normalized model that you hope will not change because production refactoring of the model is a bitch. You'll also avoid the "just in case" aspect of your model like, "field1" on your entity design. You won't need it in the map since your model can change on the fly as you need.

See Also:

No comments:

Share on Twitter