The concept of data storage is as old as the world: the folklore and the storytelling in ancient societies were in a way a kind of a ‘data storage’. The stories were rolling down in time from generation to generation.
The difference between the folklore and the contemporary data mart is, however, the fact that in the folklore the data was selectively stored, depending on relevance and interests; in the modern data stores, we have laws which prohibit us from deleting any data, no matter how uninteresting it might be.
I remember this funny picture on the first pages of a data warehouse manual: there were 2 people standing next to a huge furry ball (representing the data) and one of them says “Let’s shake this thing and see what we come up with…”
So, here is how your data needs to be approached:
Edit: the graph above is called ‘The Steinar’s graph’, since it will be refered to in later concepts.
This classification could be a starting point for your decision making when it comes to: data partitioning, storage design, storage pricing, server farm design, SQL Server edition choice, preventing data waterfalls and data avalanches.
So, unless you want to be stuck in a data avalanche with little or no chances for escape – treat your data as you would like to be treated.
Be nice to your data.
Edit: recently I read this great article and after writing a few comments on the partitioning subject I realized, that there is a 5th type of data: the one that we use but we don’t need. The non-clustered indexes.
Since the non-clustered indexes are just collections of pointers and can be built at any time, then we most likely do not need them in the backups, even though they are great to have in the production database. (I am quite disappointed that SQL Server does not cover this yet)