Data waterfalls and data avalanches

The concept was introduced some time back and one of its enablers has always been the price of storage.

Even in the 80’s, the data waterfalls and data avalanches were quite predictable, however, the constantly improving storage technology shifted the focus away from the problem. Not much has changed until today, either.

data waterfall – a case when data is flowing into a system on a regular basis and only a small part of the data is used to power reporting and business decision making. The data is not partitioned and is treated as a whole, regardless of the constant data grow. The general rule is: “let’s store the data somewhere, later we might need it and we’ll see what we will do with it”.

data avalanche is a data waterfall gone ‘out of hand’. For example, a system created in the early 2000’s in which no data has been ever ‘massaged’ . This way, 10 years later, we have a VLDB with number of tables, containing billions of rows.

The solution:

The best solution is to take preventive measures in a timely manner. At the end of each period (month, quarter, year – defined by the amount of incoming data) there is a scheduled maintenance, during which the data is distributed according to the Steinar’s graph.

Of course, the methodology used for the solution can vary, depending on the SQL Server version and edition in use.

The complete opposite:

Overabundance of data segregation.

As in most situations, a good balance is key.


1 comment to Data waterfalls and data avalanches