The four Vs of Big data

Volume, variety, velocity and value are the four key drivers of the Big data revolution.

The exponential rise in data volumes is putting an increasing strain on the conventional data storage infrastructures in place in major companies and organisations.
According to Fortune magazine, up to 2003, the human race had generated just 5 Exabytes (5 billion Gigabytes) of digital data. That amount of data was produced in just two days in 2011, and by 2013 we were generating more than that every 10 minutes. With the total volume of data stored on the world's computers doubling every 12 to 18 months, we truly live in the age of Big data.

One of defining characteristics of Big data is the variety of sources and content involved, which opens up a whole range of new opportunities to create value from this torrent of bits and bytes. Some sources are internal to the enterprise, like the list of customer purchases generated by a transaction processing system. Other sources, like tweets, geolocation data and public records, are external. And the data comes in different formats, some of it structured like conventional database entries, some of it semi-structured like images with metadata, and the rest completely unstructured like text, graphics, raw imagery (e.g. satellite imagery), audio files or streaming video.

To get the most out of Big data, all this content must be processed dynamically to generate immediate results. In other words, velocity is of the essence. With modern advances in analytical algorithms (Big analytics) and data transmission infrastructures, it is now becoming possible to feed data into business processes on the fly. Certain kinds of data are only useful if the content can be analysed as soon as the data has been generated. Online fraud, for example, needs to be detected straight away, and streaming video from traffic monitoring cameras needs to be analysed constantly to determine road traffic patterns in real time.

Big data also changes the value of data, both in a monetary sense and in terms of its usefulness. Data quality in a given situation — in other words the integrity and veracity of the information — depends on two factors. First, the data may be incomplete or incorrect, or structured in a way that makes it hard to analyse, in which case the credibility of the source and the quality of the content need to be verified. Second, organisations clearly store vast quantities of data, but what's much less clear are the types of data that are worth analysing. Preliminary investigations may be needed to root out the weak signals from the noise and clutter, and identify the types of data with the potential to become "business drivers". Defining the objectives as early as possible is the best way to avoid expending resources on analysing data with little operational value.

Data can also be monetised. Data that has been properly pre-processed, anonymised and standardised, or data that is hard to collect or scarce and therefore has high intrinsic value, will be much sought after and can therefore be monetised. In these cases, the data has value even before it has been analysed.