V’s of Big Data #2

We have talked about the 4 V’s here – V’s of Big Data . This time we are looking at validity, variability, volatility, visualisation and volume. Let’s get into it!

As of now, we read about the collection of data, and the processing of data. When the collection and processing is done, the next step is to ensure that whatever we have collected is of good quality, if not our job is to ensure that it turns out to be of good! The three V’s that ensure integrity of data are – validity, variability and volatility.

Integrity of Data

Validity – When collecting data, one of the properties of big data is veracity. Which means that the data can be of different types within the same format. So, I can have structured data, which has dates in different formats. For us to process this, when we are processing this data, to maintain integrity, we ensure that data validity is maintained, which means that the data being used is valid, and not corrupt.

Variability – This feature of big data links back to velocity of data. So, we can have huge volumes of data, but the volumes and influx of this data could be variable. This usually happens in case of unstructured data. Do you remember, that time when Jennifer Lopez joined Instagram, and Instagram broke! Well, if not broke, in case of unstructured data, there is often high variability associated with it! There could be different peak times, or peak months, and these could vary!

Volatility We all know that there is huge amount of data generated for each organisation. but the point is that not all of this data could be relevant, some of it could be, some may not! For example, sentiment of individuals change every year, if not at least every two year! There is no point in storing a data before that for a topic that may be no one is even talking about! When we refer to volatility of data, we mean that the organisation decides how long their data is going to be relevant, how far in time do they want to go to analyse their data. A weather data could be low volatile, while data for a tweet could be high!

Visualisation of Data

What is data if I cannot visualise it, right? I don’t want to run a query every time just to see the top 3 countries for my product, or have to wait for an hour to get back on all the questions I have related to my data! To solve this, data visualisation tools have come into market, we have Power BI, Tableau, Cognos and others!

Visualisation – With this characteristic of data, we mean that all the complex numbers are going to be read from graphs instead of reports! And with visualisation, they are going to be build in a manner, that are understandable in a go!

Worth of Data

We have been saying this a lot, there is huge amount of data that is generated, and we need to store it and process it, but with this huge amount getting generated, it is important to know what is relevant, and for how long! Also, not every data is same!

ValueWhen we talk about the value of data, it means that we need to extract useful information from a high stack of data. In addition, the data that sounds so useful now, can be irrelevant, or slightly less relevant in the later years! We may still want to have it, but not pay for it! Understand it like this, you have 10 folders and out of them only 2 are necessary for the next 6 months. What do we do? We can zip the other 8, keep them somewhere and work on the 2. We save the data, we compress it, so it takes less space. This compression or archival is the most important strategy in big data projects.

For example, you are paying for Oracle, and you have Hadoop as well! Considering you are paying for the data in Oracle, you will store the everyday usage data in Oracle, but keep the ones that may not be needed in Hadoop! There is one more aspect to it! The importance of data! If there is some data that is very essential for me, and I can’t lose it, it has to be stored in a more protected manner! If there is a data, that does not impact my business, we can store it, in a place that is okay being stolen!

With this, we complete on the understanding of the V’s of Big Data! Big data is anything that has variety and veracity, that comes in huge volume and with high velocity, with the need to be validated, taken care of its variability and volatility! It is so huge that even if we process it, to understand it, we need to visualise it and more importantly store it with strategic care!

Foolishly Yours,

Avantika Tanubhrt



Leave a comment