Big Data Analytics in a Volume Obsessed World

A PWC analyst, a Gartner analyst, and a Deloitte analyst are all talking about their latest big data projects at lunch.

The PWC analyst says, “We’re intaking petabytes of real world data from a thousand different sources now. It’s going to take years to really grasp the full significance of the data!”

The Gartner analyst says, “We’ve gone one step further: We created virtual data sources that are even better than the real thing. Within two years we won’t even need real users to generate data!”

The Deloitte analyst says, “Heck, I think I’ve got you both beat. We just tell the client that the data we’ve generated is so sensitive, they aren’t cleared to see it.”

“And they still pay you?”

“Of course they do! Where else are they going to find data that good?!”

The Business Intelligence Tools of Tomorrow

The story plays itself out over every big data analytics project in existence: Massive data volume leading to a kind of data arms race. The goal being to model user behaviour, or make a future market prediction of some sort, acting as the oracles of Delphi in ancient times.

All jokes aside, nobody is saying that business intelligence tools as applied to big data projects is either futile or fruitless. The triumphs of properly run big data projects are a matter of record. Enterprise analytics is what drives digital transformation, and you simply can’t do that without looking at the broad picture. Scope becomes one of the biggest factors in such projects, and having the willpower to not allow that scope to creep is the difference between success and failure.

But as far as more ‘general’ big data projects, ones that look at more nebulous criteria like customer experience and market trends, what if we’re focusing on the wrong things? When did we lose our taste for relevance in the pursuit of mass ingestion?

It’s possible that volume is simply the trick that everyone has mastered at some level. It’s an easy sell. And our human instinct to horde just plays into the ‘bigger is better’ argument. Nevermind that the analysis time may very well exceed the data’s useful shelf life.

The truth is, data volume is meaningless without the right business intelligence tools to back it up. Creating a bigger, more impressive haystack doesn’t help the client when they’re still looking for the needle inside. Without intelligent pre-processing, source validation, and fraud detection, more data can often lead to unreliable conclusions.

Machine Learning and AI Big Data Analytics

More isn’t always better. In fact the constant battle against garbage data is what’s holding the industry back from some of the next great breakthroughs in big data analysis techniques.

At least we know where to start. There is a natural progression happening in the world of big data analytics, and although some firms are less eager to embrace it than others, the results that we’ve seen so far have been rather impressive.

The first business intelligence tool that every big data analytics project needs to consider is machine learning. This intelligence automation serves several vital purposes, and is most effective if it is in place on day one of a project. The first, and possibly most important, function of machine learning is to vet the quality and validity of a data source as it is being ingested. Without this pre-filtering, garbage data can infiltrate a project… which is particularly harmful if AI is also being used (see below). The second function of machine learning is to aid in early decision making via time series analysis. Knowing why a certain series of events seems to take place again and again is often more important than the repetition itself. Finally, the better pattern recognition that machine learning can provide can shed a different light on a big data analytics project. The human mind can often bias itself towards the outcome it wishes to see. Machine learning has no emotional investment, and will report the truth of the situation without bias.

The second, and possibly the most interesting business intelligence tool emerging for big data applications, is artificial intelligence (AI). This is a step beyond the typical suite of business intelligence tools being used on most projects, although marketers love to mislabel their use of machine learning as ‘AI’. True AI uses deep machine learning to form a neural network that is running from day one of a project. It is incredibly important to introduce only trusted data to the AI as it establishes its core precepts. This is the ultimate ‘garbage in, garbage out’ situation. But once your AI is up and running, you have a core advantage over traditional  business intelligence tools: You have a source of logic that is not necessarily based on the human brain. The alien nature of this artificial logic engine is your main advantage over traditional analysis. It will think non-linearly. It will draw conclusions that can change the course of your big data project.


In a world of buzzwords, where inaccuracies are rarely punished and everyone is allowed to push tired industry tropes, somebody needs to step up.

Big data analytics has been allowed to get away with the ‘bare minimum’ for nearly a decade now. Pushing at the walls of pure volume is neither impressive nor useful if the effective result is the same… and in some cases, even worse.

By embracing first machine learning and then AI in every new big data project, we as an industry can make significant strides forward. We can move beyond the traditional and into the realm of highly effective.

With the next great business intelligence tool for big data analysis laying just around the corner, that being quantum computing and the amazing non-linear data processing which it brings, big data projects are still playing catch up. Anyone who isn’t exploring machine learning and AI for their next project is a dinosaur, and will not be ready for the rapidly expanding future of this industry.

Posted in

Have we missed something?

Contact a VitrX IT and Computer Specialist today