Working in the world of Big Data can be exhausting. As soon as you get your head around one complex technology, another three spring up behind you to replace it. While you were looking the other way, head buried in code, the state-of-the-art goalposts moved a little bit further into the future.
It’s easy to resent the next generation of tech when it renders all the strife you’ve been through to master the current generation worthless. Until of course you realise that this next wave solves so many of the problems you’ve been wrestling with and, by embracing change, your life becomes much easier and more opportunities open up. I have lost track of the hours I have spent walking round and round the block, trying to grapple with the mechanics of early Hadoop MapReduce code, building up a library of routines and patterns that could be called upon to solve most problems. Then along came Apache Spark and suddenly it all became so much easier. Damn you in-memory resilient-distributed-datasets making my life better and my code run faster!
I’m joking of course. Well, mostly. Progress in Big Data technology has been staggering. A few years ago the challenge with Big Data Projects was often defined by technical practicalities, like how do we store this amount of data and how do we even being to process it? We don’t have to search too far back in time to remember how technically challenging, and thus costly, this was. However, as we reach the end of 2016, the toolsets now available as well a significant focus by cloud platforms, in particular Amazon Web Services and Microsoft Azure to automate much the heavy lifting, mean that there is now a smorgasbord of cost effective architectures that enable companies of all sizes to work with Big Data. Demonstrating this shift, a recent Gartner report stated that, “In 4 years 90% of all data will be on Next generation technology”. From start-ups to multinationals, if you want to work with Big Data, you can.
Many companies have now been through the pain (and cost) of experimenting with Big Data projects. Through these platform iterations we’ve reached a technological level where we can now deploy reasonably cost effective solutions of staggering scale. But so what? We’ve collected this data, we’ve deployed scalable infrastructure, but what’s the benefit?
It’s a fair question. Why do this stuff? Perhaps due to the historic technical difficulties, wide scope and far-reaching possibilities, Big Data projects have tended to be Engineer or Data Scientist led, rather than by Management. Even the famous three Vs of Big Data (Velocity, Volume, Variety) describes the nature of the data, rather than what we are trying to do with it. The three Vs are great, but so what? It’s not exactly something you put in a board level presentation and leave your audience with a sense of direction. Over the years the three Vs have evolved … to the four Vs. We’ve added Veracity! I’ve even seen the slightly tongue in cheek Ten V’s of Big Data as well…
“Vast, Volumes of Vigorously, Vexingly,
Variable, Verbose, yet Valuable, Visualised
high Velocity data, from Silicon Valley”
This is not helping. The only sane response from anybody trying to run a business is “OK, we have lots of data. So what?”
We Big Data experts really need to start thinking of a better definition. Something that others can engage with as well as guide us as to what we are trying to achieve. We need to stop the ‘so what’ question from coming back to bite us on our behinds.
Big Data is the attempt to gain competitive
advantage by exploiting new ways of storing
and processing data.
“Ah ha!” our audience now says. “I’d quite like to gain a competitive advantage. That sounds advantageous!”
“It is” we reply. “It can be most advantageous indeed”.
“But how, and what sort of advantage would I be looking to gain?”
“What a good looking question”. We respond.
And indeed it is a good question. We can now start discussing the hopes, dreams and worries of the organisation. How can data solutions be put to use to solve each of them in a targeted, costed and most importantly, effective, way. Invariably in business, most hopes, dreams and worries come down to creating more revenue, improving op- erational efficiency, reducing risk or driving business change (in order to create more revenue, improve operational efficiency and reduce risk), but the key is that by starting from a new, better definition all data projects, big or small, will now be driven by a focused need. So what? Well there will be no more “So whats” for a start.
Gartner (again) predicts that by 2020, 75% of large and midsize organisations will compete using advanced analyt- ics and proprietary algorithms. Deploying this type of tech throughout a whole organisation is no small feat; the challenge being both technical and business transformational. Tackling both these aspect of data projects is in- creasingly moving into the domain of Big Data projects. In order for data to drive the enterprise the trend over the next few years has to be the engagement with a more holistic approach, perhaps one that starts with a better defi- nition of Big Data.
What do you think? Should we abandon the Three, Four or even Ten V’s of Big data? Can you come up with something better?