You don't have Big Data...

…And you don’t need AI

At Market Dojo, we’ve seen around 25,000 auctions. We’ve seen 7000 RFQs, and received nearly 50,000 questionnaire answers, with around £10B total sourced. You’d think after eight years that all this activity would add up to some significant data sets. In some ways you’d be right, but by the standards of the age, it’s paltry.

There’s a lot of noise around ‘Big Data’ these days. It’s been important for a few years, but with the inevitable scandals around any technology, it’s coming to the attention of people who perhaps previously wouldn’t have been concerned – after all, if you’re not in the tech business, does it matter to you that someone worked out how to process exabytes of data in manageable timespans? There’s a rule of thumb that might come in handy: if your production dataset would fit on the hard drive of your desktop, you do not have Big Data. You don’t even have Medium Data. Truly Big Data is the province of financial institutions, governments, the Googles, Amazons and Facebooks of the world which rule vast empires and record every interaction their citizens have with them. YouTube for example, a small portion of Google’s data storage, receives around 270TB of video uploaded per day, each with its own set of interactions.

Despite this unimaginable scale being to some extent a prerequisite, as often happens with a new technology the buzz has permeated enterprise corporations at a level where technical decisions generally shouldn’t be made. This has resulted in another generation of ‘keeping up with the IBMs’ where entities with no need for something are seeking to adopt it purely for the cachet and buzzword compliance. Big Data is what the big companies are using, the insights and flexibility they gain from using it well are arguably the source of their global power, so in an all-too-human fashion the cargo cult gathers steam. “If we use Big Data, if we use blockchain, if we use AI…”, the thought goes, “…surely the benefits those massive companies are seeing will accrue to us as well!”

This is, perhaps surprisingly, a little backwards. It is not that having data lakes and applying complex, bleeding edge technologies to them grants power in the market will be certain to bring in clients. Quite the reverse. Having a good product, sold or given away to many clients and recording their every action eventually leads to a situation where any other approach to the collected data is simply inadequate. As fashionable as it might be to use the latest and greatest (or not so latest – some companies with only terabyte-scale datasets are still using Hadoop and Spark to run their reports for them) technologies, the vast majority of companies could have everything they need done for them across their entire corpus on their CTO’s laptop with a few hundred lines of Python and some SQL. Those who make best use of machine learning are those who have reached the point where the data they store and the questions they need to ask simply cannot be dealt with (or cannot be answered in a reasonable time) using conventional and established technology. They are those who have been digging holes with shovels, then backhoes, and have finally given up and built themselves a bucket-wheel excavator because there’s no other way to move that much earth at one time. You know when you need MapReduce and machine learning, because everything else has stopped working. Certainly, once you have reached that point you have unparalleled capacity for developing new strategies and creating new forms of automation based on the results of processing all that information, but success through scale must come before success through machine learning.

I don’t say this lightly. As a developer it is my temperament, not to mention job, to be interested in and want to work with the latest tools and most effective new technologies in order to get the most out of the resources available. However, one must be pragmatic. Data mining in the average or slightly above average company is in most cases best achieved by a human being equipped with a scripting language, SQL and patience. Even in the larger companies, data scientists spend 80% of their time extracting, cleaning and tidying up the data, long before they can begin applying their more interesting tools to its analysis. Having enough of the right sort of information is the baseline, the source from which all else flows. It is sometimes asked just how much data is necessary to train a good AI. The answer, of course, is “More. Lots more.”

A personal example, then. The data within the Market Dojo database, including every document and image ever uploaded by clients, still wouldn’t justify the use of AI, nor would it qualify as ‘big data’ within today’s standards. As much as it would delight me when asked for insights to spend my time training a neural net, tweaking its dropout rate and looking for unexpected gold in the sea of information, we can usually answer the question with a few queries against one of our replica databases and perhaps a chart or two. So that’s what we do because it’s fast, simple, and as long as it’s accurate and actionable, it doesn’t matter where the results came from. For those clients who want to get their hands dirty, we help out by doing the collection and denormalisation steps automatically and providing access to their data through our API; those who make use of this typically find that Excel or PowerBI, neither capable of handling truly significant amounts of information, are in fact entirely adequate to the task at hand.

Of course, finding actionable insights is not the only application of machine learning. What of responsiveness, of the thousands of applications promising ‘intelligent’ behaviour, which most people think of when they hear the term ‘AI’? Disappointingly for the technophile, that often doesn’t require true AI either. A suitable set of handwritten “if X then Y” statements can take care of the majority of use cases. That shouldn’t be a surprise; in a very real sense, a sequence of if/then statements is a simple definition of software. Moreover, in many cases the task itself is set in such a format. You don’t need AI to tell you that you ought to invite some people to join an event that starts tomorrow; you just need the system to check the number of participants and send an email if it’s zero. A hundred little things like that and the average person will start to think of a system as ‘smart’ even if it isn’t particularly complex.

Such advantages can be gained far more easily and cheaply in that fashion than by training neural networks and keeping models up to date. This also has the distinct advantage that if anyone happens to ask how a decision was made and what factors affected it, it is possible to give an answer. Sufficiently powerful AI, even today, is essentially opaque; provided with enough data it produces answers that happen to be true and useful, but with no recourse or explanation available even to the designers, a fact which legislators have so far failed to comprehend.

One might be tempted to take away from this that I’m somehow deluded enough to believe AI is useless, or that all its applications could just as easily be achieved in a less complex fashion. Nothing could be further from the truth. If you have dozens of terabytes of data which you need to mine for new insights and directions for your company, or you have unstructured data such as documents which you need to investigate and classify, or the X in your ‘if X then Y’ functions is actually half a million variables, then you do need AI. If you have 100,000 photographs and you want to know which of them contain birds, you do need AI (and an extra 100,000 photographs all of which are known to contain birds).

If you just have a few hundred gigabytes to process and you’re mostly after simple insights and automation then you definitely don’t have Big Data, you probably don’t need AI, and you should look very hard at who profits when someone suggests that you do.

If you’d like to find out more about how Market Dojo can assist your business in processing its eSourcing data and creating reports through Power BI, contact your account manager today or email [email protected]

Market Dojo helps procurement professionals negotiate better with our on-demand eSourcing tools. If you’d like to find out more, get in touch or register for free and play around with our Sandpit software for yourself!

You don’t have Big Data…

…And you don’t need AI

BOOK A DEMO

Contact us

Contact Us