The AI Build Vs. Buy Knowledge Gap

To build or to buy; that is the question
Enterprises often struggle when deciding whether they should build a complex algorithm or purchase one that already exists, as there isn’t much of a framework to help guide them through that decision.
The reality is that modern data science is a complex and emerging field, and this lack of understanding often results in a default decision to build, even when solutions already exist. Many acknowledge that it would be unwise to spend millions of dollars inventing a word processing application or building a new Internet search engine, for example, yet few take the same approach to algorithms. Instead they continue to spend significant time and resources pursuing capabilities and solutions that are available on the open market today for a fraction of the cost.
Next time consider these three factors to determine whether it makes more sense to build or to buy.
Data Source
The first and most important consideration is where the data is coming from, specifically whether that source is public or private. In cases where the source material is publicly available, the odds that someone has already built the solution you want to create is much greater.
For example, there are countless images on the Internet that contain text, and with that publicly available resource there are countless optical character recognition (OCR) algorithms that can detect and extract text from images. As a result, organizations don’t need to invent their own solution for turning a picture into a word document, or extracting written information from a scanned document. The technology already exists because the source material is freely available.
If they’re seeking to build products, services or capabilities based on internal data sources such as customer data, however, it’s unlikely that there will be any relevant, publicly available algorithms to utilize, as the source material is private.
There’s also some data that falls in between, which is privately collected and sold by a third party organization. Typically these organizations charge significant sums for access to that data, but the fact that it’s for sale means the solution you’re trying to build with it might already exist. Instead of paying the fee and dedicating internal data scientist resources to building a solution, it would be wise to first investigate whether someone that has paid for access to the data can sell you the solution directly.
People
According to data from ZipRecruiter the average salary of a data scientist in the United States (as of December of 2019) is over $118,000, with some earning upwards of $175,000 annually, and that rate is only likely to increase along with growing demand. Their time is expensive, yet organizations aren’t always as diligent as they could be in maximizing their contributions.
For example, it typically takes between three and six months to build a working algorithm using a team of two or three data scientists. That means that the cost of a single algorithm starts at nearly $60,000, but can go as high as $260,000 in manpower alone. Despite the staggering cost, organizations still opt to build algorithms in house, even when they could purchase one for a fraction of the price.
Some may argue that the internally built algorithm works better for their unique purposes than what’s already out there, and that may be true, but its important to consider the true cost of that customization. If a custom built algorithm can provide significantly greater value, than it is likely worth the effort. If, however, the difference is minor, it’s likely not worth the added expense, and not only because of the upfront costs.
Data scientists are very valuable and very expensive, so it’s important to choose their tasks carefully. Tasking them with reinventing something that already exists takes those resources away from other data-related problems for significant periods of time, so it’s important to include those opportunity costs in your decision as well.
Strategy
The third and final consideration is the strategic business implications of the solution, algorithm or machine-learning model. Before determining whether to build or to buy organizations need to ask themselves whether the end result will provide a truly proprietary solution, and how important that solution will be to the organization overall.
If the algorithm or solution is capable of leading to original intellectual property, it’s probably best to build it internally; If it’s not providing a truly unique, strategic business advantage, it should probably be purchased.
Most executives today understand the importance of having an AI, big data and deep learning strategy, but the way in which they incorporate these new tech tools into their existing business operations is often not all that unique. For example, an organization can utilize natural language processing to improve customer service conversations without hiring a team of data scientists to build that capability, because it already exists.
It’s the same way they wouldn’t have to build a word processor from scratch, or invent a new Internet search engine just for internal use. Before deciding whether to build or buy consider whether the algorithm is based on publicly available data, whether you’re building something truly proprietary, whether it will provide enough value to justify dedicating the necessary resources, and whether you can get the same capabilities from an off-the-shelf solution. If your answer to any of the above is “no,” consider building the solution in-house; if not, you can save significant time and resources by acquiring a solution that’s already available today.