Custom chatbots and model gardens: putting Private AI to work

Part 2: More to think about

It’s essential to decide where to run your Private AI – or who you will get to run it for you, in some cases – but that is just the start. Assuming that you also want custom chatbots and AI assistants – and that is the primary reason for having your own AI, trained on your business data – there is even more to think about.

The choice of model – by which I mean the large language models (LLMs) that underpin GenAI – will be a key one for many organisations and their AI architects. Notable LLMs include OpenAI’s GPT versions (as in ChatGPT and Bing Chat), Google’s PaLM (used in Bard), Meta’s LLaMa family, and Anthropic’s Claude 2, but there are many, many more.

Getting into the model business

Each model is typically better suited to some tasks than others, some are proprietary but many are open source, and there are models tailored to specific industries, professions or tasks, such as medicine, finance or cybersecurity.

This is why we see a growing number of vendors offering model choice, such as Google Cloud with its ‘model garden’, and Box which, as well as planning to offer a curated set of models, will also allow users to BYOM or ‘bring your own model’. In fact, any vendor or service provider worth its salt will have some options here, whether it’s a short menu of LLMs or a link-up with someone like Hugging Face, which describes itself as an AI/ML community and collaboration platform, and has over 300,000 models available.

Cost vs capability: the size trade-off

One other thing worth paying attention to here is model size. While it’s LLMs that have gotten most of the attention, these are costly and time-consuming to train. For instance, most of those general-purpose LLMs mentioned above have hundreds of billions of parameters, and training GenAI on that scale requires many thousands of GPU hours and costs millions of dollars.

In addition, while inferencing, which is when you apply the resulting trained model to data, needs significantly less compute power than training does, it still isn’t free. That’s especially true if you want your AI to carry on training itself on new data as it goes along, a process called reinforcement learning.

So there’s quite a bit of research going into smaller ‘nimble’ models, with just a few billion parameters (under 15bn is a useful rule of thumb for ‘nimble’, while there’s also a middling group with maybe 50-70bn). Evidence thus far suggests that models such as IBM’s 13bn parameter Granite, or the 1.3bn parameter open-source Phi 1.5 from Microsoft Research, can be just as effective as an LLM on specific tasks or in constrained settings, and a whole lot cheaper to run.

If you’ve not started planning for your organisation’s Private AI future, hopefully this blog and part 1 have given enough of an outline to do so. Whether you prefer to use private cloud, the ‘GPT in a box’ concept favoured by the likes of Dell and VMware, or the SaaS approach offered by the public cloud and collaboration specialists, the concepts behind Private AI – at the very least – need to become the default for enterprise use, for all the usual reasons of security, governance, privacy, etc.

Bryan Betts

Bryan Betts is sadly no longer with us. He worked as an analyst at Freeform Dynamics between July 2016 and February 2024, when he tragically passed away following an unexpected illness. We are proud to continue to host Bryan’s work as a tribute to his great contribution to the IT industry.