29 Points to Consider for Private Enterprise LLMs

About

Artificial Intelligence has dominated the market this last year, with grand promises of revolutionizing business, science, and societal interactions. Many of these lofty goals for AI are holding true in terms of streamlining operations, leveraging workforce labor, improved content generation, and faster software coding. While still in its early stages, the AI market continues to evolve in terms of deployment methods, tool-sets, and organizational policies. The most visible application of AI has been the emergence of the Large Language Model, or LLM. This document outlines 29 points to consider when deciding between an LLM model-as-a-service (also known as a “Public LLM”) from a vendor or to deploy a private enterprise LLM.

Successful private LLMs will have certain qualities:

Model Superiority

Underlying AI needs to be better, faster, or cheaper than existing general-purpose models for the service provided.

Niche/Addressable Market

If you want to build an AI that can do everything, you’ll never be able to compete with the tech giants. You should solve a widespread yet specific problem.

Sustainable Source of Data

This is key for continuously improving the underlying model and staying ahead of the competition.

Path to Independence

Being under the thumb of Model as a Service is not ideal in the long run. If AI is the core of your business, having full control over the data/model should be a major goal long-term.

Table of Contents

Impact

Security

Performance

Cost

Flexibility

Sustainability

Ideal Life-cycle of an AI Player

Impact

Where is the maximum positive impact for GenAI and an LLM in the enterprise? The answer is likely different for every organization, depending on product mix, shape of the labor force, financial position, and market segment. When assessing the potential impact, be mindful of the following:

An LLM trained specifically on the enterprise’s own use cases is better.

All enterprises have large proprietary data sets around their unique products, specific services, or a known loyal customer base. Major branded companies have unique viewpoints on the world: the word “sustainability” may mean very different things to Coca-Cola vs. Exxon vs. Google. A Public LLM would struggle to discern the unique characteristics of a given term within an organization.

Superfluous information lessens accuracy.

Public LLMs deal in billions of tokens and parameters. This allows the LLM to address many diverse topics, but more than not, most of these topics will have nothing to do with the business: A real estate provider doesn’t need to know about airplane engines; a pharmaceutical company doesn’t need to know about GenZ fashion.

LLMs can become Intellectual Property.

A private enterprise LLM that is trained from proprietary data will itself become a proprietary asset of the organization or specific department. This isn’t possible with the model-as-a-service providers: even if a company owns the training data, the model/AI engine that drives the LLM is the intellectual property of the provider, not the client enterprise.

Security

Security is a paramount concern for any successful enterprise. GenAI and LLMs consume enormous amounts of data from product information, camera vision, shared customer profiles, and ERP tracking information across the entire supply chain. The need to secure this data should be a primary concern when that data is being fed to an LLM for training purposes. The security requirement is a central driver for considering a private LLM instead of a public Model-as-a-Service LLM.

Source data must be secure.

The importance of securing customer, order, and financial data has always been apparent, but the temptation of submitting this data to an LLM for some analytical intention has led some companies to compromise their data. Samsung in North America suffered one such breach when managers submitted internal proprietary data to ChatGPT in April 20231.

Data needs guardrails.

LLM data can be sourced from internal data, but equally important is how that data is trained into the model. Almost all LLMs are built from a common basic data set to understand grammar and common facts. Still, beyond those basics, where company-specific data is trained, guardrails should be put in place to protect against unwanted leaks of secret financial data or employee information.

External training data sets are discouraged.

With the rise of GenAI and the promise of LLM’s effect on operational efficiencies, it’s tempting to purchase or copy external data sets for training. Such data sets are problematic: the data may be inaccurate, may have custodial ownership issues, or may be declared illegal in certain jurisdictional privacy laws.

Model Security is equally important.

LLMs and GenAI all run on a specific mathematical model. Some of these models are proprietary (such as GPT-4 and Bard), while other models are open source (such as Llama and Mistral). Just as with almost all applications, the ongoing security debate over proprietary vs open-source software also applies here. However, in this case, all LLMs and GenAI models do carry a certain degree of ‘black box’ in how responses are generated. As such, a proprietary model will be much less available for scrutiny, audits, or traceability. Open-source models are proving to be faster and more likely to be secure, due to the number of non-corporate developers in universities, government, and research institutions who are all pushing the capabilities forward.

Adopt a platform, not just a model.

The OpenAI drama shows that relying on any single AI model/engine is risky. Enterprises should approach AI as a platform that combines data, models/engines, and integration points. The organization should have the ability to switch easily between multiple large language models, including ChatGPT, Bison, LLama2, Mistral, and others.

Beware of corporate machinations.

As we’ve recently seen, proprietary model providers are subject to M&A turmoil and shifts in strategic direction. This poses a long-term risk to model stability and continuity. The recent turmoil within OpenAI leadership illustrates the point that companies are still led by fallible humans and subject to politics. OpenAI will need to work very hard to restore confidence with their clients. Technology strength is one thing, but stable relationships are key to contractual risk management and planning.

Hosting/IT/Network Security applies to LLMs.

The most secure deployment for an LLM is in a secure data center under the direct control of the enterprise. Next would be private clouds serviced by a reputable provider. Large models require a substantial amount of storage and computing power at deployment2. This is why the model-as-a-service LLM seems to make sense: it’s in the same core cloud as the model provider. However, this shared cloud model carries large exposures to overall enterprise security.

Performance

LLM performance is evolving daily. Benchmarks that seemed distant goals manifest into reality almost weekly. However, not all models are built equally. Some LLM engines are very performant, while others are sufficiently performant with smaller footprints, which allows for edge-computing deployments and CPU-based workloads.

LLMs can be measured.

Several leader-boards track LLMs that can be measured by tokens and commonly understood benchmarks of accuracy for language, knowledge, recognition, and data synthesis.

Latency is crucial.

Latency, the response time between when a question or prompt is submitted and when the LLM responds, is becoming a crucial measurement of performance3. Requesting simple text in ChatGPT doesn’t require very fast response times, but speech recognition in a drive-thru, or image recognition in a busy warehouse, or people-counting in a crowded airport requires low latency.

Model switching in a private LLM may improve performance vs retraining on a public LLM.

Some models are better than others at specific tasks. Smaller models such as T5 Flan (400M parameters) are great at simple text summarization tasks, and with the right tuning, can achieve conversational capabilities4. Larger models such as ChatGPT and Google Bison are capable of following complex instructions.

Both public and private LLMs benefit from increased context window size.

Context windows determine how much memory the LLM retains throughout the chat history. Knowledge of previous prompts and responses allows for deeper and more continuous conversations, and the context window serves as a limiting factor for conversation length.

Hallucination control is easier in a private LLM (vs methods for a public model).

Both public and private LLMs can achieve a certain level of hallucination control by controlling the system prompts to apply custom instructions to user prompts. This helps by automating custom output instructions to overly general or out-of-scope prompts. Private model hallucinations can also be controlled with additional methods not readily available to public LLMs:

a. Distilling
b. Rag
c. Quantization
d. Intent Analysis - AI can be employed to detect the intent behind a user’s prompt and respond or react accordingly. This is how ChatGPT prevents prompts relating to violence or illicit activities from being answered. A private model can have custom business rules or logic.

Public LLMs are limited in hallucination control.

Fine-tuning is the only out-of-the-box method to reduce hallucinations for public models. If a public model lacks the knowledge to answer an input, simply fine-tune the model with additional data.

Cost

Costs for tapping into an LLM as a service are dropping rapidly, but will never drop below a certain level due to the enormous infrastructure required to maintain the parameter base, token flow, and low-latency responses.

Token Cost is a factor.

White rates for public LLM model-as-a-service seem low at first (just pennies per query), but the amount of tokens adds up quickly. Here is a table of costs as of November 2023 (likely to shift monthly):

Model	Content Length	Input Cost(per 1M tokens)	Output Cost(per 1M tokens)	Total Cost(for 6,000K articles with 1K tokens/article)
OpenAI GPT-4(8K)	8K	$30	$60	$360,000
OpenAI GPT-4 (32K)	32K	$60	$120	$720,000
GPT-4 Turbo (8K)	8K	$30	$60	Not specified
GPT-4 Turbo (128K)	128k	$10	$30	Not specified
Anthropic Claude V1	N/A	$11	$32	$162,000
InstructGPT - DaVinci	N/A	$20	$20	$180,000
Curie	N/A	$2	$2	$180,000
Self-Hosted 7B Model	N/A	Machine: $10/hr	Machine: $10/hr	$360,000
GPT-3.5 (Free Model)	N/A	Free	Free	Free

LLMs that can run on a CPU are vastly less expensive.

Almost all LLMs require a GPU-architectured server farm to train data for the model. However, once trained, some LLMs can run on a CPU server without the need for a GPU. One recent project5 shows the difference of $3560 for a CPU architecture vs $12,260 for a GPU architecture:

Description	CPU Path	GPU Path
Base Board	$885	$885
CPU	$2200 (Saphire Rapids W72495x/ 24 cores/34 threads)	$2200
Memory	$475	$475
GPU	N/A	$7300 (with NVIDIA discount) $8700 (without discount)
Total	$3560	$12,260

Flexibility

In any rapidly evolving market, flexibility is key. A given model may be very performant now, but it may be outstripped by another emerging model next month. An enterprise must have the ability to switch models easily to stay at the edge of progress.

Vendor lock-in is prevalent with model-as-a-service.

The leading providers are all closely related to the major conglomerates, with cloud service tie-ins. The cloud services become a walled garden for AI-engine dependency as well as a shared source for ongoing training data, which can become a dependency/addiction for keeping the model current.

Vendor stability might prefer the mid-market.

Enterprise machinations among giants might steamroll any given LLM model-as-a-service out of favor. On the other end of the spectrum, many startups are embracing open-source models and servicing specific niche markets. However, these startups are likely still too small to have the depth of talent to address complex hallucinations and ongoing model evolutions. The Goldilocks zone may favor mid-level providers who have enough talent to conquer complex LLM deployments while simultaneously possessing the flexibility and speed of a startup.

Open Source models will be more flexible.

By design, open-source models incorporate new mathematical constructs faster and adapt to new data sets. The diversity among developer pools breeds stronger models that can be applied to more use cases. Major open-source models such as LLama2 will likely soon be superceded by LLama3 and others, reinforcing the speed and flexibility of the overall open-source market approach.

Model output flexibility should be considered.

How much output flexibility is really needed? Different model architectures and separate instances should be applied for different tasks to optimize accuracy. Output token length can increase the depth and length of generated answers, but will eat up the context window and cost more.

Fine-tuning should be monitored just like any other application.

One use case can damage the performance of a different use case. Careful prompt engineering and prompt tuning are potential workarounds to possible cross-answers. Plugins can enhance model capabilities (code interpretation, web browsing, calculators).

AGI is coming, but that may not matter as much as the press.

Companies need LLMs and GenAI to solve specific problems. As those are problems are simplified and automated, new problems will manifest to be conquered. But this doesn’t necessarily need an overarching Artificial General Intelligence, it means that companies will need AI to solve those problems as efficiently and effectively as possible.

Sustainability

While the term ‘Sustainability’ normally applies to physical environmental concerns, the term also applies to LLMs in some unique ways regarding data, legality, and keeping the model from deteriorating.

Data Source sustainability is essential.

Private LLMs will be superior in applicability to the enterprise, but to stay current with constant fine-tuning, the model needs to have a constant source of new data. This can be ongoing customer information, sales data, or external information such as how weather might affect logistics.

Synthetic data helps train models but is not a sustainable solution.

An LLM built solely on synthetic data will ultimately collapse. Fake data will produce fake results. When repeated and at volume, this becomes a feedback loop where parameters degenerate to noise.

Legality risks will continue.

As seen with the writer’s strike and actor’s strike, worker groups can become very hostile to the threat of being ‘replaced’ by AI. Governments are also facing a plethora of new potential regulations and policy decisions. Public LLMs, being trained on data sources from tens of thousands of disparate sources, will be much more susceptible to copyright infringement from undisclosed sources.

Internal human training is key.

Developers can be shown how to incorporate LLMs to generate code and co-pilot for better quality. Business analysts and line employees should be trained on how to structure prompts for maximum benefit and accuracy.

Support capacity for a private LLM involves several factors.

Does the business have headcount/expertise for model development? Can it adapt existing open-source LLMs? Does the business have data scientists for data sourcing and model training? Does IT understand the risks of public vs private LLM deployments? What existing cloud security is in place? Has the business hit a scale where added staffing/infrastructure costs and added tech debt outweigh SaaS costs?

Ideal Lifecycle of an AI Player

An organization looking to fully internalize a private LLM into their core business could be seen to proceed through 4 phases.

Phase 1: Initial Setup

Application is hooked up to an off-the-shelf base model and features are built out Cloud services and pre-trained models allow for extremely fast implementation.

Phase 2: Model Tailoring

It’s critical that your application can outdo base model for whatever your use case is.

Model performance and efficiency can be greatly improved through Fine Tuning, Prompt Engineering, and Retrieval Augmented Generation.

Many companies will stop at this point.

Phase 3: Training an in-house model

State of the art, general purpose models are extremely powerful and flexible but for many use cases they’re overkill.

As SaaS costs rise it’s worth considering if you could achieve the same or better results with a in-house, lightweight open source model.

Once desired performance is met and supporting infrastructure is provisioned the new model can be hot swapped into production or deployed gradually.

Phase 4: Poised and Independent

Underlying model is faster, cheaper, and more performant than other existing models for your use case.

Model can be augmented and improved with a steady stream of in-house data.

Plugins can expand model capabilities.

Company is now positioned to offer the underlying model as a standalone B2B or B2C service.

We use cookies to make our site work. We'd also like to set optional analytics cookies to help us improve it. They will be enabled, unless you disable them. Our privacy policy