LLMs on Kubernetes

Deploy LLMs to Kubernetes in minutes.

1. Getting Started

Introducing the worlds first k8s LLM operator to deploy and manage LLMs on kubernetes with ease. Getting started is easy. First, make sure that you have cert-manager installed on your cluster. Then, simply install the operator using the provided helm chart

$ helm add alpn-software https://helm.alpn-software.com

to add the helm repo, followed by

$ helm install https://helm.alpn-software.com/lm-operator-0.1.0.tgz

Once the operator is running, create a LanguageModel custom resource, specifying the type of model you want to deploy and the resources it requires.

# language-model.yaml
apiVersion: ai.k8s.alpn-software.com/v1
kind: LanguageModel
metadata:
  name: llama3
spec:
  modelType: llama3.2
  modelVersion: latest
  cpuArchitecture: arm64
  compute:
    limits:
      cpu: "4"
      memory: "16Gi"

and apply the changes to the cluster.

$ kubectl apply -f language-model.yaml

The operator will take care of deploying the model according to the provided spec. Models are deployed with an OpenAI-compatible HTTP API exposed via a k8s Service resource, and can be accessed via any HTTP client.

$ curl http://llama3.{namespace}/v1/completions

You can use any OpenAI-compatible client to interact with the models. Note that most models have chunky containers, so it can take a while for your cluster to pull the images and start the pods.

2. Compute Requirements

The system requirements for running an LLM depend on the model you choose. The smallest models can run on a single CPU with 8GB of memory, while larger models may require multiple GPUs or TPUs.

The amount of resource required is usually proportional to the number of parameters in the model. The smallest models such as smollm2 generally have around 7B parameters and can be ran on IoT devices. Other models such as llama2 have upwards of 70B parameters and require multiple GPUs to run.

Larger models may require more resources, but they also tend to be more powerful and capable of handling more complex tasks. Choosing an appropriate model for your use case is critical to achieving the best performance at a reasonable cost, so make sure to take the time to consider what tasks you want to accomplish and how much resource you are willing to allocate.

3. Available Models

We have a wide range of models available for you to use, but some of the more popular ones are listed below. The number of parameters roughly determines how much compute resource is required to run the model, as well as the complexity of the tasks it can handle.

Model Type	Parameters	License Required
smollm2	7B	No
llama3.2	3B	No
gemma3	12B	Yes
mistral	13B	Yes
llama2-chat	7B	Yes
ms-phi-4	14B	Yes

4. Getting Access

Deploying and running the operator is free. License-free functionality is, however, limited to a set of small, basic models. Access to the full range of models requires a license key, which can be obtained by contacting the development team at

contact@alpn-software.com

In addition to the license key, we also offer consulting services, as well as bespoke software and integrations to help you get the most of your LLMs.