List your models

Get a list of available models on your system, including both LLMs and embedding models.

GET /api/v1/models

This endpoint has no request parameters.

Example Request

curl http://localhost:1234/api/v1/models \
  -H "Authorization: Bearer $LM_API_TOKEN"

Response fields

models : array

List of available models (both LLMs and embedding models).

type : "llm" | "embedding"

Type of model.

publisher : string

Model publisher name.

key : string

Unique identifier for the model.

display_name : string

Human-readable model name.

architecture (optional) : string | null

Model architecture (e.g., "llama", "mistral"). Absent for embedding models.

quantization : object | null

Quantization information for the model.

name : string | null

Quantization method name.

bits_per_weight : number | null

Bits per weight for the quantization.

size_bytes : number

Size of the model in bytes.

params_string : string | null

Human-readable parameter count (e.g., "7B", "13B").

loaded_instances : array

List of currently loaded instances of this model.

id : string

Unique identifier for the loaded model instance.

config : object

Configuration for the loaded instance.

context_length : number

The maximum context length for the model in number of tokens.

eval_batch_size (optional) : number

Number of input tokens to process together in a single batch during evaluation. Absent for embedding models.

parallel (optional) : number

Maximum number of parallel predictions the instance can handle. Absent for embedding models.

flash_attention (optional) : boolean

Whether Flash Attention is enabled for optimized attention computation. Absent for embedding models.

num_experts (optional) : number

Number of experts for MoE (Mixture of Experts) models. Absent for embedding models.

offload_kv_cache_to_gpu (optional) : boolean

Whether KV cache is offloaded to GPU memory. Absent for embedding models.

max_context_length : number

Maximum context length supported by the model in number of tokens.

format : "gguf" | "mlx" | null

Model file format.

capabilities (optional) : object

Model capabilities. Absent for embedding models.

vision : boolean

Whether the model supports vision/image inputs.

trained_for_tool_use : boolean

Whether the model was trained for tool/function calling.

reasoning (optional) : object

Public reasoning configuration for the model. Absent when no reasoning config is exposed.

allowed_options : ("off" | "on" | "low" | "medium" | "high")[]

Allowed public reasoning settings for the model.

default : "off" | "on" | "low" | "medium" | "high"

Default public reasoning setting for the model.

description (optional) : string | null

Model description. Absent for embedding models.

variants (optional) : array

List of available quantization variant names for this model. Present for multi-variant models.

selected_variant (optional) : string

The currently selected variant name. Present when variants is present.

Response

{
  "models": [
    {
      "type": "llm",
      "publisher": "google",
      "key": "google/gemma-4-26b-a4b",
      "display_name": "Gemma 4 26B A4B",
      "architecture": "gemma4",
      "quantization": {
        "name": "Q4_K_M",
        "bits_per_weight": 4
      },
      "size_bytes": 17990911801,
      "params_string": "26B-A4B",
      "loaded_instances": [
        {
          "id": "google/gemma-4-26b-a4b",
          "config": {
            "context_length": 4096,
            "eval_batch_size": 512,
            "parallel": 4,
            "flash_attention": true,
            "num_experts": 8,
            "offload_kv_cache_to_gpu": true
          }
        }
      ],
      "max_context_length": 262144,
      "format": "gguf",
      "capabilities": {
        "vision": true,
        "trained_for_tool_use": true,
        "reasoning": {
          "allowed_options": [
            "off",
            "on"
          ],
          "default": "on"
        }
      },
      "description": null,
      "variants": [
        "google/gemma-4-26b-a4b@q4_k_m"
      ],
      "selected_variant": "google/gemma-4-26b-a4b@q4_k_m"
    },
      {
        "type": "llm",
        "publisher": "deepseek",
        "key": "deepseek-r1",
        "display_name": "DeepSeek R1",
        "architecture": "deepseek",
        "quantization": {
          "name": "Q4_K_M",
          "bits_per_weight": 4
        },
        "size_bytes": 40492610355,
        "params_string": "671B",
        "loaded_instances": [],
        "max_context_length": 131072,
        "format": "gguf",
        "capabilities": {
          "vision": false,
          "trained_for_tool_use": true,
          "reasoning": {
            "allowed_options": ["on"],
            "default": "on"
          }
        },
        "description": null
      },
      {
        "type": "embedding",
        "publisher": "gaianet",
        "key": "text-embedding-nomic-embed-text-v1.5-embedding",
        "display_name": "Nomic Embed Text v1.5",
        "quantization": {
          "name": "F16",
          "bits_per_weight": 16
        },
        "size_bytes": 274290560,
        "params_string": null,
        "loaded_instances": [],
        "max_context_length": 2048,
        "format": "gguf"
      }
  ]
}