List your models
Get a list of available models on your system, including both LLMs and embedding models.
GET /api/v1/models
This endpoint has no request parameters.
curl http://localhost:1234/api/v1/models \
-H "Authorization: Bearer $LM_API_TOKEN"Response fields
models : array
List of available models (both LLMs and embedding models).
type : "llm" | "embedding"
Type of model.
publisher : string
Model publisher name.
key : string
Unique identifier for the model.
display_name : string
Human-readable model name.
architecture (optional) : string | null
Model architecture (e.g., "llama", "mistral"). Absent for embedding models.
quantization : object | null
Quantization information for the model.
name : string | null
Quantization method name.
bits_per_weight : number | null
Bits per weight for the quantization.
size_bytes : number
Size of the model in bytes.
params_string : string | null
Human-readable parameter count (e.g., "7B", "13B").
loaded_instances : array
List of currently loaded instances of this model.
id : string
Unique identifier for the loaded model instance.
config : object
Configuration for the loaded instance.
context_length : number
The maximum context length for the model in number of tokens.
eval_batch_size (optional) : number
Number of input tokens to process together in a single batch during evaluation. Absent for embedding models.
parallel (optional) : number
Maximum number of parallel predictions the instance can handle. Absent for embedding models.
flash_attention (optional) : boolean
Whether Flash Attention is enabled for optimized attention computation. Absent for embedding models.
num_experts (optional) : number
Number of experts for MoE (Mixture of Experts) models. Absent for embedding models.
offload_kv_cache_to_gpu (optional) : boolean
Whether KV cache is offloaded to GPU memory. Absent for embedding models.
max_context_length : number
Maximum context length supported by the model in number of tokens.
format : "gguf" | "mlx" | null
Model file format.
capabilities (optional) : object
Model capabilities. Absent for embedding models.
vision : boolean
Whether the model supports vision/image inputs.
trained_for_tool_use : boolean
Whether the model was trained for tool/function calling.
reasoning (optional) : object
Public reasoning configuration for the model. Absent when no reasoning config is exposed.
allowed_options : ("off" | "on" | "low" | "medium" | "high")[]
Allowed public reasoning settings for the model.
default : "off" | "on" | "low" | "medium" | "high"
Default public reasoning setting for the model.
description (optional) : string | null
Model description. Absent for embedding models.
variants (optional) : array
List of available quantization variant names for this model. Present for multi-variant models.
selected_variant (optional) : string
The currently selected variant name. Present when variants is present.
{
"models": [
{
"type": "llm",
"publisher": "google",
"key": "google/gemma-4-26b-a4b",
"display_name": "Gemma 4 26B A4B",
"architecture": "gemma4",
"quantization": {
"name": "Q4_K_M",
"bits_per_weight": 4
},
"size_bytes": 17990911801,
"params_string": "26B-A4B",
"loaded_instances": [
{
"id": "google/gemma-4-26b-a4b",
"config": {
"context_length": 4096,
"eval_batch_size": 512,
"parallel": 4,
"flash_attention": true,
"num_experts": 8,
"offload_kv_cache_to_gpu": true
}
}
],
"max_context_length": 262144,
"format": "gguf",
"capabilities": {
"vision": true,
"trained_for_tool_use": true,
"reasoning": {
"allowed_options": [
"off",
"on"
],
"default": "on"
}
},
"description": null,
"variants": [
"google/gemma-4-26b-a4b@q4_k_m"
],
"selected_variant": "google/gemma-4-26b-a4b@q4_k_m"
},
{
"type": "llm",
"publisher": "deepseek",
"key": "deepseek-r1",
"display_name": "DeepSeek R1",
"architecture": "deepseek",
"quantization": {
"name": "Q4_K_M",
"bits_per_weight": 4
},
"size_bytes": 40492610355,
"params_string": "671B",
"loaded_instances": [],
"max_context_length": 131072,
"format": "gguf",
"capabilities": {
"vision": false,
"trained_for_tool_use": true,
"reasoning": {
"allowed_options": ["on"],
"default": "on"
}
},
"description": null
},
{
"type": "embedding",
"publisher": "gaianet",
"key": "text-embedding-nomic-embed-text-v1.5-embedding",
"display_name": "Nomic Embed Text v1.5",
"quantization": {
"name": "F16",
"bits_per_weight": 16
},
"size_bytes": 274290560,
"params_string": null,
"loaded_instances": [],
"max_context_length": 2048,
"format": "gguf"
}
]
}