API Changelog
Updates and changes to the LM Studio API.
LM Studio 0.4.1
Anthropic-compatible API
- New Anthropic-compatible endpoint:
POST /v1/messages.- Use Claude code models with LM Studio
- See docs for more details: /docs/developer/anthropic-compat.
LM Studio 0.4.0
LM Studio native v1 REST API
- Official release of LM Studio's native v1 REST API at
/api/v1/*endpoints.- MCP via API
- Stateful chats
- Authentication configuration with API tokens
- Model download, load and unload endpoints
- See overview page for more details and comparison with OpenAI-compatible endpoints.
LM Studio 0.3.29 • 2025‑10‑06
OpenAI /v1/responses and variant listing
- New OpenAI‑compatible endpoint:
POST /v1/responses.- Stateful interactions via
previous_response_id. - Custom tool calling and Remote MCP support (opt‑in).
- Reasoning support with
reasoning.effortforopenai/gpt‑oss‑20b. - Streaming via SSE when
stream: true.
- Stateful interactions via
- CLI:
lms ls --variantslists all variants for multi‑variant models. - Docs: /docs/developer/openai-compat. Full release notes: /blog/lmstudio-v0.3.29.
LM Studio 0.3.27 • 2025‑09‑24
CLI: model resource estimates, status, and interrupts
- New:
lms load --estimate-only <model>prints estimated GPU and total memory before loading. Honors--context-lengthand--gpu, and uses an improved estimator that now accounts for flash attention and vision models. lms chat: pressCtrl+Cto interrupt an ongoing prediction.lms ps --jsonnow reports each model's generation status and the number of queued prediction requests.- CLI color contrast improved for light mode.
- See docs: /docs/cli/local-models/load. Full release notes: /blog/lmstudio-v0.3.27.
LM Studio 0.3.26 • 2025‑09‑15
CLI log streaming: server + model
lms log streamnow supports multiple sources and filters.--source serverstreams HTTP server logs (startup, endpoints, status)--source model --filter input,outputstreams formatted user input and model output- Append
--jsonfor machine‑readable logs;--statsadds tokens/sec and related metrics (model source)
- See usage and examples: /docs/cli/serve/log-stream. Full release notes: /blog/lmstudio-v0.3.26.
LM Studio 0.3.25 • 2025‑09‑04
New model support (API)
- Added support for NVIDIA Nemotron‑Nano‑v2 with tool‑calling via the OpenAI‑compatible endpoints ‡.
- Added support for Google EmbeddingGemma for the
/v1/embeddingsendpoint ‡.
LM Studio 0.3.24 • 2025‑08‑28
Seed‑OSS tool‑calling and template fixes
- Added support for ByteDance/Seed‑OSS including tool‑calling and prompt‑template compatibility fixes in the OpenAI‑compatible API ‡.
- Fixed cases where tool calls were not parsed for certain prompt templates ‡.
LM Studio 0.3.23 • 2025‑08‑12
Reasoning content and tool‑calling reliability
- For
gpt‑ossonPOST /v1/chat/completions, reasoning content moves out ofmessage.contentand intochoices.message.reasoning(non‑streaming) andchoices.delta.reasoning(streaming), aligning witho3‑mini‡. - Tool names are normalized (e.g., snake_case) before being provided to the model to improve tool‑calling reliability ‡.
- Fixed errors for certain tools‑containing requests to
POST /v1/chat/completions(e.g., "reading 'properties'") and non‑streaming tool‑call failures ‡.
LM Studio 0.3.19 • 2025‑07‑21
Bug fixes for streaming and tool calls
- Corrected usage statistics returned by OpenAI‑compatible streaming responses ‡.
- Improved handling of parallel tool calls via the streaming API ‡.
- Fixed parsing of correct tool calls for certain Mistral models ‡.
LM Studio 0.3.18 • 2025‑07‑10
Streaming options and tool‑calling improvements
- Added support for the
stream_optionsobject on OpenAI‑compatible endpoints. Settingstream_options.include_usagetotruereturns prompt and completion token usage during streaming ‡. - Errors returned from streaming endpoints now follow the correct format expected by OpenAI clients ‡.
- Tool‑calling support added for Mistral v13 tokenizer models, using proper chat templates ‡.
- The
response_format.typefield now accepts"text"in chat‑completion requests ‡. - Fixed bugs where parallel tool calls split across multiple chunks were dropped and where root‑level
$defsin tool definitions were stripped ‡.
LM Studio 0.3.17 • 2025‑06‑25
Tool‑calling reliability and token‑count updates
- Token counts now include the system prompt and tool definitions ‡. This makes usage reporting more accurate for both the UI and the API.
- Tool‑call argument tokens are streamed as they are generated ‡, improving responsiveness when using streamed function calls.
- Various fixes improve MCP and tool‑calling reliability, including correct handling of tools that omit a
parametersobject and preventing hangs when an MCP server reloads ‡.
LM Studio 0.3.16 • 2025‑05‑23
Model capabilities in GET /models
- The OpenAI‑compatible REST API (
/api/v0) now returns acapabilitiesarray in theGET /modelsresponse. Each model lists its supported capabilities (e.g."tool_use") ‡ so clients can programmatically discover tool‑enabled models. - Fixed a streaming bug where an empty function name string was appended after the first packet of streamed tool calls ‡.
👾 LM Studio 0.3.15 • 2025-04-24
Release post: LM Studio 0.3.15
Improved Tool Use API Support
OpenAI-like REST API now supports the tool_choice parameter:
{
"tool_choice": "auto" // or "none", "required"
}"tool_choice": "none"— Model will not call tools"tool_choice": "auto"— Model decides"tool_choice": "required"— Model must call tools (llama.cpp only)
Chunked responses now set "finish_reason": "tool_calls" when appropriate.
👾 LM Studio 0.3.14 • 2025-03-27
Release post: LM Studio 0.3.14
[API/SDK] Preset Support
RESTful API and SDKs support specifying presets in requests.
(example needed)
👾 LM Studio 0.3.10 • 2025-02-18
Release post: LM Studio 0.3.10
Speculative Decoding API
Enable speculative decoding in API requests with "draft_model":
{
"model": "deepseek-r1-distill-qwen-7b",
"draft_model": "deepseek-r1-distill-qwen-0.5b",
"messages": [ ... ]
}Responses now include a stats object for speculative decoding:
"stats": {
"tokens_per_second": ...,
"draft_model": "...",
"total_draft_tokens_count": ...,
"accepted_draft_tokens_count": ...,
"rejected_draft_tokens_count": ...,
"ignored_draft_tokens_count": ...
}👾 LM Studio 0.3.9 • 2025-01-30
Release post: LM Studio 0.3.9
Idle TTL and Auto Evict
Set a TTL (in seconds) for models loaded via API requests (docs article: Idle TTL and Auto-Evict)
curl http://localhost:1234/api/v0/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-qwen-7b",
"messages": [ ... ]
+ "ttl": 300,
}'With lms:
lms load --ttl <seconds>Separate reasoning_content in Chat Completion responses
For DeepSeek R1 models, get reasoning content in a separate field. See more here.
Turn this on in App Settings > Developer.
👾 LM Studio 0.3.6 • 2025-01-06
Release post: LM Studio 0.3.6
Tool and Function Calling API
Use any LLM that supports Tool Use and Function Calling through the OpenAI-like API.
Docs: Tool Use and Function Calling.
👾 LM Studio 0.3.5 • 2024-10-22
Release post: LM Studio 0.3.5
Introducing lms get: download models from the terminal
You can now download models directly from the terminal using a keyword
lms get deepseek-r1or a full Hugging Face URL
lms get <hugging face url>To filter for MLX models only, add --mlx to the command.
lms get deepseek-r1 --mlx