Streaming events
When you chat with a model with stream set to true, the response is sent as a stream of events using Server-Sent Events (SSE).
Streaming events let you render chat responses incrementally over Server‑Sent Events (SSE). When you call POST /api/v1/chat with stream: true, the server emits a series of named events that you can consume. These events arrive in order and may include multiple deltas (for reasoning and message content), tool call boundaries and payloads, and any errors encountered. The stream always begins with chat.start and concludes with chat.end, which contains the aggregated result equivalent to a non‑streaming response.
List of event types that can be sent in an /api/v1/chat response stream:
chat.startmodel_load.startmodel_load.progressmodel_load.endprompt_processing.startprompt_processing.progressprompt_processing.endreasoning.startreasoning.deltareasoning.endtool_call.starttool_call.argumentstool_call.successtool_call.failuremessage.startmessage.deltamessage.enderrorchat.end
Events will be streamed out in the following raw format:
event: <event type>
data: <JSON event data>chat.start
An event that is emitted at the start of a chat response stream.
model_instance_id : string
Unique identifier for the loaded model instance that will generate the response.
type : "chat.start"
The type of the event. Always chat.start.
{
"type": "chat.start",
"model_instance_id": "openai/gpt-oss-20b"
}model_load.start
Signals the start of a model being loaded to fulfill the chat request. Will not be emitted if the requested model is already loaded.
model_instance_id : string
Unique identifier for the model instance being loaded.
type : "model_load.start"
The type of the event. Always model_load.start.
{
"type": "model_load.start",
"model_instance_id": "openai/gpt-oss-20b"
}model_load.progress
Progress of the model load.
model_instance_id : string
Unique identifier for the model instance being loaded.
progress : number
Progress of the model load as a float between 0 and 1.
type : "model_load.progress"
The type of the event. Always model_load.progress.
{
"type": "model_load.progress",
"model_instance_id": "openai/gpt-oss-20b",
"progress": 0.65
}model_load.end
Signals a successfully completed model load.
model_instance_id : string
Unique identifier for the model instance that was loaded.
load_time_seconds : number
Time taken to load the model in seconds.
type : "model_load.end"
The type of the event. Always model_load.end.
{
"type": "model_load.end",
"model_instance_id": "openai/gpt-oss-20b",
"load_time_seconds": 12.34
}prompt_processing.start
Signals the start of the model processing a prompt.
type : "prompt_processing.start"
The type of the event. Always prompt_processing.start.
{
"type": "prompt_processing.start"
}prompt_processing.progress
Progress of the model processing a prompt.
progress : number
Progress of the prompt processing as a float between 0 and 1.
type : "prompt_processing.progress"
The type of the event. Always prompt_processing.progress.
{
"type": "prompt_processing.progress",
"progress": 0.5
}prompt_processing.end
Signals the end of the model processing a prompt.
type : "prompt_processing.end"
The type of the event. Always prompt_processing.end.
{
"type": "prompt_processing.end"
}reasoning.start
Signals the model is starting to stream reasoning content.
type : "reasoning.start"
The type of the event. Always reasoning.start.
{
"type": "reasoning.start"
}reasoning.delta
A chunk of reasoning content. Multiple deltas may arrive.
content : string
Reasoning text fragment.
type : "reasoning.delta"
The type of the event. Always reasoning.delta.
{
"type": "reasoning.delta",
"content": "Need to"
}reasoning.end
Signals the end of the reasoning stream.
type : "reasoning.end"
The type of the event. Always reasoning.end.
{
"type": "reasoning.end"
}tool_call.start
Emitted when the model starts a tool call.
tool : string
Name of the tool being called.
provider_info : object
Information about the tool provider. Discriminated union upon possible provider types.
Plugin provider info : object
Present when the tool is provided by a plugin.
type : "plugin"
Provider type.
plugin_id : string
Identifier of the plugin.
Ephemeral MCP provider info : object
Present when the tool is provided by a ephemeral MCP server.
type : "ephemeral_mcp"
Provider type.
server_label : string
Label of the MCP server.
type : "tool_call.start"
The type of the event. Always tool_call.start.
{
"type": "tool_call.start",
"tool": "model_search",
"provider_info": {
"type": "ephemeral_mcp",
"server_label": "huggingface"
}
}tool_call.arguments
Arguments streamed for the current tool call.
tool : string
Name of the tool being called.
arguments : object
Arguments passed to the tool. Can have any keys/values depending on the tool definition.
provider_info : object
Information about the tool provider. Discriminated union upon possible provider types.
Plugin provider info : object
Present when the tool is provided by a plugin.
type : "plugin"
Provider type.
plugin_id : string
Identifier of the plugin.
Ephemeral MCP provider info : object
Present when the tool is provided by a ephemeral MCP server.
type : "ephemeral_mcp"
Provider type.
server_label : string
Label of the MCP server.
type : "tool_call.arguments"
The type of the event. Always tool_call.arguments.
{
"type": "tool_call.arguments",
"tool": "model_search",
"arguments": {
"sort": "trendingScore",
"limit": 1
},
"provider_info": {
"type": "ephemeral_mcp",
"server_label": "huggingface"
}
}tool_call.success
Result of the tool call, along with the arguments used.
tool : string
Name of the tool that was called.
arguments : object
Arguments that were passed to the tool.
output : string
Raw tool output string.
provider_info : object
Information about the tool provider. Discriminated union upon possible provider types.
Plugin provider info : object
Present when the tool is provided by a plugin.
type : "plugin"
Provider type.
plugin_id : string
Identifier of the plugin.
Ephemeral MCP provider info : object
Present when the tool is provided by a ephemeral MCP server.
type : "ephemeral_mcp"
Provider type.
server_label : string
Label of the MCP server.
type : "tool_call.success"
The type of the event. Always tool_call.success.
{
"type": "tool_call.success",
"tool": "model_search",
"arguments": {
"sort": "trendingScore",
"limit": 1
},
"output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
"provider_info": {
"type": "ephemeral_mcp",
"server_label": "huggingface"
}
}tool_call.failure
Indicates that the tool call failed.
reason : string
Reason for the tool call failure.
metadata : object
Metadata about the invalid tool call.
type : "invalid_name" | "invalid_arguments"
Type of error that occurred.
tool_name : string
Name of the tool that was attempted to be called.
arguments (optional) : object
Arguments that were passed to the tool (only present for invalid_arguments errors).
provider_info (optional) : object
Information about the tool provider (only present for invalid_arguments errors).
type : "plugin" | "ephemeral_mcp"
Provider type.
plugin_id (optional) : string
Identifier of the plugin (when type is "plugin").
server_label (optional) : string
Label of the MCP server (when type is "ephemeral_mcp").
type : "tool_call.failure"
The type of the event. Always tool_call.failure.
{
"type": "tool_call.failure",
"reason": "Cannot find tool with name open_browser.",
"metadata": {
"type": "invalid_name",
"tool_name": "open_browser"
}
}message.start
Signals the model is about to stream a message.
type : "message.start"
The type of the event. Always message.start.
{
"type": "message.start"
}message.delta
A chunk of message content. Multiple deltas may arrive.
content : string
Message text fragment.
type : "message.delta"
The type of the event. Always message.delta.
{
"type": "message.delta",
"content": "The current"
}message.end
Signals the end of the message stream.
type : "message.end"
The type of the event. Always message.end.
{
"type": "message.end"
}error
An error occurred during streaming. The final payload will still be sent in chat.end with whatever was generated.
error : object
Error information.
type : "invalid_request" | "unknown" | "mcp_connection_error" | "plugin_connection_error" | "not_implemented" | "model_not_found" | "job_not_found" | "internal_error"
High-level error type.
message : string
Human-readable error message.
code (optional) : string
More detailed error code (e.g., validation issue code).
param (optional) : string
Parameter associated with the error, if applicable.
type : "error"
The type of the event. Always error.
{
"type": "error",
"error": {
"type": "invalid_request",
"message": "\"model\" is required",
"code": "missing_required_parameter",
"param": "model"
}
}chat.end
Final event containing the full aggregated response, equivalent to the non-streaming POST /api/v1/chat response body.
result : object
Final response with model_instance_id, output, stats, and optional response_id. See non-streaming chat docs for more details.
type : "chat.end"
The type of the event. Always chat.end.
{
"type": "chat.end",
"result": {
"model_instance_id": "openai/gpt-oss-20b",
"output": [
{ "type": "reasoning", "content": "Need to call function." },
{
"type": "tool_call",
"tool": "model_search",
"arguments": { "sort": "trendingScore", "limit": 1 },
"output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
"provider_info": { "type": "ephemeral_mcp", "server_label": "huggingface" }
},
{ "type": "message", "content": "The current top‑trending model is..." }
],
"stats": {
"input_tokens": 329,
"total_output_tokens": 268,
"reasoning_output_tokens": 5,
"tokens_per_second": 43.73,
"time_to_first_token_seconds": 0.781
},
"response_id": "resp_02b2017dbc06c12bfc353a2ed6c2b802f8cc682884bb5716"
}
}