Documentation Index
Fetch the complete documentation index at: https://budecosystem-b7b14df4.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Responses API provides a next-generation interface for complex AI interactions, supporting:
- Prompt-based execution: Execute versioned prompt templates with variable substitution
- MCP tool integration: Access Model Context Protocol tools for extended functionality
- Structured outputs: JSON schema-validated responses for reliable data extraction
- Array-based outputs: Multiple output types (messages, tool calls, reasoning, MCP tool lists)
- Multi-turn conversations with context preservation
- Parallel tool/function calling
- Multimodal inputs (text, image, audio)
- Reasoning model capabilities
- Streaming responses
Endpoints
POST /v1/responses
GET /v1/responses/{response_id}
DELETE /v1/responses/{response_id}
POST /v1/responses/{response_id}/cancel
GET /v1/responses/{response_id}/input_items
Authentication
Authorization: Bearer <API_KEY>
Create Response
Generate AI responses with advanced conversational features.
Endpoint: POST /v1/responses
Headers:
Authorization: Bearer YOUR_API_KEY (required)
Content-Type: application/json (required)
Request Body:
{
"model": "gpt-4o",
"input": "Explain quantum computing",
"previous_response_id": "resp_abc123",
"prompt": {
"id": "prompt_quantum_explanation",
"variables": {
"topic": "quantum computing",
"difficulty": "beginner"
},
"version": "1"
},
"instructions": "You are a helpful physics tutor",
"modalities": ["text"],
"reasoning": true,
"tools": [
{
"type": "function",
"function": {
"name": "calculate_quantum_state",
"description": "Calculate quantum state probabilities",
"parameters": {
"type": "object",
"properties": {
"qubits": {"type": "integer"},
"state": {"type": "string"}
},
"required": ["qubits", "state"]
}
}
}
],
"tool_choice": "auto",
"temperature": 0.7,
"max_tokens": 1500,
"stream": false,
"metadata": {
"user_id": "user123",
"session": "quantum_tutorial"
}
}
Parameters
| Field | Type | Required | Description |
|---|
model | string | No | Model identifier |
prompt | object | No | Prompt template parameters |
input | string/array | No | Text or multimodal content |
previous_response_id | string | No | ID for conversation continuity |
instructions | string | No | System instructions |
modalities | array | No | Output types: ["text"], ["text", "audio"] |
reasoning | boolean | No | Enable reasoning/thinking mode |
tools | array | No | Available functions/tools |
tool_choice | string/object | No | Tool selection: auto, none, required |
temperature | float | No | Sampling temperature (0.0 to 2.0) |
max_tokens | integer | No | Maximum output tokens |
stream | boolean | No | Enable streaming response |
metadata | object | No | Custom metadata |
{
"prompt": {
"id": "prompt_name",
"version": "1",
"variables": {
"variable_1": "Value 1",
"variable_2": "Value 2"
}
},
"input": "Unstructured input text related to the prompt."
}
{
"input": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,..."
}
}
]
}
The response contains an array-based output field with multiple item types:
{
"id": "resp_abc123",
"object": "response",
"created": 1699123456,
"model": "gpt-4o",
"status": "completed",
"output": [
{
"id": "msg_xyz",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Quantum computing uses quantum mechanical phenomena...",
"annotations": []
}
]
}
],
"instructions": [
{
"type": "message",
"role": "system",
"status": "completed",
"content": [
{
"type": "input_text",
"text": "You are a helpful physics tutor"
}
]
},
{
"type": "message",
"role": "user",
"status": "completed",
"content": [
{
"type": "input_text",
"text": "Explain quantum computing"
}
]
}
],
"usage": {
"input_tokens": 25,
"output_tokens": 150,
"total_tokens": 175,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
},
"parallel_tool_calls": true,
"tool_choice": "auto",
"tools": [],
"temperature": 0.7,
"top_p": 0.9,
"max_output_tokens": 1500,
"background": false,
"reasoning": {},
"text": {
"format": {
"type": "text"
}
}
}
Output Item Types
The output array can contain multiple types of items:
Text Messages
{
"id": "msg_abc",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Response content...",
"annotations": [],
"logprobs": []
}
]
}
{
"id": "mcpl_def",
"type": "mcp_list_tools",
"server_label": "filesystem",
"tools": [
{
"name": "read_file",
"description": "Read file contents",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"}
}
}
}
],
"error": null
}
{
"id": "call_123",
"type": "mcp_call",
"status": "completed",
"name": "read_file",
"server_label": "filesystem",
"arguments": "{\"path\":\"/data/file.txt\"}",
"output": "File contents here...",
"error": null
}
{
"type": "function_call",
"call_id": "call_456",
"name": "get_weather",
"arguments": "{\"location\":\"Paris\"}",
"id": "fc_789"
}
Reasoning Items
{
"id": "rs_abc",
"type": "reasoning",
"status": "completed",
"summary": [
{
"type": "summary_text",
"text": "Let me think through this step by step..."
}
]
}
When streaming is enabled, responses are returned as Server-Sent Events (SSE) with the following format:
event: {event_type}
data: {json_payload}
Event Lifecycle
1. Initial Events
event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_abc","status":"in_progress","created_at":1699123456}}
event: response.in_progress
data: {"type":"response.in_progress","sequence_number":1,"response":{"id":"resp_abc","status":"in_progress"}}
2. MCP Tool List Events (if MCP tools are configured)
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":2,"output_index":0,"item":{"id":"mcpl_xyz","type":"mcp_list_tools","server_label":"filesystem","tools":[]}}
event: response.mcp_list_tools.in_progress
data: {"type":"response.mcp_list_tools.in_progress","sequence_number":3,"output_index":0,"item_id":"mcpl_xyz"}
event: response.mcp_list_tools.completed
data: {"type":"response.mcp_list_tools.completed","sequence_number":4,"output_index":0,"item_id":"mcpl_xyz"}
event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":5,"output_index":0,"item":{"id":"mcpl_xyz","type":"mcp_list_tools","server_label":"filesystem","tools":[{"name":"read_file","description":"Read file"}]}}
3. Text Output Events
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":6,"output_index":1,"item":{"id":"msg_abc","type":"message","status":"in_progress","role":"assistant","content":[]}}
event: response.content_part.added
data: {"type":"response.content_part.added","sequence_number":7,"item_id":"msg_abc","output_index":1,"content_index":0,"part":{"type":"output_text","text":"","annotations":[]}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":8,"item_id":"msg_abc","output_index":1,"content_index":0,"delta":"Quantum","logprobs":[]}
event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":9,"item_id":"msg_abc","output_index":1,"content_index":0,"delta":" computing","logprobs":[]}
event: response.output_text.done
data: {"type":"response.output_text.done","sequence_number":10,"item_id":"msg_abc","output_index":1,"content_index":0,"text":"Quantum computing uses...","logprobs":[]}
event: response.content_part.done
data: {"type":"response.content_part.done","sequence_number":11,"item_id":"msg_abc","output_index":1,"content_index":0,"part":{"type":"output_text","text":"Quantum computing uses...","annotations":[]}}
event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":12,"output_index":1,"item":{"id":"msg_abc","type":"message","status":"completed","role":"assistant","content":[{"type":"output_text","text":"Quantum computing uses..."}]}}
4. Reasoning Events (for thinking/reasoning models)
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":13,"output_index":0,"item":{"id":"rs_xyz","type":"reasoning","status":"in_progress","summary":[]}}
event: response.reasoning_summary_part.added
data: {"type":"response.reasoning_summary_part.added","sequence_number":14,"item_id":"rs_xyz","output_index":0,"summary_index":0,"part":{"type":"summary_text","text":""}}
event: response.reasoning_summary_text.delta
data: {"type":"response.reasoning_summary_text.delta","sequence_number":15,"item_id":"rs_xyz","output_index":0,"summary_index":0,"delta":"Let me think..."}
event: response.reasoning_summary_text.done
data: {"type":"response.reasoning_summary_text.done","sequence_number":16,"item_id":"rs_xyz","output_index":0,"summary_index":0,"text":"Let me think through this..."}
event: response.reasoning_summary_part.done
data: {"type":"response.reasoning_summary_part.done","sequence_number":17,"item_id":"rs_xyz","output_index":0,"summary_index":0,"part":{"type":"summary_text","text":"Let me think through this..."}}
event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":18,"output_index":0,"item":{"id":"rs_xyz","type":"reasoning","status":"completed","summary":[{"type":"summary_text","text":"Let me think through this..."}]}}
5. MCP Tool Call Events
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":19,"output_index":2,"item":{"id":"call_123","type":"mcp_call","status":"in_progress","name":"read_file","server_label":"filesystem","arguments":""}}
event: response.mcp_call.in_progress
data: {"type":"response.mcp_call.in_progress","sequence_number":20,"output_index":2,"item_id":"call_123"}
event: response.mcp_call_arguments.delta
data: {"type":"response.mcp_call_arguments.delta","sequence_number":21,"output_index":2,"item_id":"call_123","delta":"{\"path\"}"}
event: response.mcp_call_arguments.done
data: {"type":"response.mcp_call_arguments.done","sequence_number":22,"output_index":2,"item_id":"call_123","arguments":"{\"path\":\"/file.txt\"}"}
event: response.mcp_call.completed
data: {"type":"response.mcp_call.completed","sequence_number":23,"output_index":2,"item_id":"call_123"}
event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":24,"output_index":2,"item":{"id":"call_123","type":"mcp_call","status":"completed","name":"read_file","server_label":"filesystem","arguments":"{\"path\":\"/file.txt\"}","output":"file contents..."}}
6. Function Tool Call Events
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":25,"output_index":3,"item":{"type":"function_call","call_id":"call_456","name":"get_weather","arguments":"","id":"fc_789"}}
event: response.function_call_arguments.done
data: {"type":"response.function_call_arguments.done","sequence_number":26,"output_index":3,"item_id":"call_456","name":"get_weather","arguments":"{\"location\":\"Paris\"}"}
event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":27,"output_index":3,"item":{"type":"function_call","call_id":"call_456","name":"get_weather","arguments":"{\"location\":\"Paris\"}","id":"fc_789"}}
7. Completion Event
event: response.completed
data: {"type":"response.completed","sequence_number":28,"response":{"id":"resp_abc","object":"response","created_at":1699123456,"model":"gpt-4","status":"completed","output":[...],"instructions":[...],"usage":{"input_tokens":25,"output_tokens":150,"total_tokens":175}}}
8. Error Event (on failure)
event: response.failed
data: {"type":"response.failed","sequence_number":5,"response":{"id":"resp_abc","status":"failed","error":{"message":"Error description","type":"server_error","code":"execution_failed"}}}
Key Event Fields
sequence_number: Monotonically increasing counter for event ordering
output_index: Position in the output array (0-indexed)
item_id: Unique identifier for the specific item being streamed
content_index: Position within the content array (for messages)
summary_index: Position within the summary array (for reasoning)
Prompt-Based Execution
Execute pre-configured prompt templates using the prompt parameter:
Request Example:
{
"prompt": {
"id": "prompt_template_id",
"variables": {"topic": "quantum computing"},
"version": "1"
},
"input": "Unstructured user input"
}
Fields:
prompt.id (required) - Template identifier
prompt.variables (optional) - Variable substitutions
prompt.version (optional) - Template version (defaults to default version)
input (optional) - Unstructured user input
Prompt Configuration (via UI or API):
Users can pre-configure prompts with:
- Model deployment and settings (temperature, max_tokens, top_p, etc.)
- System prompt with Jinja2 template support
- Conversation messages and context with Jinja2 template support
- MCP tools (filesystem, web access, custom tools)
- Input/output schemas for structured data
- Validation rules and retry limits
- Streaming configuration
Retrieve Response
Get details of a specific response.
Endpoint: GET /v1/responses/{response_id}
Returns the same format as the create response endpoint.
Delete Response
Remove a response from the system.
Endpoint: DELETE /v1/responses/{response_id}
{
"id": "resp_abc123",
"object": "response",
"deleted": true
}
Cancel Response
Cancel an in-progress response generation.
Endpoint: POST /v1/responses/{response_id}/cancel
{
"id": "resp_abc123",
"object": "response",
"status": "cancelled",
"cancelled_at": 1699123456
}
Retrieve the input conversation history for a response.
Endpoint: GET /v1/responses/{response_id}/input_items
{
"object": "list",
"data": [
{
"type": "message",
"role": "system",
"content": "You are a helpful physics tutor"
},
{
"type": "message",
"role": "user",
"content": "Explain quantum computing"
},
{
"type": "message",
"role": "assistant",
"content": "I'd be happy to explain quantum computing..."
}
]
}
Usage Examples
Basic Response
curl -X POST http://localhost:3000/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "What is machine learning?"
}'
Prompt Execution
curl -X POST http://localhost:3000/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": {
"id": "prompt_neural_networks",
"variables": {
"question": "Explain neural networks"
}
}
}'
Multi-turn Conversation
# First response
RESPONSE_ID=$(curl -X POST http://localhost:3000/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "Explain neural networks"
}' | jq -r '.id')
# Follow-up response
curl -X POST http://localhost:3000/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "How do they differ from traditional algorithms?",
"previous_response_id": "'$RESPONSE_ID'"
}'
curl -X POST http://localhost:3000/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "Calculate the fibonacci sequence up to 10",
"tools": [
{
"type": "function",
"function": {
"name": "calculate_fibonacci",
"description": "Calculate fibonacci numbers",
"parameters": {
"type": "object",
"properties": {
"n": {"type": "integer", "description": "Number of terms"}
},
"required": ["n"]
}
}
}
]
}'
Python Example
import requests
import json
class ResponsesAPI:
def __init__(self, api_key, base_url="http://localhost:3000"):
self.api_key = api_key
self.base_url = base_url
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def create_response(self, model, input_text, **kwargs):
data = {
"model": model,
"input": input_text,
**kwargs
}
response = requests.post(
f"{self.base_url}/v1/responses",
headers=self.headers,
json=data
)
return response.json()
def create_conversation(self, model, messages):
"""Create a multi-turn conversation"""
response_id = None
responses = []
for message in messages:
data = {
"model": model,
"input": message
}
if response_id:
data["previous_response_id"] = response_id
response = self.create_response(model, message,
previous_response_id=response_id)
responses.append(response)
response_id = response["id"]
return responses
def stream_response(self, model, input_text, **kwargs):
"""Stream response with SSE"""
data = {
"model": model,
"input": input_text,
"stream": True,
**kwargs
}
response = requests.post(
f"{self.base_url}/v1/responses",
headers=self.headers,
json=data,
stream=True
)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data == '[DONE]':
break
yield json.loads(data)
# Usage
api = ResponsesAPI("YOUR_API_KEY")
# Simple response
response = api.create_response(
"gpt-4o",
"Explain the theory of relativity",
temperature=0.7
)
print(response["output"]["content"])
# Multi-turn conversation
conversation = api.create_conversation(
"gpt-4o",
[
"What is artificial intelligence?",
"How does it relate to machine learning?",
"What are some practical applications?"
]
)
# Streaming response
for chunk in api.stream_response("gpt-4o", "Write a short story"):
if "delta" in chunk and "content" in chunk["delta"]:
print(chunk["delta"]["content"], end="", flush=True)
# Multimodal input
with open("image.jpg", "rb") as f:
import base64
image_data = base64.b64encode(f.read()).decode()
response = api.create_response(
"gpt-4o-vision",
[
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}
]
)
JavaScript Example
class ResponsesAPI {
constructor(apiKey, baseUrl = 'http://localhost:3000') {
this.apiKey = apiKey;
this.baseUrl = baseUrl;
}
async createResponse(model, input, options = {}) {
const response = await fetch(`${this.baseUrl}/v1/responses`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model,
input,
...options
})
});
return await response.json();
}
async *streamResponse(model, input, options = {}) {
const response = await fetch(`${this.baseUrl}/v1/responses`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model,
input,
stream: true,
...options
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop();
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') return;
yield JSON.parse(data);
}
}
}
}
async createConversation(model, messages) {
let responseId = null;
const responses = [];
for (const message of messages) {
const response = await this.createResponse(
model,
message,
responseId ? { previous_response_id: responseId } : {}
);
responses.push(response);
responseId = response.id;
}
return responses;
}
}
// Usage
const api = new ResponsesAPI('YOUR_API_KEY');
// Simple response
const response = await api.createResponse(
'gpt-4o',
'What is the meaning of life?',
{ temperature: 0.9 }
);
console.log(response.output.content);
// Streaming response
for await (const chunk of api.streamResponse('gpt-4o', 'Tell me a joke')) {
if (chunk.delta?.content) {
process.stdout.write(chunk.delta.content);
}
}
// Tool calling
const toolResponse = await api.createResponse(
'gpt-4o',
'What is the weather in Paris?',
{
tools: [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather information',
parameters: {
type: 'object',
properties: {
location: { type: 'string' }
},
required: ['location']
}
}
}]
}
);
// Handle tool calls
if (toolResponse.output.tool_calls?.length > 0) {
for (const toolCall of toolResponse.output.tool_calls) {
console.log(`Calling ${toolCall.function.name} with:`,
JSON.parse(toolCall.function.arguments));
}
}
Advanced Features
Reasoning Models
Enable step-by-step reasoning:
{
"model": "o1-preview",
"input": "Solve this complex problem...",
"reasoning": true
}
The API supports calling multiple tools in parallel:
{
"output": {
"tool_calls": [
{
"id": "call_1",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Paris\"}"
}
},
{
"id": "call_2",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"London\"}"
}
}
]
}
}
Conversation Context
Maintain context across multiple interactions:
{
"model": "gpt-4o",
"input": "Continue our discussion",
"previous_response_id": "resp_previous",
"instructions": "You are a helpful tutor who remembers previous conversations"
}
Error Responses
400 Bad Request
{
"error": {
"message": "Invalid model specified",
"type": "invalid_request_error",
"code": "invalid_model"
}
}
401 Unauthorized
{
"error": {
"message": "Invalid API key",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
429 Rate Limit
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"retry_after": 60
}
}
Best Practices
- Conversation Management: Use
previous_response_id for coherent multi-turn conversations
- Tool Design: Create focused, single-purpose tools for better reliability
- Streaming: Use streaming for long responses to improve user experience
- Error Handling: Implement robust retry logic for transient failures
- Metadata: Use metadata to track conversations and user sessions
- Context Window: Be mindful of token limits when building long conversations
- Parallel Tools: Leverage parallel tool calling for independent operations
- Prompt Templates: Design reusable prompt templates with clear variable names for maintainability
- Variable Management: Use descriptive variable names and provide defaults where appropriate
- Version Control: Use prompt versioning to iterate on prompts without breaking existing integrations
Limitations
- Some advanced retrieval features may not be fully implemented
- Response management endpoints have limited functionality
- Conversation history is maintained only through
previous_response_id chaining
- Maximum context window depends on the model used
- Prompt templates must be pre-configured before use
- Maximum context window depends on the model used, including when configured in a prompt template.
Supported providers
OpenAI
Next-generation Responses API with full support for advanced conversational features, multi-turn interactions, and parallel tool calling.