Skip to main content

Installation

pip install pandaprobe[openai]

Setup

from pandaprobe.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())
Works with both synchronous and asynchronous clients; use the same wrap_openai entry point.

Chat Completions API

Span name: "openai-chat", SpanKind: LLM
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing."},
    ],
    temperature=0.7,
)
What gets traced
  • Input: messages array
  • Output: assistant message
  • Model name
  • Token usage: prompt_tokens, completion_tokens, total_tokens, plus detail fields (for example reasoning_tokens from completion_tokens_details)
  • Model parameters: temperature, top_p, max_tokens, and other safe parameters only

Streaming

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Streaming is fully supported. The wrapper records completion_start_time on the first chunk for time-to-first-token tracking. Chunks are reduced to a single response for the span output.

Responses API

Span name: "openai-response", SpanKind: LLM
response = client.responses.create(
    model="gpt-4o",
    instructions="You are a helpful assistant.",
    input="What is the capital of France?",
)
What gets traced
  • Input: instructions plus input, normalized to messages format
  • Output: response output items
  • Token usage: input_tokens mapped to prompt tokens, output_tokens mapped to completion tokens, plus detail fields
  • Reasoning summaries extracted from reasoning output items
  • Model parameters: max_output_tokens, temperature, top_p, reasoning, and related fields

Tool calls (Responses API)

Built-in tools such as web_search, file_search, and code_interpreter are automatically traced as child spans with SpanKind TOOL:
response = client.responses.create(
    model="gpt-4o",
    input="Search the web for PandaProbe",
    tools=[{"type": "web_search"}],
)
Each tool invocation produces a child TOOL span with the tool type as the span name (for example "web_search_call", "function_call"). Function calls (function_call items) are also captured as TOOL child spans with arguments as input and results as output.

Token usage mapping

OpenAI FieldPandaProbe Field
prompt_tokensprompt_tokens
completion_tokenscompletion_tokens
total_tokenstotal_tokens
completion_tokens_details.reasoning_tokensreasoning_tokens
(Responses) input_tokensprompt_tokens
(Responses) output_tokenscompletion_tokens
(Responses) input_tokens_details.cached_tokenscache_read_tokens
(Responses) output_tokens_details.reasoning_tokensreasoning_tokens
Chat Completions and Responses return different usage object shapes from the SDK. The wrapper normalizes both into the PandaProbe fields in this table; do not assume raw OpenAI field names are identical across APIs when reading span payloads in custom exporters.