Google Gemini - PandaProbe

Installation

pip install "pandaprobe[gemini]"

uv add "pandaprobe[gemini]"

Setup

from pandaprobe.wrappers import wrap_gemini
from google import genai

client = wrap_gemini(genai.Client())

Generate content

Span name: "gemini-generate", SpanKind: LLM

response = client.models.generate_content(
    model="gemini-3.1-flash-preview",
    contents="Explain quantum computing.",
    config={"temperature": 0.7},
)
print(response.text)

What gets traced

Input: contents plus system_instruction normalized to messages format (role "model" mapped to "assistant")
Output: answer text (non-thought parts)
Model name
Token usage
Model parameters: temperature, top_p, top_k, max_output_tokens, stop_sequences, and related fields
Thinking or reasoning parts stored in metadata as reasoning_summary

Streaming and async

stream = client.models.generate_content_stream(
    model="gemini-3.1-flash-preview",
    contents="Hello!",
)
for chunk in stream:
    print(chunk.text, end="")

All four methods are traced: synchronous blocking, synchronous streaming, asynchronous blocking, asynchronous streaming.

Sync vs async streaming

Use models.generate_content_stream for synchronous iterators and aio.models.generate_content_stream with async for when the call site is already async. The wrapper emits the same span fields in both cases; only the execution model differs.

Thinking mode

When using Gemini’s thinking mode, thought parts are automatically separated from answer parts. Thought content is stored in metadata as reasoning_summary, while the span output contains only the answer text.

Token usage mapping

Gemini Field	PandaProbe Field
`prompt_token_count`	`prompt_tokens`
`candidates_token_count`	`completion_tokens`
`total_token_count`	`total_tokens`
`thoughts_token_count`	`reasoning_tokens`
`cached_content_token_count`	`cache_read_tokens`

​Installation

​Setup

​Generate content

​Streaming and async

​Thinking mode

​Token usage mapping

Installation

Setup

Generate content

Streaming and async

Thinking mode

Token usage mapping