Skip to main content

Installation

pip install pandaprobe[gemini]

Setup

from pandaprobe.wrappers import wrap_gemini
from google import genai

client = wrap_gemini(genai.Client())

Generate content

Span name: "gemini-generate", SpanKind: LLM
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain quantum computing.",
    config={"temperature": 0.7},
)
print(response.text)
What gets traced
  • Input: contents plus system_instruction normalized to messages format (role "model" mapped to "assistant")
  • Output: answer text (non-thought parts)
  • Model name
  • Token usage
  • Model parameters: temperature, top_p, top_k, max_output_tokens, stop_sequences, and related fields
  • Thinking or reasoning parts stored in metadata as reasoning_summary

Streaming and async

stream = client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Hello!",
)
for chunk in stream:
    print(chunk.text, end="")
All four methods are traced: synchronous blocking, synchronous streaming, asynchronous blocking, asynchronous streaming.
Use models.generate_content_stream for synchronous iterators and aio.models.generate_content_stream with async for when the call site is already async. The wrapper emits the same span fields in both cases; only the execution model differs.

Thinking mode

When using Gemini’s thinking mode, thought parts are automatically separated from answer parts. Thought content is stored in metadata as reasoning_summary, while the span output contains only the answer text.

Token usage mapping

Gemini FieldPandaProbe Field
prompt_token_countprompt_tokens
candidates_token_countcompletion_tokens
total_token_counttotal_tokens
thoughts_token_countreasoning_tokens
cached_content_token_countcache_read_tokens