Installation
Setup
Generate content
Span name:"gemini-generate", SpanKind: LLM
- Input:
contentsplussystem_instructionnormalized to messages format (role"model"mapped to"assistant") - Output: answer text (non-thought parts)
- Model name
- Token usage
- Model parameters:
temperature,top_p,top_k,max_output_tokens,stop_sequences, and related fields - Thinking or reasoning parts stored in metadata as
reasoning_summary
Streaming and async
All four methods are traced: synchronous blocking, synchronous streaming, asynchronous blocking, asynchronous streaming.
Sync vs async streaming
Sync vs async streaming
Use
models.generate_content_stream for synchronous iterators and aio.models.generate_content_stream with async for when the call site is already async. The wrapper emits the same span fields in both cases; only the execution model differs.Thinking mode
When using Gemini’s thinking mode, thought parts are automatically separated from answer parts. Thought content is stored in metadata asreasoning_summary, while the span output contains only the answer text.
Token usage mapping
| Gemini Field | PandaProbe Field |
|---|---|
prompt_token_count | prompt_tokens |
candidates_token_count | completion_tokens |
total_token_count | total_tokens |
thoughts_token_count | reasoning_tokens |
cached_content_token_count | cache_read_tokens |
