Skip to main content

2 posts tagged with "v0.73.0"

View All Tags

Chat Sessions in Observability

Overview

Chat sessions bring conversation-level observability to Agenta. You can now group related traces from multi-turn conversations together, making it easy to analyze complete user interactions rather than individual requests.

This feature is essential for debugging chatbots, AI assistants, and any application with multi-turn conversations. You get visibility into the entire conversation flow, including costs, latency, and intermediate steps.

Key Capabilities

  • Automatic Grouping: All traces with the same ag.session.id attribute are automatically grouped together
  • Session Analytics: Track total cost, latency, and token usage per conversation
  • Session Browser: Dedicated UI showing all sessions with first input, last output, and key metrics
  • Session Drawer: Detailed view of all traces within a session with parent-child relationships
  • Real-time Monitoring: Auto-refresh mode for monitoring active conversations

How to Use Sessions

Using the Python SDK

Add session tracking to your application with one line of code:

import agenta as ag

# Initialize Agenta
ag.init()

# Store the session ID for all subsequent traces
ag.tracing.store_session(session_id="conversation_123")

# Your LLM calls are automatically tracked with this session
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)

Using the Chat Run Endpoint

You can also instrument sessions when calling Agenta-managed prompts via the /chat/run endpoint:

import agenta as ag

# Initialize the Agenta client
agenta = ag.Agenta(api_key="your_api_key")

# Call the chat endpoint with session tracking
response = agenta.run(
base_id="your_base_id",
environment="production",
inputs={
"chat_history": [
{"role": "user", "content": "What is the weather like?"}
]
},
# Add session metadata to group related conversations
metadata={
"ag.session.id": "user_456_conv_789"
}
)

# Follow-up in the same session
follow_up = agenta.run(
base_id="your_base_id",
environment="production",
inputs={
"chat_history": [
{"role": "user", "content": "What is the weather like?"},
{"role": "assistant", "content": response["message"]},
{"role": "user", "content": "What about tomorrow?"}
]
},
metadata={
"ag.session.id": "user_456_conv_789" # Same session ID
}
)

Using OpenTelemetry

If you're using OpenTelemetry for instrumentation:

import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('my-app');
const span = tracer.startSpan('chat-interaction');

// Add session ID as a span attribute
span.setAttribute('ag.session.id', 'conversation_123');

// Your code here
span.end();

The UI automatically detects session IDs and groups traces together. You can use any format for session IDs: UUIDs, composite IDs like user_123_session_456, or custom formats.

Use Cases

Debug Chatbots

See the complete conversation flow when users report issues. Instead of viewing isolated requests, you can analyze the entire conversation context and understand why a particular response was generated.

Monitor Multi-turn Agents

Track how your agent handles follow-up questions and maintains context across turns. See which turns are expensive, identify where latency spikes occur, and understand conversation patterns.

Analyze Conversation Costs

Understand which conversations are expensive and why. Session-level cost tracking helps you identify optimization opportunities and set appropriate pricing for your application.

Optimize Performance

Identify latency issues across entire conversations, not just single requests. See which conversational patterns lead to performance problems and optimize accordingly.

Getting Started

Learn more in our documentation:

What's Next

We're continuing to enhance session tracking with upcoming features like session-level annotations, session comparisons, and automated session analysis.

JSON Multi-Field Match Evaluator

The JSON Multi-Field Match evaluator lets you validate multiple fields in JSON outputs simultaneously. This makes it ideal for entity extraction tasks where you need to check if your model correctly extracted name, email, address, and other structured fields.

What is JSON Multi-Field Match?

This evaluator compares specific fields between your model's JSON output and the expected JSON values from your test set. Unlike the old JSON Field Match evaluator (which only checked one field), this evaluator handles any number of fields at once.

For each field you configure, the evaluator produces a separate score (either 1 for a match or 0 for no match). It also calculates an aggregate score showing the percentage of fields that matched correctly.

Key Features

Multiple Field Comparison

Configure as many fields as you need to validate. The evaluator checks each field independently and reports results for all of them.

If you're extracting user information, you might configure fields like name, email, phone, and address.city. Each field gets its own score, so you can see exactly which extractions succeeded and which failed.

Three Path Format Options

The evaluator supports three different ways to specify field paths:

Dot notation (recommended for most cases):

  • Simple fields: name, email
  • Nested fields: user.address.city
  • Array indices: items.0.name

JSON Path (standard JSON Path syntax):

  • Simple fields: $.name, $.email
  • Nested fields: $.user.address.city
  • Array indices: $.items[0].name

JSON Pointer (RFC 6901):

  • Simple fields: /name, /email
  • Nested fields: /user/address/city
  • Array indices: /items/0/name

All three formats work the same way. Use whichever matches your existing tooling or personal preference.

Nested Field and Array Support

Access deeply nested fields and array elements without restrictions. The evaluator handles any level of nesting.

Per-Field Scoring

See individual scores for each configured field in the evaluation results. This granular view helps you identify which specific extractions are working well and which need improvement.

Aggregate Score

The aggregate score shows the percentage of matching fields. If you configure five fields and three match, the aggregate score is 0.6 (or 60%).

Example

Suppose you're building an entity extraction model that pulls contact information from text. Your ground truth looks like this:

{
"name": "John Doe",
"email": "john@example.com",
"phone": "555-1234",
"address": {
"city": "New York",
"zip": "10001"
}
}

Your model produces this output:

{
"name": "John Doe",
"email": "jane@example.com",
"phone": "555-1234",
"address": {
"city": "New York",
"zip": "10002"
}
}

You configure these fields: ["name", "email", "phone", "address.city", "address.zip"]

The evaluator returns:

FieldScore
name1.0
email0.0
phone1.0
address.city1.0
address.zip0.0
aggregate_score0.6

You can see immediately that the model got the email and zip code wrong but correctly extracted the name, phone, and city.

Auto-Detection in the UI

When you configure the evaluator in the web interface, Agenta automatically detects available fields from your test set data. Click to add or remove fields using a tag-based interface. This makes setup fast and reduces configuration errors.

Migration from JSON Field Match

The old JSON Field Match evaluator only supported checking a single field. If you're using it, consider migrating to JSON Multi-Field Match to gain:

  • Support for multiple fields in one evaluator
  • Per-field scoring for detailed analysis
  • Aggregate scoring for overall performance tracking
  • Nested field and array support

Existing JSON Field Match configurations continue to work. We recommend migrating to JSON Multi-Field Match for new evaluations.

Next Steps

Learn more about configuring and using the JSON Multi-Field Match evaluator in the Classification and Entity Extraction Evaluators documentation.