Skip to content

016: Export Gemini CLI Metrics to GCP Cloud Monitoring

Problem

The agent workflows produce raw telemetry data (LLM API time, tool execution time, token usage, error rates) embedded in gemini-output artifacts, but this data is only accessible by downloading and parsing individual workflow run artifacts. There is no centralized dashboard for observing agent performance trends, detecting regressions, or alerting on anomalies.

Current telemetry is captured per-run but not exported:

MetricSourceCurrent State
LLM API timegemini-output artifactRaw data in artifact JSON
Tool execution time (shell)gemini-output artifactRaw data in artifact JSON
Token usage (total, cached %)gemini-output artifactRaw data in artifact JSON
API request count / errorsgemini-output artifactRaw data in artifact JSON
Loop detector timegemini-output artifactRaw data in artifact JSON
Workflow durationGitHub Actions UIPer-run only, no aggregation

See 001 for example telemetry data from fix agent runs.

What Should Happen

Agent telemetry should be exported to GCP Cloud Monitoring so that:

  1. Dashboards show per-agent metrics over time (review agent mean duration, fix agent cycle time, token spend)
  2. Alerts fire when metrics exceed thresholds (e.g., fix agent > 30 min, error rate > 5%)
  3. Trend analysis is possible without downloading artifacts (e.g., "did the last workflow change improve review agent speed?")

Scope

Metrics to Export

MetricTypeLabels
agent/llm_api_time_secondsGaugeagent, model, repo, run_id
agent/tool_execution_time_secondsGaugeagent, repo, run_id
agent/total_tokensGaugeagent, model, repo, run_id
agent/cached_token_ratioGaugeagent, model, repo, run_id
agent/api_requestsCounteragent, model, repo, run_id, status
agent/workflow_duration_secondsGaugeagent, repo, run_id

Export Mechanism

Options:

  1. Post-step in each workflow — parse gemini-output artifact, write custom metrics via gcloud monitoring CLI or Cloud Monitoring API
  2. Dedicated metrics workflow — triggered by workflow_run.completed, fetches artifacts from the completed run, exports metrics
  3. OpenTelemetry Collector — emit OTLP from a workflow step, route to Cloud Monitoring via GCP's OTLP endpoint

GCP Requirements

The service account used for metrics export needs:

  • roles/monitoring.metricWriter — write custom metrics
  • roles/logging.logWriter — (optional) structured log export
  • roles/cloudtrace.agent — (optional) trace export

These roles are additive to the existing roles/aiplatform.user and roles/modelarmor.user on the CI service account.

Dependencies

  • 015 — WIF auth for the metrics export step
  • 001 — defines the telemetry data format being exported