Internet-Draft X. Liang Intended status: Standards Track Shaanxi Rui Ding Guan Yan Expires: 7 December 2026 Industrial Co., Ltd. June 2026 Agent Health State An Observability Layer Between Agent Discovery and Governance draft-liang-agent-health-state-00 Abstract AI agents deployed on the Web require three capabilities to interoperate safely: discovering each other, assessing each other's current operational state, and governing each other's behavior over time. Discovery is addressed by A2A's /.well-known/agent.json endpoint. Behavior governance is addressed by the SOOS Progressive Trust model and its Trust Decay specification. Between these two layers lies a gap: no existing standard provides a lightweight, machine-readable signal for whether an agent is currently operational, responsive, and calibrated -- the operational health state that any consumer needs before deciding whether to interact with an agent, and that governance frameworks need as a freshness input for their calibration anchors. This document defines the Agent Health State specification: a /.well-known/agent-health endpoint and a structured health state response format that exposes an agent's operational status, response calibration metrics, and decay indicators. The specification is designed to be independently deployable (requiring no governance infrastructure), composable with A2A discovery, and consumable by SOOS/PT as calibration anchor freshness input. This document positions agent-health as the missing observability layer between agent discovery (A2A /.well-known/agent.json) and agent behavior governance (SOOS/PT Trust Decay Model), serving both as operational state for interaction decisions and as calibration anchor freshness input to verification gates. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 7 December 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions and Definitions . . . . . . . . . . . . . . . . 4 3. Problem Statement: The Observability Gap . . . . . . . . . 4 3.1. Discovery Without Health . . . . . . . . . . . . . . . 5 3.2. Governance Without Freshness . . . . . . . . . . . . . 5 3.3. Empirical Evidence . . . . . . . . . . . . . . . . . . 6 4. Agent Health State Specification . . . . . . . . . . . . . 6 4.1. Endpoint . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Response Format . . . . . . . . . . . . . . . . . . . 7 4.3. Health State Values . . . . . . . . . . . . . . . . . 8 4.4. Calibration Metrics . . . . . . . . . . . . . . . . . 9 4.5. Decay Indicators . . . . . . . . . . . . . . . . . . . 10 5. Relationship to Existing Standards . . . . . . . . . . . . 10 5.1. A2A Protocol . . . . . . . . . . . . . . . . . . . . . 10 5.2. SOOS Progressive Trust . . . . . . . . . . . . . . . . 11 5.3. Microsoft Agent SRE . . . . . . . . . . . . . . . . . . 11 5.4. Verification State (Krausz) . . . . . . . . . . . . . . 12 6. Well-Known URI Registration . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . 12 8. Privacy Considerations . . . . . . . . . . . . . . . . . . 13 9. IANA Considerations . . . . . . . . . . . . . . . . . . . 13 10. References . . . . . . . . . . . . . . . . . . . . . . . . 13 10.1. Normative References . . . . . . . . . . . . . . . . . 13 10.2. Informative References . . . . . . . . . . . . . . . . 14 Appendix A. Health State Decision Flow . . . . . . . . . . . . 15 Appendix B. Empirical Data: Agent Health at Scale . . . . . . . 15 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 16 1. Introduction The emerging Agent Web relies on three capabilities for safe interoperability: 1. Discovery: How does Agent A find Agent B? A2A addresses this with /.well-known/agent.json [A2A], providing an agent card that describes capabilities and endpoints. 2. Operational Health: Is Agent B currently alive, responsive, and producing calibrated outputs? No standard addresses this. 3. Behavior Governance: Should Agent A trust Agent B over time? SOOS/PT addresses this with Progressive Trust [I-D.sato-soos-pt], measuring behavioral properties across sessions. This document addresses capability (2): the operational health state that any agent consumer needs before initiating interaction. The gap is not theoretical. Empirical observation of 181,666 AI agents with public URLs found that 98.8% return no valid response (Appendix B). A consumer that discovers an agent via A2A has no standard mechanism to determine whether that agent is operational before committing resources to an interaction attempt. Agent Health State is designed as a composable middle layer: * Downstream from A2A discovery: after finding an agent, check its health before interacting. * Upstream from SOOS governance: provide calibration anchor freshness signals that governance frameworks can consume. * Independently deployable: requires no governance infrastructure, no identity framework, no credential issuance. An agent operator can implement agent-health with a single HTTP endpoint. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Agent Health State: A structured representation of an agent's current operational status, response calibration, and decay indicators, exposed at a well-known HTTP endpoint. Operational Status: The liveness and responsiveness state of an agent: healthy, degraded, unhealthy, or unknown. Response Calibration: The degree to which an agent's actual response behavior matches its declared capabilities. Measured as a ratio of successful interactions to total interaction attempts over a measurement window. Decay Indicator: A signal that an agent's operational quality is trending toward degraded or unhealthy states, derived from changes in calibration metrics over time. Calibration Anchor: A reference point (such as a verification credential, trust score, or capability declaration) whose validity depends on the agent's continued operational health. A calibration anchor becomes stale when the agent's health state degrades. 3. Problem Statement: The Observability Gap The current Agent Web standards landscape provides discovery (A2A) and governance (SOOS) but lacks an observability layer between them. This creates two concrete problems. 3.1. Discovery Without Health A2A's /.well-known/agent.json provides an agent card with capabilities, endpoints, and authentication requirements. It includes a /ping endpoint for basic liveness checks. However, /ping only answers "is the server process running?" It does not answer: * Is the agent producing calibrated responses consistent with its declared capabilities? * Has the agent's response quality degraded since its capabilities were declared? * Is the agent operating under reduced capacity (rate-limited, partially offline, model-downgraded)? A consumer that relies solely on /ping may interact with an agent that is technically "alive" but functionally degraded -- returning error responses, hallucinating, or operating with a downgraded model. The cost of discovering this through failed interactions is borne entirely by the consumer. 3.2. Governance Without Freshness SOOS/PT's Trust Decay Model [I-D.sato-soos-pt] Section 6 specifies that trust scores decay during inactivity periods. This is a behavioral decay model: it tracks whether an agent has demonstrated good behavior recently. However, SOOS/PT operates within a governance framework that assumes a Governing Enforcement Component (GEC) and GEC-signed Event Streams. For agents outside any governance framework -- the vast majority of agents deployed today -- there is no mechanism to signal operational freshness. Even within a governance framework, the GEC needs an input to determine whether its calibration anchors (verification credentials, trust scores) remain valid. If the agent's underlying model has been swapped or its response quality has silently degraded, the trust score derived from historical behavior may be stale. The governance framework needs an operational health signal to determine when calibration anchors require re-verification. 3.3. Empirical Evidence Observational data from a continuous monitoring system tracking AI agents with public URLs provides evidence of the problem's scale: * Of 181,666 agents with URL-accessible endpoints, only 2,201 (1.2%) returned valid responses. * Platform-specific patterns: GPTs on OpenAI's marketplace showed 0% survival (49/49 returning no valid response), while managed platforms like Meyo showed 100% survival (8/8 healthy). * The 98.8% failure rate includes agents that were discoverable (their agent cards or metadata were accessible) but non-operational. Discovery succeeded; interaction failed. This data demonstrates that discovery without health observability leads to predictable, widespread interaction failures. 4. Agent Health State Specification 4.1. Endpoint Agents implementing this specification MUST expose a health endpoint at: /.well-known/agent-health The endpoint MUST be accessible via HTTPS GET request and return a JSON response with Content-Type "application/json". The endpoint SHOULD be co-located with the agent's primary service URL. If the agent exposes an A2A agent card at /.well-known/agent.json, the health endpoint SHOULD be served from the same origin. The endpoint MUST NOT require authentication for basic health queries. An agent MAY provide extended health information that requires authentication, but the base health state MUST be accessible without credentials. 4.2. Response Format The response is a JSON object with the following structure: { "schema_version": "0.1.0", "agent_id": "string", "timestamp": "string", "health": { "status": "string", "last_healthy_at": "string", "uptime_seconds": number }, "calibration": { "response_ratio": number, "error_ratio": number, "latency_p50_ms": number, "latency_p99_ms": number, "measurement_window_seconds": number, "sample_count": number }, "decay": { "calibration_trend": "string", "days_since_model_change": number | null, "last_capability_update": "string" }, "extensions": {} } Field descriptions: schema_version: The version of this schema. Implementations MUST use semantic versioning. This document defines version "0.1.0". agent_id: A stable identifier for the agent. RECOMMENDED to use a URI matching the agent's A2A agent card URL or did:wba identifier. timestamp: The time at which this health state was computed, in ISO 8601 format with timezone. health: An object containing the agent's operational status. calibration: An object containing the agent's response calibration metrics. decay: An object containing indicators of operational quality trends. extensions: An optional object for implementation-specific extensions. Extensions MUST use a namespace prefix (e.g., "com.example.feature") to avoid collisions. 4.3. Health State Values The health.status field MUST be one of the following values: healthy: The agent is operational, responding within expected parameters, and producing calibrated outputs. Response ratio >= 0.95 over the measurement window. degraded: The agent is operational but exhibiting reduced quality. This may indicate rate limiting, partial service degradation, or model downgrading. Response ratio >= 0.50 and < 0.95, or latency p99 exceeding 3x the agent's declared baseline. unhealthy: The agent is responding but producing significantly degraded or erroneous outputs. Response ratio < 0.50, or the agent is returning errors for the majority of requests. unknown: The agent's health state cannot be determined. This is the default state for agents that have not implemented health monitoring, or when the monitoring system has insufficient data (sample_count < 10 in the measurement window). The health.last_healthy_at field records the last timestamp at which the agent's status was "healthy". This field enables consumers to distinguish between an agent that just became degraded and one that has been degraded for days. The health.uptime_seconds field records the number of seconds since the agent process was last restarted. A low uptime value combined with a degraded status may indicate a crash loop. 4.4. Calibration Metrics The calibration object provides quantitative metrics for the agent's response quality: response_ratio: The fraction of requests that received a valid (non-error) response within the measurement window. Range [0.0, 1.0]. error_ratio: The fraction of requests that resulted in error responses (4xx, 5xx, timeouts, or malformed responses). Range [0.0, 1.0]. latency_p50_ms: The 50th percentile response latency in milliseconds over the measurement window. latency_p99_ms: The 99th percentile response latency in milliseconds over the measurement window. measurement_window_seconds: The duration of the measurement window in seconds. RECOMMENDED value: 86400 (24 hours). Minimum value: 300 (5 minutes). sample_count: The number of requests sampled in the measurement window. If sample_count < 10, the calibration metrics SHOULD be treated as low-confidence. 4.5. Decay Indicators The decay object provides signals about the agent's operational quality trajectory: calibration_trend: One of "stable", "declining", or "improving". Determined by comparing the current measurement window's response_ratio to the previous window's. A change of more than 0.10 in either direction constitutes a trend. days_since_model_change: The number of days since the agent's underlying model was changed (e.g., model version update, downgrading to a smaller model, switching providers). null if this information is not available. A recent model change combined with a declining calibration_trend is a strong decay signal. last_capability_update: The timestamp of the last update to the agent's capability declaration (e.g., A2A agent card update). If an agent's capabilities were declared weeks ago but its calibration has been declining, the capability declaration may be stale. 5. Relationship to Existing Standards 5.1. A2A Protocol Agent Health State is designed as a natural complement to A2A discovery. The interaction pattern is: 1. Consumer retrieves /.well-known/agent.json to discover the agent's capabilities. 2. Consumer retrieves /.well-known/agent-health to assess the agent's current operational state. 3. If health.status is "healthy" or "degraded", consumer proceeds with interaction. If "unhealthy", consumer may retry later, fall back to an alternative agent, or report the health state to a monitoring system. A2A's /ping endpoint provides a binary alive/dead signal. Agent Health State provides a graduated signal (healthy/degraded/unhealthy) with quantitative calibration metrics. Implementations SHOULD list the agent-health endpoint in their A2A agent card as a capability. 5.2. SOOS Progressive Trust SOOS/PT [I-D.sato-soos-pt] defines a Trust Decay Model (Section 6) that measures behavioral trust across sessions. Agent Health State complements PT by providing an operational layer signal: * PT decay is behavioral: "has the agent demonstrated good behavior recently?" Measured from GEC-signed Event Stream entries. * Agent Health decay is operational: "is the agent's infrastructure functioning correctly?" Measured from HTTP endpoint observations. These two decay signals address different failure modes. An agent may have excellent behavioral trust (high PT scores) but be currently experiencing infrastructure degradation (degraded health status). Conversely, an agent may be operationally healthy but have declining behavioral trust due to poor decision-making. SOOS/PT implementations SHOULD consume agent-health signals as a freshness input for calibration anchors. If an agent's health transitions from healthy to degraded, the GEC SHOULD flag the agent's PT Score as potentially stale and trigger re-verification before the next authority elevation. 5.3. Microsoft Agent SRE Microsoft's Agent SRE framework defines health.status with values including "degraded" and "unhealthy" [MS-SRE]. These status names are consistent with those defined in Section 4.3, reflecting independent convergence on a similar operational model. Microsoft's framework is an internal Azure operational concept. This specification provides a standards-track definition that is vendor- neutral and independently deployable. 5.4. Verification State (Krausz) draft-krausz-verification-state-00 defines verification.* fields for agent identity verification. Verification state answers "is this agent who it claims to be?" Agent Health State answers "is this agent currently operating correctly?" A verification credential that was valid when issued may become unreliable if the agent's health has since degraded. Agent Health State provides the operational signal that verification consumers need to determine whether a verification credential's calibration anchor remains fresh. 6. Well-Known URI Registration This document requests IANA to register the following Well-Known URI in the "Well-Known URIs" registry defined in [RFC8615]: URI suffix: agent-health Change controller: IETF Specification document: This document Related information: The agent-health endpoint provides operational health state information for AI agents. It is designed to complement A2A's /.well-known/agent.json discovery endpoint. 7. Security Considerations Information Disclosure: The agent-health endpoint exposes operational metrics that could inform attack planning. Implementations MUST NOT expose internal infrastructure details (server versions, internal IPs, stack traces) in the health response. The calibration metrics are intentionally coarse (ratios and percentiles) to limit information leakage. Denial of Service: The health endpoint MUST be lightweight and non-blocking. Implementations SHOULD cache health state for at least 60 seconds and MUST NOT perform expensive computations or external queries to generate the response. Health Endpoint Spoofing: A compromised agent could report "healthy" status while operating maliciously. Agent Health State is an operational signal, not a security guarantee. Consumers MUST NOT treat a "healthy" status as proof of trustworthiness. Health state should be combined with identity verification [I-D.krausz-verification-state] and behavioral governance [I-D.sato-soos-pt] for a complete trust assessment. Rate Limiting: Consumers SHOULD NOT poll the health endpoint at high frequency. RECOMMENDED polling interval is 300 seconds (5 minutes) or longer. Implementations MAY rate-limit health endpoint requests. 8. Privacy Considerations The agent-health endpoint exposes aggregate operational metrics, not individual interaction data. Response ratios and latency percentiles cannot be reverse-engineered to identify specific users or requests. The agent_id field identifies the agent, not its users. Implementa- tions MUST NOT include user-identifying information in the health response. If an agent is a personal AI assistant associated with an identifiable natural person, the agent-health endpoint could reveal operational patterns (e.g., daily uptime cycles) that indirectly indicate the person's activity patterns. Implementations of such agents SHOULD set health.status at a coarse granularity (healthy/unhealthy only, omitting calibration and decay details) or require authentication for full health state access. 9. IANA Considerations This document requests IANA to register the "agent-health" well-known URI as described in Section 6. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017. [RFC8615] Nottingham, M. and E. Wilde, "Well-Known Uniform Resource Identifiers (URIs)", RFC 8615, May 2019. 10.2. Informative References [A2A] Google, "Agent-to-Agent Protocol", 2025, https://github.com/google/A2A/. [I-D.sato-soos-pt] Sato, T., "Progressive Trust (PT) for Agentic AI Governance Systems", draft-sato-soos-pt-01, May 2026. [I-D.krausz-verification-state] Krausz, J., "Agent Verification State", draft-krausz- verification-state-00, 2026. [MS-SRE] Microsoft, "Applying Site Reliability Engineering to Autonomous AI Agents", Microsoft Tech Community, 2026, . Appendix A. Health State Decision Flow Consumers SHOULD use the following decision flow when interpreting an agent's health state: 1. Retrieve /.well-known/agent-health. 2. If the request fails (network error, timeout, DNS failure): Treat as "unknown" status. Do not assume the agent is unhealthy; network issues may be transient. 3. If the response is valid: * health.status == "healthy": Proceed with interaction. Optionally check calibration.response_ratio for quality assurance. * health.status == "degraded": Proceed with caution. Consider setting lower timeouts, enabling fallback agents, or reducing request complexity. Check decay.calibration_trend to determine if the degradation is worsening. * health.status == "unhealthy": Avoid interaction. Consider retrying after a delay (RECOMMENDED: 5 minutes) or using an alternative agent. Check health.last_healthy_at to estimate the duration of the outage. * health.status == "unknown": No operational information is available. Proceed with default interaction behavior and monitor for errors. 4. If the agent is within a governance framework (e.g., SOOS): Feed the health state to the governance system as a calibration anchor freshness signal. A transition from "healthy" to "degraded" SHOULD trigger calibration anchor re-verification before the next trust elevation. Appendix B. Empirical Data: Agent Health at Scale Continuous monitoring of AI agents with public URLs reveals the current state of agent operational health in the wild: Dataset: 181,666 agents with URL-accessible endpoints (Tier 1 in the AgentRisk classification). Monitoring period: 30 days ending 2026-06-07. Results: * Healthy: 2,201 (1.2%) * Unhealthy/Non-responsive: 179,465 (98.8%) Platform-specific survival rates: * GPTs (OpenAI marketplace): 0% (49/49 non-responsive) * Meyo: 100% (8/8 healthy) * Other platforms: survival rates between 0.5% and 15% Interpretation: The extreme disparity between discovery (agents are listed in directories and marketplaces) and operational health (nearly all are non-responsive) validates the need for an observability layer between discovery and interaction. Limitations: This data covers only Tier 1 agents with public URLs. Enterprise agents behind firewalls, agents accessed via API keys, and agents on private networks are not represented. The 98.8% failure rate may be partially attributable to URL decay (marketplace URLs that become invalid after listing removal) rather than agent malfunction. Even accounting for this, the observability gap is significant. Author's Address Xiao Liang Shaanxi Rui Ding Guan Yan Industrial Co., Ltd. Xi'an, Shaanxi China Email: yingzi-159@163.com