Internet-Draft                                              X. Liang
Intended status: Standards Track               Shaanxi Rui Ding Guan Yan
Expires: 7 December 2026                       Industrial Co., Ltd.
                                                            June 2026

                            Agent Health State
          An Observability Layer Between Agent Discovery and Governance
                     draft-liang-agent-health-state-00

Abstract

   AI agents deployed on the Web require three capabilities to
   interoperate safely: discovering each other, assessing each other's
   current operational state, and governing each other's behavior over
   time.  Discovery is addressed by A2A's /.well-known/agent.json
   endpoint.  Behavior governance is addressed by the SOOS Progressive
   Trust model and its Trust Decay specification.  Between these two
   layers lies a gap: no existing standard provides a lightweight,
   machine-readable signal for whether an agent is currently
   operational,
   responsive, and calibrated -- the operational health state that any
   consumer needs before deciding whether to interact with an agent, and
   that governance frameworks need as a freshness input for their
   calibration anchors.

   This document defines the Agent Health State specification: a
   /.well-known/agent-health endpoint and a structured health state
   response format that exposes an agent's operational status, response
   calibration metrics, and decay indicators.  The specification is
   designed to be independently deployable (requiring no governance
   infrastructure), composable with A2A discovery, and consumable by
   SOOS/PT as calibration anchor freshness input.

   This document positions agent-health as the missing observability
   layer between agent discovery (A2A /.well-known/agent.json) and agent
   behavior governance (SOOS/PT Trust Decay Model), serving both as
   operational state for interaction decisions and as calibration anchor
   freshness input to verification gates.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 7 December 2026.

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Conventions and Definitions . . . . . . . . . . . . . . . .   4
   3.  Problem Statement: The Observability Gap  . . . . . . . . .   4
     3.1.  Discovery Without Health  . . . . . . . . . . . . . . .   5
     3.2.  Governance Without Freshness  . . . . . . . . . . . . .   5
     3.3.  Empirical Evidence  . . . . . . . . . . . . . . . . . .   6
   4.  Agent Health State Specification  . . . . . . . . . . . . .   6
     4.1.  Endpoint  . . . . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Response Format  . . . . . . . . . . . . . . . . . . .   7
     4.3.  Health State Values  . . . . . . . . . . . . . . . . .   8
     4.4.  Calibration Metrics  . . . . . . . . . . . . . . . . .   9
     4.5.  Decay Indicators  . . . . . . . . . . . . . . . . . . .  10
   5.  Relationship to Existing Standards  . . . . . . . . . . . .  10
     5.1.  A2A Protocol  . . . . . . . . . . . . . . . . . . . . .  10
     5.2.  SOOS Progressive Trust  . . . . . . . . . . . . . . . .  11
     5.3.  Microsoft Agent SRE . . . . . . . . . . . . . . . . . .  11
     5.4.  Verification State (Krausz) . . . . . . . . . . . . . .  12
   6.  Well-Known URI Registration  . . . . . . . . . . . . . . .  12
   7.  Security Considerations  . . . . . . . . . . . . . . . . .  12
   8.  Privacy Considerations  . . . . . . . . . . . . . . . . . .  13
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . .  13
   10. References  . . . . . . . . . . . . . . . . . . . . . . . .  13
     10.1.  Normative References  . . . . . . . . . . . . . . . . .  13
     10.2.  Informative References  . . . . . . . . . . . . . . . .  14
   Appendix A.  Health State Decision Flow  . . . . . . . . . . . .  15
   Appendix B.  Empirical Data: Agent Health at Scale  . . . . . . .  15
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . .  16

1.  Introduction

   The emerging Agent Web relies on three capabilities for safe
   interoperability:

   1.  Discovery: How does Agent A find Agent B?  A2A addresses this
       with /.well-known/agent.json [A2A], providing an agent card that
       describes capabilities and endpoints.

   2.  Operational Health: Is Agent B currently alive, responsive, and
       producing calibrated outputs?  No standard addresses this.

   3.  Behavior Governance: Should Agent A trust Agent B over time?
       SOOS/PT addresses this with Progressive Trust [I-D.sato-soos-pt],
       measuring behavioral properties across sessions.

   This document addresses capability (2): the operational health state
   that any agent consumer needs before initiating interaction.

   The gap is not theoretical.  Empirical observation of 181,666 AI
   agents with public URLs found that 98.8% return no valid response
   (Appendix B).  A consumer that discovers an agent via A2A has no
   standard mechanism to determine whether that agent is operational
   before committing resources to an interaction attempt.

   Agent Health State is designed as a composable middle layer:

   *  Downstream from A2A discovery: after finding an agent, check its
      health before interacting.

   *  Upstream from SOOS governance: provide calibration anchor
      freshness signals that governance frameworks can consume.

   *  Independently deployable: requires no governance infrastructure,
      no identity framework, no credential issuance.  An agent
      operator can implement agent-health with a single HTTP endpoint.

2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   Agent Health State:
      A structured representation of an agent's current operational
      status, response calibration, and decay indicators, exposed at
      a well-known HTTP endpoint.

   Operational Status:
      The liveness and responsiveness state of an agent: healthy,
      degraded, unhealthy, or unknown.

   Response Calibration:
      The degree to which an agent's actual response behavior matches
      its declared capabilities.  Measured as a ratio of successful
      interactions to total interaction attempts over a measurement
      window.

   Decay Indicator:
      A signal that an agent's operational quality is trending toward
      degraded or unhealthy states, derived from changes in
      calibration metrics over time.

   Calibration Anchor:
      A reference point (such as a verification credential, trust
      score, or capability declaration) whose validity depends on the
      agent's continued operational health.  A calibration anchor
      becomes stale when the agent's health state degrades.

3.  Problem Statement: The Observability Gap

   The current Agent Web standards landscape provides discovery (A2A)
   and governance (SOOS) but lacks an observability layer between them.
   This creates two concrete problems.

3.1.  Discovery Without Health

   A2A's /.well-known/agent.json provides an agent card with
   capabilities, endpoints, and authentication requirements.  It
   includes a /ping endpoint for basic liveness checks.

   However, /ping only answers "is the server process running?"  It does
   not answer:

   *  Is the agent producing calibrated responses consistent with its
      declared capabilities?

   *  Has the agent's response quality degraded since its capabilities
      were declared?

   *  Is the agent operating under reduced capacity (rate-limited,
      partially offline, model-downgraded)?

   A consumer that relies solely on /ping may interact with an agent
   that is technically "alive" but functionally degraded -- returning
   error responses, hallucinating, or operating with a downgraded model.
   The cost of discovering this through failed interactions is borne
   entirely by the consumer.

3.2.  Governance Without Freshness

   SOOS/PT's Trust Decay Model [I-D.sato-soos-pt] Section 6 specifies
   that trust scores decay during inactivity periods.  This is a
   behavioral decay model: it tracks whether an agent has demonstrated
   good behavior recently.

   However, SOOS/PT operates within a governance framework that assumes
   a Governing Enforcement Component (GEC) and GEC-signed Event
   Streams.  For agents outside any governance framework -- the vast
   majority of agents deployed today -- there is no mechanism to signal
   operational freshness.

   Even within a governance framework, the GEC needs an input to
   determine whether its calibration anchors (verification credentials,
   trust scores) remain valid.  If the agent's underlying model has been
   swapped or its response quality has silently degraded, the trust
   score derived from historical behavior may be stale.  The governance
   framework needs an operational health signal to determine when
   calibration anchors require re-verification.

3.3.  Empirical Evidence

   Observational data from a continuous monitoring system tracking AI
   agents with public URLs provides evidence of the problem's scale:

   *  Of 181,666 agents with URL-accessible endpoints, only 2,201
      (1.2%) returned valid responses.

   *  Platform-specific patterns: GPTs on OpenAI's marketplace showed
      0% survival (49/49 returning no valid response), while managed
      platforms like Meyo showed 100% survival (8/8 healthy).

   *  The 98.8% failure rate includes agents that were discoverable
      (their agent cards or metadata were accessible) but
      non-operational.  Discovery succeeded; interaction failed.

   This data demonstrates that discovery without health observability
   leads to predictable, widespread interaction failures.

4.  Agent Health State Specification

4.1.  Endpoint

   Agents implementing this specification MUST expose a health endpoint
   at:

      /.well-known/agent-health

   The endpoint MUST be accessible via HTTPS GET request and return a
   JSON response with Content-Type "application/json".

   The endpoint SHOULD be co-located with the agent's primary service
   URL.  If the agent exposes an A2A agent card at
   /.well-known/agent.json, the health endpoint SHOULD be served from
   the same origin.

   The endpoint MUST NOT require authentication for basic health
   queries.  An agent MAY provide extended health information that
   requires authentication, but the base health state MUST be accessible
   without credentials.

4.2.  Response Format

   The response is a JSON object with the following structure:

   {
     "schema_version": "0.1.0",
     "agent_id": "string",
     "timestamp": "string",
     "health": {
       "status": "string",
       "last_healthy_at": "string",
       "uptime_seconds": number
     },
     "calibration": {
       "response_ratio": number,
       "error_ratio": number,
       "latency_p50_ms": number,
       "latency_p99_ms": number,
       "measurement_window_seconds": number,
       "sample_count": number
     },
     "decay": {
       "calibration_trend": "string",
       "days_since_model_change": number | null,
       "last_capability_update": "string"
     },
     "extensions": {}
   }

   Field descriptions:

   schema_version: The version of this schema.  Implementations MUST
   use semantic versioning.  This document defines version "0.1.0".

   agent_id: A stable identifier for the agent.  RECOMMENDED to use a
   URI matching the agent's A2A agent card URL or did:wba identifier.

   timestamp: The time at which this health state was computed, in
   ISO 8601 format with timezone.

   health: An object containing the agent's operational status.

   calibration: An object containing the agent's response calibration
   metrics.

   decay: An object containing indicators of operational quality
   trends.

   extensions: An optional object for implementation-specific
   extensions.  Extensions MUST use a namespace prefix (e.g.,
   "com.example.feature") to avoid collisions.

4.3.  Health State Values

   The health.status field MUST be one of the following values:

   healthy: The agent is operational, responding within expected
   parameters, and producing calibrated outputs.  Response ratio >=
   0.95 over the measurement window.

   degraded: The agent is operational but exhibiting reduced quality.
   This may indicate rate limiting, partial service degradation, or
   model downgrading.  Response ratio >= 0.50 and < 0.95, or latency
   p99 exceeding 3x the agent's declared baseline.

   unhealthy: The agent is responding but producing significantly
   degraded or erroneous outputs.  Response ratio < 0.50, or the
   agent is returning errors for the majority of requests.

   unknown: The agent's health state cannot be determined.  This is
   the default state for agents that have not implemented health
   monitoring, or when the monitoring system has insufficient data
   (sample_count < 10 in the measurement window).

   The health.last_healthy_at field records the last timestamp at
   which the agent's status was "healthy".  This field enables
   consumers to distinguish between an agent that just became
   degraded and one that has been degraded for days.

   The health.uptime_seconds field records the number of seconds
   since the agent process was last restarted.  A low uptime value
   combined with a degraded status may indicate a crash loop.

4.4.  Calibration Metrics

   The calibration object provides quantitative metrics for the
   agent's response quality:

   response_ratio: The fraction of requests that received a valid
   (non-error) response within the measurement window.  Range [0.0,
   1.0].

   error_ratio: The fraction of requests that resulted in error
   responses (4xx, 5xx, timeouts, or malformed responses).  Range
   [0.0, 1.0].

   latency_p50_ms: The 50th percentile response latency in
   milliseconds over the measurement window.

   latency_p99_ms: The 99th percentile response latency in
   milliseconds over the measurement window.

   measurement_window_seconds: The duration of the measurement
   window in seconds.  RECOMMENDED value: 86400 (24 hours).
   Minimum value: 300 (5 minutes).

   sample_count: The number of requests sampled in the measurement
   window.  If sample_count < 10, the calibration metrics SHOULD be
   treated as low-confidence.

4.5.  Decay Indicators

   The decay object provides signals about the agent's operational
   quality trajectory:

   calibration_trend: One of "stable", "declining", or "improving".
   Determined by comparing the current measurement window's
   response_ratio to the previous window's.  A change of more than
   0.10 in either direction constitutes a trend.

   days_since_model_change: The number of days since the agent's
   underlying model was changed (e.g., model version update,
   downgrading to a smaller model, switching providers).  null if
   this information is not available.  A recent model change
   combined with a declining calibration_trend is a strong decay
   signal.

   last_capability_update: The timestamp of the last update to the
   agent's capability declaration (e.g., A2A agent card update).
   If an agent's capabilities were declared weeks ago but its
   calibration has been declining, the capability declaration may be
   stale.

5.  Relationship to Existing Standards

5.1.  A2A Protocol

   Agent Health State is designed as a natural complement to A2A
   discovery.  The interaction pattern is:

   1.  Consumer retrieves /.well-known/agent.json to discover the
       agent's capabilities.

   2.  Consumer retrieves /.well-known/agent-health to assess the
       agent's current operational state.

   3.  If health.status is "healthy" or "degraded", consumer proceeds
       with interaction.  If "unhealthy", consumer may retry later,
       fall back to an alternative agent, or report the health state
       to a monitoring system.

   A2A's /ping endpoint provides a binary alive/dead signal.  Agent
   Health State provides a graduated signal (healthy/degraded/unhealthy)
   with quantitative calibration metrics.  Implementations SHOULD list
   the agent-health endpoint in their A2A agent card as a capability.

5.2.  SOOS Progressive Trust

   SOOS/PT [I-D.sato-soos-pt] defines a Trust Decay Model (Section 6)
   that measures behavioral trust across sessions.  Agent Health State
   complements PT by providing an operational layer signal:

   *  PT decay is behavioral: "has the agent demonstrated good behavior
      recently?"  Measured from GEC-signed Event Stream entries.

   *  Agent Health decay is operational: "is the agent's infrastructure
      functioning correctly?"  Measured from HTTP endpoint observations.

   These two decay signals address different failure modes.  An agent
   may have excellent behavioral trust (high PT scores) but be
   currently experiencing infrastructure degradation (degraded health
   status).  Conversely, an agent may be operationally healthy but have
   declining behavioral trust due to poor decision-making.

   SOOS/PT implementations SHOULD consume agent-health signals as a
   freshness input for calibration anchors.  If an agent's health
   transitions from healthy to degraded, the GEC SHOULD flag the
   agent's PT Score as potentially stale and trigger re-verification
   before the next authority elevation.

5.3.  Microsoft Agent SRE

   Microsoft's Agent SRE framework defines health.status with values
   including "degraded" and "unhealthy" [MS-SRE].  These status names
   are consistent with those defined in Section 4.3, reflecting
   independent convergence on a similar operational model.

   Microsoft's framework is an internal Azure operational concept.  This
   specification provides a standards-track definition that is vendor-
   neutral and independently deployable.

5.4.  Verification State (Krausz)

   draft-krausz-verification-state-00 defines verification.* fields
   for agent identity verification.  Verification state answers "is
   this agent who it claims to be?"  Agent Health State answers "is
   this agent currently operating correctly?"

   A verification credential that was valid when issued may become
   unreliable if the agent's health has since degraded.  Agent Health
   State provides the operational signal that verification consumers
   need to determine whether a verification credential's calibration
   anchor remains fresh.

6.  Well-Known URI Registration

   This document requests IANA to register the following Well-Known URI
   in the "Well-Known URIs" registry defined in [RFC8615]:

   URI suffix: agent-health

   Change controller: IETF

   Specification document: This document

   Related information: The agent-health endpoint provides operational
   health state information for AI agents.  It is designed to complement
   A2A's /.well-known/agent.json discovery endpoint.

7.  Security Considerations

   Information Disclosure: The agent-health endpoint exposes operational
   metrics that could inform attack planning.  Implementations MUST NOT
   expose internal infrastructure details (server versions, internal
   IPs, stack traces) in the health response.  The calibration metrics
   are intentionally coarse (ratios and percentiles) to limit
   information leakage.

   Denial of Service: The health endpoint MUST be lightweight and
   non-blocking.  Implementations SHOULD cache health state for at
   least 60 seconds and MUST NOT perform expensive computations or
   external queries to generate the response.

   Health Endpoint Spoofing: A compromised agent could report
   "healthy" status while operating maliciously.  Agent Health State
   is an operational signal, not a security guarantee.  Consumers MUST
   NOT treat a "healthy" status as proof of trustworthiness.  Health
   state should be combined with identity verification
   [I-D.krausz-verification-state] and behavioral governance
   [I-D.sato-soos-pt] for a complete trust assessment.

   Rate Limiting: Consumers SHOULD NOT poll the health endpoint at high
   frequency.  RECOMMENDED polling interval is 300 seconds (5 minutes)
   or longer.  Implementations MAY rate-limit health endpoint requests.

8.  Privacy Considerations

   The agent-health endpoint exposes aggregate operational metrics, not
   individual interaction data.  Response ratios and latency
   percentiles cannot be reverse-engineered to identify specific users
   or requests.

   The agent_id field identifies the agent, not its users.  Implementa-
   tions MUST NOT include user-identifying information in the health
   response.

   If an agent is a personal AI assistant associated with an
   identifiable natural person, the agent-health endpoint could reveal
   operational patterns (e.g., daily uptime cycles) that indirectly
   indicate the person's activity patterns.  Implementations of such
   agents SHOULD
   set health.status at a coarse granularity (healthy/unhealthy only,
   omitting calibration and decay details) or require authentication for
   full health state access.

9.  IANA Considerations

   This document requests IANA to register the "agent-health" well-known
   URI as described in Section 6.

10.  References

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, May 2017.

   [RFC8615]  Nottingham, M. and E. Wilde, "Well-Known Uniform
              Resource Identifiers (URIs)", RFC 8615, May 2019.

10.2.  Informative References

   [A2A]      Google, "Agent-to-Agent Protocol", 2025,
              https://github.com/google/A2A/.

   [I-D.sato-soos-pt]
              Sato, T., "Progressive Trust (PT) for Agentic AI
              Governance Systems", draft-sato-soos-pt-01, May 2026.

   [I-D.krausz-verification-state]
              Krausz, J., "Agent Verification State", draft-krausz-
              verification-state-00, 2026.

   [MS-SRE]   Microsoft, "Applying Site Reliability Engineering to
              Autonomous AI Agents", Microsoft Tech Community, 2026,
              <https://techcommunity.microsoft.com/blog/
              linuxandopensourceblog/applying-site-reliability-
              engineering-to-autonomous-ai-agents/4521357>.

Appendix A.  Health State Decision Flow

   Consumers SHOULD use the following decision flow when interpreting
   an agent's health state:

   1.  Retrieve /.well-known/agent-health.

   2.  If the request fails (network error, timeout, DNS failure):
       Treat as "unknown" status.  Do not assume the agent is
       unhealthy; network issues may be transient.

   3.  If the response is valid:

       *  health.status == "healthy": Proceed with interaction.
          Optionally check calibration.response_ratio for quality
          assurance.

       *  health.status == "degraded": Proceed with caution.  Consider
          setting lower timeouts, enabling fallback agents, or
          reducing request complexity.  Check decay.calibration_trend
          to determine if the degradation is worsening.

       *  health.status == "unhealthy": Avoid interaction.  Consider
          retrying after a delay (RECOMMENDED: 5 minutes) or using an
          alternative agent.  Check health.last_healthy_at to estimate
          the duration of the outage.

       *  health.status == "unknown": No operational information is
          available.  Proceed with default interaction behavior and
          monitor for errors.

   4.  If the agent is within a governance framework (e.g., SOOS):
       Feed the health state to the governance system as a calibration
       anchor freshness signal.  A transition from "healthy" to
       "degraded" SHOULD trigger calibration anchor re-verification
       before the next trust elevation.

Appendix B.  Empirical Data: Agent Health at Scale

   Continuous monitoring of AI agents with public URLs reveals the
   current state of agent operational health in the wild:

   Dataset: 181,666 agents with URL-accessible endpoints (Tier 1 in
   the AgentRisk classification).  Monitoring period: 30 days ending
   2026-06-07.

   Results:

   *  Healthy: 2,201 (1.2%)
   *  Unhealthy/Non-responsive: 179,465 (98.8%)

   Platform-specific survival rates:

   *  GPTs (OpenAI marketplace): 0% (49/49 non-responsive)
   *  Meyo: 100% (8/8 healthy)
   *  Other platforms: survival rates between 0.5% and 15%

   Interpretation: The extreme disparity between discovery (agents are
   listed in directories and marketplaces) and operational health
   (nearly all are non-responsive) validates the need for an
   observability layer between discovery and interaction.

   Limitations: This data covers only Tier 1 agents with public URLs.
   Enterprise agents behind firewalls, agents accessed via API keys, and
   agents on private networks are not represented.  The 98.8% failure
   rate may be partially attributable to URL decay (marketplace URLs
   that become invalid after listing removal) rather than agent
   malfunction.  Even accounting for this, the observability gap is
   significant.

Author's Address

   Xiao Liang
   Shaanxi Rui Ding Guan Yan Industrial Co., Ltd.
   Xi'an, Shaanxi
   China
   Email: yingzi-159@163.com