Network Working Group Z. Chang Internet-Draft J. Li Intended status: Informational Z. Cao Expires: 8 June 2026 Huawei 5 December 2025 A Token-efficient Data Layer for Agentic Communication draft-chang-agent-token-efficient-01 Abstract Agentic communication fundamentally differs from traditional machine communication in that its outputs are recursively consumed and interpreted by Large Language Models (LLMs). This unique characteristic makes efficient use of the model's limited context a critical requirement. However, current agent communication protocols - such as the Model Context Protocol (MCP) and Agent-to-Agent (A2A) - often suffer from token or context bloat, caused by redundant schema definitions, verbose message structures, and inefficient capability exchanges. As a result, a substantial number of tokens are unnecessarily consumed within the model’s context window. To address these issues, this draft defines the Agentic Data Optimization Layer (ADOL), a general and foundational data-layer for agent communication. ADOL introduces a set of backward-compatible optimizations, including schema deduplication through JSON references, adaptive inclusion of optional fields, controllable response verbosity, and retrieval-based selection mechanisms. Collectively, these mechanisms reduce token consumption, enhance context efficiency, and provide a scalable foundation for future agent communication frameworks. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Chang, et al. Expires 8 June 2026 [Page 1] Internet-Draft Abbreviated Title December 2025 This Internet-Draft will expire on 8 June 2026. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Token Bloat Problem in A2A (Agent-Agent Communication) . 3 1.2. Token Bloat Problem in MCP (Agent-Tool Communication) . . 3 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 4 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Repetitive Schema Across Tools . . . . . . . . . . . . . 4 2.2. Excessive Optional Content . . . . . . . . . . . . . . . 5 2.3. Overly Long Server Responses . . . . . . . . . . . . . . 5 2.4. Inefficient Tool Selection . . . . . . . . . . . . . . . 6 3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Schema Deduplication Using JSON References . . . . . . . 7 3.2. Adaptive Optional Field Inclusion . . . . . . . . . . . . 8 3.3. Controllable Response Verbosity . . . . . . . . . . . . . 8 3.4. Tool Selection Mechanism . . . . . . . . . . . . . . . . 9 4. Specification . . . . . . . . . . . . . . . . . . . . . . . . 10 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.1. Normative References . . . . . . . . . . . . . . . . . . 11 7.2. Informative References . . . . . . . . . . . . . . . . . 11 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 1. Introduction Chang, et al. Expires 8 June 2026 [Page 2] Internet-Draft Abbreviated Title December 2025 1.1. Token Bloat Problem in A2A (Agent-Agent Communication) The Agent-to-Agent (A2A) protocol defines structured message exchanges among multiple AI agents to support coordination, collaboration, and task delegation. It provides a semantic framework through which agents can advertise their capabilities, exchange intermediate reasoning results, and invoke each other's functions in a standardized manner. A2A communication typically uses JSON-based message structures to ensure interoperability across heterogeneous agents. However, as the number and complexity of agents increase, message size and redundancy also grow. Agents often repeatedly exchange static or overlapping information (such as capability descriptions, metadata, or schema definitions) across multiple rounds of interaction. This repetition leads to a phenomenon known as token bloat, where verbose or redundant message components consume excessive tokens in the LLM's context window. Token bloat degrades reasoning efficiency, increases latency, and elevates operational cost, particularly in large-scale or multi-agent deployments. These challenges highlight the need for a unified and optimized data layer that reduces redundancy while preserving interoperability and semantic integrity. 1.2. Token Bloat Problem in MCP (Agent-Tool Communication) The Model Context Protocol (MCP) defines a standardized interface for communication between Large Language Models (LLMs) and external tools or data sources. It allows agents to describe, list, and invoke tools in a structured and interoperable manner. MCP follows a client-server architecture consisting of a client (which connects to the MCP server), a server (which hosts tool definitions and logic), and a host (which coordinates the overall interaction). Conceptually, MCP consists of two layers: 1. the data layer, which defines JSON-RPC-based primitives such as tools, resources, prompts, and notifications; and 2. the transport layer, which handles connection establishment, message framing, and authorization. Chang, et al. Expires 8 June 2026 [Page 3] Internet-Draft Abbreviated Title December 2025 As the number of MCP servers/tools increases, token bloat problem also emerges. Each tool's detailed schema, including optional fields and human-readable descriptions, is transmitted to the model during reasoning. Overlapping schemas, unfiltered tool lists, and verbose outputs also contribute to redundant token usage. Though protocol correctness is unaffected, efficiency, latency, and cost are negatively impacted. These limitations motivate the design of a more efficient data layer. 1.3. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Problem Statement Agentic communication introduces unique efficiency challenges because all exchanged messages are recursively processed by Large Language Models (LLMs). Every transmitted token consumes part of the model's limited context window, making redundancy and verbosity directly detrimental to reasoning efficiency and operational cost. Across current agent protocols - such as the [MCP] and [A2A] - four common inefficiencies are observed in the data-layer design: repetitive schemas, excessive optional content, verbose responses, and inefficient capability selection. Together, these issues cause token and context bloat, limiting scalability and responsiveness in multi-agent systems. 2.1. Repetitive Schema Across Tools When the LLM requests a list of tools from the server, the server returns a list of tools that includes the tool schema for all tools. However, tools hosted on the same MCP server often share similar or identical schema components, such as parameters or output schemas. These recurring elements not only introduce unnecessary repetition but also waste tokens when provided to LLM. We have conducted an experiment to analyze the duplicate content in the schemas of 60 tools within the official GitHub MCP server. According to our results, the repetition rate of some contents within the 60 tool schemas, such as Chang, et al. Expires 8 June 2026 [Page 4] Internet-Draft Abbreviated Title December 2025 { "repo": { "description": "Repository name", "type": "string" } } Figure 1: Repetitive Schema Example reached 9.84%, indicating that it is necessary to improve the tool schemas to reduce such repetitive fields. 2.2. Excessive Optional Content The MCP tool schema currently includes a variety of optional fields, such as output schemas and metadata. In many cases, these fields are not essential for successful tool invocation. For instance, the output schema is primarily intended to help the LLM interpret and structure the server’s response. However, when the server’s output is already concise and semantically clear, the value of providing an additional output schema becomes marginal. From the server’s standpoint, these fields are optional; yet, once transmitted, they are fully processed and tokenized by the LLM, contributing directly to prompt length and token overhead. At present, the client has no mechanism to indicate whether such optional content should be included, resulting in unnecessary data exchange and reduced token efficiency. 2.3. Overly Long Server Responses Under the current MCP architecture, the server is responsible for generating outputs that conform to its declared output schema. However, in many scenarios, this design results in overly verbose responses that exceed the LLM’s actual informational needs. For example, when a weather-reporting tool provides detailed structured data—including temperature, humidity, wind speed, and visibility—an LLM may only require a subset of this information, such as temperature and humidity. Since the server lacks awareness of the LLM’s contextual requirements, it always returns the complete output defined by the schema. Consequently, the LLM must parse and tokenize all fields, even those irrelevant to the current prompt. This behavior introduces unnecessary token consumption and increases computational cost, particularly in use cases where concise responses are sufficient. Chang, et al. Expires 8 June 2026 [Page 5] Internet-Draft Abbreviated Title December 2025 2.4. Inefficient Tool Selection In practice, a service provider typically exposes a single MCP server that aggregates multiple tools under one endpoint. These tools may span heterogeneous functional domains (e.g., translation, data retrieval, summarization), all of which are simultaneously discoverable by the MCP client. While this design simplifies server- side deployment and management, it introduces inefficiency on the client side. As the number of available tools grows, agents must evaluate and compare an increasingly large set of tool descriptions to determine the most relevant one. This process not only increases computational overhead but also leads to selection ambiguity, especially when multiple tools exhibit overlapping functionalities or verbose metadata. Empirical observations indicate that excessively detailed tool descriptions, while intended to enhance interpretability, can paradoxically increase token consumption during parsing and prompt formation. Moreover, the expanded search space degrades the accuracy and efficiency of tool selection, particularly in real-time or resource-constrained scenarios. 3. Solution Overview +--------------------------------------------------------+ | Agentic Data Optimization Layer (ADOL) | +--------------------------------------------------------+ +--------+ +---------------+ +-----------+ | | | | | | | LLM | | Client | B&D.Tool/list | Server | | | | +-------------->+ | | | | | --short --tag | | | | | | | | | | Curated List | |A.Curated List | | | <---------------+ <---------------+ | | | | | | | | | Tool Call | | Tool Call | | | +-------------->+ +-------------->+ | | |C.requireOutput| | | | | | | | | | | | Response | | Response | | | <---------------+ requireOutput <---------------+ | | | (Filtered) | | | | +--------+ +---------------+ +-----------+ Figure 2: Typical workflow of ADOL Chang, et al. Expires 8 June 2026 [Page 6] Internet-Draft Abbreviated Title December 2025 To systematically address the inefficiencies identified above, this draft introduces the Agentic Data Optimization Layer (ADOL) -- a unified and backward-compatible enhancement to existing agentic data layers. Rather than focusing on a single protocol, ADOL provides a generalized framework that improves the efficiency of data exchange across diverse agent communication settings, including both Model Context Protocol (MCP) and Agent-to-Agent (A2A) workflows. ADOL defines a set of lightweight mechanisms that preserve interoperability while significantly reducing redundant token usage in LLM-driven interactions. It optimizes data compactness and adaptiveness without altering existing transport or semantic layers. As displayed in Figure 2, ADOL enhances the agentic data layer along four complementary dimensions: A. Schema Deduplication: eliminate repetitive definitions through JSON references; B. Adaptive Optional Inclusion: allow clients to request concise or full schemas; C. Controllable Response Verbosity: enable models to specify the required output scope; D. Context-Aware Tool Selection: limit tool exposure to those relevant to the current task. Together, these mechanisms establish a more efficient and extensible foundation for agentic communication, reducing token and context bloat while maintaining full compatibility with existing agent frameworks. 3.1. Schema Deduplication Using JSON References To eliminate redundant schema definitions, MCP tool schemas may enable the JSON `$ref` mechanism, which was not supported in the original MCP architecture and thus leads to a significant amount of duplicate content. This JSON `$ref` mechanism supports both internal and external references, as can be seen in [JSON]. { "definitions": { "user": { "type": "object" } }, "properties": { "data": { "$ref": "#/definitions/user" } } } Chang, et al. Expires 8 June 2026 [Page 7] Internet-Draft Abbreviated Title December 2025 Figure 3: Example: Internal Reference The internal reference case enables schema reuse within the same file, which applies to situations where there is multiple duplicate content within the same Json schema. { "$ref": "schema/common/user.json" } Figure 4: Example: External Reference The external reference allows a tool schema to reference a shared template stored either locally or remotely. The MCP host or client may cache these schemas to avoid repeated token consumption. This mechanism is more suitable for situations where there is overlapping content among multiple tool schemas. We have conducted extensive experiments where the tool schema in the server are modified by using $ref references. The results show that the LLM can still accurately recognize and invoke these tools, introducing no backward incompatibility. 3.2. Adaptive Optional Field Inclusion When responding to `tool/list` requests, servers SHOULD determine whether to include optional schema content based on the client request. Specifically, an additional field `optional` is introduced in the server's tool configuration file, which contains schema content that is not mandatory. If the client requests a “short” list (for example, via `tool/list --short`), the server SHOULD omit unnecessary optional contents included in the `optional` field and return a concise tool schema. This conditional inclusion mechanism prevents unnecessary token usage. 3.3. Controllable Response Verbosity Client SHOULD be capable of adjusting response verbosity based on the LLM’s request. To address this, ADOL introduces a new `requireOutput` field in the tool schema. Chang, et al. Expires 8 June 2026 [Page 8] Internet-Draft Abbreviated Title December 2025 { "requireOutput": { "description": "A list of output field names that the LLM expects to be returned from the tool's complete output.", "type": "array", "items": { "type": "string", "enum": ["toolOutcome_1", "toolOutcome_2", "toolOutcome_n"] }, "uniqueItems": true } } Figure 5: RequireOutput Schema When invoking a tool, the LLM first retrieves the tool list and corresponding schemas through the client. Each tool schema may include a `requireOutput` field, which defines an optional list of output fields that can be selectively requested. This field enables the LLM to specify which parts of a tool’s output are relevant to the current context. Based on this field, the LLM infers which elements of the tool’s defined enumerated outcome are needed and passes the generated `requireOutput` schema to the MCP client. Upon receiving the tool’s execution result, the MCP client filters the output according to `requireOutput` and returns the trimmed result to the LLM. For example, in a weather-related tool, the output schema may contain multiple fields such as current conditions, temperature, humidity, and visibility. The `requireOutput` allows the client to specify which subset of these results should be included (e.g., `"requireOutput": ["temperature", "humidity"]`). After receiving the server’s response, the client extracts and forwards only these fields to the LLM, filtering out unnecessary content while preserving semantic completeness. This design ensures that the LLM only receives the necessary data for the current context, thereby reducing redundant information and minimizing token consumption. 3.4. Tool Selection Mechanism To alleviate the token overhead caused by large tool lists, an retrieval-based mechanism is introduced. Generally, The MCP client may include additional informative field in the `tool/list` request to indicate its requirements for tool. The server then filters and returns only the tools that meet the need. We propose two typical implementation approaches: Chang, et al. Expires 8 June 2026 [Page 9] Internet-Draft Abbreviated Title December 2025 1. Semantic-based retrieval: The client provides a natural language description of its requirement. The server computes an embedding for the requirement and compares it with precomputed embeddings of all available tools. Then, the server ranks the tools based on similarity and returns either the top-K most relevant tools or those whose similarity score exceeds a predefined threshold. 2. Tag-based retrieval: The client provides one or more predefined tool tags. Each tool is assigned at least one tag. The server returns the tools whose tags match the ones requested by the client. These two approaches can accommodate servers with different capabilities. Approach 1 requires a more sophisticated server implementation, but it is expected to provide better retrieval performance and higher flexibility. In contrast, Approach 2 is simpler to implement, though it introduces some usage limitations. By tool selection mechanism, the range of tools selected by the LLM is narrowed down, thereby reducing the token overhead of the LLM and improving the accuracy of tool selection. 4. Specification The proposed extensions can be summarized as follows: +===================+============================+=============+ | Mechanism | Description | Impact | +===================+============================+=============+ | JSON $ref support | Deduplicate schema content | Reduce | | | | token usage | +-------------------+----------------------------+-------------+ | Optional field | Client-specified schema | Reduce | | control | inclusion | token usage | +-------------------+----------------------------+-------------+ | Response | Adaptive detail level in | Reduce | | Verbosity control | output | token usage | +-------------------+----------------------------+-------------+ | Retrieval-based | Vectorized tool selection | Improve | | tool selection | | accuracy | +-------------------+----------------------------+-------------+ Table 1: Mechanism Summary Backward compatibility is maintained, as none of the proposed changes modify existing MCP schema semantics. Chang, et al. Expires 8 June 2026 [Page 10] Internet-Draft Abbreviated Title December 2025 5. IANA Considerations This memo includes no request to IANA. 6. Security Considerations Embedding-based retrieval introduces potential information exposure risks if tool embeddings or client embeddings are transmitted in plaintext. Implementations SHOULD ensure encryption and access control when exchanging embedding vectors. Schema caching and external references should only reference trusted sources to avoid code injection or schema poisoning. In addition to this, ADOL does not introduce new security vulnerabilities beyond those already present in existing agent communication protocols such as MCP or A2A. However, implementers should ensure that any schema references or external resources retrieved via $ref are obtained from trusted sources to prevent schema injection or data tampering. 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 7.2. Informative References [A2A] Google, "Agent2Agent(A2A) Protocol", 2025, . [MCP] Anthropic, "Model Context Protocol (MCP)", 2025, . [JSON] Wright, A., Andrews, H., Hutton, B., and G. Dennis, "JSON Schema: A Media Type for Describing JSON Documents", 2022. Chang, et al. Expires 8 June 2026 [Page 11] Internet-Draft Abbreviated Title December 2025 Acknowledgements The author thanks the MCP and A2A open-source community for the ongoing discussion and the SEP authors for foundational design insights. The SEP link we proposed in the official MCP GitHub community is: https://github.com/modelcontextprotocol/modelcontextprotocol/ issues/1576. Authors' Addresses Zeze Chang Huawei No. 3, Shangdi Information Road, Haidian District Beijing 100085 China Email: changzeze@huawei.com Jinyang Li Huawei No. 3, Shangdi Information Road, Haidian District Beijing 100085 China Email: lijinyang9@huawei.com Zhen Cao Huawei No. 3, Shangdi Information Road, Haidian District Beijing 100085 China Email: zhen.cao@huawei.com Chang, et al. Expires 8 June 2026 [Page 12]