Network Working Group B. Laurie Internet-Draft T. Santoro Intended status: Informational P. Anthonysamy Expires: 23 April 2026 S. de. Haas Google LLC 20 October 2025 A Standard for Claiming Transparency and Falsifiability draft-laurie-tmif-00 Abstract This document specifies a transparency metadata interface format that allows a system to make claims about its levels of transparency and falsifiability. Discussion Venues This note is to be removed before publishing as an RFC. Source for this draft and an issue tracker can be found at https://github.com/sarahdeh/draft-TMIF. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 23 April 2026. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. Laurie, et al. Expires 23 April 2026 [Page 1] Internet-Draft TMIF October 2025 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 3 3. Falsifiability of privacy claims . . . . . . . . . . . . . . 5 4. Examples of Claims . . . . . . . . . . . . . . . . . . . . . 5 5. Usage of Transparency Metadata . . . . . . . . . . . . . . . 6 6. Transparency Levels . . . . . . . . . . . . . . . . . . . . . 6 7. Transparency Metadata Interchange Format . . . . . . . . . . 7 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 1. Introduction As AI powered consumer facing features proliferate, AI service providers and users are grappling with increasing amounts of user discomfort over the increasing amounts of sensitive data powered by these systems. While this isn't a new problem, AI accelerates the need to process sensitive information in order to provide significant advances in utility. However, many consumers want to be reassured that there are strict limits on how their data is used, who can access it, and for what purpose. For that reason, we see an increasing trend to introduce transparency into privacy preserving systems, giving service providers a way to make strong claims about how data is handled, and users a way to independently verify those claims. Transparency approaches can, however, be radically different, therefore some consistency in definitions and terminology is required in order to allow end users (or their delegates) to examine these systems in detail and assess the overall falsifiability of the claims they make. This document therefore defines a Transparency Metadata Interchange Format (TMIF) for describing levels of transparency achieved by such a system, in order to effectively allow independent auditors to determine the overall level of transparency that the system can claim. Laurie, et al. Expires 23 April 2026 [Page 2] Internet-Draft TMIF October 2025 A high degree of transparency means that end consumers, or their delegates, can assure themselves that a service provider's claims about the usage and handling of their data are very likely true because of the high technical bar that would have to be met to undermine those claims without discovery. It is also possible to incentivize third parties and independent researchers to focus their efforts towards finding falsifying examples of privacy and security claims. 2. Conventions and Definitions *Defining a target system for the transparency metadata interchange format* A system that is set up to provide transparency and falsifiability will generally include the following components, with characteristics as described, which may have transparency related details specified in the TMIF. _Service Provider_ The entity that develops, deploys, and operates the service. This entity defines data processing logic and is considered an untrusted party in the security model, as the goal is to constrain access to user data in plaintext. _TEE Manufacturer_ The hardware vendor that designs and manufactures the processor containing the TEE. This entity is a root of trust; it produces the TEE. _TEE_ A hardware-isolated secure processing environment that protects the confidentiality and integrity of code and data executing within it, and can produce attestations about the state of the secured environment. For the purposes of this standard, we define a TEE as a secure area that keeps code and data loaded inside it, usually a hardware TEE as mentioned above. Code and data in TEEs are protected by confidentiality and integrity: data confidentiality prevents unauthorized entities from outside the TEE from reading data, while code integrity prevents code in the TEE from being replaced or modified by unauthorized entities. Crucially, an unauthorized entity can include the owner/maintainer of the code and data inside the TEE. Laurie, et al. Expires 23 April 2026 [Page 3] Internet-Draft TMIF October 2025 _Client_ The device (e.g., a smartphone, laptop, or web browser) used to interact with the service. Includes software and hardware. _Server_ The machine that serves user requests. In a distributed system, this includes all machines that have access to the data. _Remote attestation_ A cryptographic process that provides a remote client with verifiable proof of the TEE's state. The service generates a signed report, or attestation, that proves key information: 1. The hardware is a genuine TEE from a specific manufacturer. 2. The correct, unmodified application binaries are running within it. 3. The TEE's software version. The attestation allows the client to verify the integrity of the trusted environment before provisioning it with any sensitive data. See more in the section below on Transparency properties for who should read these attestations. _Root of Trust_ In practice most TEEs will be hardware based, with the cryptographic keys fused into the CPU silicon during hardware manufacturing. This provides a high-degree of physical tamper resistance. However, a software root of trust is also possible, therefore details of its assurance level should be specified as part of the TMIF format (e.g. has completed a vulnerability analysis at Evaluation Assurance Level (EAL) Level 5 or higher.). _Accelerators_ Laurie, et al. Expires 23 April 2026 [Page 4] Internet-Draft TMIF October 2025 When a TEE offloads computation to a hardware accelerator (e.g., a GPU, NPU), then in general the entire data pathway will be secured. This implies a mutually authenticated and encrypted channel between the host CPU's TEE and a TEE within the accelerator itself. The accelerator would also have its own capacity for secure processing and attestation. The TMIF format should be able to describe all of these implementation details. _Isolation_ TEE environments provide memory protection, even from software with higher privilege. 3. Falsifiability of privacy claims The aim of specifying the transparency metadata for a given system is to allow third parties to assess the level of falsifiability of the system. A fully transparent system is designed in such a way to ensure that an accidental or malicious attempt to undermine privacy and security claims cannot be introduced secretly. This is based on the principle that testing may show the presence of bugs or backdoors, but never their absence. The higher the falsifiability of a claim, the more likely it is for someone to find a counterexample to it, in the event that claim were in fact false (e.g. because a bug or backdoor were introduced, accidentally or intentionally). In practice, a high degree of falsifiability also makes it difficult for an insider with highly privileged access to purposely subvert the claims of the system without risking discovery in doing so. The chief criterion for inclusion of metadata in the algorithmic calculation of levels is its role in supporting claims that increase falsifiability. 4. Examples of Claims _Claim: The service provider cannot access user data in plaintext._ How can it be falsified? - _Remote Attestation:_ Before a client sends any data, it can receive a cryptographic attestation (or signed report) from the TEE. This attestation proves which software is running inside the TEE. If the service provider were to run a modified version of the software to access plaintext data, the attestation would change, and a client could detect this and refuse to send data. * _Binary Transparency:_ When a binary is publicly available, anyone can download it to perform reverse engineering and search for backdoors. A higher level of assurance is achieved when the Laurie, et al. Expires 23 April 2026 [Page 5] Internet-Draft TMIF October 2025 binary is open source and has a reproducible build. This allows anyone to build the binary from the public source code and confirm that the resulting binary matches the one being attested to by the TEE, providing strong confidence that the running code corresponds to the public source code. _Claim: Data that leaves the TEE is limited to privacy preserving, aggregated analytics, and it's not possible to link it to a specific user._ How can it be falsified? - _Non-targetability:_ One could attempt to falsify this claim by analyzing the egressed data to demonstrate a method for de-anonymization and re-linking to an individual. 5. Usage of Transparency Metadata We expect that users of the TMIF would comprise the claimant (e.g. a service provider), and a third party wishing to perform an evaluation of said claims. The claimant will perform the following functions: - Specification of the roots of trust, for example whether the RoT is hardware based, software based, or other, and what kind of system if so - Specification of the application level claims, for example that an application keeps data private from the service provide - Adding any additional fields that might help to increase the overall falsifiability level of the application level claims The evaluator will assess the information provided by the TMIF and determine whether they trust the claimant based on the evidence provided. We expect the evaluator to calculate an algorithmic assessment based on the provided values. The evaluator can separately publish their own requirements for the claimant's system. 6. Transparency Levels We expect that the evaluators will calculate an algorithmic assessment that would class the service provider's claims in different levels or buckets of transparency and falsifiability. The following levels are proposed 1. binary is publicly available 2. L1 + is executable 3. L2 + is reproducibly buildable 4. Source is available 5. Formal proof of security properties of the source/ binary are available Laurie, et al. Expires 23 April 2026 [Page 6] Internet-Draft TMIF October 2025 7. Transparency Metadata Interchange Format This metadata interchange format allows for a way of computing an overall transparency level for a distributed system. Here is an example TMIF instance, as a JSON object: { "transparency_level": { "remote_attestation_roots_of_trust": ["amd_sev_snp", "nvidia_h100"], "application_level_claims": [ "https://github.com/some-claim-index/oak/blob/main/docs/tr/ claim/18136.md", "https://github.com/another-claim-index/ claim/292382.md", ], "transparency_level_lower_bound": 4, // more fields may be added here } } NOTE: This is currently specified in JSON format for the purposes of providing an example, but the preferred format will depend on emerging use cases and we invite further discussion on this particular point. Individual values will be defined elsewhere in a decentralized way (e.g. on separate websites / GitHub readmes, etc.). Fields are defined below. Fields are defined below. _remote_attestation_roots_of_trust_ For a single node, this is the root of trust that provides the attestation to which the encrypted and attested communication channel is bound. e.g. a hardware provider. If the node is an accelerator, this is the manufacturer of the accelerator. For the overall system, this field is the union of the roots of trust of the transitive closure of the connectivity graph of the system, starting from the entry point node. In general, it is desirable for this set to be as small as possible, and not to include the service provider. Future versions of this document may introduce a more sophisticated semantics for the roots of trust, such as a boolean expression that combines entities with logical operators (AND, OR, k-of-n, etc.). TODO: describe key provisioning practices _application_level_claims_ For a single node, this is the list of falsifiable claims attached to the application. _transparency_level_lower_bound_ Laurie, et al. Expires 23 April 2026 [Page 7] Internet-Draft TMIF October 2025 For a single node, this is the lower bound of the transparency level of all the published binaries. For a graph, this is the lower bound of the transparency levels of all the nodes. *Optional Fields* The following transparency attributes are considered useful additional information about system transparency, and may be added as fields with corresponding values in the transparency message above. _Publication of binaries in a verifiable audit log_ published_binaries Expected value: URI of the log Binaries or binary hashes should be published in a verifiable audit log, such as Trillian. This allows for a record of releases to ensure that malicious code is not pushed to the server and then later changed to avoid discoverability. _Open Sourcing privacy-critical system components (policy enforcement components)_ open_sourced_policy_enforcement Expected value: location of policy enforcement OSS It is possible to prove certain attributes of a system without fully open sourcing the entire workload, but open sourcing key policy enforcing software. A straightforward example would be open sourcing a sandboxing system, which enforces the policy that data cannot egress. _Open Sourcing key system components (evaluation tools)_ open_sourced_eval_tools Expected value: location of evaluation tools It is also possible to make meaningful claims about a system by running an evaluation script that is open sourced against a closed source model or data set inside a TEE _Full OSS including workload_ fully_open_sourced Expected value: location of system OSS Laurie, et al. Expires 23 April 2026 [Page 8] Internet-Draft TMIF October 2025 It is possible for the entire workload within the TEE to be fully open sourced. This may be desirable to prove adherence to policies that cannot be proved by a separate policy enforcement layer and instead are part of the workload itself. _Reproducible builds_ reproducible_builds Expected value: boolean In order to provide meaning to binaries published in a verifiable audit log, it is necessary to match them to associated source code. Therefore published source code that is intended as part of a 'proof' of system behavior must be reproducibly buildable to match the published binaries. _Publicly available remote attestation evidence_ public_remote_attestation Expected value: path to attestation endpoint In order to confirm that the binaries published in the verifiable audit log are the same binaries that run on the server, it is necessary to inspect the remote attestation from the TEE. At higher transparency levels this inspection should be possible for anyone, in principle, to complete, therefore it must be made possible for an attestation request to be made from any machine. _Auditability_ data_egress_audit_logs Expected value: location of log If highly privileged 'break glass' type access is used, this field can be used to declare an audit log that ensures that this type of access cannot be used secretly 8. Security Considerations This section will certainly be filled in later as the discussion progresses. 9. IANA Considerations This document has no IANA actions. Acknowledgments TODO acknowledge. Laurie, et al. Expires 23 April 2026 [Page 9] Internet-Draft TMIF October 2025 Authors' Addresses Ben Laurie Google LLC Email: benl@google.com Tiziano Santoro Google LLC Email: tzn@google.com Pauline Anthonysamy Google LLC Email: anthonysp@google.com Sarah de Haas Google LLC Email: dehaass@google.com Laurie, et al. Expires 23 April 2026 [Page 10]