| Internet-Draft | Concise Diagnostic Notation (CDN) | June 2026 |
| Bormann | Expires 17 December 2026 | [Page] |
This document formalizes and consolidates the definition of the Concise Diagnostic Notation (CDN) of the Concise Binary Object Representation (CBOR), addressing implementer experience.¶
Replacing CDN's previous informal descriptions, it updates RFC 8949, obsoleting its Section 8, and RFC 8610, obsoleting its Appendix G.¶
It also specifies registry-based extension points and uses them to support text representations such as of epoch-based dates/times and of IP addresses and prefixes.¶
(This cref will be removed by the RFC editor:)
-26 is intended to address the May/June 2026
Working Group Last Call comments on -25 and the ensuing WG discussions.
Specifically, this update:
• is going further with the idea to entirely replace the non-backwards
compatible update considered for the RFC 8610/G.4 concatenation by two new
application extensions (temporarily named b1/t1), and to add
related application-oriented extensions
that deprecate the original streamstring syntax.
• includes the float'' application-extension so that the entire
CBOR format can be covered.
• now uses rules closer to those of markdown for handling data
transparency in raw strings, simplifying their implementation.
• adds security considerations.
• proactively reserves the application-extension identifier
"pragma" for potential future standardization.
• This update does not address certain comments that propose some
editorial restructuring requiring moving text around; this is best
done in a next revision after the technical comments are addressed.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://cbor-wg.github.io/edn-literal/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-cbor-edn-literals/.¶
Discussion of this document takes place on the cbor Working Group mailing list (mailto:cbor@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/cbor/. Subscribe at https://www.ietf.org/mailman/listinfo/cbor/.¶
Source for this draft and an issue tracker can be found at https://github.com/cbor-wg/edn-literal.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 17 December 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The Concise Binary Object Representation (CBOR) (RFC8949) [STD94] is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. In addition to the binary interchange format, the original CBOR specification described a text-based "diagnostic notation" (Section 6 of [RFC7049], now Section 8 of RFC 8949 [STD94]), in order to be able to converse about CBOR data items without having to resort to binary data. Appendix G of [RFC8610] extended this into what also became known as Extended Diagnostic Notation (EDN), often including Section 4.2 of [RFC8742] and draft revisions of the present document. Diagnostic notation is now specified by this document, obsoleting all these previous descriptions, and is known as Concise Diagnostic Notation (CDN).¶
Diagnostic notation syntax is based on JSON, with extensions for representing CBOR constructs such as binary data and tags.¶
The interchange format created by standardizing CDN is not intended to compete with the actual binary interchange format CBOR, but enables the use of a shared diagnostic notation in tools for and in documents about CBOR. Still, between tools for CBOR development and diagnosis, document generation systems, continuous integration (CI) environments, configuration files, and user interfaces for viewing and editing for all these, CDN is often "interchanged" and therefore merits a specification that facilitates interoperability within this domain as well as reliable translation to and from CBOR. CDN is not designed or intended for general-purpose use in protocol elements exchanged between systems engaged in processes outside those listed here.¶
This document consolidates and formalizes the definition of CDN, providing a formal grammar (see Section 5.1 and Section 5.2), and incorporating small changes based on implementation experience. It updates RFC8949 by obsoleting Section 8 of RFC 8949 [STD94], and [RFC8610] by obsoleting Appendix G of [RFC8610]. It is intended to serve as the single reference target that can be used in specifications that use CDN.¶
It also specifies two registry-based extension points for the diagnostic notation: one for additional encoding indicators, and one for adding application-oriented literal forms. It uses these registries to add encoding indicators for a more complete coverage of encoding variation, and to add application-oriented literal forms that enhance CDN with text representations of epoch-based date/times, of IP addresses and prefixes [RFC9164], and of Concise Resource Identifiers (CRI [I-D.ietf-core-href]), as well as an application-oriented literal that represents cryptographic hash values computed from byte strings.¶
In addition, this document registers a media type identifier and a content-format for CDN. This does not elevate its status as an interchange format, but recognizes that interaction between tools is often smoother if media types can be used.¶
Note that CDN is not meant to be the only text-based representation of CBOR data items. For instance, [YAML] [RFC9512] is able to represent most CBOR data items, possibly requiring use of YAML's extension points. YAML does not provide certain features that can be useful with tools and documents needing text-based representations of CBOR data items (such as embedded CBOR or encoding indicators), but it does provide a host of other features that CDN does not provide such as anchor/alias data sharing, at a cost of higher implementation and learning complexity.¶
Section 2 of this document has been built from Section 8 of RFC 8949 [STD94] and Appendix G of [RFC8610]. The latter provided a number of useful extensions to the initial diagnostic notation that was originally defined in Section 6 of [RFC7049]. Section 8 of RFC 8949 [STD94] and Appendix G of [RFC8610] have collectively been called "Extended Diagnostic Notation" (EDN), now simplified as "Concise Diagnostic Notation" (CDN) giving the present document its name.¶
After introductory material, Section 3 illustrates the concept of application-oriented extension literals by defining a number of application-extensions. Section 4 defines mechanisms for dealing with unknown application-oriented literals and deliberately elided information. Section 5 gives the formal syntax of CDN in ABNF, with explanations for some features of and additions to this syntax, as an overall grammar (Section 5.1) and specific grammars for the content of app-string and byte-string literals (Section 5.2). This is followed by the conventional sections for IANA Considerations (6), Security considerations (7), and References (8.1, 8.2). An informational comparison of CDN with CDDL follows in Appendix A.¶
Section 8 of RFC 8949 [STD94] defines the original CBOR diagnostic notation, and Appendix G of [RFC8610] supplies a number of extensions to the diagnostic notation that form the basis for what is now the Concise Diagnostic Notation (CDN). The diagnostic notation extensions include popular features such as embedded CBOR (encoded CBOR data items in byte strings) and comments. A simple diagnostic notation extension that enables representing CBOR sequences was added in Section 4.2 of [RFC8742]. As at least some elements of the extended form are now near-universally used, the terms "diagnostic notation" and "extended diagnostic notation" have become synonyms in the context of CBOR, with "concise diagnostic notation" (CDN) now the preferred synonym, hinting at knowledge of this updated specification.¶
In a similar vein, the term "ABNF" in this document refers to the
language defined in [STD68] as extended in [RFC7405], where the
"characters" of Section 2.3 of RFC 5234 [STD68] are Unicode scalar
values.
Where names for ABNF rules are used in the text, they are shown in
typewriter font (not distinguishable in the plaintext rendition of
this document).
Brief snippets of grammar may also be given in the text as I-Regexp regular
expressions [RFC9485].¶
The term "CDDL" (Concise Data Definition Language) refers to the data definition language defined in [RFC8610] and its registered extensions (such as those documented in [RFC9165], [RFC9741], and [RFC9682]). Additional information about the relationship between the two languages CDN and CDDL is captured in Appendix A.¶
Examples sometimes need to be quoted in the text, in particular in
cases where the typewriter font used for example text cannot be
distinguished in the plaintext rendition of this document.
ASCII quotes, however, are already taken: true, "true", 'true',
and `true` are all different literals in CDN and should not be
confused.
Therefore, a different quoting convention as in »true« or »"true"«
is used for examples in the text where this is needed to remain
unambiguous.¶
Superscript notation denotes exponentiation. For example, 2 to the power of 64+1 is notated: 264+1. In the plain-text rendition of this specification, superscript notation is not available and exponentiation is therefore rendered by the surrogate notation seen here in the plain-text rendition.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP14] (RFC2119) (RFC8174) when, and only when, they appear in all capitals, as shown here.¶
Section 8 of RFC 8949 [STD94] states the objective of defining a common human-readable diagnostic notation with CBOR. In particular, it states:¶
All actual interchange always happens in the binary format.¶
One important application of CDN is the notation of CBOR data for humans: in specifications, on whiteboards, and for entering test data. A number of features, such as comments inside prefixed string literals, are mainly useful for people-to-people communication via CDN. Programs also often output CDN for diagnostic purposes, such as in error messages or to enable comparison (including generation of diffs via tools) with test data.¶
For comparison with test data, it is often useful if different
implementations generate the same (or similar) output for the same
CBOR data items.
This is comparable to the objectives of deterministic serialization
for CBOR data items themselves (Section 4.2 of RFC 8949 [STD94]).
However, there are even more representation variants in CDN than in
binary CBOR, and there is little point in specifically endorsing a
single variant as "deterministic" when other variants may be more
useful for human understanding, e.g., the << >> notation as
opposed to, say, hexadecimal h'' notation;
a CDN generator may have quite a few options
that control what presentation variant is most desirable for the
application that it is being used for.¶
Because of this, a deterministic representation is not defined for CDN. More generally speaking, there is no expectation for "roundtripping": Converting CDN to binary CBOR and back to CDN will generally not achieve exactly the same result as the original input CDN. This possibly was created by humans or by a different CDN generator and may contain presentation information that is not represented in the binary CBOR.¶
However, there is a certain expectation that CDN generators can be configured to some basic output format, which:¶
looks like JSON where that is possible;¶
inserts encoding indicators, if any, only where the binary form differs from Preferred Serialization (Section 4.1 of RFC 8949 [STD94]);¶
uses hexadecimal representation (h'') for byte strings, not
b64'' or embedded CBOR (<<>>);¶
does not generate elaborate blank space (newlines, indentation) for
pretty-printing, but does use common blank spaces such as after ,
and :.¶
See Section 1.3.5 for more considerations about the character repertoire used for CDN source text.¶
Additional features such as ensuring
deterministic map ordering (Section 4.2 of RFC 8949 [STD94]) on output,
or even deviating from the basic
configuration in some systematic way, can further assist in comparing
test data.
Information obtained from a CDDL model can help in choosing
application-oriented literals or specific string representations such
as embedded CBOR or b64'' in the appropriate places.¶
Diagnostic notation is often used in interchange situations where backward compatibility is much less of a concern than in the kinds of interchanges enabled by binary CBOR. This meant that extensions to diagnostic notation could be introduced relatively freely in Appendix G of [RFC8610] and in Section 4.2 of [RFC8742]. There was little point in not using these extensions for instance in the examples contained in specifications. With the landscape of CBOR related tools becoming more populated, this kind of evolution is now less desirable.¶
For the CBOR representation format, Section 7.1 of RFC 8949 [STD94] introduced a limited number of specific extension points, in particular the concept of tags, to enable the introduction of new constructs such as data types without a need to update the base specification.¶
The present specification follows suit by adding extension points to CDN, one very general one (Section 2.1) and one specific to diagnostic processing of encoding variants (Section 2.3).¶
From the relatively unconstrained way extensions were added in [RFC8610], the present specification also derives taking the liberty to make two changes to these extensions that are not entirely backwards compatible. Section 2.2.1 and Section 2.5.1 have more details. Also, some syntax that has been part of the original diagnostic notation has been deprecated (Section 2.5.5) and replaced (Section 3.5). Changes of this kind would be unacceptable for the binary CBOR format itself, but can be OK just once now, considering the more permissive conditions under which the features that will suffer these changes were originally introduced. With CDN now featuring the new extension points, a need for this kind of changes should arise much less.¶
Similar to JSON, CDN is designed to enable representing all CBOR data
items using a source character repertoire just containing printable
ASCII characters (%x20-7e in ABNF) and newlines.
However, if appropriate, CDN can also make full use of larger Unicode
repertoires.¶
CDN generators may provide configuration to consistently select either the unescaped (directly readable) or an escaped (ASCII equivalent) form of characters in string literals; the latter allows CDN to be used when the diagnostic value of fully escaped characters may be desired or in environments where non-ASCII characters may not enjoy full data transparency. Similar to JSON, CDN is designed to allow a simple tool to convert any CDN (including CDN with application extensions unknown to the tool) into a fully escaped (printable ASCII and newlines only) form, as well as to inversely recover unescaped characters for all escapes where this is possible or for certain subsets of the characters (such as Unicode categories L, M, N, P, S, plus Zs or just ASCII space).¶
Special considerations apply to newlines in the source. On some platforms, a CARRIAGE RETURN character (U+000D or CR, often seen escaped as "\r" in many programming languages) is always added in front of a LINE FEED (U+000A or LF) to represent a newline (which are then referred to as CRLF). On other platforms, carriage returns are not used at line breaks at all, so a newline is just an LF. (Platforms that use just a CARRIAGE RETURN by itself to signify an end of line are no longer relevant and the files they produce are out of scope for this document.)¶
Files are often freely converted between these two newline representations, including by source code revision control systems. To ensure that platforms will generate the same bytes in the CBOR data items created from input in either conversion state, CDN MUST create the same processing result independent of which newline representation is used by its input.¶
To deal with this variability in platform presentation of newlines,
Unicode CARRIAGE RETURN characters that exist in the input unescaped are
ignored as if they were not in the input wherever they appear.
Specifically, any carriage return characters that may be present in a
CDN (text or byte) string literal are not copied into the resulting string.
If a carriage return is needed in a CBOR string data item, it can be
added explicitly, for instance by using the escaped form \r in
single-quoted or double-quoted strings.¶
CBOR is a binary interchange format. To facilitate documentation and debugging, and in particular to facilitate communication between entities cooperating in debugging, this document defines a simple human-readable diagnostic notation. All actual interchange always happens in the binary format.¶
Note that diagnostic notation truly was designed as a diagnostic format; it originally was not meant to be parsed. Therefore, no formal definition (as in ABNF) was given in the original documents. Recognizing that formal grammars can aid interoperation of tools and usability of documents that employ CDN, Section 5 now provides ABNF definitions.¶
CDN is a true superset of JSON as it is defined in [STD90] in conjunction with [RFC7493] (that is, any interoperable [RFC7493] JSON text also is a CDN text), extending it both to cover the greater expressiveness of CBOR and to increase its usability.¶
CDN borrows the JSON syntax for numbers (integer and floating-point, Section 2.4), certain simple values (Section 2.8), UTF-8 [STD63] text strings, arrays, and maps (maps are called objects in JSON; the diagnostic notation extends JSON here by allowing any data item in the map key position).¶
As CDN is used for truly diagnostic purposes, its implementations MAY support generation and possibly ingestion of CDN for CBOR data items that are well-formed but not valid. It is RECOMMENDED that an implementation enables such usage only explicitly by configuration (such as an API or CLI flag). Validity of CBOR data items is discussed in Section 5.3 of RFC 8949 [STD94], with basic validity discussed in Section 5.3.1 of RFC 8949 [STD94], and tag validity discussed in Section 5.3.2 of RFC 8949 [STD94]. Tag validity is more likely a subject for individual application-oriented extensions, while the two cases of basic validity (for text strings and for maps) are addressed in Sections 2.5.8 and 2.6.3 under the heading of validity.¶
The rest of this section provides an overview over specific features of CDN, starting with certain common syntactical features and then going through kinds of CBOR data items roughly in the order of CBOR major types. Any additional detailed syntax discussion needed has been deferred to Section 5.1.¶
Additional information about implementation and use of CDN is continuously being collected by the community in [CDN-WIKI].¶
CDN provides literals that represent CBOR data items textually. Many of the forms of literals provided are predefined by this document, but it also defines an extension point that enables defining additional application-oriented extension literals, or extension literals for short.¶
Extension literals start with a prefix that identifies the
application-oriented extension, immediately followed by a sequence
literal (Section 2.5.7) or a single-quoted or raw string literal (Section 2.5).
The string-based forms use their string literal as a shorthand
form for a sequence literal representing a sequence with exactly that
one text string data item, e.g., b64`Zm9v` is a shorthand for
b64<<"Zm9v">> or b64<<`Zm9v`>>.¶
More precisely, an application-extension identifier is a registered name consisting of a
lowercase ASCII letter ([a-z]) and zero or more additional ASCII
characters that are either lowercase letters, digits, or hyphens ([a-z0-9-]).
»false«, »true«, »null«, and »undefined« cannot be used as such
identifiers and are reserved.¶
Application-extension identifiers are registered in the "Application-Extension Identifiers" registry (Section 6.1).¶
An application-extension (such as dt) MAY also define the meaning of
one additional prefix derived from its application-extension identifier by
replacing each lowercase character by its uppercase counterpart (such
as DT).
As a convention, using the all-uppercase variant implies making use of
a CBOR tag appropriate for this application-oriented extension (such
as tag number 1 for DT, where in contrast the prefix dt stands for
the unwrapped tag content).¶
In summary, an application-extension identifier gives rise to one or two application-extension prefixes, one that is lexically identical to the identifier (i.e., all lowercase), and potentially another one that is an all-uppercase variation of it. In addition to specifying which of these two variations exhibits which specific semantics, the application extension specifies what input the extension takes.¶
When the prefix is used immediately in front of a single-quoted or a raw string, the input takes the form of a single text string CBOR data item. When used immediately in front of a sequence literal, the input is a CBOR sequence of elements of the sequence literal as input. (For a single parameter, this is equivalent to receiving a single CBOR data item as the argument.) The application extension can provide behavior that depends on the number of items supplied as input to it and their data types; it cannot distinguish between its prefix being used with a single-quoted string, a raw string, or a CBOR sequence composed of a single text string data item (as illustrated for instance in Tables 4, 5, and 6).¶
This specification defines a number of generally applicable application-oriented extensions (Section 3), both to motivate making these extensions generally available, and to illustrate the concept.¶
Of these, the application-oriented extensions h, b64, t1, b1, dt and ip are
mandatory to implement.
(As mentioned, for simplicity we use the term "application-oriented
extensions" for the mechanism discussed in this section even if it is
used to describe a part of base CDN.)¶
Sometimes it is useful to indicate in the diagnostic notation which of several alternative CBOR representations are actually used; for example, a data item written »1.5« by a diagnostic decoder might have been encoded in CBOR as a half-, single-, or double-precision float.¶
Encoding indicators are always optional: CDN is usually used to describe CBOR data items at the data model level. For some diagnostic purposes, it is useful to represent the choice of a serialization variation by including encoding indicators. Implementations of CDN generally do not need to provide this functionality in full; if they do, they can be called "diagnostic implementations". To be able to process CDN that contains encoding indicators, a CDN-consuming implementation MUST accept them (i.e., process or ignore the presence or absence of each encoding indicator). It is RECOMMENDED to provide a warning for each encoding indicator value that is encountered but not further processed.¶
When creating CDN as input for a diagnostic CBOR encoder in order to obtain specific encoding choices, encoding indicators may be placed manually or by the software generating the CDN. Where no encoding indicator is placed, a diagnostic CBOR encoder is expected to generate Preferred Serialization (Section 4.1 of RFC 8949 [STD94]) with definite-length encoding only. Similarly, when using CDN as output for a diagnostic CBOR decoder, a basic diagnostic configuration of the tool is expected to provide encoding indicators only in places where the CBOR input did not use Preferred Serialization with definite-length encoding (see also Section 1.3.3). Diagnostic implementations of CDN that process encoding indicators as discussed here are expected to document their diagnostic behavior and the processing options that can be selected.¶
Encoding indicators start with
an underscore and comprise all immediately following characters that are alphanumeric or
underscore.
For example, _ or _3.
Encoding indicators can be ignored by anyone not
interested in this information.¶
Encoding indicators are placed immediately to the right of the data item or of a syntactic feature that can stand for the data item the encoding of which the encoding indicator is controlling. Table 1 provides examples for data items with definite length encoding indicators used with various kinds of data items ("mt" = major type, "ignoring e.i." = example encoding when ignoring the encoding indicators). Examples for encoding indicators controlling indefinite length encoding can be found in the context of explanations in Section 2.5.5 and Section 2.6.2.¶
| mt | examples | encoding (in hex) | ignoring e.i. |
|---|---|---|---|
| 0 |
1_10x4711_3
|
190001 1b0000000000004711 |
01 194711 |
| 1 |
-1_1
|
390000 | 20 |
| 2 |
'A'_1
|
59000141 | 4141 |
| 3 |
"A"_1
|
79000161 | 6161 |
| 4 |
[_1 "bar"]
|
99000163626172 | 8163626172 |
| 5 |
{_1 "bar": 1}
|
b900016362617201 | a16362617201 |
| 6 |
1_1(4711)
|
d90001191267 | c1191267 |
| 7 |
1.5_20x4711p+03_3
|
fa3fc00000 fb4101c44000000000 |
f93e00 fa480e2200 |
(In the following, an abbreviation of the form ai=nn gives nn as
the numeric value of the field additional information, the low-order 5
bits of the initial byte: see Section 3 of RFC 8949 [STD94].
This field is used in encoding the "argument", i.e., the value, tag, or
length; ai=0 to ai=23 mean that the value of the ai field
immediately is the argument, ai=24 to ai=27 mean that the
argument is carried in 2ai-24 (1, 2, 4, or 8)
additional bytes, and ai=31 means that indefinite-length
encoding is used.)¶
An underscore followed by a decimal digit n indicates that the
item was or is to be encoded with an additional information
value of ai=24+n.
(The item associated to the encoding indicator may be the preceding
item, or, for arrays and maps, the item starting with the
preceding bracket or brace.)
For an example involving floating point values (Section 3.3 of RFC 8949 [STD94]),
1.5_1 is a half-precision floating-point
number (21 = 2 additional bytes or 16 bits), while 1.5_3 is encoded as
double precision (23 = 8 additional bytes or 64 bits).
For a tool consuming CDN in a diagnostic mode, encountering an
encoding indicator that does not provide enough space to correctly
encode the unchanged data item given is an error; there is no
truncation or rounding that would change the data item encoded.¶
The encoding indicator _ (an underscore on its own) is used to
indicate indefinite-length encoding.
Indefinite-length encoding uses ai=31, which could have been
indicated by _7, which is therefore not used and marked as reserved
(as are _4, _5, and _6, which would stand for ai=28 to
ai=30, values currently not in use in CBOR; these encoding
indicators will be available if and when CBOR is extended to make use
of them).¶
Note that the encoding indicator _ is only available behind the opening
brace/bracket for map and array (Section 2.6.2): strings
originally had a now deprecated special syntax
streamstring for indefinite-length encoding except for the special
cases ''_ and ""_ (Section 2.5.5).¶
The encoding indicators _0 to _3 indicate ai=24
to ai=27, respectively; they therefore stand for 1, 2, 4, and 8
bytes of additional information (ai) following the initial byte in the
head of the data item.¶
Surprisingly, Section 8.1 of RFC 8949 [STD94] does not address ai=0 to
ai=23 — the assumption seems to have been that Preferred Serialization
(Section 4.1 of RFC 8949 [STD94]) will be used when converting CBOR
diagnostic notation to an encoded CBOR data item, so leaving out the
encoding indicator for a data item with a Preferred Serialization
will implicitly use ai=0 to ai=23 if that is possible.
The present specification allows making this explicit:¶
_i ("immediate") stands for encoding with ai=0 to ai=23, i.e.,
it indicates that the argument is encoded directly in the initial byte
of the CBOR item.¶
Encoding indicators are an extension point for CDN; Section 6.2 defines a registry for additional values.¶
Specific forms of encoding indicators are discussed in further detail in Section 2.5.5 for the deprecated syntax for indefinite-length strings and in Section 2.6.2 for arrays and maps.¶
In addition to JSON's decimal number literals, CDN provides hexadecimal, octal,
and binary number literals in the usual C-language notation (0x, 0o prefix only, and 0b,
respectively).¶
Numbers composed only of digits (of the respective base) are
interpreted as CBOR integers (major type 0/1, or where the number
cannot be represented in this way, major type 6 with tag 2/3).
A leading "+" sign is a no-op, and a leading "-" sign inverts the
sign of the number.
So 0, 000, +0 all represent the same integer zero, as does -0.
Similarly,
1, 001, +1 and +0001 all stand for the same positive integer one, and
-1 and -0001 both designate the same negative integer minus one.¶
Using a decimal point (.) and/or an exponent (e for decimal, p
for hexadecimal) turns the number into a floating point number (major
type 7) instead, irrespective of whether it is an integral number
mathematically.
Note that, in floating point numbers, 0.0 is not the same number as
-0.0, even if they are mathematically equal.¶
In Table 2, all the items on a row are the same number (also shown in CBOR, hexadecimally), but they are distinct from items in a different row.¶
| CDN | CBOR hex |
|---|---|
4711, 0x1267, 0o11147, 0b1001001100111
|
19 1267 # uint |
1.5, 0.15e1, 15e-1, 0x1.8p0, 0x18p-4
|
F9 3E00 # float16 |
0, +0, -0
|
00 # uint |
0.0, +0.0
|
F9 0000 # float16 |
-0.0
|
F9 8000 # float16 |
Infinity
|
F9 7C00 # float16 |
-Infinity
|
F9 FC00 # float16 |
NaN
|
F9 7E00 # float16 |
The non-finite floating-point values Infinity, -Infinity, and NaN are
written exactly as in this sentence (this is also a way they can be
written in JavaScript, although JSON does not allow them).
NaN in CDN stands for the NaN value with a zero sign bit and an all-zero
significand except for a set quiet bit; this is represented as
F9 7E 00 in CBOR Preferred Serialization.
Table 3 shows how the floating point numbers 1.1, 1.5 and
these three values are encoded in preferred serialization and when
encoding indicators are given.¶
| CDN | CBOR hex |
|---|---|
1.1
|
fb 3ff199999999999a
|
1.1_1, 1.1_2
|
(error) |
1.1_3
|
fb 3ff199999999999a
|
1.5, 1.5_1
|
f9 3e00
|
1.5_2
|
fa 3fc00000
|
1.5_3
|
fb 3ff8000000000000
|
Infinity, Infinity_1
|
f9 7c00
|
Infinity_2
|
fa 7f800000
|
Infinity_3
|
fb 7ff0000000000000
|
-Infinity, -Infinity_1
|
f9 fc00
|
-Infinity_2
|
fa ff800000
|
-Infinity_3
|
fb fff0000000000000
|
NaN, NaN_1
|
f9 7e00
|
NaN_2
|
fa 7fc00000
|
NaN_3
|
fb 7ff8000000000000
|
See Section 5.1, Paragraph 8, Item 3 for additional details of the CDN number syntax.¶
(Note that literals for further number formats, e.g., for representing
rational numbers as fractions, or for other NaN values than the one called NaN, can
be added as application-oriented literals.
Background information beyond that in [STD94] about the representation
of numbers in CBOR can be found in the informational document
[I-D.bormann-cbor-numbers].)¶
CBOR distinguishes two kinds of strings: text strings (the bytes in the string constitute UTF-8 [STD63] text, major type 3), and byte strings (CBOR does not further characterize the bytes that constitute the string, major type 2).¶
(UTF-8) text strings can be directly represented (unprefixed) in CDN either as double-quoted (Section 2.5.2) or as raw strings (Section 2.5.4), while byte strings can be represented as single-quoted strings (Section 2.5.3). The latter is useful for byte strings carrying bytes that can be meaningfully notated as UTF-8 text.¶
Many strings are best notated as extension literals, which may provide detailed access to the bits within those bytes (see Section 2.5.6). Using an application-extension prefix, extension literals can be constructed out of single-quoted strings and raw strings, as well as sequence literals (cf. Section 2.1).¶
Before extension literals were added to diagnostic notation, Appendix G.4 of [RFC8610] added a syntax for concatenating strings by just juxtaposing them. This syntax was not widely implemented and is problematic in the presence of optional commas; it is now entirely removed from CDN. (Previous revisions of the present document proposed yet another alternative syntax; this is now entirely withdrawn and replaced by application-extensions such as Section 3.4.)¶
CDN enables notating text strings in a form compatible to that of notating text
strings in JSON (i.e., as a double-quoted string literal), with a
number of usability enhancements.
JSON allows no control characters in text-string literals;
if needed, they can be specified using escapes such as \t or \r.
This also applies to CDN, and all escaping rules apply as in JSON,
with a single exception:
In CDN, string literals additionally can contain newlines (LINEFEED
U+000A), which are copied into the resulting string like other
characters in the string literal.
To deal with variability in platform presentation of newlines, any
carriage return characters (U+000D) that may be present in the CDN
string literal are not copied into the resulting string (see Section 1.3.5).¶
JSON's escape scheme for characters that are not on Unicode's basic
multilingual plane (BMP) is cumbersome (see Section 7 of RFC 8259 [STD90]).
CDN keeps it, but also adds the syntax \u{NNN} where NNN is the
Unicode scalar value as a hexadecimal number.
This means the following are equivalent (the first o is escaped as
\u{6f} for no particular reason):¶
"D\u{6f}mino's \u{1F073} + \u{2318}" # \u{}-escape 3 chars
"D\u006Fmino's \uD83C\uDC73 + \u2318" # escape JSON-like
"Domino's 🁳 + ⌘" # unescaped
¶
Analogously to text-string literals delimited by double quotes, CDN allows the use of single quotes (without a prefix) to express byte-string literals with UTF-8 text; for instance, the following are equivalent:¶
'hello world' h'68656c6c6f20776f726c64'¶
The escaping rules of JSON strings are applied equivalently for
text-based byte-string literals, e.g., \\ stands for a single
backslash and \' stands for a single quote.
However, to facilitate parsing, in single-quoted strings CDN excludes
certain escaping mechanisms available for double-quoted strings:¶
\/ is an escape in JSON that is available for double-quoted CDN
text strings as
well to ensure all JSON texts are CDN literals.
Since CDN's single-quoted strings do not occur in JSON, this legacy
compatibility feature is not available for them.¶
\u-based escapes are not available for characters in the range
from U+0020 through U+007E (essentially, printable ASCII).¶
All other escaping mechanisms that are available in double-quoted string literals are available in single-quoted string literals.¶
Single-quoted string literals can occur unprefixed and stand for the byte string that encodes its text string value (the "content"), or be prefixed by what looks like an application-extension prefix (see Section 2.1).¶
In a prefixed string literal, the text content of the single-quoted string literal is not used directly as a byte string, but is further processed in a way that is defined by the meaning given to the prefix. Depending on the prefix, the result of that processing can, but need not be, a byte string value.¶
Prefixed string literals (whether single-quoted after the
prefix or a raw string (Section 2.5.4)) are used both for base-encoded byte string literals (see Section 2.5.6) and for
application-oriented extension literals (see Section 2.1, called app-string).
(Additional kinds of base-encoded string literals can be defined as
application-oriented extension literals by registering their prefixes;
there is no fundamental difference between the two predefined
base-encoded string literal prefixes (h, b64) and any such potential
future extension literal prefixes; for simplicity of expression, both
cases are referred to as "extension literals".)¶
Both double-quoted and single-quoted string literals handle backslashes in a special way. For string data items that employ backslashes themselves, possibly with additional layers of processing giving this "escaping" mechanism specific application semantics, this can lead to an exponential duplication of backslashes that has informally been described as "quoting hell".¶
CDN therefore also allows text strings to be notated as raw string
literals, which do not perform any special processing on backslashes,
i.e., treat
them as raw string content like any other characters.
Instead, data transparency is provided by enclosing the entire string content in starting
and ending delimiters built as a sequence of one or more backquote
(»`«, U+0060 GRAVE ACCENT) characters.¶
For example, the string content »[^ \t\n\r"'`]«, an I-Regexp character class
that excludes blank space and quoting characters, can be notated as:¶
``[^ \t\n\r"'`]``¶
instead of¶
"[^ \\t\\n\\r\"'`]"¶
By using more backquotes for each of the outer delimiters than the longest sequence of backquotes that can be found in the string, internal backquotes do not prematurely end the string literal. An example for a raw string that contains a double backquote and therefore is notated starting and ending with a triple backquote:¶
```To emulate typographic quotes, sometimes duplicate backward and forward single quotes are used, as in ``text.'' ```¶
This mechanism is easy to use for the large majority of cases. However, without additional rules:¶
raw strings could not be used for empty string data items, which therefore need to be notated using double- or single-quoted strings. (Obviously, there is no need to escape the content of empty strings, so this should not be a problem.)¶
raw strings could not be used for string data items that start or end with backquotes, as these would amalgamate with the start and end delimiters.¶
To address these cases (predominantly the latter), two additional rules are added to perform after processing the backquotes used as delimiters:¶
any single newline (LF or CRLF, see Section 1.3.5) at the start of the inner string is removed to yield the string content. As a result:¶
```a```¶
can also be expressed as¶
``` a```¶
In addition to enabling leading backquotes in raw strings, this can be very useful for documentation strings etc.¶
This rule also allows notating »``text''« as:¶
``` ``text''```¶
if the first rule does not apply, but the inner string starts with a space character as well as ends with one, exactly one single space character starting the inner string together with exactly one single space character ending the inner string are removed to yield the string content.¶
This allows notating »a = ``foo``« as:¶
``` a = ``foo`` ```¶
If neither of these rules apply, the inner string between the raw delimiters is used as the raw string unchanged.¶
(The examples given here are minimal in that they show how the additional rules work; more complex examples would be necessary to provide additional motivation why this is a good way to handle the various cases.)¶
This section will move to 2.3.2; it is left here at the moment for easier comparison.¶
In CBOR, indefinite-length encoded (byte or text) strings are composed of "chunks" (Section 3.2.3 of RFC 8949 [STD94]).¶
The original diagnostic notation (Section 6.1 of [RFC7049]) provided
a special syntax streamstring for them, which was retained and
further clarified in Section 8.1 of RFC 8949 [STD94].
This syntax represents the individual chunks in
sequence within parentheses, each optionally followed by a comma, with
an encoding indicator _ immediately after the opening parenthesis:
e.g., (_ h'0123', h'4567') or (_ "foo", "bar").
The overall type (byte string or text string) of the string is
provided by the types of the individual chunks, which all need to be
of the same type (Section 3.2.3 of RFC 8949 [STD94]).¶
In this syntax, an indefinite-length string with no chunks inside, (_ )
would be ambiguous as to whether a byte string (encoded 5f ff) or a text string
(encoded 7f ff) is meant and is therefore not used.
The basic forms ''_ and ""_ can be used instead and are reserved for
the case of no chunks only — not as short forms for the (permitted,
but not really useful) encodings with only empty chunks, which
need to be notated as (_ ''), (_ ""), etc.,
when it is desired to preserve the chunk structure.¶
With this document, the streamstring syntax is now deprecated; new
CDN documents should instead use the ilbs/ilts application
extensions (Section 3.5) to build indefinite-length encoded strings.¶
This section will move to new subsections of Section 3.¶
Besides the unprefixed byte string literals that are analogous to JSON text
string literals, CDN provides extension literals that can represent
byte strings by base-encoding them, typically notated as prefixed
string literals.
The application-extension identifier selects one of the base encodings
[RFC4648], without padding.
Most often, the base encoding is
enclosed in a single-quoted or raw string literal, prefixed by »h« for base16 or
»b64« for base64 or base64url (the actual encodings of the latter two
have the same meaning where they overlap, so the string remains unambiguous).
For example, the byte string consisting of the four bytes 12 34 56 78
(given in hexadecimal here) could be written h'12345678' or
b64'EjRWeA' when using single-quoted string literals, or
h`12345678` or b64`EjRWeA` when using raw string literals.¶
Examples often benefit from some blank space (spaces, line breaks) in byte strings literals. In certain CDN prefixed byte string literals, blank space is ignored; for instance, the following are equivalent:¶
h'48656c6c6f20776f726c64'
h'48 65 6c 6c 6f 20 77 6f 72 6c 64'
h'4 86 56c 6c6f
20776 f726c64'
¶
The internal syntax of prefixed single-quote literals such
as h'' and b64'' might also allow comments as blank space (see Section 2.2).¶
h'68656c6c6f20776f726c64'
h'68 65 6c /doubled l!/ 6c 6f # hello
20 /space/
77 6f 72 6c 64' /world/
¶
Slash characters are part of the base64 classic alphabet (see
Table 1 in Section 4 of [RFC4648]), and they therefore need to be in the
b64'' set of characters that contribute to the byte string.
Therefore, only end-of-line comments starting with # are available inside
b64 byte string literals.¶
b64'/base64 not a comment/ but one follows # comment' h'FDB6AC 7BAE27A2D69CA2699E9EDFDBBADA2779FA25 968C2C'¶
These two byte string literals stand for the same byte string; the
deliberately confusing base64 content starts with
b64'/bas' which is the same as h'FDB6AC' and ends with b64'lows'
which is the same as h'968C2C'.¶
In diagnostic notation, a sequence of zero or more CBOR data item literals can
be enclosed in << and >> and separated by comma or blank space, optionally prefixed by an
application-extension prefix; this specification speaks of sequence literals.
CDN mainly deals with individual data items, not with CBOR sequences
[RFC8742], so the CBOR sequence represented by the sequence literal needs
to be further processed to obtain the value of the literal.¶
Prefixed sequence literals refer to the application extension (see Section 2.1) identified by the prefix and apply the extension to its sequence content, resulting in a single data item. This data item may be a string or not (always), depending on the definition of the application extension.¶
An unprefixed sequence literal applies CBOR encoding to the data items in its content, taken as a CBOR sequence. The value of the literal thus is a byte string with the encoded content; this is commonly referred to as embedded CBOR. For instance, each pair of columns in the following are equivalent:¶
<<1>> h'01' <<1, 2>> h'0102' <<"hello", null>> h'65 68656c6c6f f6' <<>> h''¶
A diagnostic implementation is expected to honor encoding indicators on the individual items in the supplied sequence before assembling them into an encoded CBOR sequence. For instance, each pair of columns in the following are equivalent:¶
<<1_1>> h'190001' <<1_0, 2_2>> h'1801 1a00000002' <<"hello"_0, null>> h'7805 68656c6c6f f6'¶
For prefixed sequence literals, the processing of encoding indicators on the arguments can be defined by the application extension being used. See Section 3.5 for an example of where this is done. Encoding indicators on the arguments are ignored if the application extension does not define their handling.¶
To be valid CBOR, Section 5.3.1 of RFC 8949 [STD94] requires that text strings are byte sequences in UTF-8 [STD63] form. CDN provides several ways to construct such byte strings (in particular, see also Section 3.4). These mechanisms might operate on subsequences that do not themselves constitute UTF-8, e.g., by building larger sequences out of concatenating the subsequences; for validity of a text string resulting from these mechanisms it is only of importance that the result is UTF-8. Double-quoted, single-quoted, and raw string literals have been defined such that they lead to byte sequences that are UTF-8: the source language of CDN is UTF-8, and all escaping mechanisms lead only to adding further UTF-8 characters. Only application-extensions (invoked in prefixed literals) can generate non-UTF-8 byte sequences.¶
As discussed at the start of Section 2, CDN implementations MAY support generation and possibly ingestion of CDN for CBOR data items that are well-formed but not valid; when this is enabled, such implementations MAY relax the requirement on text strings to be valid UTF-8.¶
CBOR has no requirements for its text strings except for conformance to [STD63]. The same applies to CDN and its source language. No additional Unicode processing or validation such as normalization or checking whether a scalar value is actually assigned is foreseen by CDN, particularly not any processing that is dependent on a specific Unicode version. Such processing, if offered, MUST NOT get in the way of processing the data item represented in CDN (i.e., it may be appropriate to issue warnings but not to error out or to generate output that does not match the input at the UTF-8 level).¶
CDN borrows the JSON syntax for arrays and maps. (Maps are called objects in JSON.)¶
For maps, CDN extends the JSON syntax by allowing any data item in the map key position (before the colon).¶
JSON requires the use of a comma as a separator character between the elements of an array as well as between the members (key/value pairs) of a map. (These commas also were required in the original diagnostic notation defined in [STD94] and [RFC8610].) The separator commas are now optional in the places where CDN syntax allows commas; however, where no comma is used in a separator position, there must be blank space (composed of at least one space, newline, and/or comment) instead. (Stylistically, leaving out the commas is more idiomatic when they occur at line breaks, which provide the blank space.)¶
In addition, CDN also allows, but does not require, a trailing comma before the closing bracket/brace, enabling an easier to maintain "terminator" style of their use.¶
In summary, the following eight examples are all equivalent:¶
[1, 2, 3] [1, 2, 3,] [1 2 3] [1 2 3,] [1 2, 3] [1 2, 3,] [1, 2 3] [1, 2 3,]¶
as are¶
{1: "n", "x": "a"}
{1: "n", "x": "a",}
{1: "n" "x": "a"}
# etc.
¶
As a comma and/or blank/comment is mandatory in a separator position,
»[11]« is unambiguously an array with a single element (the
integer 11), different from »[1 1]« or »[1,1]«.
As this is a general rule, »[[] []]« or »[[],[]]« are well-formed
CDN, while »[[][]]« is not.¶
A single underscore can be written after the opening brace of a map or
the opening bracket of an array to indicate that the data item was
represented in indefinite-length format. For example, [_ 1, 2]
contains an indicator that an indefinite-length representation was
used to represent the data item [1, 2].¶
At the same position, encoding indicators for specifying the size of
the array or map head for definite-length format can be used instead,
specifically _i or _0 to _3. For example, [_0 false, true] can be
used to specify the encoding of the array [false, true] as 98 02 f4 f5.¶
As discussed at the start of Section 2, CDN implementations MAY support generation and possibly ingestion of CDN for CBOR data items that are well-formed but not valid (Section 5.3 of RFC 8949 [STD94]).¶
For maps, this is relevant for map keys that occur more than once, as in this CDN that is not representing a valid CBOR data item:¶
{1: "to", 1: "from"}
¶
CDN uses JSON syntax for the simple values True (»true«), False
(»false«), and Null (»null«).
Undefined is written »undefined« as in JavaScript.¶
These and all other simple values can be given as "simple()" with the
appropriate decimal unsigned integer (0|[1-9][0-9]*) in the parentheses.
For example, »simple(42)«
indicates major type 7, value 42, and »simple(20)« indicates
»false«.¶
This document extends the syntax used in diagnostic notation to also enable application-oriented extensions (Section 2.1). This section defines a number of application-oriented extensions.¶
The application-extension identifier "dt" is used to notate a date/time literal that can be used as an Epoch-Based Date/Time as per Section 3.4.2 of RFC 8949 [STD94].¶
The content of the literal is a single Standard Date/Time String as per Section 3.4.1 of RFC 8949 [STD94], as a text or byte string.¶
The value of the literal is a number representing the result of a
conversion of the given Standard Date/Time String to an Epoch-Based
Date/Time.
If fractional seconds are given in the text (production
time-secfrac in Figure 5), the value is a
floating-point number; the value is an integer number otherwise.
In the all-uppercase variant of the app-prefix, the value is enclosed
in a tag number 1.¶
Each row of Table 4 shows an example of "dt" notation and equivalent notation not using an application-extension identifier.¶
| dt literal | plain CDN |
|---|---|
dt'1969-07-21T02:56:16Z'
|
-14159024
|
dt'1969-07-21T02:56:16.0Z'
|
-14159024.0
|
dt'1969-07-21T02:56:16.5Z'
|
-14159023.5
|
dt`1969-07-21T02:56:16.5Z`
|
-14159023.5
|
dt<<'1969-07-21T02:56:16.5Z'>>
|
-14159023.5
|
dt<<"1969-07-21T02:56:16.5Z">>
|
-14159023.5
|
dt<<`1969-07-21T02:56:16.5Z`>>
|
-14159023.5
|
DT'1969-07-21T02:56:16Z'
|
1(-14159024)
|
See Section 5.2.3 for an ABNF definition for the text string input of dt literals.¶
The application-extension identifier "ip" is used to notate an IP address literal that can be used as an IP address as per Section 3 of [RFC9164].¶
The input of the literal is a single text string representing an IPv4address or IPv6address as per Section 3.2.2 of [RFC3986].¶
With the lowercase app-string prefix ip, the value of the literal is a
byte string representing the binary IP address.
With the uppercase app-string prefix IP, the literal is such a byte string
tagged with tag number 54, if an IPv6address is used, or tag number
52, if an IPv4address is used.¶
As an additional case, the uppercase app-string prefix IP can be used
with an IP address prefix such as 2001:db8::/56 or 192.0.2.0/24, with the equivalent tag as its value.
(Note that [RFC9164] representations of address prefixes need to
implement the truncation of the address byte string as described in
Section 4.2 of [RFC9164]; see example below.)
For completeness, the lowercase variant ip'2001:db8::/56' or ip'192.0.2.0/24' stands for
an unwrapped [56,h'20010db8'] or [24,h'c00002']; however, in this case the information
on whether an address is IPv4 or IPv6 often needs to come from the context.¶
Note that this application-extension provides no direct representation
of the "Interface format"
defined in Section 3.1.3 of [RFC9164], an address combined with an
optional prefix length and an optional zone identifier, and therefore
no way to reference a zone identifier at all.
(If needed, this format can be put together by building their
structures explicitly, e.g., an interface format without a zone
identifier can be represented as in 52([ip'192.0.2.42',24]), or an
interface format with zone identifier 42 as in
54([ip'fe80::0202:02ff:ffff:fe03:0303',64,42]).)¶
Each row of Table 5 shows an example of "ip" notation and equivalent notation not using an application-extension identifier.¶
| ip literal | plain CDN |
|---|---|
ip'192.0.2.42'
|
h'c000022a'
|
ip<<'192.0.2.42'>>
|
h'c000022a'
|
IP'192.0.2.42'
|
52(h'c000022a')
|
IP'192.0.2.0/24'
|
52([24,h'c00002'])
|
ip'2001:db8::42'
|
h'20010db8000000000000000000000042'
|
IP'2001:db8::42'
|
54(h'20010db8000000000000000000000042')
|
IP'2001:db8::/64'
|
54([64,h'20010db8'])
|
See Section 5.2.4 for an ABNF definition for the content of ip literals.¶
The application-extension identifier "hash" is used to notate the input to a cryptographic hash function as well as to identify such a hash function. Its value is a byte string that represents the output of that hash function.¶
The input of the literal is a (text or byte) string, optionally followed by either an integer or a text string that identifies the hash function in the COSE Algorithms registry of the CBOR Object Signing and Encryption (COSE) registry group [IANA.cose], either by the identifier (value: integer or string), or, if no algorithm is registered with this value, by its name used in the registry. If the second item is not given, the default algorithm used is -16 ("SHA-256").¶
No uppercase variant prefix is defined for the application-extension identifier "hash".¶
| hash literal | plain CDN |
|---|---|
hash<<'foo'>>
|
h'2C26B46B68FFC68FF99B453C1D304134 13422D706483BFA0F98A5E886266E7AE' |
hash'foo'
|
h'2C26B46B68FFC68FF99B453C1D304134 13422D706483BFA0F98A5E886266E7AE' |
hash<<'foo', -16>>
|
h'2C26B46B68FFC68FF99B453C1D304134 13422D706483BFA0F98A5E886266E7AE' |
hash<<'foo', "SHA-256">>
|
h'2C26B46B68FFC68FF99B453C1D304134 13422D706483BFA0F98A5E886266E7AE' |
hash<<'foo', -44>>
|
h'F7FBBA6E0636F890E56FBBF3283E524C 6FA3204AE298382D624741D0DC663832 6E282C41BE5E4254D8820772C5518A2C 5A8C0C7F7EDA19594A7EB539453E1ED7' |
hash<<'foo', "SHA-512">>
|
h'F7FBBA6E0636F890E56FBBF3283E524C 6FA3204AE298382D624741D0DC663832 6E282C41BE5E4254D8820772C5518A2C 5A8C0C7F7EDA19594A7EB539453E1ED7' |
This section uses the placeholders t1 and b1 as provisional application extension names, allowing the text to stabilize while the actual names are still being decided by the WG.¶
The "b1" and "t1" Extensions allow a (byte or text) string to be built up from multiple (byte or text) string literals; these are then concatenated into a single string.¶
The following four text string values (adapted from Appendix G.4 of [RFC8610]) are equivalent:¶
"Hello world" t1<<"Hello ", "world">> t1<<"Hello", h'20', "world">> t1<<h'48656c6c6f20776f726c64'>>¶
Similarly, the following byte string values are equivalent:¶
'Hello world' b1<"Hello world"> b1<<'Hello ', 'world'>> b1<<'Hello ', h'776f726c64'>> b1<<'Hello', h'20', 'world'>> b1<<h'48656c6c6f20776f726c64', '', b64''>> b1<<h'4 86 56c 6c6f', h' 20776 f726c64'>>¶
As the examples show, text strings and byte strings can mix within such a concatenation, so that, for instance, byte string literal notation can be used inside a sequence of concatenated text string notation literals, to encode characters that may be better represented in an encoded way.¶
This is realized by simply joining together the bytes in the sequence of string arguments to the b1/t1 application extension, proceeding from left to right.¶
For "b1", the joining operation results in a byte string. For "t1", the joining operation results in a text string, and the result therefore needs to be valid UTF-8 except for "diagnostic" implementations that support and are enabled for generation/ingestion of CDN for CBOR data items that are well-formed but not valid; see also Section 2.5.8.¶
Besides strings, arguments to t1/b1 may include ellipses, in which case the result will be an ellipsis data item in turn (see Section 4.2). The semantic processing of these is governed by the following rules:¶
A single ... is a general ellipsis, which by itself can stand for
any data item, but when used as argument to t1/b1 must stand in for
a string value.¶
Multiple adjacent ellipses are equivalent to a single ellipsis.¶
When an ellipsis is concatenated (on one or both sides) with
strings, the result is a CBOR tag number CPA888 that contains an
array with joined together spans of such strings plus the ellipses
represented by /CPA/888(null).¶
Arguments with nested ellipses are flattened and the above equivalences applied, so that, for instance, these values are equivalent:¶
h'48656c6c6f...776f726c64' b1<<h'48656c6c6f...', ..., h'...776f726c64'>> b1<<'Hello', ..., 'world'>>¶
If there is no ellipsis in the concatenated list, the result of processing the list will always be a single string data item.¶
The ilbs and ilts application extensions are semantically
identical to t1 and b1 at the data model level, but instead of
concatenating the arguments to a single (byte/text) string data item,
they build an indefinite length string out of the arguments, with one
chunk of the correct major type (byte string/text string for
ilbs/ilts, respectively) created per argument.¶
A diagnostic implementation would honor encoding indicators on each of the arguments, creating a chunk with the same encoding. As the application-extension is implying indefinite length encoding, there is no point in applying an encoding indicator to the entire application-extension literal.¶
'Hello world' 4b 48656c6c6f20776f726c64 ilbs<<>> 5f ff ilbs<<"Hello world">> 5f 4b 48656c6c6f20776f726c64 ff ilbs<<'Hello ', "world">> 5f 46 48656c6c6f20 45 776f726c64 ff ilbs<<'Hello '_0, 'world'>> 5f 5806 48656c6c6f20 45 776f726c64 ff¶
There is no way to include ellipses in an indefinite length string.¶
The
application-extension identifier "cri" is used to notate
a CDN literal for a CRI reference as defined in [I-D.ietf-core-href].¶
The input of the literal is a URI Reference as per [RFC3986] or an IRI Reference as per [RFC3987].¶
The value of the literal is a CRI reference that can be converted to the text of the literal using the procedure of Section 6.1 of [I-D.ietf-core-href]. Note that there may be more than one CRI reference that can be converted to the URI/IRI reference given; implementations are expected to favor the simplest variant available and make non-surprising choices otherwise. In the all-uppercase variant of the app-prefix, the value is enclosed in a tag number 99.¶
As an example, the CDN¶
cri'https://example.com/bottarga/shaved' CRI'https://example.com/bottarga/shaved'¶
is equivalent to¶
[-4, ["example", "com"], ["bottarga", "shaved"]] 99([-4, ["example", "com"], ["bottarga", "shaved"]])¶
See Section 5.2.5 for an ABNF definition for the content of cri literals.¶
The "float" application extension enables the notation of 2-byte,
4-byte, and 8-byte byte strings to express floating point values
(mt=7, ai=25/26/27 respectively) by giving their IEEE 754
representation.
A text string used as an argument is interpreted exactly as a hex
literal (like the h application prefix); the result is used as the
byte string.¶
The application-oriented literal is interpreted as an encoded data item would be that prefixes the byte string by a single byte 0xF9 (2 bytes, i.e., binary16), 0xFA (4 bytes, i.e., binary32), and 0xFB (8 bytes, i.e., binary64), respectively. Byte strings of a different length than 2, 4, or 8 raise an error. Note that the interpretation as an encoded data item does not create or imply an encoding indicator; that can be added separately.¶
Example (tool used: edn-abnf -afloat -e):¶
🔧 "[float'fe00', float'fe00'_2, float'47110815']" -tpretty ➔ 83 # array(3) F9 FE00 # primitive(65024) FA FFC00000 # primitive(4290772992) FA 47110815 # primitive(1192298517) 🔧 "[float'fe00', float'fe00'_2, float'47110815', 0x1.22102ap+15]" ➔ [float'fe00', float'fe00'_2, 37128.08203125, 37128.08203125]¶
The purpose of this application extension is to close a gap in CDN's [IEEE754] binary64 support: Without this (or a similar) extension there is no way to represent NaN values different from the one called out at the end of Section 4.1 of RFC 8949 [STD94]: "(for many applications, the single NaN encoding 0xf97e00 will suffice)". For finite floating point numbers, the decimal or hex floating point representations are preferred.¶
This section collects grammars in ABNF form ([STD68] as extended in [RFC7405]) that serve to define the syntax of CDN and some application-oriented literals.¶
This subsection provides an overall ABNF definition for the syntax of concise diagnostic notation.¶
For simplicity, the internal parsing for the built-in CDN prefixes is
specified in the same way.
ABNF definitions for h''/h`` and b64''/b64`` are
provided in Section 5.2.1 and Section 5.2.2.¶
seq = S [item *(MSC item) SOC]
one-item = S item S
item = map / array / tagged
/ number / simple
/ string / streamstring
string1 = (tstr / bstr) spec
string = string1 / ellipsis
ellipsis = 3*"." ; "..." or more dots
number = (hexfloat / hexint / octint / binint
/ decnumber / nonfin) spec
sign = "+" / "-"
decnumber = [sign] (1*DIGIT ["." *DIGIT] / "." 1*DIGIT)
["e" [sign] 1*DIGIT]
hexfloat = [sign] "0x" (1*HEXDIG ["." *HEXDIG] / "." 1*HEXDIG)
"p" [sign] 1*DIGIT
hexint = [sign] "0x" 1*HEXDIG
octint = [sign] "0o" 1*ODIGIT
binint = [sign] "0b" 1*BDIGIT
nonfin = %s"Infinity"
/ %s"-Infinity"
/ %s"NaN"
simple = %s"false"
/ %s"true"
/ %s"null"
/ %s"undefined"
/ %s"simple(" S simple-number S ")"
simple-number = "25" %x30-35 ; 250-255
/ "2" %x30-34 DIGIT ; 200-249
/ "1" 2DIGIT ; 100-199
/ %x34-39 DIGIT ; 40-99
/ "3" %x32-39 ; 32-39
;; there are no simple values between 24-31
/ "2" %x30-33 ; 20-23
/ "1" DIGIT ; 10-19
/ DIGIT ; 0-9
uint = "0" / DIGIT1 *DIGIT
tagged = uint spec "(" S item S ")"
app-prefix = lcalpha *lcldh ; including h and b64
/ ucalpha *ucldh ; tagged variant, if defined
app-string = app-prefix sqstr
app-sequence = app-prefix "<<" seq ">>"
app-rstring = app-prefix rawstring
rawstring = startrawdelim
raw-inner
alikerawdelim
rawdelim = 1*"`"
startrawdelim = rawdelim
; width (number of backquotes) distinguishes
; between following alikerawdelim and shortrawdelim
alikerawdelim = rawdelim ; width == previous startrawdelim
shortrawdelim = rawdelim ; width < previous startrawdelim
rawchars = 1*(%x0a/%x0d / %x20-5f / %x61-7e / NONASCII)
raw-inner = 1*(rawchars / shortrawdelim)
sqstr = SQUOTE *single-quoted SQUOTE
bstr = app-string / sqstr / app-rstring / rawstring
/ app-sequence / embedded
; note: rawstring is text; app-... can be any type
tstr = DQUOTE *double-quoted DQUOTE
embedded = "<<" seq ">>"
array = "[" (specms S item *(MSC item) SOC / spec S) "]"
map = "{" (specms S keyp *(MSC keyp) SOC / spec S) "}"
keyp = item S ":" S item
; We allow %x09 HT in prose, but not in string literals
blank = %x09 / %x0A / %x0D / %x20
lblank = %x0A / %x20 ; Not HT or CR (gone)
non-slash = blank / %x21-2e / %x30-7F / NONASCII
non-slash-star = blank / %x21-29 / %x2b-2e / %x30-7F / NONASCII
non-star = blank / %x21-29 / %x2b-7F / NONASCII
ends-in-star = *non-star 1*"*"
non-lf = %x09 / %x0D / %x20-7F / NONASCII
eol-comment = "#" / "//"
comment = "/" non-slash-star *non-slash "/"
/ "/*" ends-in-star
*(non-slash-star ends-in-star) "/"
/ eol-comment *non-lf %x0A
; optional space
S = *blank *(comment *blank)
; mandatory space
MS = (blank/comment) S
; mandatory comma and/or space
MSC = ("," S) / (MS ["," S])
; optional comma and/or space
SOC = S ["," S]
; check semantically that strings are either all text or all bytes
; note that there must be at least one string to distinguish
streamstring = "(_" MS string *(MSC string) SOC ")"
spec = ["_" *wordchar]
specms = ["_" *wordchar MS]
double-quoted = unescaped
/ SQUOTE
/ "\" escapable-d
single-quoted = unescaped
/ DQUOTE
/ "\" escapable-s
escapable1 = %s"b" ; BS backspace U+0008
/ %s"f" ; FF form feed U+000C
/ %s"n" ; LF line feed U+000A
/ %s"r" ; CR carriage return U+000D
/ %s"t" ; HT horizontal tab U+0009
/ "\" ; \ backslash (reverse solidus) U+005C
escapable-d = escapable1
/ DQUOTE
/ "/" ; / slash (solidus) U+002F (JSON!)
/ (%s"u" hexchar) ; uXXXX U+XXXX
escapable-s = escapable1
/ SQUOTE
/ (%s"u" hexchar-s) ; uXXXX U+XXXX
hexchar = "{" (1*"0" [ hexscalar ] / hexscalar) "}"
/ non-surrogate
/ two-surrogate
non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG)
/ ("D" ODIGIT 2HEXDIG )
two-surrogate = high-surrogate "\" %s"u" low-surrogate
high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG
low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG
hexscalar = "10" 4HEXDIG / HEXDIG1 4HEXDIG
/ non-surrogate / 1*3HEXDIG
; single-quote hexchar-s: don't allow 0020..007e
hexchar-s = "{" (1*"0" [ hexscalar-s ] / hexscalar-s) "}"
/ non-surrogate-s
/ two-surrogate
non-surrogate-s = "007F" ; rubout
/ "00" ("0"/"1"/"8"/"9"/HEXDIGA) HEXDIG
/ "0" HEXDIG1 2HEXDIG
/ non-surrogate-1
non-surrogate-1 = ((DIGIT1 / "A"/"B"/"C" / "E"/"F") 3HEXDIG)
/ ("D" ODIGIT 2HEXDIG )
hexscalar-s = "10" 4HEXDIG / HEXDIG1 4HEXDIG
/ non-surrogate-1 / HEXDIG1 2HEXDIG
/ ("1"/"8"/"9"/HEXDIGA) HEXDIG
/ "7F"
/ HEXDIG1
; Note that no other C0 characters are allowed, including %x09 HT
unescaped = %x0A ; new line
/ %x0D ; carriage return -- ignored on input
/ %x20-21
; omit 0x22 "
/ %x23-26
; omit 0x27 '
/ %x28-5B
; omit 0x5C \
/ %x5D-7F
/ NONASCII
newline = [%x0D] %x0A
DQUOTE = %x22 ; " double quote
SQUOTE = "'" ; ' single quote
DIGIT = %x30-39 ; 0-9
DIGIT1 = %x31-39 ; 1-9
ODIGIT = %x30-37 ; 0-7
BDIGIT = %x30-31 ; 0-1
HEXDIGA = "A" / "B" / "C" / "D" / "E" / "F"
; Note: double-quoted strings as in "A" are case-insensitive in ABNF
HEXDIG = DIGIT / HEXDIGA
HEXDIG1 = DIGIT1 / HEXDIGA
lcalpha = %x61-7A ; a-z
lcldh = lcalpha / DIGIT / "-"
ucalpha = %x41-5A ; A-Z
ucldh = ucalpha / DIGIT / "-"
ALPHA = lcalpha / ucalpha
wordchar = "_" / ALPHA / DIGIT ; [_a-z0-9A-Z]
NONASCII = %x80-D7FF / %xE000-10FFFF
While an ABNF grammar defines the set of character strings that are considered to be valid CDN by this ABNF, the mapping of these character strings into the generic data model of CBOR is not always obvious.¶
Further information can be moved up to Section 2 by splitting it up into information specific to the ABNF grammar and general information.¶
The following additional items should help in the interpretation:¶
As mentioned in the terminology (Section 1.2), the ABNF terminal values in this document define Unicode scalar values (characters) rather than their UTF-8 encoding. For example, the Unicode PLACE OF INTEREST SIGN (U+2318) would be defined in ABNF as %x2318.¶
See Section 1.3.5 for more considerations about the character repertoire used for CDN source text and, in particular, the special handling of newline characters in the source.¶
decnumber stands for an integer in the usual decimal notation, unless at
least one of the optional parts starting with "." and "e" are
present, in which case it stands for a floating point value in the
usual decimal notation. Note that the grammar allows 3. for
3.0 and .3 for 0.3 (also for hexadecimal floating point
below); implementers are advised that some platform numeric parsers
accept only a subset of the floating point syntax in this document
and may require some preprocessing to use here.¶
hexint, octint, and binint stand for an integer in the usual base 16/hexadecimal
("0x"), base 8/octal ("0o"), or base 2/binary ("0b") notation.
hexfloat stands
for a floating point number in the usual hexadecimal notation (which
uses a mantissa in hexadecimal and an exponent in decimal notation,
see Section 5.12.3 of [IEEE754], Section 6.4.4.3 of [C], or Section
5.13.4 of [Cplusplus]; floating-suffix/floating-point-suffix from
the latter two is not used here).¶
For hexint, octint, binint, and when decnumber stands for an integer, the
corresponding CBOR data item is represented using major type 0 or 1
if possible, or using tag 2 or 3 if not.
In the latter case, this specification does not define any encoding
indicators that apply.
If fine control over encoding is desired, this can be expressed by
being explicit about the representation as a tag:
E.g., 987654321098765432310, which is equivalent to 2(h'35 8a 75
04 38 f3 80 f5 f6') in its Preferred Serialization, might be
written as 2_3(h'00 00 00 35 8a 75 04 38 f3 80 f5 f6'_1) if
leading zeros need to be added during serialization to obtain
specific sizes for tag head, byte string head, and the overall byte
string.¶
When decnumber stands for a floating point value, and for
hexfloat and nonfin, a floating point data item with major
type 7 is used; diagnostic implementations employ Preferred
Serialization unless the item was modified by an
encoding indicator, which then needs to be _1, _2, or _3.
For this, the number range needs to fit into an [IEEE754] binary64 (or the size
corresponding to the encoding indicator), and the precision will be
adjusted to binary64 before further applying Preferred Serialization
(or to the size corresponding to the encoding indicator).
Tag 4/5 representations are not generated in these cases.
Future app-prefixes could be defined to allow more control for
obtaining a tag 4/5 representation directly from a hex or decimal
floating point literal.¶
spec stands for an encoding indicator.
See Section 2.3 for details.¶
The ABNF grammar for raw strings is lenient; a parser needs to
implement the ABNF comments on alikerawdelim and shortrawdelim as
well.
shortrawdelim only matches sequences of backquotes that are
shorter than startrawdelim.
alikerawdelim only matches sequences of backquotes that are
exactly as long as startrawdelim.¶
This subsection provides ABNF definitions for the content of application-oriented extension literals defined in [STD94] and in this specification, where applicable. These grammars describe the decoded content of the single-quoted or raw string components that combine with the application-extension identifiers used as prefixes to form application-oriented extension literals. Each of these may integrate ABNF rules defined in Figure 1, which are not always repeated here.¶
Table 7 summarizes the app-prefix values defined in this document.¶
| app-prefix | content of single-quoted or raw string | result type |
|---|---|---|
| h | hexadecimal form of binary data | byte string |
| H | (not used) | |
| b64 | base64 forms (classic or base64url) of binary data | byte string |
| B64 | (not used) | |
| dt | RFC 3339 date/time | number (int or float) |
| DT | " | Tag 1 on the above |
| ip | IP address or prefix | byte string, array of length and byte string |
| IP | " | Tag 54 (IPv6) or 52 (IPv4) on the above |
| hash | string (usually used with sequences) | byte string |
| HASH | (not used) | |
| t1 | strings (usually used with sequences) | text string |
| T1 | (not used) | |
| b1 | strings (usually used with sequences) | byte string |
| B1 | (not used) | |
| cri | RFC 3986 URI or URI reference | CBOR structure representing equivalent CRI |
| CRI | " | Tag 99 on the above |
| float | floating point value from input bytes | floating point value (mt=7) |
| FLOAT | (not used) |
Note that implementation platforms may already provide implementations
of grammars used in application-extensions, such as of RFC 3339 for
dt'' and of IP address syntax for ip''.
CDN-based tools may want to use these implementation libraries instead
of using the grammars that are provided here as a reference.¶
For convenience, the common definitions in Figure 2 are not repeated in the below ABNF grammars.¶
ALPHA = %x41-5a / %x61-7a DIGIT = %x30-39 ; 0-9 HEXDIG = DIGIT / HEXDIGA HEXDIGA = "A" / "B" / "C" / "D" / "E" / "F" ; Note: double-quoted strings as in "A" are case-insensitive in ABNF lblank = %x0A / %x20 ; Not HT or CR (gone) non-lf = %x20-7f / NONASCII NONASCII = %x80-D7FF / %xE000-10FFFF
The syntax of the content of byte strings represented in hex,
such as h'', h'0815', or h'/head/ 63 /contents/ 66 6f 6f'
(another representation of << "foo" >>), is described by the ABNF in Figure 3.
This syntax accommodates both lowercase and uppercase hex digits, as
well as blank space (including comments) around each hex digit.¶
app-string-h = S *(HEXDIG S HEXDIG S / ellipsis S)
[eol-comment *non-lf]
ellipsis = 3*"."
non-slash = lblank / %x21-2e / %x30-7f / NONASCII
non-slash-star = lblank / %x21-29 / %x2b-2e / %x30-7f / NONASCII
non-star = lblank / %x21-29 / %x2b-7f / NONASCII
ends-in-star = *non-star 1*"*"
non-lf = %x20-7f / NONASCII
eol-comment = "#" / "//"
S = *lblank *(comment *lblank)
comment = "/" non-slash-star *non-slash "/"
/ "/*" ends-in-star
*(non-slash-star ends-in-star) "/"
/ eol-comment *non-lf %x0A
The syntax of the content of byte strings represented in base64 is described by the ABNF in Figure 4.¶
This syntax allows both the classic (Section 4 of [RFC4648]) and the
URL-safe (Section 5 of [RFC4648]) alphabet to be used.
It accommodates, but does not require base64 padding.
Note that inclusion of classic base64 makes it impossible to have
comments based on slash characters in b64, as "/" is valid base64-classic.¶
app-string-b64 = B *(4(b64dig B))
[b64dig B b64dig B ["=" B "=" / b64dig B ["="]] B]
["#" *non-lf]
b64dig = ALPHA / DIGIT / "-" / "_" / "+" / "/"
B = *lblank *(comment *lblank)
comment = "#" *non-lf %x0A
The syntax of the content of dt literals can be described by the
ABNF for date-time in Figure 5.
This is derived from [RFC3339] as summarized in Section 3 of [RFC9165].¶
app-string-dt = date-time
date-fullyear = 4DIGIT
date-month = 2DIGIT ; 01-12
date-mday = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on
; month/year
time-hour = 2DIGIT ; 00-23
time-minute = 2DIGIT ; 00-59
time-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap sec
; rules
time-secfrac = "." 1*DIGIT
time-numoffset = ("+" / "-") time-hour ":" time-minute
time-offset = "Z" / time-numoffset
partial-time = time-hour ":" time-minute ":" time-second
[time-secfrac]
full-date = date-fullyear "-" date-month "-" date-mday
full-time = partial-time time-offset
date-time = full-date "T" full-time
The syntax of the content of ip literals can be described by the
ABNF for IPv4address and IPv6address in Section 3.2.2 of [RFC3986],
as included in slightly updated form in Figure 6.¶
app-string-ip = IPaddress ["/" uint]
IPaddress = IPv4address
/ IPv6address
; ABNF from RFC 3986, re-arranged for PEG compatibility:
IPv6address = 6( h16 ":" ) ls32
/ "::" 5( h16 ":" ) ls32
/ [ h16 ] "::" 4( h16 ":" ) ls32
/ [ h16 *1( ":" h16 ) ] "::" 3( h16 ":" ) ls32
/ [ h16 *2( ":" h16 ) ] "::" 2( h16 ":" ) ls32
/ [ h16 *3( ":" h16 ) ] "::" h16 ":" ls32
/ [ h16 *4( ":" h16 ) ] "::" ls32
/ [ h16 *5( ":" h16 ) ] "::" h16
/ [ h16 *6( ":" h16 ) ] "::"
h16 = 1*4HEXDIG
ls32 = ( h16 ":" h16 ) / IPv4address
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = "25" %x30-35 ; 250-255
/ "2" %x30-34 DIGIT ; 200-249
/ "1" 2DIGIT ; 100-199
/ %x31-39 DIGIT ; 10-99
/ DIGIT ; 0-9
DIGIT1 = %x31-39 ; 1-9
uint = "0" / DIGIT1 *DIGIT
It can be expected that implementations of the application-extension
identifier "cri" will make use of platform-provided URI
implementations, which will include a URI parser.¶
In case such a URI parser is not available or inconvenient to
integrate,
a grammar of the content of cri literals is provided by the
ABNF for URI-reference in Section 4.1 of RFC 3986 [RFC3986] with certain
re-arrangements taken from Section 5.2.4;
these are reproduced in Figure 7.
If the content is not ASCII only (i.e., for IRIs), first apply
Section 3.1 of [RFC3987] and apply this grammar to the result.¶
app-string-cri = URI-reference
; ABNF from RFC 3986:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
URI-reference = URI / relative-ref
absolute-URI = scheme ":" hier-part [ "?" query ]
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
authority = [ userinfo "@" ] host [ ":" port ]
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
host = IP-literal / IPv4address / reg-name
port = *DIGIT
IP-literal = "[" ( IPv6address / IPvFuture ) "]"
IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
; Use IPv6address, h16, ls32, IPv4adress, dec-octet as re-arranged
; for PEG Compatibility in Figure 6 of [RFC XXXX]:
IPv6address = 6( h16 ":" ) ls32
/ "::" 5( h16 ":" ) ls32
/ [ h16 ] "::" 4( h16 ":" ) ls32
/ [ h16 *1( ":" h16 ) ] "::" 3( h16 ":" ) ls32
/ [ h16 *2( ":" h16 ) ] "::" 2( h16 ":" ) ls32
/ [ h16 *3( ":" h16 ) ] "::" h16 ":" ls32
/ [ h16 *4( ":" h16 ) ] "::" ls32
/ [ h16 *5( ":" h16 ) ] "::" h16
/ [ h16 *6( ":" h16 ) ] "::"
h16 = 1*4HEXDIG
ls32 = ( h16 ":" h16 ) / IPv4address
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = "25" %x30-35 ; 250-255
/ "2" %x30-34 DIGIT ; 200-249
/ "1" 2DIGIT ; 100-199
/ %x31-39 DIGIT ; 10-99
/ DIGIT ; 0-9
reg-name = *( unreserved / pct-encoded / sub-delims )
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
; non-zero-length segment without any colon ":"
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
query = *( pchar / "/" / "?" )
fragment = *( pchar / "/" / "?" )
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
For some applications of CDN, it is an optimization to integrate
parsers for the content of some prefixed string literals into
the main parser, handling both the string literal syntax (e.g., escapes such
as \' and \\) and the syntax of the extension content in one go.¶
For application-extensions that only use printable ASCII characters
(from U+0020 to U+007E) minus single-quote ' and backslash \, the ABNF
such as that given in Section 5.2 can be directly used as an
integrated parser, after adding some glue ABNF.
For instance, for app-string-dt, add an alternative to bstr that
points to a rule for prefixed single-quoted string literals (Figure 8).¶
bstr = sq-app-string-dt /
app-string / sqstr / app-sequence / embedded
sq-app-string-dt = (%s"dt'"/%s"DT'") app-string-dt "'"
To facilitate writing integrated ABNF for more complex prefixed string literals, the ABNF definitions in Figure 9 may be useful and are used in the rest of this section.¶
i-HT = %s"\t" / %s"\u" ("0009" / "{" *("0") "9}")
i-LF = %x0a / %s"\n" / %s"\u" ("000A" / "{" *("0") "A}")
i-CR = %x0d / %s"\r" / %s"\u" ("000D" / "{" *("0") "D}")
i-blank = i-LF / i-CR / " "
i-non-lf = i-HT / i-CR / %x20-26 / "\'" / %x28-5b
/ "\\" / %x5d-7f / i-NONASCII
i-NONASCII = NONASCII / %s"\u" ESCGE7F
; hex escaping for U+007F or greater
ESCGE7F = "D" ("8"/"9"/"A"/"B") 2HEXDIG
%s"\u" "D" ("C"/"D"/"E"/"F") 2HEXDIG
/ FOURHEX1 / "0" HEXDIG1 2HEXDIG / "00" TWOHEX1
/ "{" *("0")
("10" 4HEXDIG / HEXDIG1 4HEXDIG
/ FOURHEX1 / HEXDIG1 2HEXDIG / TWOHEX1)
"}"
; xxxx - 0xxx - Dhigh\uDloow
FOURHEX1 = (DIGIT1 / "A"/"B"/"C" / "E"/"F") 3HEXDIG
/ "D" ODIGIT 2HEXDIG
; 00xx - ASCII + 007F
TWOHEX1 = ("8"/"9" / HEXDIGA) HEXDIG / "7F"
Similarly, for integrated parsers for extension literals built from raw strings, the ABNF
definitions in Figure 10 can be useful.
alikerawdelim only matches sequences of backquotes that are exactly as
long as a previous startrawdelim.¶
r-non-lf = %x0D / %x20-5f / %x61-7f / NONASCII / shortrawdelim
Four subsections with ABNF for integrated parsers follow, a pair for
h'' and b64'', and a pair for h`` and b64``.
There is no expectation for a new application-extension to supply ABNF
for an integrated parser (or any ABNF at all!), in particular if the
parsing function is likely to be fulfilled by a platform library.
If ABNF for the content of a single-quoted string is available in an
application-extension specification, ABNF for an integrated parser can
be written as a separate activity or also automatically derived (see
also [CDN-WIKI], where more information about implementing integrated
parsers is being collected).¶
With glue ABNF similar to that in Figure 8 and common
definitions in Figures 2 and 9, ABNF such as
that shown in Figure 11 can be used as an integrated parser
for h prefixed single-quote strings.¶
sq-app-string-h = %s"h'" s-app-string-h "'"
s-app-string-h = h-S *(HEXDIG h-S HEXDIG h-S / ellipsis h-S)
[eol-comment *i-non-lf]
h-S = *(i-blank) *(h-comment *(i-blank))
h-non-slash = i-blank / %x21-26 / "\'" / %x28-2e
/ %x30-5b / "\\" / %x5d-7f / i-NONASCII
h-non-slash-star = i-blank / %x21-26 / "\'" / %x28-29 / %x2b-2e
/ %x30-5b / "\\" / %x5d-7f / i-NONASCII
h-non-star = i-blank / %x21-26 / "\'" / %x28-29 / %x2b-5b
/ "\\" / %x5d-7f / i-NONASCII
h-ends-in-star = *h-non-star 1*"*"
h-comment = "/" h-non-slash-star *h-non-slash "/"
/ "/*" h-ends-in-star
*(h-non-slash-star h-ends-in-star) "/"
/ eol-comment *i-non-lf i-LF
With glue ABNF similar to that in Figure 8 and common
definitions in Figures 2 and 9, ABNF such as
that shown in Figure 12 can be used as an integrated parser
for b64 prefixed single-quote strings.¶
sq-app-string-b64 = %s"b64'" s-app-string-b64 "'"
s-app-string-b64 = b64-S *(4(b64dig b64-S))
[b64dig b64-S b64dig b64-S
["=" b64-S "=" / b64dig b64-S ["="]] b64-S]
["#" *i-non-lf]
b64dig = ALPHA / DIGIT / "-" / "_" / "+" / "/"
b64-S = *i-blank *(b64-comment *i-blank)
b64-comment = "#" *i-non-lf %x0A
With glue ABNF similar to that in Figure 8 and common
definitions in Figures 2, 9
and
10, ABNF such as that shown in Figure 13 can
be used as an integrated parser for h prefixed raw strings.¶
raw-app-string-h = %s"h" startrawdelim r-app-string-h
r-app-string-h = rh-S *(HEXDIG rh-S HEXDIG rh-S / ellipsis rh-S)
(eol-comment *r-non-lf alikerawdelim / alikerawdelim)
rh-S = *(lblank) *(rh-comment *(lblank))
rh-2 = %x61-7f / NONASCII / shortrawdelim
rh-non-slash = lblank / %x21-2e / %x30-5f / rh-2
rh-non-slash-star = lblank / %x21-29 / %x2b-2e / %x30-5f / rh-2
rh-non-star = lblank / %x21-29 / %x2b-5f / rh-2
rh-ends-in-star = *rh-non-star 1*"*"
rh-comment = "/" rh-non-slash-star *rh-non-slash "/"
/ "/*" rh-ends-in-star
*(rh-non-slash-star rh-ends-in-star) "/"
/ eol-comment *r-non-lf %x0A
With glue ABNF similar to that in Figure 8, common
definitions in Figures 2, 9
and 10 as well as the rule
b64dig from Figure 12, ABNF such as
that shown in Figure 14 can be used as an integrated parser
for b64 prefixed raw strings.¶
raw-app-string-b64 = %s"b64" startrawdelim r-app-string-b64
r-app-string-b64 = rb64-S *(4(b64dig rb64-S))
[b64dig rb64-S b64dig rb64-S
["=" rb64-S "=" / b64dig rb64-S ["="]] rb64-S]
("#" *r-non-lf alikerawdelim / alikerawdelim)
rb64-S = *lblank *(rb64-comment *lblank)
rb64-comment = "#" *r-non-lf %x0A
RFC Editor: please replace RFC-XXXX with the RFC number of this RFC, [IANA.concise-diagnostic-notation] with a reference to the new registry group, and remove this note.¶
IANA is requested to create an "Application-Extension Identifiers" registry in a new "Concise Diagnostic Notation" registry group [IANA.concise-diagnostic-notation], with the policy "expert review" (Section 4.5 of RFC 8126 [BCP26]).¶
The experts are instructed to be frugal in the allocation of application-extension identifiers that are suggestive of generally applicable semantics, keeping them in reserve for application-extensions that are likely to enjoy wide use and can make good use of their conciseness. The experts are also instructed to direct the registrant to provide a specification (Section 4.6 of RFC 8126 [BCP26]), but can make exceptions, for instance when a specification is not available at the time of registration but is likely forthcoming. If the experts become aware of application-extension identifiers that are deployed and in use, they may also initiate a registration on their own if they deem such a registration can avert potential future collisions.¶
Each entry in the registry must include:¶
a lowercase ASCII [STD80] string that starts with a letter and can
contain letters, digits, and hyphens after that ([a-z][a-z0-9-]*).
No other entry in the registry can have the same
application-extension identifier.¶
a brief description¶
a reference document that provides a description of the application-extension identifier¶
The initial content of the registry is shown in Table 8; all initial entries have the Change Controller "IETF".¶
| Application-extension Identifier | Description | Reference |
|---|---|---|
| h | Reserved | RFC8949 |
| b32 | Reserved | RFC8949 |
| h32 | Reserved | RFC8949 |
| b64 | Reserved | RFC8949 |
| false | Reserved | RFC-XXXX |
| true | Reserved | RFC-XXXX |
| null | Reserved | RFC-XXXX |
| undefined | Reserved | RFC-XXXX |
| pragma | Reserved for future use | RFC-XXXX |
| dt | Date/Time | RFC-XXXX |
| ip | IP Address/Prefix | RFC-XXXX |
| hash | Cryptographic Hash | RFC-XXXX |
| b1 | Byte String Concatenation | RFC-XXXX |
| t1 | Text String Concatenation | RFC-XXXX |
| cri | Constrained Resource Identifier | RFC-XXXX, [I-D.ietf-core-href] |
| float | Floating-Point Value | RFC-XXXX |
IANA is requested to create an "Encoding Indicators" registry in the newly created "Concise Diagnostic Notation" registry group [IANA.concise-diagnostic-notation], with the policy "specification required" (Section 4.6 of RFC 8126 [BCP26]).¶
The experts are instructed to be frugal in the allocation of encoding indicators that are suggestive of generally applicable semantics, keeping them in reserve for encoding indicator registrations that are likely to enjoy wide use and can make good use of their conciseness. If the experts become aware of encoding indicators that are deployed and in use, they may also solicit a specification and initiate a registration on their own if they deem such a registration can avert potential future collisions.¶
Each entry in the registry must include:¶
an ASCII [STD80] string that starts with an underscore letter and
can contain zero or more underscores, letters and digits after that
(_[_A-Za-z0-9]*). No other entry in the registry can have the same
Encoding Indicator.¶
a brief description.
This description may employ an abbreviation of the form ai=nn,
where nn is the numeric value of the field additional information, the
low-order 5 bits of the initial byte (see Section 3 of RFC 8949 [STD94]).¶
a reference document that provides a description of the application-extension identifier¶
The initial content of the registry is shown in Table 9; all initial entries have the Change Controller "IETF".¶
| Encoding Indicator | Description | Reference |
|---|---|---|
| _ | Indefinite-Length Encoding (ai=31) | RFC8949, RFC-XXXX |
| _i | ai=0 to ai=23 | RFC-XXXX |
| _0 | ai=24 | RFC8949, RFC-XXXX |
| _1 | ai=25 | RFC8949, RFC-XXXX |
| _2 | ai=26 | RFC8949, RFC-XXXX |
| _3 | ai=27 | RFC8949, RFC-XXXX |
| _4 | Reserved (for ai=28) | RFC-XXXX |
| _5 | Reserved (for ai=29) | RFC-XXXX |
| _6 | Reserved (for ai=30) | RFC-XXXX |
| _7 | Reserved (see _) | RFC8949, RFC-XXXX |
IANA is requested to add the following Media-Type to the "Media Types" registry [IANA.media-types].¶
| Name | Template | Reference |
|---|---|---|
| cdn | application/cdn | RFC-XXXX, Section 6.3 |
application¶
cdn¶
N/A¶
N/A¶
binary (UTF-8)¶
none¶
Section 6.3 of RFC XXXX¶
Tools interchanging a human-readable form of CBOR¶
The syntax and semantics of fragment identifiers is as specified for "application/cbor". (At publication of RFC XXXX, there is no fragment identification syntax defined for "application/cbor".)¶
CBOR WG mailing list (cbor@ietf.org), or IETF Applications and Real-Time Area (art@ietf.org)¶
LIMITED USE¶
Concise diagnostic notation represents CBOR data items, which are the format intended for actual interchange. The media type application/cdn is intended to be used within documents about CBOR data items, in diagnostics for human consumption, and in other representations of CBOR data items that are necessarily text-based such as in configuration files or other data edited by humans, often under source-code control.¶
IETF¶
no¶
IANA is requested to register a Content-Format number in the "CoAP Content-Formats" sub-registry, within the "Constrained RESTful Environments (CoRE) Parameters" Registry [IANA.core-parameters], as follows:¶
| Content-Type | Content Coding | ID | Reference |
|---|---|---|---|
| application/cdn | - | TBD1 | RFC-XXXX |
TBD1 is to be assigned from the space 256..9999, according to the procedure "IETF Review or IESG Approval", preferably a number less than 1000.¶
The security considerations of [STD94] apply, including by applying the considerations about the CBOR format to the CDN format in an analogous sense. Security considerations documented in [RFC8610] for the CDDL language often are also applicable to the CDN language in an analogous sense.¶
The CDN specification defines two explicit extension points: application-extension identifiers (Section 6.1) and encoding indicators (Section 6.2). Extensions introduced through these can have their own security considerations, which need to be considered in the specification for the extension (see, e.g., Section 5 of [I-D.ietf-cbor-edn-e-ref]).¶
Implementers of tools that support the use of CDN extensions need to avoid inadvertently introducing a vector that allows attackers to invoke extensions not planned for by the tool operator, who might not have considered security considerations of specific extensions such as those posed by their use of dereferenceable identifiers (Section 6 of [I-D.bormann-t2trg-deref-id]).¶
Tools might require explicitly enabling the use of each extension that is not on an allowlist. (This task can possibly be made less onerous by combining it with a mechanism for supplying any parameters that control such an extension.)¶
Tools that process application extensions — directly from their use in CDN or later via Tag CPA999 (Section 4.1) — need to be configured out of band to enable processing each specific application extension only if that is desired. An allowlist built out of the mandatory-to-implement application extensions may be an exception.¶
Similarly, inputs to validators may be prepared with partially specified subtrees by representing ellipses via Tag CPA888 (Section 4.2). Validators that want to accept such partially specified CBOR data items need to require explicit configuration to do so.¶
This appendix is for information.¶
CDN was designed as a language to provide a human-readable representation of an instance, i.e., a single CBOR data item or CBOR sequence. CDDL was designed as a language to describe an (often large) set of such instances (which itself constitutes a language), in the form of a data definition or grammar (or sometimes called schema).¶
The two languages share some similarities, not the least because they have mutually inspired each other. But they have very different roots:¶
CDN syntax is an extension to JSON syntax [STD90].
(Any (interoperable) JSON text is also valid CDN.)¶
For engineers that are using both CDN and CDDL, it is easy to write "CDDLisms" or "CDNisms" into their drafts that are meant to be in the other language. (This is one more of the many motivations to always validate formal language instances with tools.)¶
Important differences include:¶
Comment syntax. CDDL inherits ABNF's semicolon-delimited end of
line characters, while CDN finds nothing in JSON that could be inherited here.
Inspired by JavaScript, CDN simplifies JavaScript's copy of the
original C comment syntax to be delimited by single slashes (where
line breaks are not of interest); it also adds traditional C-style
inline comments (/* ... */) and end-of-line comments
that start with # or //.¶
Syntax for tags. CDDL's tag syntax is part of the system for referring to CBOR's fundamentals (the major type 6, in this case) and (with [RFC9682]) allows specifying the actual tag number separately, while CDN's tag syntax is a simple decimal number and a pair of parentheses.¶
Embedded CBOR. CDN has a special syntax to describe the content of byte strings that are encoded CBOR data items. CDDL can specify these with a control operator, which looks very different.¶
The concept of application-oriented extensions to diagnostic notation, as well as the definition for the "dt" extension, were inspired by the CoRAL work by Klaus Hartke.¶
(TBD)¶
2.2. Comments
For presentation to humans, CDN text may benefit from comments. JSON famously does not provide for comments, and the original diagnostic notation in Section 6 of [RFC7049] inherited this property.¶
CDN provides two comment syntaxes, which can be used where the syntax allows blank space (outside of constructs such as numbers, string literals, etc.):¶
inline comments, delimited by slashes ("
/") or by C-style "/*" and "*/":¶In a position that allows blank space, each of the following is considered blank space (and thus effectively a comment):¶
any text that starts with a slash followed by a character that is not a star or a slash, up to another slash, or¶
any text that starts with "
/*" up to and including the next following "*/"¶end-of-line comments, delimited by "
#" or "//" and an end of line (LINE FEED, U+000A):¶In a position that allows blank space, any text starting with "
#" or "//" and ending with and including the end of the line is considered blank space (and thus effectively a comment).¶Comments can be used to annotate a CBOR structure as in:¶
/grasp-message/ [/M_DISCOVERY/ 1, /session-id/ 10584416, /objective/ [/objective-name/ "opsonize", /D, N, S/ 7, /loop-count/ 105]]¶This reduces to
[1, 10584416, ["opsonize", 7, 105]].¶Another example, combining the use of inline and end-of-line comments:¶
{ /kty/ 1 : 4, # Symmetric /alg/ 3 : 5, # HMAC 256-256 /k/ -1 : h'6684523ab17337f173500e5728c628547cb37df e68449c65f885d1b73b49eae1' }¶This reduces to
{1: 4, 3: 5, -1: h'6684523AB17337F173500E5728C628547CB37DFE68449C65F885D1B73B49EAE1'}.¶A CDN file used for configuration might look like this (employing '//' end of line comments throughout and an ornamental C-Style comment at the start):¶
/* ### MyApp Configuration * John Example, 2026-06-09 */ { // Top-level config for the app "appName": "MyApp", // short name shown in UI "version": "1.2.0", ...: ... }¶2.2.1. Discussion
Appendix G.6 of [RFC8610] introduced comments into the diagnostic notation syntax, limited to inline comments using a bare "
/" as the comment delimiter. It however also hinted at the potential desire to add end-of-line comments, mentioning both "//" and "#" as start delimiters.¶The present specification adds both, as well as C-style inline comments ("
/*" and "*/" delimiters).¶This introduces a backwards-incompatible change, restricting slash-delimited comments that were allowed by Appendix G.6 of [RFC8610] in two ways:¶
Inline comments no longer can be empty: The construct "
//" that was an empty comment in Appendix G.6 of [RFC8610] is now used instead to introduce an end-of-line comment. (Note that "//" still can be used in what is visually "within" a slash-delimited comment like in the second example below; its first slash actually ends the current comment and the second slash starts a new one.)¶Enabling the use of C-style inline comments can extend the scope of what previously were parsed as slash-delimited comments: for instance, "
/*foo/" was a complete comment in Appendix G.6 of [RFC8610] and now is the beginning of a C-style comment that goes on up to a "*/".¶As an example for what is enabled by this change, the introduction of C-style inline comments enables a comment explaining a COSE algorithm identifier, as in¶
instead of the previously conventional, but often less familiar¶