| Internet-Draft | CSV++ | January 2026 |
| Caldas | Expires 11 July 2026 | [Page] |
This document specifies CSV++ (CSV Plus Plus), an extension to the Comma-Separated Values (CSV) format defined in RFC 4180. CSV++ adds support for repeating fields (one-to-many relationships) and hierarchical component structures while maintaining backward compatibility with standard CSV parsers. The extension uses declarative syntax in column headers to define array fields and nested structures, enabling representation of complex real-world data while preserving the simplicity and human-readability of CSV.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 11 July 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
CSV++ extends the CSV format defined in [RFC4180] to support repeating fields (one-to-many relationships) and hierarchical component structures while maintaining backward compatibility with standard CSV parsers.¶
Traditional CSV files represent flat, tabular data. However, real-world data often contains:¶
CSV++ addresses these needs while keeping the simplicity and human-readability of CSV with a straightforward syntax.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
CSV++ files MUST conform to [RFC4180] with these specifications:¶
The field separator character is detected using the same rules as [RFC4180]. Parsers SHOULD auto-detect the field separator by:¶
The comma (,) is the conventional field separator for CSV++ files.¶
A field containing repeated values is declared in the header using square brackets:¶
column_name[delimiter] column_name[]¶
Where:¶
Delimiter Resolution:¶
The tilde (~) is recommended as the default array delimiter to avoid conflicts with common data characters and the field separator.¶
id,name,phone[|],email[;] 1,John,555-1234|555-5678|555-9012,john@work.com;john@home.com 2,Jane,555-4444,jane@company.com
id,name,phone[],email[] 1,John,555-1234~555-5678~555-9012,john@work.com~john@home.com 2,Jane,555-4444,jane@company.com
Empty values in repetitions are represented by consecutive delimiters:¶
id,tags[|] 1,urgent||priority
This represents three tags: "urgent", "" (empty), "priority"¶
A field containing structured components is declared using parentheses:¶
column_name[repetition_delim]component_delim(
comp1 component_delim comp2 ...)
column_name[]component_delim(comp1 component_delim comp2 ...)
column_name[](comp1 component_delim comp2 ...)
column_name(comp1 component_delim comp2 ...)
¶
Component Delimiter Resolution:¶
The caret (^) is recommended as the default component delimiter to avoid conflicts with common data characters.¶
id,name,geo^(lat^lon) 1,Location A,34.0522^-118.2437 2,Location B,40.7128^-74.0060
id,name,address[~]^(street^city^state^zip) 1,John,123 Main St^Los Angeles^CA^90210~456 Oak Ave^New York^NY^10001 2,Jane,789 Pine St^Boston^MA^02101
Structures can nest arbitrarily deep. Component names can themselves be arrays or structures. Within component names in (...), array and structure syntax applies recursively.¶
id,name,address[~]^(type^lines[;]^city^state^zip) 1,John,home^123 Main;Apt 4^LA^CA^90210~work^456 Oak^NY^NY^10001
id,location^(name^coords:(lat:lon)) 1,Office^34.05:-118.24 2,Home^40.71:-74.00
To maintain readability and parseability:¶
CSV++ parsers process files in two phases:¶
The ABNF grammar in Appendix A provides a formal specification. Implementations MUST handle arbitrary nesting depth up to their documented limits.¶
Implementations SHOULD validate:¶
Malicious data could attempt to inject delimiters to break parsing. Implementations MUST respect [RFC4180] quoting. Quoted fields MUST be parsed as literal values. Delimiters inside quotes MUST NOT be interpreted as separators.¶
Deeply nested or highly repetitive structures could cause excessive memory consumption or CPU exhaustion during parsing.¶
Mitigations:¶
Files SHOULD use UTF-8 encoding. Implementations SHOULD detect and handle encoding issues. BOM (Byte Order Mark) MAY be present.¶
This document has no IANA actions.¶
CSV++ files use the text/csv media type defined in [RFC4180]. The format is fully backward compatible with standard CSV parsers.¶
csvpp-file = header-row data-rows
header-row = field *(field-sep field) CRLF
data-rows = *(data-row CRLF)
data-row = value *(field-sep value)
field = simple-field / array-field /
struct-field / array-struct-field
simple-field = name
array-field = name "[" [delimiter] "]"
struct-field = name [component-delim] "(" component-list ")"
array-struct-field = name "[" [delimiter] "]"
[component-delim] "(" component-list ")"
component-list = component *(component-delim component)
component = simple-field / array-field /
struct-field / array-struct-field
name = 1*field-char
field-char = ALPHA / DIGIT / "_" / "-"
delimiter = CHAR
component-delim = CHAR
value = quoted-value / unquoted-value
quoted-value = DQUOTE *(textdata / escaped-quote) DQUOTE
unquoted-value = *textdata
escaped-quote = DQUOTE DQUOTE
textdata = <any character except DQUOTE, CRLF, or field-sep>
¶
id,cust,items[~]^(sku^name^qty^price^opts[;]:(k:v)) 1,Alice,S1^Shirt^2^20^sz:M;col:blu~S2^Pant^1^50^sz:32
This specification was inspired by the HL7 Version 2.x delimiter hierarchy and the need for a simple, human-readable format for hierarchical data that maintains compatibility with existing CSV tools.¶