Yet Another Markup Language (YAML) 1.0

Working Draft 10 Dec 2001

Latest version:
https://yaml.org/spec/
Editors:

Brian Ingerson (mailto:ingy@ttul.org)),
Clark C. Evans
Oren Ben-Kiki (mailto:oren@ben-kiki.org)

Status of this Document

This specification is a working draft and reflects consensus reached by the members of the yaml-core mailing list. Any questions regarding this draft should be raised on this list. This is a draft and changes are expected, therefore implementers should closely follow this mailing list to stay up-to-date on trends and announcements.


Abstract

YAML (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. This specification describes the YAML information model and serialization format.

Table of Contents

1 Introduction
    1.1 Goals
    1.2 Origin
    1.3 Relation to XML
    1.4 Terminology

2 Preview
    2.1 Collections
    2.2 Structures
    2.3 Styles
    2.4 Type Family
    2.5 Full Length Examples

3 Key Concepts
     3.1 General Concepts
         3.1.1 Type Family
         3.1.2 String Format
     3.2 Graph Model
         3.2.1 Node
         3.2.2 Scalar
         3.2.3 Identity
         3.2.4 Node set
         3.2.5 Collection
         3.2.6 Equality
         3.2.7 Documents
     3.3 Tree Model
         3.3.1 Tree Node
         3.3.2 Leaf
         3.3.3 Alias
         3.3.4 Pair
         3.3.5 Branch
         3.3.6 Ordering
     3.4 Syntax Model
         3.4.1 Style
         3.4.2 Format
         3.4.3 Comment
         3.4.4 Directive

4 Serialization Syntax
     4.1 Characters
         4.1.1 Character Set
         4.1.2 Encoding
         4.1.3 Indicators
         4.1.4 Escape Codes
         4.1.5 Miscellaneous Characters
     4.2 White Space Processing
         4.2.1 Indentation
         4.2.2 End-of-Line Normalization
         4.2.3 Throwaway comments
         4.2.4 Line Folding
     4.3 YAML Stream
         4.3.1 Directive
         4.3.2 Node
         4.3.3 Property
         4.3.4 Transfer Method
         4.3.5 Anchor
     4.4 Alias
     4.5 Collection
         4.5.1 Sequence
         4.5.2 Map
     4.6 Scalar
         4.6.1 Block Scalar
         4.6.2 Folded Scalar
         4.6.3 Escaped Scalar
         4.6.4 Plain Scalar

5 Transfer Methods
     5.1 Explicit Typing
     5.2 Implcit Typing
     5.3 Common Type Families
         5.3.1 Sequence
         5.3.2 Map
         5.3.3 String
         5.3.4 Null
         5.3.5 Pointer
         5.3.6 Integer
         5.3.7 Float
         5.3.8 Binary
         5.3.9 Special Keys
     5.4 Unsupported Transfer Methods

6 Changes From Other Versions

1 Introduction

Yet Another Markup Language, abbreviated YAML, is a human readable data serialization format and processing model. This text describes the class of data objects called YAML documents and partially describes the behavior of computer programs that process them.

YAML documents encode into a serialized form the native data constructs of modern scripting languages. Strings, arrays, hashes, and other user defined data types are supported. A YAML document stream consists of a sequence of characters, some of which are considered part of the document's content, and others that are used to indicate structure within the information stream.

A software module called a YAML parser is used to read YAML documents and provide access to their content and structure. In a similar way, a YAML emitter is used to write YAML documents, serializing their content and structure. A YAML processor is a module that provides parser or emitter functionality or both. It is assumed that a YAML processor does its work on behalf of another module, called an application. This specification describes the interface and required behavior of a YAML processor in terms of how it must read or write YAML document streams and the information it must provide to or obtain from the application.

1.1 Goals

The design goals for YAML are:

  1. YAML documents are very readable by humans.

  2. YAML interacts well with scripting languages.

  3. YAML uses host languages' native data structures.

  4. YAML has a consistent information model.

  5. YAML enables stream based processing.

  6. YAML is expressive and extensible.

  7. YAML is easy to implement.

YAML was designed with experience gained from the construction and deployment of Data::Denter. YAML has also enjoyed much markup language critique from SML-DEV list participants, including experience with the Minimal XML and Common XML specifications.

1.2 Origin

YAML integrates and builds upon structures and concepts described by Perl, XML, SOAP, Python, HTML, C, RFC0822, RFC2045 and SAX.

YAML's core type system is based on serialization requirements of the Perl language. YAML directly supports both scalar values (string, integer) and collections (array,hash). Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).

Like XML's SOAP, the YAML serialization supports native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application defined types. This allows YAML to serialize rich data structures required for modern distributed computing.

YAML's block scoping is similar to Python's. In YAML, the extent of a node is indicated by its column. YAML's block scalar leverages this by enabling formatted text to be cleanly mixed within an aggregate structure without troublesome escaping. Further, YAML's block indenting provides for easy inspection of the document's structure.

Motivated by HTML's end of line normalization, YAML's folded scalars introduce a unique method of handling whitespace. In YAML, single line breaks may be folded into a single space. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.

YAML's escaped scalars use familar C style escape sequences. This enables ASCII representation of non-printables or 8-bit (ISO 8859-1) characters using '\x3B', 16-bit (Unicode) characters with '\u003B', and 32-bit (ISO/IEC 10646) characters can be specified using '\U0000003B' style escapes.

The syntax of YAML was motivated by Internet Mail (RFC0822) and can be used for HTTP headers. Further, YAML borrows the document separator from MIME (RFC2045). With this insight, YAML's top level production is a stream of independent documents; ideal for distributed processing systems.

YAML was designed to have an incremental interface which includes both a pull style input stream and a push style (SAX like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.

1.3 Relation to XML

There are many differences between YAML and the eXtensible Markup Language ("XML"). XML was designed to be backwards compatible with Standard Generalized Markup Language ("SGML") and thus had many design constraints placed on it that YAML does not share. Also XML, inheriting SGML's legacy, is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML has been grown on the lessons learned by the XML community.

The YAML and XML information models are starkly different. In XML, the primary construct is an attributed tree, where each element has an ordered, named list of children and an unordered mapping of names to strings. In YAML, the primary graph constructs are keyed collections (natively stored as a hash or array) and scalar values (string, integer, float). This difference is critical since YAML's model is directly supported by native data structures in most modern programming languages, where XML's model requires mapping conventions, or an alternative programming component (e.g. a document object model).

1.4 Terminology

The terminology used to describe YAML is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a YAML processor:

may

Conformant YAML streams and processors are permitted to but need not behave as described.

should

Conformant YAML texts and processors are encouraged to behave as described, but may do otherwise if a warning mesage is provided to the user and any deviant behavior requires consious effort (non-default setting) to enable.

must

Conformant YAML texts and processors are required to behave as described, otherwise they are in error.

error

A violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it.

This specification, together with the Unicode standard for characters, provides all the information necessary to understand YAML Version 1.0 and construct computer programs to process it.

2 Preview

This section provides a quick glimpse into the expressive power of YAML (and its clean syntax) without going into too much detail. It is not expected that the first time reader grok all of the examples. Instead these selections are used to motivate the information model and as guide posts for the serialization productions.

2.1 Collections

YAML collections allow for aggregation of data. There are two primary types of collections which YAML supports, sequences and mappings. Most tree structures can be constructed by nesting collections.

- Mark McGwire
- Sammy Sosa
- Ken Griffey

A1

Sequence of scalars
(ball players)

hr: 65
avg: .278
rbi: 147

A2

Mapping of scalars to scalars
(player statistics)

american:
 - Boston Red Socks
 - Detroit Tigers
 - New York Yankees
 - Texas Rangers
national:
 - New York Mets
 - Chicago Cubs
 - Atlanta Braves
 - Montreal Expos

A3

Mapping to sequences of scalars
(ball clubs in each league)

-
 name: Mark McGwire
 hr: 65
 avg: .278
 rbi: 147
-
 name: Sammy Sosa
 hr: 63
 avg: .288
 rbi: 141

A4

Sequence of mappings
(players statistics)

?
 - New York Yankees
 - Atlanta Braves
:
 - 2001-07-02
 - 2001-08-12
 - 2001-08-14
?
 - Detroit Tigers
 - Chigago Cubs
:
 - 2001-07-23

A5

Mapping from sequences to sequences
(team pair to play dates)

invoice: 34843
date   : 2001-01-23
bill-to:
 given  : Chris
 family : Dumars
product:
 -
  quantity: 4
  desc    : Basketball
 -
  quantity: 1
  desc    : Super Hoop

A6

Nesting of mappings and sequences
(a simple invoice)

2.2 Structures

YAML streams can be commented and separated into multiple documents. To allow for graph serialization, YAML has a built-in alias mechanism.

---
name: Mark McGwire
hr: 65
avg: .278
rbi: 147
---
name: Sammy Sosa
hr: 63
avg: .288
rbi: 141

B1

Two documents within a stream
(player statistics)

# Ranking of players by
# season home runs.
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey





B2

Single document with leading comment

# Home runs
hr:
 # 1998 record
 - Mark McGwire
 - Sammy Sosa
# Runs batted in
rbi:
 - Sammy Sosa
 - Ken Griffey

B3

Single document with nested comments

# Home runs
hr:
 # 1998 record
 - Mark McGwire
 - &001 Sammy Sosa
# Runs batted in
rbi:
 - *001
 - Ken Griffey

B4

Alias used for second occurance of Sammy Sosa.

2.3 Styles

Besides in-line scalars used above, YAML has support for several multi-line and quoted scalar styles. Furthermore, for small sequences and mappings, an in-line style helps make YAML easy to author.

--- \
Mark McGwire's
year was crippled
by a knee injury.

C1

Word-wrapping helps readability

--- |
    \/|\/|
    / |  |_


C2

Word-wrapping is not desired

--- \\
Sosa completed
another fine
season. \u263A



C3

Unicode smiley using ASCII

name: Mark McGwire
occupation: baseball player
comments: \
 Mark set a major
 league home run
 record in 1998.

C4

Scalars within a collection

years: "1998\t1999\t2000\n"
msg:  "Sosa did fine. \u263A"

C5

Double quoted (escaped in-line)

- ' \/|\/|  '
- ' / |  |_ '

C6

Single quoted (unescaped in-line)

- [ name        , hr,  avg ]
- [ Mark McGwire, 65, .278 ]
- [ Sammy Sosa  , 63, .288 ]

C7

Sequence of sequences (in-line)

Mark McGwire: {hr: 65 , avg: .278}
Sammy Sosa:   {hr: 63 , avg: .288}


C8

Mapping of mappings (in-line)

2.4 Type Family

To encode data type and other application semantics in a YAML serialization, every node has a type family and leaf nodes have a syntax format.

invoice: 34843
date   : 2001-01-23
buyer:
 given  : Chris
 family : Dumars
product:
 - 4 Basketballs
 - 1 Superhoop

D1

Implicit family;format

invoice: !int;decimal 34843
date   : !date;iso8609 2001-01-23
buyer: !map
 given  : !str Chris
 family : !str Dumars
product: !seq
 - !str 4 Basketballs
 - !str 1 Superhoop

D2

Explit family;format

--- !binary;base64 \
 R0lGODlhDAAMAIQAAP/
 9/X17unp5WZmZgAAAOf
 n515eXvPz7Y6OjuDg4J
 +fn5OTk6enp56enmlpa
 NjY6Ojo4SEhP/++f/++
 f/++f/++f/++f/++f/+
 EeECcgggoBADs=

D3

Binary type family and Base64 string format

--- !seq
0: Mark McGwire
1: Sammy Sosa
2: Ken Griffey
---
empty: !map
invoice: !str 34843


D4

Override implicit family

--- !org.clarkevans.timesheet
who: Clark C. Evans
when: 2001-11-18
hours: !.hours 3
description: \
 Wrote up these examples
 and learned alot about
 baseball statistics.



D5

Application specific family

--- !com.clarkevans.graph
- !.circle
 center: &ORIG {x: 73 , y: 129}
 radius: 7
- !.line [23,32,200,300]
- !.line [23,32,300,200]
- !.text
 center: *ORIG
 color: 0x02FDBA
 value: Center of circle

D6

Application specific family

2.5 Full Length Examples

Following are two full length examples. On the left is a sample invoice, on the right is a sample log file.

--- !com.clarkevans.invoice
invoice: 34843
date   : 2001-01-23
bill-to: &001
 given  : Chris
 family : Dumars
 address:
  line one: '458 Walkman Dr.'
  line two: Suite #292
  city    : Royal Oak
  state   : MI
  postal  : 48046
ship-to: *001
product:
 -
  quantity: 4
  id      : BL394D
  desc    : Basketball
  price   : $450.00
 -
  quantity: 1
  id      : BL4438H
  desc    : Super Hoop
  price   : $2,392.00
tax  : $251.42
total: $4443.52
comments: \
 Late afternoon is best.
 Backup contact is Nancy
 Billsmer @ 338-4338.

E1

Invoice

---
Date: 2001-11-23
Time: 13:02+5:00
User: ed
Warning: \
 This is an error message
 for the log file
---
Date: 2001-11-23
Time: 15:02+5:00
User: ed
Warning: \
 A slightly different error
 message.
---
Date: 2001-11-23
Time: 15:03+5:00
User: ed
Fatal: \
 Unknown variable "bar"
Stack:
 -
  file: TopClass.py
  line: 23
  code: x = MoreObject('345')
 -
  file: MoreClass.py
  line: 58
  code: foo = bar


E2

Log file

3 Key Concepts

Conceptually, a YAML system may be visualized as three interacting states: a serialization format, a event stream, and a native binding. Translating YAML information between these states are four processing components: a parser, a loader, a dumper, and an emitter. The parser extracts structured information from the input stream. The loader converts this information into the appropriate native structures.

 

 

 

 

 

[serialization  format]

-->

[event  stream]

-->

[native  binding]

 

(parser)

 

(loader)

 

 

 

 

 

 

[serialization  format]

<--

[event  stream]

<--

[native  binding]

 

(emitter)

 

(dumper)

 

For each one of the states above, there is a corresponding information model. The graph model covers the native binding, the tree model covers the event stream, and the syntax model covers the serialization format. Type information is moved between these states with the the type family and string format constructs.

graph model  The graph model abstracts data structures of common programming languages. Nodes in the graph include collections or a scalars. A collection is modeled as a function from one set of nodes to another. Scalars are nodes having a string representation. Both node kinds have a type family.
tree model  The tree model flattens the graph structure into a hierarchy of branches, leaves and alias nodes. A branch represents the first occurance of a collection, a leaf represents the first occurance of a given scalar, and an alias is a surrogate used for subsequent occurences of either graph nodes. The branch is modeled as an ordered set of tree node pairs.
syntax model  The syntax model enhances the tree model with comments, leaf styles and string formats, and other serialization specific details. Character serializations must also comply with the syntax productions given in the following section.

A processor need not expose the event stream (or the tree model) and may directly translate between a serialization and its native binding. However, such a direct translation should take place so that the native binding is constructed only from information available in the graph model. In particular, information particular to the the tree model (alias anchors and pair ordering) and syntax specific information (comments and styles) should not be used in the construction of a native binding. Exceptions to this guideline include editors which must operate on a direct image of the serialization format.

3.1 General Concepts

There are several core concepts shared by each information model primarly relating to type information and how it is communicated between the serialization format and a native binding.

3.1.1 Type Family

The type family mechanism provides an abstraction of data types which is portable across various languages and platforms. Each native binding may have zero or more native concrete types or class constructs which correspond to a given type family.

name
Each type family has a name used for explicit typing and for general identification. This name must comply with the type family production.
definition
A description of the particular category of information independent of language and platform.
format
Each type family used for scalar nodes has an optional default string format.
implicit
A set of zero or more string formats used for implicit typing. Each format may only be used in a single type family for this purpose.

In general, there may be more than one native type which corresponds to the type family. In the Python languagek, for example, the integer family may be bound to either the a plain integer capable of holding 32 bits, or the long integer with unlimited size. In situations like this, the loader makes the choice.

In other cases, a binding may not have an appropriate native construct for a given type family. This may be addressed with a generic YAML construct to act as a place-holder so that the data value and the type family may round-trip. Alternatively, with warning to the user, a value may be cast to a different, perhaps less specific family. Otherwise, a processor must raise an exception when a native binding for a particular value is not possible.

3.1.2 String Format

It may be possible to write a string value of a leaf in more than one way. For example, an integer value of 254 can also be written in hex as 0xFF. This distinction is covered by the concept of a string format.

name
Each string format has a name used for for explicit typing and for general identification. This name must comply with the string format production.
definition
A description of the format as it applies to particular data values.
regex
Regular expressions may be provided to allow implicit identification of the string format, or to enable the parser to validate that a given value is indeed compliant with the string format.

As noted above, each type family has exactly one default string format; although more than one string format may apply. For example, the decimal format is the default for integers and the base64 format is the default for the binary type family.

3.2 Graph Model

The graph model abstracts data structures of common programming languages. The model is a graph of collection and scalar values, where each node in the graph is provided with type information. The model provides an intermediate interface between the parser/emitter which can be shared by multiple native languages, and the loader/dumper which is specific to a particular binding. The model also provides a concrete representation for language independent storage, simple structural queries, and graph transformations.

In the graph model, YAML is viewed as a directed graph of typed nodes. Nodes that can reference other nodes are collections and nodes with a string representation are scalars. The graph model also requires node identity and a mechanism to determine if two different nodes have the same content.

3.2.1 Graph Node

A graph node is the building block of YAML structures. In the serialization, they represent indented blocks. Within a native binding they represent an application specific objects. In the graph model, a node is tagged with a type family and can either be a collection or a scalar.

kind
A node may be one of two kinds, a collection or a scalar.
family
Each node is associated with a type family. For scalar nodes, the family is required to have a default string format. For collections, the family need not have a default format.

3.2.2 Scalar

A scalar is a graph node with a string representation.

string
Each scalar must have a canonical string representation. This is a series of zero or more printable unicode characters according to the type family's default string format.

The default type family for scalar nodes is org.yaml.str. The string representation of the scalar together with its type family should be sufficient to encode most native data types not having a composite structure. Other scalar type families include integer, float, and binary.

3.2.3 Identity

In most programming languages, there are two manners in which variables can be equivalent. The first is by reference, where the two variables refer to the same memory address. We call this equivalence identity.

The second form of equivalence occurs when two nodes are different (have a different memory address), but share the same content or have the same binary layout. We call this second form of equivalence equality. It follows that when two nodes are identical they are also equal.

3.2.4 Node set

A node set is an unordered association of zero or more graph nodes. A node may participate in many node sets without restriction, allowing for a graph structure. However, node sets may not contain duplicates, that is, a node with a particular identity may only appear once. The primary purpose of the node set is to provide a basis for the definition of a collection. A native binding usually exposes node sets through a mechanism to enumerate the keys of a hash or dictionary.

3.2.5 Collection

A collection is a graph node which represents sequences such as lists or arrays, or mappings such as hashes or dictionaries. In the graph model, sequences are treated uniformly as mappings with integer keys. There are two collection rules. First, a set of keys may not contain two nodes that are equal. Second, each key is associated with exactly one value. Note that this does not prevent a value from being associated with more than one key.

domain
A domain is a node set restricted such that no two nodes in the set may be equal. Nodes which are members of the domain are often called keys.
range
A range is node set without restrictions. Nodes which are members of the range are often called values.
function
A function is a rule of correspondence from the domain onto the range such that there is a unique value in the range assigned to every key in the domain.

The default type family for collection nodes is org.yaml.map, which covers associative containers such as the Perl hash or Python dictionary. When the domain is a continuous series of positive integers starting with zero, the preferred type family is org.yaml.seq which includes the Perl array or Python list.

3.2.6 Equality

Node equality determines when two given nodes have the same content. Technically, equality is an equivalence relation (like identity above). When two nodes are equivalent under this relation, they are said to be equal. Equality is defined between scalar nodes and between collection nodes, as described below.

scalar equality
Two scalars are equal means they have the same type family and their canonical string representations have exactly the same series of unicode characters..
collection equality
Equality of a collection is defined recursively. Two collections are equal means that they have the same type family and for each key in the domain of one, there is a corresponding key in the domain of the other such that both keys are equal and their corresponding values are equal; here value refers to the unique node in the range of the collection assigned to the key by the collection's function.

3.2.7 Documents

The start of a YAML text (file or stream) is a series of disjoint graphs, each with a root node.

root
A series of zero or more document nodes.
document
A top level graph node that is disjoint from all other document nodes.

The term disjoint means that for any two nodes x and y, there does not exist a third node z such that is both reachable from x and y. For any node x, x is reachable from y means that either x and y are identical; or y is a collection and there exists a node z in the domain or the range of y such that x is reachable from z.

3.3 Tree Model

To allow for YAML to be communicated as a series of events, an ordered tree structure must be used instead of a graph. This section describes an extension to the graph model where the graph is flattened and ordered to provide a tree interface. The resulting tree structured model uses several constructs and imposes a linear ordering which is not part of the graph model. Applications constructing an native binding from an implementation of the tree model should not use these additional constructs and the imposed ordering to preserve important data.

3.3.1 Tree node

To layout graph nodes as a tree structure, a mechanism is needed to manage duplicates. This is solved with a three node system: branch, leaf, and alias. The first occurance of a scalar is represented by a leaf, the first occurance of a collection is represented by a branch, and subsequent occurances of either a collection or a scalar is represented by an alias. All tree nodes in the serial model have the following properties:

kind
An adapter may be one of three kinds, a branch or a leaf or an alias.
parent
The parent property gives access to the branch which holds the current tree node.
anchor
The anchor is a unicode string which complies with the alias production. The anchor is used to associate the first occurance of a graph node with subsequent occurances, via the alias tree node. This property is optional for leaf or branch nodes, provided that the scalar or collection represented does not occur more than once.

3.3.2 Leaf

Leaf tree nodes represent the first occurance of a scalar in a given serialization.

family
Like a scalar, each leaf is associated with a type family having a default string format.
string
Also like a scalar, each leaf has a canonical string representation.

When a leaf is converted into a graph node it becomes a scalar with the same type family and string representation. Note that the anchor, if any, is not converted.

3.3.3 Alias

The alias tree node represents subsequent occurances of a scalar or collection in the serialization.

referent
The branch or leaf which the alias references is the closest preceding having an identical anchor.

When an alias is converted into a graph node it becomes a subsequent occurance of it's referent's graph node.

3.3.4 Pair

A pair is an ordered set of two tree nodes. The first member of the set is called the key and the second member of the set is called the value.

3.3.5 Branch

Branch tree nodes represent the first occurance of a collection in a given serialization.

family
Like a collection, each branch is associated with a type family. This type family need not have a default string format.
pairs
A branch has zero or more pairs.

When an branch is converted to a graph node, three operations occur. The domain is constructed with the graph node for each key in it's set of pairs. Likewise, the range is constructed with the graph node for each value in it's set of pairs. Last, the function is constructed via assocation of key graph nodes to value graph nodes, as provided by the set of pairs. Note that the ordering of the pairs is explicitly not converted.

3.3.6 Ordering

When serializing a YAML graph, every tree node is put into a single linear sequence within a given document through the branch ordering. Through the composition of branches, this ordering becomes total, so that for any two distinct tree nodes in a serialization, one can be said to precede another.

For any two nodes or aliases, x and y we say that x precedes y when any of the following holds:

  • the parent of y is x
  • x is a key and y is a value in a given pair.
  • x and y are both keys of two pairs within a branch, and the pair containing x comes before the pair containing y.
  • the parent of x precedes y
  • there exists a node z such that x precedes z and z precedes y.

3.4 Syntax Model

To enhance readability, a YAML serialization extends the tree model with syntax styles, string formats, comments, and directives. Although the parser may provide this information, applications should take care not to use these features to encode data which must be preserved.

3.4.1 Style

The tree node is extended with a style property, which can have different values depending upon its kind.

leaf style
Leaf styles include plain, folded, escaped, and block. All but the escaped style are limited to scalars having only printable characters.
branch style
Branch styles are sequence and mapping. The sequence style may only be used if the domain of the collection's function are sequential positive integers starting at zero.

3.4.2 Format

Each leaf node is given a particular format to represent the actual format used by it's string representation. Note that once this property is added, the string representation stops being canonical since it overrides the default format for the leaf's family.

format
The format used by the leaf's string representation.

3.4.3 Comment

Before each pair in the serialization is an optional comment.

comment
A comment is a series of zero or more unicode characters complying with the throwaway comment productions.

3.4.4 Directive

Attached to each document is a document directive section.

directive section
A map collection where each member of the domain and range are scalar values.

4 Serialization Syntax

Following are the syntax productions for the YAML serialization.

4.1 Characters

Characters are the basis for a serialized version of a YAML document. Below is a general definition of a character followed by several characters which have specific meaning in particular contexts.

4.1.1 Character Set

Serialized YAML uses a subset of the Unicode character set. A YAML parser must accept all printable ASCII characters and all non-ASCII Unicode characters. However a YAML emitter should attempt to emit only printable characters (including space, tab and line break characters). Characters known to be non-printable may be escaped.

[001] printable_char ::=
|
#x9 | #xA | #xD
(printable Unicode
characters starting
at #x20 and upwards)
/* printable characters, as defined by the Unicode standard */

As with standard practice, the surrogate block, #xFFFE and #xFFFF are excluded.

4.1.2 Encoding

A YAML processor is required to support the UTF-32, UTF-16 and UTF-8 character encodings. If an input stream does not begin with a byte order mark, the initial encoding shall be UTF-8. Otherwise the initial encoding shall be UTF-32 (LE or BE), UTF-16 (LE or BE) or UTF-8, as deduced from the byte order mark. Note that as YAML files may only contain printable characters, this does not raise any ambiguities. For more information on the byte order mark and the Unicode character encoding schemes see the Unicode FAQ.

[002] byte_order_mark ::= #xFEFF /* the Unicode ZERO WIDTH NON-BREAKING SPACE character used to mark a UTF-32 or UTF-16 stream and determine byte ordering */

4.1.3 Indicators

Indicators are special characters which are used to describe the structure of a YAML document.

[003] series_entry_indicator ::= '-' /* indicates a series entry */
[004] keyed_entry_separator ::= ':' /* separates a key from its value */
[005] series_in_line_start ::= '[' /* starts an in-line series branch */
[006] series_in_line_end ::= ']' /* ends an in-line series branch */
[007] keyed_in_line_start ::= '{' /* starts an in-line keyed branch */
[008] keyed_in_line_end ::= '}' /* ends an in-line keyed branch */
[009] branch_in_line_separator ::= ',' /* separates in-line branch entries */
[010] nested_key_indicator ::= '?' /* indicates a nested key */
[011] alias_indicator ::= '*' /* indicates an alias node */
[012] anchor_indicator ::= '&' /* indicates an anchor property */
[013] transfer_indicator ::= '!' /* indicates a transfer method property */
[014] block_indicator ::= '|' /* indicates a block leaf */
[015] plain_indicator ::= '\' /* indicates a plain leaf */
[016] single_quote ::= ''' /* indicates a single quoted leaf */
[017] double_quote ::= '"' /* indicates a single quoted leaf */
[018] throwaway_indicator ::= '#' /* indicates a throwaway comment */
[019] reserved_indicators ::= '^' | '@' | '%' /* reserved */

Indicators can be grouped into three categories. The '-' and ':' space indicators are always followed by a white space character (space, tab or line break). If followed by any other character they are treated as content text characters. The '[', ']', '{', '}' and ',' in line indicators are used to denote in-line branch structure and therefore must not be used as content text characters unless protected in some way. The remaining indicators are used to denote the start of various YAML elements and hence may used as internal content text character in most cases. The exact restrictions on the use of indicators as content text characters depend on the particular leaf style used.

[020] space_indicators ::=
|
series_entry_indicator
keyed_entry_separator
/* indicators which are always followed by white space */
[021] in_line_indicators ::=
|
|
|
|
series_in_line_start
series_in_line_end
keyed_in_line_start
keyed_in_line_end
branch_in_line_separator
/* indicators for in-line structure */
[022] non_space_indicators ::=
|
|
|
|
|
|
|
|
|
nested_key_indicator
alias_indicator
anchor_indicator
transfer_indicator
block_indicator
plain_indicator
single_quote
double_quote
throwaway_indicator
reserved_indicators
/* additional indicators, which don't require a following white space */

4.1.4 Escape Sequences

Escape codes are used in escaped and double quoted leaves to denote common non-printable characters, specify characters by a hexadecimal value, and produce the literal escape and double quote characters.

[023] escape ::= '\' /* indicates an escape code */
[024] escaped_escape ::= escape escape /* escape literal */
[025] escaped_double_quote ::= escape double_quote /* Escaped double quote character */
[026] escaped_bel ::= escape 'a' /* ASCII alert (BEL) */
[027] escaped_backspace ::= escape 'b' /* ASCII backspace (BS) */
[028] escaped_esc ::= escape 'e' /* ASCII escape (ESC) */
[029] escaped_form_feed ::= escape 'f' /* ASCII formfeed (FF) */
[030] escaped_line_feed ::= escape 'n' /* ASCII linefeed (LF) */
[031] escaped_return ::= escape 'r' /* ASCII carriage return (CR) */
[032] escaped_tab ::= escape 't' /* ASCII horizontal tab (TAB) */
[033] escaped_vertical ::= escape 'v' /* ASCII vertical tab (VTAB) */
[034] escaped_null ::= escape 'z' /* ASCII zero (NUL) */
[035] escaped_8_bit ::= escape 'x'
hexadecimal_digit
hexadecimal_digit
/* 8-bit character */
[036] escaped_16_bit ::= escape 'u'
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
/* 16-bit character */
[037] escaped_32_bit ::= escape 'U'
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
hexadecimal_digit
/* 32-bit character */
[038] escape_sequence ::=
|
|
|
|
|
|
|
|
|
|
|
|
|
escaped_escape
escaped_double_quote
escaped_bel
escaped_backspace
escaped_esc
escaped_form_feed
escaped_line_feed
escaped_return
escaped_tab
escaped_vertical
escaped_null
escaped_8_bit
escaped_16_bit
escaped_32_bit
/* escape codes in escaped leaves */

In single quoted leaves, a single quote character needs to be escaped. This is done by repeating the character.

[039] escaped_single_quote ::= single_quote
single_quote
/* indicates a single quote */

4.1.5 Line Breaks

Unicode defines the following line break characters.

[040] line_feed ::= #xA /* ASCII line feed (LF) */
[041] carriage_return ::= #xD /* ASCII carriage return (CR) */
[042] next_line ::= #x85 /* Unicode next line (NEL) */
[043] line_separator ::= #x2028 /* Unicode line separator (LS) */
[044] paragraph_separator ::= #x2029 /* Unicode paragraph separator (PS) */
[045] line_break ::=
|
|
|
|
line_feed
carriage_return
next_line
line_separator
paragraph_separator
/* Any line break */

4.1.6 Miscellaneous Characters

This section includes several common character range definitions.

[046] line_char ::=
-
printable_char
line_break
/* characters valid in a line */
[047] line_space ::= #x20 | #x9 /* whitespace valid in a line */
[048] line_non_space ::=
-
line_char
line_space
/* non space characters valid in a line */
[049] ascii_letter ::=
|
[#x41-#x5A]
[#x61-#x7A]
/* ASCII letters, A-Z or a-z */
[050] decimal_digit ::= [#x30-#x39] /* 0-9 */
[051] hexadecimal_digit ::=
|
|
decimal_digit
[#x41-#x46]
[#x61-#x66]
/* 0-9, A-F or a-f */
[052] word_char ::=
|
ascii_letter | '-'
decimal_digit
/* characters valid in a word */
[053] non_word_char ::=
-
line_non_space
word_char
/* characters invalid in a word */

4.2 Line Processing

Serialized YAML uses text lines to convey structure. This requires special processing rules for white space (space, tab and line break) characters. These rules are compatible with Unicode's newline guidelines.

4.2.1 Indentation

In a YAML serialization, structure is determined from indentation, where indentation is defined as an end of line marker followed by zero or more space characters. Indentation level is defined recursively.

[054] indent(0) ::= /* the first level of indentation is zero spaces */
[055] indent(n) ::= indent(n-1) #x20 /* the previous indentation setting plus one space character */

Since the YAML serialization depends upon indentation level to delineate blocks, additional productions are a function of an integer, based on the indent(n) production above.

The indentation level is used exclusively to delineate blocks. Indentation characters are otherwise ignored. In particular, they are never taken to be a part of the value of serialized text.

4.2.2 End-of-Line Normalization

On input and before parsing, a compliant YAML parser must translate the two-character combination CR LF, any CR which is not followed by an LF, and any NEL into a single LF (this does not apply to escaped characters). LS and PS characters are preserved. This functionality is indicated by the use of the normalized_line_break production defined below.

[056] line_feed_line_break ::=

|
|
|
( carriage_return
  line_feed )
greedy
carriage_return
line_feed
next_line
/* line breaks converted to a line feed */
[057] normalized_line_break ::=
|
|
line_feed_line_break
line_separator
paragraph_separator
/* a normalized end of line marker */

On output, a YAML emitter is free to serialize end of line markers using whatever convention is most appropriate, though again LS and PS must be preserved.

4.2.3 Line Folding

To increase readability, YAML serialization allows for breaking long text lines. Therefore in many cases the parser replaces a single normalized line feed with a single space (#x20). LS and PS characters are preserved, so it is safe to use them to indicate line/paragraph text structure even when line folding is done.

When encountering two or more consecutive (possibly indented) normalized line feeds, the parser does not convert them into spaces. However, if the series of line feeds is surrounded by other text characters, the parser ignores the first line feed, requiring a single line feed to be serialized as two, two line feeds to be serialized as three etc. Thus each "empty line" in a folded text represents a single line feed character, be it at the start, middle or end of the value.

When this functionality is implied, the folded_line_breaks(n) production below will be used.

[058] space_line_feed ::= line_feed_line_break /* single line feed converted to a space */
[059] empty_line_feeds(n) ::= line_feed_line_break
( indent(n)?
  line_feed_line_break )+
/* empty lines with line feeds */
[060] folded_line_breaks(n) ::=
|
|
|
empty_line_feeds(n) greedy
space_line_feed
line_separator
paragraph_separator
/* folded line breaks */

4.2.4 Throwaway comments

In some applications there is a strong requirement to allow the inclusion of throwaway comments into a YAML document. Such comments have no effect whatsoever on the abstract information model represented in the file. Their usual purpose is to communicate between the human maintainers of the file. A typical example is comments in a configuration file.

To support this requirement, YAML defines a throwaway comment line construct. Such comments must be indented at the same level as the line following the comment. A throwaway comment begins with a '#' and spans the whole line, including its terminating line break. On input, the parser must ignore such throwaway comments and proceed as if they were not present in the document.

[061] throwaway_comment(n) ::= indent(n)
throwaway_indicator
line_char*
normalized_line_break
/* throwaway comment line */

Since throwaway comment are complete lines, they may appear only where a line break is valid. When throwaway comments are allowed after a line break, the throwaway_line_break(n) production below will be used. Note that throwaway comments are not allowed inside leaf nodes, making the '#' character safe to use there.

[062] throwaway_line_break(n) ::= normalized_line_break
throwaway_comment(n)*
/* line break including throwaway comment lines */
# This comment is ignored
# by the YAML parser
this: contains two
 # lines of text

4.3 YAML Stream

A series of bytes is a YAML stream if, taken as a whole, it complies with the following production. Note that an empty stream is a valid YAML stream containing no documents, while a stream containing a single line break is an error.

[063] yaml_stream ::= byte_order_mark?
( single_document
| multi_document* )
/* a YAML document stream */

A YAML stream may contain several independent YAML documents. A document header line is used to separate between documents. This line must start with a document separator - '--' followed by a series of non space characters. The same separator line must be used in all the document headers throughout the stream. If the stream contains more than one document, it must start with such a header line. If it contains a single branch document, the header line may be omitted.

[064] document_header ::= document_separator
( line_space+ directive )*
( line_space+ property )*
/* a YAML document header */
[065] document_separator ::= '-' '-' line_non_space+ /* a YAML document separator */
--- \
This YAML stream contains a single text value.
The next stream is a log file - a series of log
entries. Adding an entry to the log is a simple
matter of appending it at the end.
---
at: 2001-08-12 09:25:00.00
type: GET
HTTP: 1.0
url: /index.html
---
at: 2001-08-12 09:25:10.00
type: GET
HTTP: 1.0
url: /toc.html
examples: 3
first: is a text value.
second: is a stream of log entries.
last: is a simple map.

4.3.1 Directive

Directives are instructions to the YAML parser specifying how to parse the document. Like throwaway comments, directives are not reflected in the abstract information model represented in the document. Directives apply to a single document. It is an error for the same directive to be specified more than once for the same document.

Additional directives may be added in future versions of YAML. A parser should ignore unknown directives with an appropriate warning. There is no provision for specifying private directives. This is intentional.

The only directive defined in this version is 'YAML'. This directive specifies the version of YAML the document adheres to. This specification defines version '1.0'.

A version 1.0 parser should accept documents with an explicit 'YAML:1.0' directive, as well as documents lacking a 'YAML' directive. Documents with a directive specifying a higher minor version (e.g. 'YAML:1.1') should be processed with an appropriate warning. Documents with a directive specifying a higher major version (e.g. 'YAML:2.0') should be rejected with an appropriate error message.

[066] directive ::= directive_name
keyed_entry_separator
directive_value
/* a document directive */
[067] directive_name ::= word_char+ /* a document directive name */
[068] directive_value ::= line_non_space+ /* a document directive value */

4.3.2 Serialization Node

A serialization node begins at a particular level of indentation, n, and its content is indented at a level n+1. A serialization node can be either a branch (keyed or series), a leaf (block, plain, escaped, single quoted, double quoted, or implicit), or an alias.

A YAML document is a normal node at indentation level 0, which starts with a document header. This header is optional for the first document if it is a branch and there are no directives or properties specified.

[069] single_document ::= top_branch
normalized_line_break
/* single document top level node */
[070] multi_document ::= document_header
( nested_branch(0)
| nested_leaf(0)
| in_line_document )
normalized_line_break
/* multi document top level node */
[071] in_line_document ::= line_space+
type_family
/* in-line (empty, typed) document top level node */
[072] value_node(n) ::=
|
line_space+ alias_node
( line_space+ property )*
( branch(n)
| leaf(n) )
/* node used as a value */
[073] key_node(n) ::=
|
in_line_node
nested_key_indicator
nested_node(n)
/* node used as a key */
[074] nested_node(n) ::= ( line_space+ property )*
( nested_branch(n)
| nested_leaf(n) )
/* node nested in following lines */
[075] in_line_node ::=
|
alias_node
( property line_space+ )*
( in_line_branch
| in_line_leaf )
/* node embedded in-line */
[076] alias_node ::= ( anchor_property
  line_space+ )?
alias
/* alias node may not have transfer type */

4.3.3 Property

Each serialization node may have anchor and transfer method properties. These properties are specified in a properties list appearing before the node value itself. For a top level node (a document), the properties appear in the document header. It is an error for the same property to be specified more than once for the same node.

[077] property ::=
|
type_string
anchor_property
/* spaces followed by node property */

4.3.4 Transfer Method

Explicit type information can be provide via a transfer string which contains the type family and, for leaf nodes, an optional string format. Where the family and format are separated by a semi-colon.

By providing an explicit transfer string to a node, the implicit typing is prevented. However, an explicit empty transfer string can be used to force implicit typing to be applied to a non-implicit leaf value.

Type families beginning with a reverse DNS string are reserved for that domain's owners. Type families beginning with a IANA mime type are reserved for that mime type's owners.

Type families consisting of a single word are a shorthand for transfer strings defined in the yaml.org domain. Thus, the transfer strings 'map' must be deserialized by the parser as if it was written using the full 'org.yaml.map' notation.

Type families consisting of a '.' character followed by a single word are taken to be in the same DNS domain used for the nearest ancestor node with an explicit absolute DNS type family. It is an error if there is no such ancestor node. The type family for the node is the result of replacing the final word in the ancestor's absolute DNS transfer string with the relative transfer string. Thus, if a node with the transfer string '.customer' is contained within a branch with the transfer string 'tld.company.invoice', the parser must deserialize the node as if it was written using the full 'tld.company.customer' notation.

Type families beginning with '!' are used for private transfer string. Such transfer strings should not be expected to have consistent semantics in different documents.

All other formats of transfer strings are reserved. TODO: Update the productions to reflect a transfer string as a composition of a type family, followed by an optional colon and string format.

[078] trans_string ::= transfer_indicator
transfer_type?
/* associates a transfer string with a given node */
[079] transfer_type ::=
|
|
|
|
|
yaml_transfer
dns_transfer
relative_transfer
iana_transfer
private_transfer
reserved_transfer
/* possible transfer types */
[080] non_word_tail ::= non_word_char
line_non_space*
/* non word trailer of a transfer type */
[081] yaml_transfer ::= word_char+
non_word_tail?
/* org.yaml transfer type */
[082] dns_transfer ::= word_char+
( '.' word_char+ )+
non_word_tail?
/* absolute DNS based transfer type */
[083] relative_transfer ::= '.' word_char+
non_word_tail?
/* relative DNS based transfer type */
[084] iana_transfer ::= word_char+
( '/' word_char+ )+
non_word_tail?
/* IANA based transfer type */
[085] private_transfer ::= transfer_indicator
line_non_space+
/* Private transfer type */
[086] reserved_transfer ::=
-
-
-
-
-
line_non_space+
yaml_transfer
dns_transfer
relative_transfer
iana_transfer
private_transfer
/* Reserved transfer types */
# '!point' is a private transfer
# type. The three coordinates all
# have the same transfer string.
center : !!point
 x : !float 12
 y : !org.yaml.float 3
 z : ! \
  7.5
# 'tld.company.invoice' is an absolute
# DNS based transfer string name.
invoice: !tld.company.invoice
 # 'seq' is a shorthand for 'org.yaml.seq'.
 # This does not effect '.customer' below
 # because it is not an absolute DNS based
 # transfer string name.
 customers: !seq
  # '.customer' is a relative DNS transfer
  # type name serving as a shorthand for
  # the longer absolute DNS based notation
  # 'tld.company.customer'.
  - !.customer
   given : Chris
   family : Dumars

4.3.5 Anchor

An anchor is a property which can be used to mark a serialization node for future reference. An alias node can then be used to indicate additional inclusions of an anchored node by specifying the node's anchor.

[087] anchor_property ::= anchor_indicator
anchor
/* associates an anchor with a given node */
[088] anchor ::= line_non_space+ /* a unique anchor */

4.4 Alias

Once an anchor is used to mark a node, an alias should be used to indicate additional inclusions of that node in the graph. An alias refers to the most recent node having the same anchor.

It is an error to have an alias use an anchor which does not occur previously in the serialization of the document.

Note an alias is just that, another name for the node it refers to. Thus, the deserialization of the anchored node and the deserialization of the alias node is the same object instance.

[089] alias ::= alias_indicator
anchor
/* alias of a preceding anchored node */
anchor : &A001 This leaf has an anchor.
override : &A001 \
 The alias node below
 is a repeated use of this
 folded leaf value.
alias : *A001

4.5 Branch

There are two styles of serialized branch nodes, series and keyed.

[090] top_branch ::=
|
top_series
top_keyed
/* branch document (top level node) */
[091] branch(n) ::=
|
nested_branch(n)
in_line_branch
/* branch node styles */
[092] nested_branch(n) ::=
|
nested_series(n)
nested_keyed(n)
/* nested branch node styles */
[093] in_line_branch ::=
|
in_line_series
in_line_keyed
/* in-line branch node styles */

4.5.1 Series

A series value is the simplest kind of value, it is a series of nodes at a higher indentation. An in-line style is available for short, simple series.

[094] top_series ::= series_entry(0)
nested_series(0)?
/* series document (top level node) */
[095] series(n) ::=
|
nested_series(n)
in_line_series
/* series node styles */
[096] nested_series(n) ::= ( throwaway_line_break(n)
  series_entry(n) )+
/* nested series node */
[097] series_entry(n) ::= indent(n)
series_entry_indicator
( value_node(n+1)
| single_keyed(n+1) )
/* a nested series node entry */
[098] in_line_series ::= series_in_line_start
( in_series_entry
  branch_in_line_separator )*
in_series_entry?
series_in_line_end
/* in-line series node */
[099] in_series_entry ::= line_space*
in_line_node
line_space*
/* an in_line series node entry */
empty: []
inline: [ one, two, three ]
nested:
 - First item in top series
 -
  - Subordinate series entry
 - \
  A multi-line
  series entry
 - Sixth item in top series

4.5.2 Keyed

A keyed node is an association of unique keys with values. It is an error for two equal key values to appear in the same keyed node. In such a case the parser may continue processing, ignoring the second key and issuing an appropriate warning. This strategy preserves a consistent information model for streaming and random access applications.

An in-line form is available for short, simple keyed nodes. Also, if a keyed node has no properties and consists of a single key:value pair, it may be specified in-line in a series entry.

[100] top_keyed ::= keyed_entry(0)
nested_keyed(0)?
/* keyed document (top level node) */
[101] keyed(n) ::= nested_keyed(n)
in_line_keyed
/* keyed node styles */
[102] nested_keyed(n) ::= ( throwaway_line_break(n)
  indent(n) single_keyed(n) )+
/* nested keyed node */
[103] single_keyed(n) ::= key_node(n+1)
line_space*
keyed_entry_separator
value_node(n+1)
/* a single key:value pair */
[104] in_line_keyed ::= keyed_in_line_start
( in_keyed_entry
  branch_in_line_separator )*
in_keyed_entry?
keyed_in_line_end
/* in-line keyed node */
[105] in_keyed_entry ::= line_space*
in_line_node
line_space*
keyed_entry_separator
line_space+
in_line_node
line_space*
/* an in-line key:value pair */
empty: {}
inline: { one: 1, two: 2 }
keyed:
 first : First entry
 second:
  key: Subordinate keyed!
 third:
  - Subordinate series
  - !keyed
  - Previous keyed is empty.
  - Single pair: keyed in series.
  - The previous entry is equal to the following one.
  -
   Single pair: keyed in series.
 !float 12 : This key is a float.
 ? \
  :
 : This key had to be folded.
 ? \\
  \a
 : This key had to be escaped.
 "\b": Another way to escape
 ? \
  This is a
  multi-line
  folded key
 : \
  Whose value is in the next line.
 ?
  - this key
  - is a series
 :
  - with a series value
 ?
  this: key
  is a: keyed
 :
  with a: keyed value

4.6 Leaf

While most of the document productions are fairly strict, the leaf production is generous. It offers six styles of expressing leaf values, depending upon the readability requirements. The table below describes the various styles.

Style

Indicator Line Breaks Escaped? Transfer Type
Block

| or ||

preserved

No

org.yaml.str

Plain

\

folded

No

org.yaml.str

Escaped

\\

folded

Yes

org.yaml.str

Single Quoted

'

forbidden

No

org.yaml.str

Double Quoted

"

forbidden

Yes

org.yaml.str

Implicit

forbidden

No

implicit

Note that throwaway comments are not allowed inside any of the leaf styles, making the '#' character safe to use in leaves.

A top level (document) leaf must be nested, much like an nested key leaf. However, YAML allows an in-line top-level empty leaf if an explicit transfer string property is given. This provides a natural syntax for typed empty documents.

[106] leaf(n) ::= space_in_line+
( nested_leaf(n)
| in_line_leaf )
/* leaf node styles */
[107] nested_leaf(n) ::=
|
|
block(n)
plain(n)
escaped(n)
/* nested leaf styles */
[108] in_line_leaf ::=
|
|
single_quoted
double_quoted
implicit
/* in-line leaf styles */

4.6.1 Block

A block leaf is the simplest leaf form. No processing is performed on block leaf characters aside from end of line normalization and stripping away the indentation. This restricts block leaves to printable characters only. Also, long lines can't be broken.

In exchange for these severe restrictions, a block leaf may freely use any any printable character, including line breaks. This makes block leaves the most readable format for source code or other text values with significant use of indicators, quotes, escape sequences, and line break characters.

The value of a block leaf contains, by default, the trailing line break character. To prevent this trailing line break from being included, the block indicator should be repeated.

[109] block(n) ::= block_indicator
block_indicator?
normalized_line_break
block_value(n)
/* an indented character block */
[110] block_value(n) ::= ( block_line?
  normalized_line_break )*
block_line?
/* value of character block */
[111] block_line ::= indent(n) line_char* /* line data in a character block */
empty: ||
second: |
 This is a block leaf,    with significant
 whitespace, and the use of " @, etc.
      All whitespace    is    significant.
third: ||
 No leading nor trailing new line.
fourth:
 - |

  First series item which has a
  leading and a trailing new line.
 - ||
  Second series item. Has neither
  leading nor trailing new line.
  Has two new lines altogether.

4.6.2 Plain

A plain leaf is similar to a block leaf. However, unlike a block leaf, a plain leaf is subject to line folding. This allows long lines to be broken anywhere a space character (#x20) appears, at the cost of requiring an empty line to represent each line feed character. Also, unlike a block leaf, the value does not include the trailing line break character. If such a character is desired it must be explicitly specified using a trailing empty line.

[112] plain(n) ::= plain_indicator
normalized_line_break
plain_value(n)
/* plain leaf */
[113] plain_value(n) ::= ( plain_line(n)?
  folded_line_breaks(n) )*
plain_line(n)?
/* value of a plain leaf */
[114] plain_line(n) ::= indent(n) line_char* /* line data in a plain leaf */
empty: \
second: \
 The value of the previous key is the empty string.
 The last character in this value is a period.
third: \
  This value has just one leading white space,
 and is terminated by a hard newline (LF).

fourth: \
 Indicators like ! & : are allowed.  Further,
 whitespace     is   preserved.
fifth:
 - \
  A single line entry.
 -  \
   A second, multi-line,
  entry of the series.
 - \
  A third, multi-line series
  entry, without any leading
  or trailing white space.
 - \


 - \
  The value of the previous
  entry is two hard newlines
  (LF LF).
sixth: \
 This is not a key : value pair.
seventh: \
 - This is not a series entry.
eigth: \
 This isn't a newline: \n

4.6.3 Escaped

An escaped leaf is similar to a plain leaf. However, after line folding is done, escape sequences are processed within an escaped leaf. This allows every Unicode character to be represented, and allows lines to be broken anywhere (rather than just on a space character) by escaping the (folded) line break. This comes at the cost of some verbosity: starting at a separate line, escaping the printable '\' character, and specifying line feed characters using a blank line or a '\n' escape sequence.

It is an error for an escaped leaf to contain invalid escape sequences.

[115] escaped(n) ::= escape
plain_indicator
normalized_line_break
escaped_value(n)
/* escaped leaf */
[116] escaped_value(n) ::= ( escaped_line(n)?
  folded_line_breaks(n) )*
escaped_line?
/* value of an escaped leaf */
[117] escaped_line(n) ::= indent(n)
escaped_char*
escape?
/* line data in an escaped leaf */
[118] escaped_char ::=
|
escape_sequence
( line_char
- escape )
/* characters valid in an escaped leaf */
empty: \\
second: \\
 Escaped leaf.\nWith a new line.
"third":
 - \\
  Line breaks are folded so this ->
  <- new line is the same a space. A
  new line may be inserted by using a
  blank line:

  Or an escape: \n. A line may be brok\
  en anywhere by escaping the newline.
  A new line may also be inserted at
  the very beginning or end:

 - \\
  There is no need to escape " and '.
  However \" is allowed.
 - \\
  Furthermore indicators such
  as ! & : can be used.
 - \\
  Escape sequences can be used
  to specify unprintable
  characters: \a, \x01.
 - \\
    Leading and trailing
     spaces are significant
     in all lines
 - \\
  This is a series of six escaped leaves.

4.6.4 Single Quoted

A single quoted leaf is indicated by surrounding ''' characters. Therefore within a single quoted leaf such characters needs to be escaped. No other form of escaping is done, limiting single quoted leaves to printable characters. Also, single quoted leaves may not contain any line break characters.

Single quoted leaves are the most readable form for short, printable single-line text values which contain indicators or the escape character.

[119] single_quoted ::= single_quote
single_quoted_char*
single_quote
/* single quoted leaf value */
[120] single_quoted_char ::=

|
( line_char
- single_quote )
escaped_single_quote
/* characters valid in a single quoted leaf */
empty: ''
second: '! : \ etc. can be used freely.'
third: 'a single quote '' must be escaped.'

4.6.5 Double Quoted

A double quoted leaf is similar to a single quoted leaf. Similarly to a single quoted leaf, a double quoted leaf is indicated by surrounding '"' and may not contain non-printable or line break characters. However unlike a single quoted leaf, escaping is done within a double quoted leaf. This allows arbitrary Unicode characters to be specified at the cost of some verbosity: escaping the printable '\' and '"' characters.

Double quoted leaves are the most readable form for short text values containing non-printable characters and/or indicators but not the escape character.

It is an error for a double quoted leaf to contain invalid escape sequences.

[121] double_quoted ::= double_quote
double_quoted_char*
double_quote
/* double quoted leaf value */
[122] double_quoted_char ::=
-
escaped_char
double_quote
/* characters valid in a double quoted leaf */
empty: ""
second: "! : etc. can be used freely."
third: "a double quote \" must be escaped."
fourth: "this value ends with a line feed.\n"

4.6.6 Implicit

A implicit leaf is similar to a single quoted leaf. However, unlike a single quoted leaf, an implicit leaf has no identifying markers. Therefore an implicit leaf may not start or end with white space characters, may not start with most indicators, and may not contain certain indicators. Also, an implicit leaf is subject to implicit transfer. This can be avoided by providing an explicit transfer string annotation.

In exchange for these restrictions implicit leaves are the most readable of all leaf styles for keys, numerical values, or other short, printable leaf values of any type.

[123] implicit ::= implicit_1st
( implicit_char*
  implicit_last )?
/* implicit leaf value */
[124] implicit_char ::=


|
( line_char
- space_indicators
- in_line_indicators )
( space_indicators
  line_non_space )
/* non space characters valid in an implicit leaf */
[125] implicit_last ::=
-
implicit_char
line_space
/* characters valid at end of an implicit leaf */
[126] implicit_1st ::=
-
implicit_last
non_space_indicators
/* characters valid at start of an implicit leaf */
empty:
second: The value of the previous key is the empty string.
third: 12
fourth: The above entry is an integer.

5 Transfer Method

In YAML, each graph or tree node is given a type family and each tree leaf is additionally provided a string format. This section provides explicit information on these two properties. The word transfer method refers to a combination of a type family and a string format. If the string format is not clearly indicated, then it refers to the family's default format.

5.1 Explicit Typing

YAML allows every node to be given by an explicit transfer string property. This property is a two part instruction to the YAML parser: (a) if the node is a leaf, then the string format of the node is used to encode the leaf's string value, and (b) that the resulting node (after any format normalization) has the type family provided and should be treated appropriately.

There are some cases where an explicit transfer method is useful for the three primary methods, org.yaml.map, org.yaml.seq and org.yaml.str. This happens whenever YAML's default assignment from a node style to a native data type does not match the intent. Specific cases are handling in-line sequences and maps, serializing sparse sequences, and turning off/on implicit transfer.

5.2 Implicit Typing

A plain leaf which does not start with an alphabetic character and which lacks an explicit transfer method is subject to implicit transfer. For each type family, there is a set of implicit string formats and these formats have a regular expression. The parser compares the leaf value with a list of these regular expressions. If the value matches one of these expressions, it is deserialized as if it was explicitly annotated with the given family/format pair.

The set of implicit transfer methods comparied against depends upon the situation. Regular expressions for implicit string formats must start with ^` if they are not defined in this specification or accepted into the the YAML type registry. Values matching private implicit transfer methods therefore always begin with the '`' character. This prevents private implicit transfer methods from interfering with public ones.

5.3 Common Type Families

The following is a list of common type families and their associated string formats defined under the yaml.org domain. YAML requires a native data type for the sequence, map and string type families. While the other type families are not mandatory, they usually map to native data types in most programming languages, so using them promotes interoperability with other YAML systems.

Additional type familes are defined under the domain yaml.org, which serves as the central repository for common YAML type families and string formats. An application may also use private transfer methods, or public transfer methods defined on the basis of a DNS domain name. The exact set of transfer methods used is a part of the document's schema, and is tied to the expected document graph structure, the set of valid map keys, etc.

5.3.1 Sequence

This type family is the default for sequence nodes unless they are given an explicit transfer method property. Example bindings include the Perl array, Python's list or tuple, and Java's array or vector.

name org.yaml.seq
styles sequence, mapping, all leaf styles
definition Collections indexed by positive integer greater or equal to zero.
format none
implicit none

Applying the seq type family to leafs also provides a natural syntax for representing an empty sequence.

# The following is an empty
# top level sequence.
--- !seq
---
# An empty sequence.
empty: !seq

In some applications large sequences may contain only a small number of non-null entries. While it is possible to serialize such sparse sequences using the null implicit type family, this is awkward. YAML allows to serialize such sequences using the mapping style with an explicit sequence type family. The only supported keys are integers, serving as zero-based sequence entry indices.

# The following map style node is
# deserialized to a sequence, with
# unspecified entries containing
# a null value.
sparse sequence: !seq
 2: Third entry
 4: ~
# The following sequence node is
# deserialized into an identical
# in-memory sequence, which has
# a seperate identity.
identical sequence:
 - ~
 - ~
 - Third entry
 - ~
 - ~

5.3.2 Map

This type family is the default for map nodes unless they are given an explicit transfer method property. Example bindings include the Perl hash, Python's dictionary, and Java's dictionary.

name org.yaml.map
styles mapping, all leaf styles
definition Associative container, where each key is unique in the association and mapped to exactly one value.
format none
implicit none

Applying the map type family to leafs also provides a natural syntax for representing an empty map.

# The following is an empty top level map.
--- !map
---
# An empty map.
empty map: !map

5.3.3 String

This type family is the default for leaf nodes unless they are given an explicit transfer method property. This type is usually bound to the native language's string or character array construct.

name org.yaml.str
styles all leaf styles
definition Unicode strings, a series of zero or more unicode characters.
format any
implicit any
...
name regex definition
any * Matches any sequence of strings

Specifying this as an explicit type family is required for turning off implicit transfer for a plain leaf. The same effect can be achieved by converting it to another leaf style.

# The following leafs are
# deserialized to the string
# value '1' '2'.
string: !string 12
also: ||
 12
and: \\
 12

This type family can also be specified using the implicit transfer mechanism by surrounding a plain leaf value with quotes. In this case the type family strips the surrounding quotes from the value of the leaf. If double quotes are used then the type family also expands escape sequences within the leaf. In this case it is an error for the leaf to contain invalid escape sequences.

# The following leafs
# all have the same value.
block: ||
 "line1"
 'line2'
 line 3
folded: \
 "line1"

 'line2'

 line 3
implicit (single): '"line1"

 'line2'

 line 3'
escaped: \\
 "line\ 1"\n'line\
 2'

 line
 3
implicit (double): ""line\ 1"\n'line\
 2'

 line
 3"
forced implicit (double) in block: ! ||
 ""line\ 1"\n'line\
 2'

 line
 3"

5.3.4 Null

The null type family accepts leafs with the value '~' and converts them into any native null-like value (e.g., undef in Perl, None in Python). A null value is used to indicate the lack of a value. Note that in most programming languages a map entry with a key and a null value is valid and different from not having that key in the map.

name org.yaml.null or null
styles only plain leaf style
definition devoid of value
format tilde
implicit tilde
...
name regex definition
tilde ~ Accepts only the tilde (~)
first: ~
second:
 - ~
 - Second entry.
 - ~
 - This sequence has 4 entries, two with values.
three:
 This map has three keys,
 only two with values.

5.3.5 Pointer

The pointer type family accepts a map with a single key, '=', and is deserialized into any native pointer-like data type, pointing to the value given for that key (e.g., a hard reference in Perl). Note that this is not necessarily the native data type used to implement alias nodes. For example, in Java aliases are directly supported, but pointers must be emulated using a special class.

name org.yaml.ptr
styles mapping style
definition a hard reference, explicit memory address
format none
implicit none
Perl: |
 $map{YAML} = \"content";
# The following map is deserialized
# into a pointer to a text string.
YAML: !ptr
 = : content

5.3.6 Integer

The integer type family handles bases as in C - a leading '0x' indicates base 16, a leading '0' indicates base 8. This should be deserialized to some native integer data type. The parser may choose from a range of such native data types according to the value of the integer. The valid range depends on the parser, though 32 bit integers should be safe.

name org.yaml.int
styles plain leaf style
definition a mathmatical integer
format decimal
implicit decimal, hex, oct
...
name regex definition
decimal [-]?[1-9][0-9]* or 0 A unique decimal based integer
hex [+-]?(0x[0-9a-fA-F]+) A base 16 integer
oct [+-]?0[0-7]* A base 8 integer
decimal: 12
hexadecimal: 0x1f
octal: 011

5.3.7 Float

The floating point type family handles approximations to real numbers as in C (both with and without a scientific notation). This should be deserialized to some native float data type. The parser may choose from a range of such native data types according to the size and accuracy of the floating point value. The valid range and accuracy depend on the parser, though 32 bit IEEE floats should be safe.

name org.yaml.float
styles plain leaf style
definition a floating point approximation to a real number
format scientific
implicit scientific, otherfloat
...
name regex definition
scientific [-]?[0-9]\.[0-9]*([eE][+-]?[0-9]+) A unique scientific format
otherfloat [+-]?[0-9]+\.[0-9]* Other fixed floating point format
float: 12.
scientific: 1.2e-3

5.3.8 Binary

The binary type family accepts the base64 format as defined in RFC2045 and deserializes it into some native binary data type (e.g., byte[] in Java). This is the recommended way to store such data in YAML files. Note however that many forms of binary data have internal structure which may benefit from being represented as YAML nodes (e.g. the Java serialization format).

name org.yaml.binary
styles flow leaf style
definition binary: arbitrary sequence of octets (8 bit values)
format base64
implicit none
...
name regex definition
base64 (see base64 spec) The base 64 binary encoding
arrow: !binary;base64 \
 R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOf
 n515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmlpaW
 NjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++
 f/++f/++f/++f/++f/++f/++f/++SH+Dk1hZGUg
 d2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjoEwnuN
 AFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84Bww
 EeECcgggoBADs=
description: \
 The binary value above is a tiny arrow
 encoded as a gif image.

5.3.9 Special Keys

The special key type family is used for special YAML defined items, which are used as map keys to denote structural information.

name org.yaml.special
styles plain leaf style
definition special mapping keys
format any
implicit subst, trans, comm, alias
...
name regex definition
subst = The node substutibility key
trans ! The unknown transfer method key
comm // The round-trippable comment key
alias [&*] Transfer keys for handling aliases when the language does not support graph structures.

The '&', '*' and '!' keys must not be used in serialized YAML documents. They are only used internally (in-memory) to encode anchors and type family information in applications which do not support them directly. On output these should be converted back to the appropriate YAML syntax. The '=' and '//' keys, however, may appear in serialized YAML documents.

The '=' key is used to denote the "default value" of a map. In cases where a value needs to be annotated by additional attributes, this avoids the need to come up with a key for the value itself.

The '//' key is used to attach a persistent comment to a map. A simple filter can remove these comments before reaching the application, while allowing such comments to survive round-trips and to be manipulated as normal data when necessary.

annotated text:
 - This text contains
 -
  = : colored
  color : red
 - characters.
"!": These three keys
"&": had to be quoted
"=": and are normal strings.
# NOTE: the following encoded node
# is NOT in valid serialized YAML
# format. It directly represents an
# in-memory structure.
encoded node :
 ! : transfer
 & : 12
 = : value
# The YAML way to serialize the
# above structure is as follows:
node : !transfer &12 value
commented: !!point
 x : 12
 y : 3
 // : This is the center point.

5.4 Unsupported Transfer Methods

A YAML parser may encounter a valid YAML value of an unknown explicit or implicit type family.

For a schema-specific application, this isn't different from encountering any other valid YAML document which does not satisfy the schema. Such an application may safely use a parser which rejects any value of any unknown type family, or discards the type family information with an appropriate warning and parses the value as if it wasn't present.

For a schema-independent application (for example, a hypothetical YAML pretty print application), this isn't an option. Parsers used by such applications must encode the value instead.

Encoding is done by wrapping the value in a map using special keys. The '!' key is used for the type family string and the '=' key is used for the value being wrapped. This value is parsed as if there was no type family specified (for a plain leaf, it is parsed as if it was given the explicit org.yaml.str transfer method). In some cases, it may be necessary to encode anchors and alias nodes. The '&' and '*' special keys are used for this purpose.

This encoding must be reversed on output, allowing the application to safely round-trip any valid YAML document.

6 Changes From Other Versions

Probably Future Changes
Changes from the 09 Dec 2001 Draft
Changes from the 10 Dec 2001 Draft
Changes from the 11 Nov 2001 Draft
Changes from the 04 Nov 2001 Draft
Changes from the 12 Aug 2001 Draft
Changes from the 01 Aug 2001 Draft
Changes from the 31 Jul 2001 Draft
Changes from the 22 Jul 2001 Draft
Changes from the 23 Jun 2001 Draft
Changes from the 16 Jun 2001 Draft
Changes from the 09 Jun 2001 Draft
Changes from the 26 May 2001 Draft

6.1 Probable Future Changes

Updating the Transfer Method production

Transfer method production needs updating as transfer method was broken into type family and string format pair. The entire transfer method seciton needs review.

Reviewing Examples

Ensure there are enough examples. In particular, special syntax forms should be demonstrated to remove doubt in the interpretation of the productions. Providing equivalent constructs in Perl/Python/Java may help clarify the examples further.

Verifying Productions

There are a lot of new productions. Bugs are a real possibility.

Polish

Spell and grammar checking, formatting, etc.

6.3 Changes From The 10 Dec 2001 Draft

Updating the Transfer Method production

Transfer method production needs updating as transfer method was broken into type family and string format pair. The entire transfer method seciton needs review.

6.4 Changes From The 11 Nov 2001 Draft

Quoted Strings

Are now implemented as a text implicit transfer format. This changed slightly the definition of an escaped leaf so that the two would be equivalent.

Relative DNS type familys

Are now supported using the simplest form only. the definition of an escaped leaf so that the two would be equivalent.

Syntax/Grammar/Formatting

Were fixed according to Brian's inputs (done - Oren).

Block

Is now chomped using '||' rather than '|-' for consistency (done - Oren).

Document Header

Can now be anything starting with '--'; therefore is required before the first document in a multi-document stream (done - Oren).

Transfer Methods

Can now accept any printable characters, not just words. '!' now means 'force implicit typing' (done - Oren). Split into type family and string format (clark)

New Scalar Styles

We now have the following: | || \ \\ plain (done - Oren).

Treat sequence/map as Collection Styles

In productions (done - Oren). Information model work still needs to be done (ongoing - Clark).

Add Structured Keys

Using a key indicator (done - Oren).

Add Examples

Added a detailed examples section to the introduction to better acquaint the user so that the spec can proceed with some basic knowledge (done - cce).

In-line maps/sequences

Are now supported. Empty maps/sequences are a natural special case (done - Oren).

Minor Changes

Made list of prior versions shorter. (cce)

Moved list of changes down... it was cluttering the top of the spec.(cce)

Information model and Preview

Completely new rewrite.

6.5 Changes From The 04 Nov 2001 Draft

Polish

Minor wording fixes, added internal links, etc.

List

Was renamed to "sequence".

Separator

Was changed to "---" instead of "----".

Indentation

Was changed to one space instead of one tab.

Base64

Is no longer an implicit type. The surrounding '[=...=]' are kept, however, in case we change our mind later (e.g., if we introduce pipelining). The type was renamed to "binary" to stress its class rather than the encoding used.

Float

Was renamed to "real" to decouple it from specific in-memory representation. Mathematicians may object :-)

Length

Was removed from the sequence map.

Type vs. Class

Added some wording to clarify the difference. Most likely this will need to be changed once we settle the pipelining issue.

Productions

Were completely overhauled, again, to accomodate the new semantics.

Next Line Scalars

Now have two separate indicators, one for quoted and one for unquoted values.

Duplicate keys

Are now an error. The parser may ignore the second occurrence with a warning.

6.6 Changes From The 12 Aug 2001 Draft

Indentation

Has been changed to use tabs instead of spaces.

Throwaway comments

Were added. The persistent comment key was changed to '//'.

Indicators

Were changed. '-' now signifies a list entry and '\' signifies a next-line leaf value. '@' and '%' are no longer necessary (they may be if we ever support map/list keys). As a result no lookahead is ever required.

Multiple documents

Are now possible in a single file (again), using '----' as a separator.

Wording

Has been changed in numerous locations, hopefully to make it clearer. There was also some shuffling of the text sections to remove redundancy.

Productions

Were thoroughly overhauled and therefore undoubtedly contain new bugs. Also, all the shorthand production names were replaced by long ones to improve readability.

Indicator keys

Are no longer allowed. Structure keys are used instead, where some have only an in-memory representation.

Map/List keys

Are no longer allowed. This may have to be revisited when Perl 6 comes out.

Deep References

Are still supported but as an explicit type rather than as a hack.

Types List

Has been shrunk to only the common types, with a reference to yaml.org for a fuller list of types. The three core types were added as required types.

Type vs. Kind

This distinction was inserted explicitly into the text, with several examples to drive the point home.

6.7 Changes From The 01 Aug 2001 Draft

Character Set

Is now defined as simply printable Unicode characters without explicit ranges. This makes the spec resistant to the evolution of the Unicode spec.

Reserved Indicators

The set of such indicators has been minimized. There is now a conflict between reserving them for future use and allowing people to use them as markers for implicit leaf types.

Simple Scalar

Has been renamed to unquoted leaf.

Generic Model

Has been generalized to allow for types nodes.

Implicit Typing

Has been added with an assortment of suggested types.

General Keys

Keys can now be any nodes to allow for Java serialization.

Multi-level references

Are now supported for Perl serialization.

6.8 Changes From The 31 Jul 2001 Draft

Simple Scalar and End Of Lines

Moved eol productions to the end, rather than the start, of most productions. The wording and productions for the simple leaf were fixed to match each other and the intended semantics. The simple leaf example set was enhanced to clarify the proper interpretation.

Empty Document

Both empty top level maps and no top level maps are now allowed, and hence so are empty documents.

6.9 Changes From The 22 Jul 2001 Draft

Thanks to Joe Lapp for reviewing the 22 Jul 2001 draft and recommending these changes.

Phrasing fixes

Fixed phrasing in the abstract, and sections 1.3, 2.1, 2.3.1, 2.4.3, 2.4.4, 2.4.5, 2.4.6 and 2.5.3.

Production fixes

Fixed productions: added production 47, 59, fixed productions 57, 58, 60 and 64 (productions numbers in the 22 Jul 2001 draft are off by one in some cases). Most are bug fixes. Actual changes include allowing for empty lines surrounding a top level map, allowing an optional trailing separator line, and forbidding annotations which have no sensible semantics (anchor to null, anchor to a reference, shorthand for a reference).

6.10 Changes From The 23 Jun 2001 Draft

Merge Spec

Due to the decision to leave all API related issues outside the core spec, the spec has been re-merged into a single file, covering just what used to be the introduction and serialization sections of the previous specs.

Character Encodings

The spec now refers only to the Unicode standard. Due to the efforts by the Unicode and ISO/IEC 10646 groups, both standards are in almost complete agreement. The additional features provided by the ISO/IEC standard are rarely used in practice, while Unicode is simpler and is more widely supported by existing languages and systems.

Strict Indentation

Indentation is now a strict 4 spaces per level. This allows for the new whitespace policy and the new block notation.

Shorthand Notation

The spec introduces a shorthand notation for attaching special keys to any node kind (converting it to a map if necessary). This will need more work.

Null Nodes

Null nodes have finally been added, after somehow eluding all previous versions.

Bullet Lists

Change the * optional prefix for leaf list entries to a mandatory : and therefore remove the special name "bulleted list entries".

Simplify Keys

Multi-line simple keys are now out. The door is open for re-introducing them, however.

Change Whitespace Policy

White space folding has been replaced by line break folding. White space is now always significant, except for indentation and for separation of structure tokens.

Block Scalar Syntax

The syntax for block leafs has been replaced by a more elegant one.

6.11 Changes From The 16 Jun 2001 Draft

Split Spec

The spec is now separated into several files. This allows different versions of the spec to share the same version of unchanged section, and make it easier to refer to a particular version of important pieces of the spec such as serialization and interfaces. All the HTML files use the same shared CSS file. Cross references between the separate parts of the spec are now relative, though references to older versions are absolute and refer to the main site.

Cyclical Graph

Change the wording on the information model to allow for graphs with cycles. The alternative is to define the anchor semantics in such a way that would preclude cycles.

Null Character Escape

The escape sequence \z was added to allow convenient escaping of the ASCII zero (null) character.

Remove Binary Scalars

The information model now contains just one type of leaf. The special syntax for binary leafs has been removed. This functionality will be re-added in the form of a color.

Remove Class Shorthand

The syntax no longer supports the !class syntax. This functionality will be re-added in the form of a color.

Bullet Lists

Change the optional prefix for leaf list entries to * and rename such entries to "bulleted list entries".

Make Keys More Scalars-Compatible

Allow for multi-line simple keys and unify the description of leaf keys and values where it makes sense.

HTML Tidying

All the HTML pages have gone through Tidy. Also, all the HTML files have been run through an HTML validation service and a CSS validation service. Broken links and spelling were checked using another online HTML validator. This needs to be repeated for all future drafts.

6.12 Changes From The 09 Jun 2001 Draft

Relationship with MIME

Beyond using base64 for binary leafs, no additional special relationship with MIME is expected. Hence references to the MIME and mail RFCs were moved from section 1.1 ("required reading") to section 1.2 ("background material").

Strict Indentation

Indentation is now completely strict for all leaf styles. Also, the productions were changes to use a consistent semantics to the indentation level parameter.

List Scalar Prefixes

A list leaf entry may be prefixed by an optional : indicator to improve readability of multi-line simple leaf values.

Anchor Semantics

Leading zeros are now ignored for comparing anchor strings.

No Empty Line At Start

The document production was fixed so as not to require an empty line at the start of a document.

Character Escapes

The set of character escapes is now maximal (including the rare \e escape for the useful ASCII ESC character). Also, it is now possible to "escape" a line break in a quoted string (the previous drafts were inconsistent at this point).

32 Bit Characters

The current draft allows such characters, and includes a specialized escaping format ('\Uxxxxxxxx') to support them.

6.13 Changes From The 26 May 2001 Draft

Changes Section

The changes section was added for easier comparison of different versions. The final draft will not contain this section.

Class Indicator

The indicator was changed from # to ! to allow for # to be used for comments.

No Empty Line At End

The document production was fixed so as not to require an empty line at the end of a document.

Strict Indentation

Indentation in quoted strings and binary blocks is now strict to ensure readability.

Productions

Problems in the productions were fixed, especially where related to white space issues and formatting of the result.

BOM Comment

The link to the Unicode FAQ was moved to section 2.2.2.

Binary Scalars

The information model now distinguishes between text and binary leafs.