Specification for JEP 355: Text Blocks (Preview)

Copyright © 2019 Oracle America, Inc. Legal Notice

Last updated 2019-06-03

This document proposes changes to the Java Language Specification to support text blocks. See JEP 355 for an overview.

3.10 Literals

Literal:
IntegerLiteral
FloatingPointLiteral
BooleanLiteral
CharacterLiteral
StringLiteral
TextBlock
NullLiteral

3.10.6 Text Blocks

The pre-existing section 3.10.6, “Escape Sequences for Character and String Literals”, will become 3.10.7, “Escape Sequences”.

The pre-existing section 3.10.7, “The Null Literal”, will be renumbered to 3.10.8.

A text block consists of zero or more characters enclosed by opening and closing delimiters. Characters may be represented by escape sequences (3.10.7), but the newline and double quote characters that must be represented with escape sequences in a string literal may be represented directly in a text block.

TextBlock:
" " " { TextBlockWhiteSpace } LineTerminator { TextBlockCharacter } " " "
TextBlockWhiteSpace:
WhiteSpace but not LineTerminator
TextBlockCharacter:
InputCharacter but not \
EscapeSequence
LineTerminator

The following productions from 3.3, 3.4, and 3.6 are shown here for convenience:

WhiteSpace:
the ASCII SP character, also known as “space”
the ASCII HT character, also known as “horizontal tab”
the ASCII FF character, also known as “form feed”
LineTerminator
LineTerminator:
the ASCII LF character, also known as “newline”
the ASCII CR character, also known as “return”
the ASCII CR character followed by the ASCII LF character
InputCharacter:
UnicodeInputCharacter but not CR or LF
UnicodeInputCharacter:
UnicodeEscape
RawInputCharacter
UnicodeEscape:
\ UnicodeMarker HexDigit HexDigit HexDigit HexDigit
RawInputCharacter:
any Unicode character

The opening delimiter is a sequence that starts with three double quote characters ("""), continues with zero or more space, tab, and form feed characters, and concludes with a line terminator.

The closing delimiter is a sequence of three double quote characters.

The content of a text block is the sequence of characters that begins immediately after the line terminator of the opening delimiter, and ends immediately before the first double quote of the closing delimiter.

Unlike in a string literal, it is not a compile-time error for a line terminator to appear in the content of a text block.

A text block is always of type String (4.3.3).

Example 3.10.6-1. Text Blocks

When multi-line strings are desired, a text block is usually more readable than a concatenation of string literals. For example, compare these alternative representations of a snippet of HTML:

Here are some examples of text blocks:

The use of the escape sequences \" and \n is permitted in a text block, but not necessary or recommended. However, representing the sequence """ in a text block requires the escaping of at least one " character, to avoid mimicking the closing delimiter.

Example 3.10.6-2. Escape sequences in text blocks

The following snippet of text would be less readable if the " characters were escaped:

If a text block is to denote another text block, then it is recommended to escape the first " of the embedded opening and closing delimiters:

The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of applying the following transformations to the content, in order:

  1. Line terminators are normalized to the ASCII LF character, as follows:

    1. An ASCII CR character followed by an ASCII LF character is translated to an ASCII LF character.

    2. An ASCII CR character is translated to an ASCII LF character.

  2. Incidental white space is removed, as if by execution of String::stripIndent on the characters in the content.

  3. Escape sequences are interpreted, as in a string literal.

Example 3.10.6-3. Order of transformations on text block content

Interpreting escape sequences last allows developers to use \n, \f, and \r for vertical formatting of a string without affecting the normalization of line terminators, and to use \b and \t for horizontal formatting of a string without affecting the removal of incidental white space. For example, consider this text block that mentions the escape sequence \r (CR):

The \r escapes are not interpreted until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A) and CR (\u000D), and using | to visualize the left margin, the final result is:

When this specification says that a text block contains a particular character or sequence of characters, or that a particular character or sequence of characters is in a text block, it means that the string represented by the text block (as opposed to the content of the text block) contains the character or sequence of characters.

A text block is a reference to an instance of class String that denotes the string represented by the text block.

A text block always refers to the same instance of class String. This is because the strings represented by text blocks - or, more generally, strings that are the values of constant expressions (15.28) - are “interned” so as to share unique instances (12.5).

Example 3.10.6-4. Text blocks evaluate to strings

Text blocks can be used wherever an expression of type String is allowed, such as in string concatenation (15.18.1), in method invocation on class String, and in annotations with String elements:

Miscellaneous changes

Some clarification of terminology around “escapes” is desirable: