Specification for JEP 355: Text Blocks (Preview)

Copyright © 2019 Oracle America, Inc. Legal Notice

Last updated 2019-05-20

This document proposes changes to the Java Language Specification to support text blocks. See JEP 355 for an overview.

3.10 Literals

Literal:
IntegerLiteral
FloatingPointLiteral
BooleanLiteral
CharacterLiteral
StringLiteral
TextBlock
NullLiteral

3.10.6 Text Blocks

The pre-existing section 3.10.6, “Escape Sequences for Character and String Literals”, will become 3.10.7, “Escape Sequences”.

The pre-existing section 3.10.7, “The Null Literal”, will be renumbered to 3.10.8.

A text block consists of zero or more characters enclosed by opening and closing delimiters. Characters may be represented by escape sequences (3.10.7), but the newline and double quote characters that must be represented with escape sequences in a string literal may be represented directly in a text block.

TextBlock:
" " " { the ASCII SP character } LineTerminator { TextBlockCharacter } " " "
TextBlockCharacter:
InputCharacter but not \
EscapeSequence
LineTerminator

The following productions from 3.4 and 3.3 are shown here for convenience:

LineTerminator:
the ASCII LF character, also known as “newline”
the ASCII CR character, also known as “return”
the ASCII CR character followed by the ASCII LF character
InputCharacter:
UnicodeInputCharacter but not CR or LF
UnicodeInputCharacter:
UnicodeEscape
RawInputCharacter
UnicodeEscape:
\ UnicodeMarker HexDigit HexDigit HexDigit HexDigit
RawInputCharacter:
any Unicode character

The opening delimiter is a sequence of three double quote characters (""") followed by zero or more white spaces followed by a line terminator.

The closing delimiter is a sequence of three double quote characters.

The content of a text block is the sequence of characters that begins immediately after the line terminator of the opening delimiter, and ends immediately before the first double quote of the closing delimiter.

Unlike in a string literal, it is not a compile-time error for a line terminator to appear in the content of a text block.

Example 3.10.6-1. Text Blocks

When multi-line strings are desired, a text block is usually more readable than a concatenation of string literals. For example, compare these alternative representations of a snippet of HTML:

Here are some examples of text blocks:

The use of the escape sequences \" and \n is permitted in a text block, but not necessary or recommended. However, representing the sequence """ in a text block requires the escaping of at least one " character, to avoid mimicking the closing delimiter.

Example 3.10.6-2. Escape sequences in text blocks

The following snippet of text would be less readable if the " characters were escaped:

If a text block is to denote another text block, then it is recommended to escape the first " of the embedded opening and closing delimiters:

The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of processing the content, as follows:

  1. Line terminators are normalized to the ASCII LF character, as follows:

    1. An ASCII CR character followed by an ASCII LF character is translated to an ASCII LF character.

    2. An ASCII CR character is translated to an ASCII LF character.

  2. Incidental white space is removed, as if by execution of String::stripIndent on the characters in the content.

  3. Escape sequences are interpreted, as in a string literal.

Example 3.10.6-3. Order of content processing

Interpreting escape sequences last allows developers to use \n, \f, and \r for vertical formatting of a string without affecting the normalization of line terminators, and to use \b and \t for horizontal formatting of a string without affecting the removal of incidental white space. For example, consider this text block that mentions the escape sequence \r (CR):

The \r escapes are not interpreted until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A) and CR (\u000D), and using | to visualize the left margin, the final result is:

When this specification says that a text block contains a particular character or sequence of characters, or that a particular character or sequence of characters is in a text block, it means that the string represented by the text block (as opposed to the content of the text block) contains the character or sequence of characters.

A text block is always of type String (4.3.3).

At run time, a text block is evaluated to a reference to an instance of type String that corresponds to the string represented by the text block. The string represented by a text block is interned in the same manner the string represented by a string literal.

Example 3.10.6-4. Text blocks represent strings

Text blocks can be used wherever an expression of type String is allowed, such as in string concatenation (15.18.1), in method invocation on class String, and in annotations with String elements:

Miscellaneous changes

Some clarification of terminology around “escapes” is desirable: