--- old/src/java.base/share/classes/java/util/regex/Pattern.java 2016-02-09 21:53:06.532409902 -0800 +++ new/src/java.base/share/classes/java/util/regex/Pattern.java 2016-02-09 21:53:06.307410882 -0800 @@ -109,6 +109,8 @@ * ({@link java.lang.Character#MIN_CODE_POINT Character.MIN_CODE_POINT} *  <= {@code 0x}h...h <=  * {@link java.lang.Character#MAX_CODE_POINT Character.MAX_CODE_POINT}) + * \N{name} + * The character with Unicode character name 'name' * {@code \t} * The tab character ('\u0009') * {@code \n} @@ -243,6 +245,8 @@ * The end of a line * {@code \b} * A word boundary + * {@code \b{g}} + * A Unicode extended grapheme cluster boundary * {@code \B} * A non-word boundary * {@code \A} @@ -263,6 +267,11 @@ * * *   + * Unicode Extended Grapheme matcher + * {@code \X} + * Any Unicode extended grapheme cluster + * + *   * Greedy quantifiers * * X{@code ?} @@ -546,12 +555,21 @@ * {@code "\\u2014"}, while not equal, compile into the same pattern, which * matches the character with hexadecimal value {@code 0x2014}. *

- * A Unicode character can also be represented in a regular-expression by - * using its Hex notation(hexadecimal code point value) directly as described in construct - * \x{...}, for example a supplementary character U+2011F - * can be specified as \x{2011F}, instead of two consecutive - * Unicode escape sequences of the surrogate pair - * \uD840\uDD1F. + * A Unicode character can also be represented by using its Hex notation + * (hexadecimal code point value) directly as described in construct + * \x{...}, for example a supplementary character U+2011F can be + * specified as \x{2011F}, instead of two consecutive Unicode escape + * sequences of the surrogate pair \uD840\uDD1F. + *

+ * Unicode character names are supported by the named character construct + * \N{...}, for example, \N{WHITE SMILING FACE} + * specifies character \u263A. The character names supported + * by this class are the valid Unicode character names matched by + * {@link java.lang.Character#codePointOf(String) Character.codePointOf(name)}. + *

+ * + * Unicode extended grapheme clusters are supported by the grapheme + * cluster matcher {@code \X} and the corresponding boundary matcher {@code \b{g}}. *

* Unicode scripts, blocks, categories and binary properties are written with * the {@code \p} and {@code \P} constructs as in Perl. @@ -679,22 +697,12 @@ *

Perl constructs not supported by this class:

* *