--- old/src/java.base/share/classes/java/util/regex/Pattern.java 2017-08-18 15:03:17.459089224 -0700 +++ new/src/java.base/share/classes/java/util/regex/Pattern.java 2017-08-18 15:03:17.243079776 -0700 @@ -81,309 +81,296 @@ * * * - * - * - * - * + * + * + * + * * * - * + * * - * - * + * * - * - * - * - * - * - * + * + * + * + * + * - * - * + * - * - * + * - * - * - * - * - * - * + * + * + * + * + * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * - * - * + * - * - * - * - * + * + * + * - * - * + * - * * - * + * * - * - * - * - * - * - * - * - * + * + * + * + * + * + * + * - * - * - * - * - * - * - * - * + * + * + * + * + * + * + * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * - * - * + * + * + * + * * - * - * - * - * + * * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * - * + * * - * - * + * + * * - * - * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * * - * - * + * + * + * + * + * * - * - * + * * - * - * - * - * - * - * + * + * + * + * + * + * * * - * - * + * * - * - * - * - * - * - * + * + * + * + * + * - * - * + * - * - * - * - * - * - * - * - * - * - * + * + * + * + * + * + * + * + * + * + * * * *
Regular expression constructs, and what they match
ConstructMatches
ConstructMatches
 
Characters
Characters
xThe character x
{@code \\}The backslash character
{@code \0}nThe character with octal value {@code 0}n + *
xThe character x
{@code \\}The backslash character
{@code \0}nThe character with octal value {@code 0}n * (0 {@code <=} n {@code <=} 7)
{@code \0}nnThe character with octal value {@code 0}nn + *
{@code \0}nnThe character with octal value {@code 0}nn * (0 {@code <=} n {@code <=} 7)
{@code \0}mnnThe character with octal value {@code 0}mnn + *
{@code \0}mnnThe character with octal value {@code 0}mnn * (0 {@code <=} m {@code <=} 3, * 0 {@code <=} n {@code <=} 7)
{@code \x}hhThe character with hexadecimal value {@code 0x}hh
\uhhhhThe character with hexadecimal value {@code 0x}hhhh
\x{h...h}The character with hexadecimal value {@code 0x}h...h + *
{@code \x}hhThe character with hexadecimal value {@code 0x}hh
\uhhhhThe character with hexadecimal value {@code 0x}hhhh
\x{h...h}The character with hexadecimal value {@code 0x}h...h * ({@link java.lang.Character#MIN_CODE_POINT Character.MIN_CODE_POINT} *  <= {@code 0x}h...h <=  * {@link java.lang.Character#MAX_CODE_POINT Character.MAX_CODE_POINT})
\N{name}The character with Unicode character name 'name'
{@code \t}The tab character ('\u0009')
{@code \n}The newline (line feed) character ('\u000A')
{@code \r}The carriage-return character ('\u000D')
{@code \f}The form-feed character ('\u000C')
{@code \a}The alert (bell) character ('\u0007')
{@code \e}The escape character ('\u001B')
{@code \c}xThe control character corresponding to x
 
Character classes
{@code [abc]}{@code a}, {@code b}, or {@code c} (simple class)
{@code [^abc]}Any character except {@code a}, {@code b}, or {@code c} (negation)
{@code [a-zA-Z]}{@code a} through {@code z} + *
\N{name}The character with Unicode character name 'name'
{@code \t}The tab character ('\u0009')
{@code \n}The newline (line feed) character ('\u000A')
{@code \r}The carriage-return character ('\u000D')
{@code \f}The form-feed character ('\u000C')
{@code \a}The alert (bell) character ('\u0007')
{@code \e}The escape character ('\u001B')
{@code \c}xThe control character corresponding to x
Character classes
{@code [abc]}{@code a}, {@code b}, or {@code c} (simple class)
{@code [^abc]}Any character except {@code a}, {@code b}, or {@code c} (negation)
{@code [a-zA-Z]}{@code a} through {@code z} * or {@code A} through {@code Z}, inclusive (range)
{@code [a-d[m-p]]}{@code a} through {@code d}, + *
{@code [a-d[m-p]]}{@code a} through {@code d}, * or {@code m} through {@code p}: {@code [a-dm-p]} (union)
{@code [a-z&&[def]]}{@code d}, {@code e}, or {@code f} (intersection)
{@code [a-z&&[^bc]]}{@code a} through {@code z}, + *
{@code [a-z&&[def]]}{@code d}, {@code e}, or {@code f} (intersection)
{@code [a-z&&[^bc]]}{@code a} through {@code z}, * except for {@code b} and {@code c}: {@code [ad-z]} (subtraction)
{@code [a-z&&[^m-p]]}{@code a} through {@code z}, + *
{@code [a-z&&[^m-p]]}{@code a} through {@code z}, * and not {@code m} through {@code p}: {@code [a-lq-z]}(subtraction)
 
Predefined character classes
Predefined character classes
{@code .}Any character (may or may not match line terminators)
{@code \d}A digit: {@code [0-9]}
{@code \D}A non-digit: {@code [^0-9]}
{@code \h}A horizontal whitespace character: + *
{@code .}Any character (may or may not match line terminators)
{@code \d}A digit: {@code [0-9]}
{@code \D}A non-digit: {@code [^0-9]}
{@code \h}A horizontal whitespace character: * [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
{@code \H}A non-horizontal whitespace character: {@code [^\h]}
{@code \s}A whitespace character: {@code [ \t\n\x0B\f\r]}
{@code \S}A non-whitespace character: {@code [^\s]}
{@code \v}A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029] + *
{@code \H}A non-horizontal whitespace character: {@code [^\h]}
{@code \s}A whitespace character: {@code [ \t\n\x0B\f\r]}
{@code \S}A non-whitespace character: {@code [^\s]}
{@code \v}A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029] *
{@code \V}A non-vertical whitespace character: {@code [^\v]}
{@code \w}A word character: {@code [a-zA-Z_0-9]}
{@code \W}A non-word character: {@code [^\w]}
 
POSIX character classes (US-ASCII only)
{@code \p{Lower}}A lower-case alphabetic character: {@code [a-z]}
{@code \p{Upper}}An upper-case alphabetic character:{@code [A-Z]}
{@code \p{ASCII}}All ASCII:{@code [\x00-\x7F]}
{@code \p{Alpha}}An alphabetic character:{@code [\p{Lower}\p{Upper}]}
{@code \p{Digit}}A decimal digit: {@code [0-9]}
{@code \p{Alnum}}An alphanumeric character:{@code [\p{Alpha}\p{Digit}]}
{@code \p{Punct}}Punctuation: One of {@code !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~}
{@code \V}A non-vertical whitespace character: {@code [^\v]}
{@code \w}A word character: {@code [a-zA-Z_0-9]}
{@code \W}A non-word character: {@code [^\w]}
POSIX character classes (US-ASCII only)
{@code \p{Lower}}A lower-case alphabetic character: {@code [a-z]}
{@code \p{Upper}}An upper-case alphabetic character:{@code [A-Z]}
{@code \p{ASCII}}All ASCII:{@code [\x00-\x7F]}
{@code \p{Alpha}}An alphabetic character:{@code [\p{Lower}\p{Upper}]}
{@code \p{Digit}}A decimal digit: {@code [0-9]}
{@code \p{Alnum}}An alphanumeric character:{@code [\p{Alpha}\p{Digit}]}
{@code \p{Punct}}Punctuation: One of {@code !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~}
{@code \p{Graph}}A visible character: {@code [\p{Alnum}\p{Punct}]}
{@code \p{Print}}A printable character: {@code [\p{Graph}\x20]}
{@code \p{Blank}}A space or a tab: {@code [ \t]}
{@code \p{Cntrl}}A control character: {@code [\x00-\x1F\x7F]}
{@code \p{XDigit}}A hexadecimal digit: {@code [0-9a-fA-F]}
{@code \p{Space}}A whitespace character: {@code [ \t\n\x0B\f\r]}
 
java.lang.Character classes (simple java character type)
{@code \p{javaLowerCase}}Equivalent to java.lang.Character.isLowerCase()
{@code \p{javaUpperCase}}Equivalent to java.lang.Character.isUpperCase()
{@code \p{javaWhitespace}}Equivalent to java.lang.Character.isWhitespace()
{@code \p{javaMirrored}}Equivalent to java.lang.Character.isMirrored()
 
Classes for Unicode scripts, blocks, categories and binary properties
{@code \p{IsLatin}}A Latin script character (script)
{@code \p{InGreek}}A character in the Greek block (block)
{@code \p{Lu}}An uppercase letter (category)
{@code \p{IsAlphabetic}}An alphabetic character (binary property)
{@code \p{Sc}}A currency symbol
{@code \P{InGreek}}Any character except one in the Greek block (negation)
{@code [\p{L}&&[^\p{Lu}]]}Any letter except an uppercase letter (subtraction)
 
Boundary matchers
{@code ^}The beginning of a line
{@code $}The end of a line
{@code \b}A word boundary
{@code \b{g}}A Unicode extended grapheme cluster boundary
{@code \B}A non-word boundary
{@code \A}The beginning of the input
{@code \G}The end of the previous match
{@code \Z}The end of the input but for the final + *
{@code \p{Graph}}A visible character: {@code [\p{Alnum}\p{Punct}]}
{@code \p{Print}}A printable character: {@code [\p{Graph}\x20]}
{@code \p{Blank}}A space or a tab: {@code [ \t]}
{@code \p{Cntrl}}A control character: {@code [\x00-\x1F\x7F]}
{@code \p{XDigit}}A hexadecimal digit: {@code [0-9a-fA-F]}
{@code \p{Space}}A whitespace character: {@code [ \t\n\x0B\f\r]}
java.lang.Character classes (simple java character type)
{@code \p{javaLowerCase}}Equivalent to java.lang.Character.isLowerCase()
{@code \p{javaUpperCase}}Equivalent to java.lang.Character.isUpperCase()
{@code \p{javaWhitespace}}Equivalent to java.lang.Character.isWhitespace()
{@code \p{javaMirrored}}Equivalent to java.lang.Character.isMirrored()
Classes for Unicode scripts, blocks, categories and binary properties
{@code \p{IsLatin}}A Latin script character (script)
{@code \p{InGreek}}A character in the Greek block (block)
{@code \p{Lu}}An uppercase letter (category)
{@code \p{IsAlphabetic}}An alphabetic character (binary property)
{@code \p{Sc}}A currency symbol
{@code \P{InGreek}}Any character except one in the Greek block (negation)
{@code [\p{L}&&[^\p{Lu}]]}Any letter except an uppercase letter (subtraction)
Boundary matchers
{@code ^}The beginning of a line
{@code $}The end of a line
{@code \b}A word boundary
{@code \b{g}}A Unicode extended grapheme cluster boundary
{@code \B}A non-word boundary
{@code \A}The beginning of the input
{@code \G}The end of the previous match
{@code \Z}The end of the input but for the final * terminator, if any
{@code \z}The end of the input
{@code \z}The end of the input
Linebreak matcher
 
Linebreak matcher
{@code \R}Any Unicode linebreak sequence, is equivalent to + *
{@code \R}Any Unicode linebreak sequence, is equivalent to * \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029] *
 
Unicode Extended Grapheme matcher
{@code \X}Any Unicode extended grapheme cluster
 
Greedy quantifiers
X{@code ?}X, once or not at all
X{@code *}X, zero or more times
X{@code +}X, one or more times
X{n}X, exactly n times
X{n{@code ,}}X, at least n times
X{n{@code ,}m}X, at least n but not more than m times
 
Reluctant quantifiers
X{@code ??}X, once or not at all
X{@code *?}X, zero or more times
X{@code +?}X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n{@code ,}m}?X, at least n but not more than m times
 
Possessive quantifiers
X{@code ?+}X, once or not at all
X{@code *+}X, zero or more times
X{@code ++}X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n{@code ,}m}+X, at least n but not more than m times
 
Logical operators
XYX followed by Y
X{@code |}YEither X or Y
{@code (}X{@code )}X, as a capturing group
Unicode Extended Grapheme matcher
 
Back references
{@code \X}Any Unicode extended grapheme cluster
{@code \}nWhatever the nth - * capturing group matched
Greedy quantifiers
X{@code ?}X, once or not at all
X{@code *}X, zero or more times
X{@code +}X, one or more times
X{n}X, exactly n times
X{n{@code ,}}X, at least n times
X{n{@code ,}m}X, at least n but not more than m times
Reluctant quantifiers
X{@code ??}X, once or not at all
X{@code *?}X, zero or more times
X{@code +?}X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n{@code ,}m}?X, at least n but not more than m times
Possessive quantifiers
X{@code ?+}X, once or not at all
X{@code *+}X, zero or more times
X{@code ++}X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n{@code ,}m}+X, at least n but not more than m times
Logical operators
XYX followed by Y
X{@code |}YEither X or Y
{@code (}X{@code )}X, as a capturing group
{@code \}k<name>Whatever the + *
Back references
{@code \}nWhatever the nth + * capturing group matched
{@code \}k<name>Whatever the * named-capturing group "name" matched
 
Quotation
Quotation
{@code \}Nothing, but quotes the following character
{@code \Q}Nothing, but quotes all characters until {@code \E}
{@code \E}Nothing, but ends quoting started by {@code \Q}
{@code \}Nothing, but quotes the following character
{@code \Q}Nothing, but quotes all characters until {@code \E}
{@code \E}Nothing, but ends quoting started by {@code \Q}
 
Special constructs (named-capturing and non-capturing)
Special constructs (named-capturing and non-capturing)
(?<name>X{@code )}X, as a named-capturing group
{@code (?:}X{@code )}X, as a non-capturing group
(?idmsuxU-idmsuxU) Nothing, but turns match flags i + *
(?<name>X{@code )}X, as a named-capturing group
{@code (?:}X{@code )}X, as a non-capturing group
(?idmsuxU-idmsuxU) Nothing, but turns match flags i * d m s * u x U * on - off
(?idmsux-idmsux:X{@code )}  X, as a non-capturing group with the + *
(?idmsux-idmsux:X{@code )}  X, as a non-capturing group with the * given flags i d * m s u * x on - off
{@code (?=}X{@code )}X, via zero-width positive lookahead
{@code (?!}X{@code )}X, via zero-width negative lookahead
{@code (?<=}X{@code )}X, via zero-width positive lookbehind
{@code (?X{@code )}X, via zero-width negative lookbehind
{@code (?>}X{@code )}X, as an independent, non-capturing group
{@code (?=}X{@code )}X, via zero-width positive lookahead
{@code (?!}X{@code )}X, via zero-width negative lookahead
{@code (?<=}X{@code )}X, via zero-width positive lookbehind
{@code (?X{@code )}X, via zero-width negative lookbehind
{@code (?>}X{@code )}X, as an independent, non-capturing group
@@ -432,26 +419,29 @@ *

The precedence of character-class operators is as follows, from * highest to lowest: * - *

+ *
* + * + * * - * + * * * - * + * * * - * + * * * - * + * * * - * + * * * * - *
Precedence of character class operators.
PrecedenceNameExample + *
1    
1Literal escape    {@code \x}
2    
2Grouping{@code [...]}
3    
3Range{@code a-z}
4    
4Union{@code [a-e][i-u]}
5    
5Intersection{@code [a-z&&[aeiou]]}
+ * * *

Note that a different set of metacharacters are in effect inside * a character class than outside a character class. For instance, the @@ -467,18 +457,18 @@ * *

*

If {@link #UNIX_LINES} mode is activated, then the only line terminators @@ -501,19 +491,12 @@ * left to right. In the expression {@code ((A)(B(C)))}, for example, there * are four such groups:

* - *
- * - * - * - * - * - * - * - * - * - * - * - *
Capturing group numberings
1    {@code ((A)(B(C)))}
2    {@code (A)}
3    {@code (B(C))}
4    {@code (C)}
+ *
    + *
  1. {@code ((A)(B(C)))} + *
  2. {@code (A)} + *
  3. {@code (B(C))} + *
  4. {@code (C)} + *
* *

Group zero always stands for the entire expression. * @@ -649,52 +632,52 @@ * of Unicode Regular Expression * , when {@link #UNICODE_CHARACTER_CLASS} flag is specified. * - * + *
* * - * - * - * + * + * + * * * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * - * + * * * *
predefined and posix character classes in Unicode mode
ClassesMatches
ClassesMatches
{@code \p{Lower}}
{@code \p{Lower}}A lowercase character:{@code \p{IsLowercase}}
{@code \p{Upper}}
{@code \p{Upper}}An uppercase character:{@code \p{IsUppercase}}
{@code \p{ASCII}}
{@code \p{ASCII}}All ASCII:{@code [\x00-\x7F]}
{@code \p{Alpha}}
{@code \p{Alpha}}An alphabetic character:{@code \p{IsAlphabetic}}
{@code \p{Digit}}
{@code \p{Digit}}A decimal digit character:{@code \p{IsDigit}}
{@code \p{Alnum}}
{@code \p{Alnum}}An alphanumeric character:{@code [\p{IsAlphabetic}\p{IsDigit}]}
{@code \p{Punct}}
{@code \p{Punct}}A punctuation character:{@code \p{IsPunctuation}}
{@code \p{Graph}}
{@code \p{Graph}}A visible character: {@code [^\p{IsWhite_Space}\p{gc=Cc}\p{gc=Cs}\p{gc=Cn}]}
{@code \p{Print}}
{@code \p{Print}}A printable character: {@code [\p{Graph}\p{Blank}&&[^\p{Cntrl}]]}
{@code \p{Blank}}
{@code \p{Blank}}A space or a tab: {@code [\p{IsWhite_Space}&&[^\p{gc=Zl}\p{gc=Zp}\x0a\x0b\x0c\x0d\x85]]}
{@code \p{Cntrl}}
{@code \p{Cntrl}}A control character: {@code \p{gc=Cc}}
{@code \p{XDigit}}
{@code \p{XDigit}}A hexadecimal digit: {@code [\p{gc=Nd}\p{IsHex_Digit}]}
{@code \p{Space}}
{@code \p{Space}}A whitespace character:{@code \p{IsWhite_Space}}
{@code \d}
{@code \d}A digit: {@code \p{IsDigit}}
{@code \D}
{@code \D}A non-digit: {@code [^\d]}
{@code \s}
{@code \s}A whitespace character: {@code \p{IsWhite_Space}}
{@code \S}
{@code \S}A non-whitespace character: {@code [^\s]}
{@code \w}
{@code \w}A word character: {@code [\p{Alpha}\p{gc=Mn}\p{gc=Me}\p{gc=Mc}\p{Digit}\p{gc=Pc}\p{IsJoin_Control}]}
{@code \W}
{@code \W}A non-word character: {@code [^\w]}
@@ -1219,34 +1202,36 @@ *

The input {@code "boo:and:foo"}, for example, yields the following * results with these parameters: * - *

- * + *
Split examples showing regex, limit, and result
+ * * - * - * - * + * + * + * + * + * * * - * - * + * + * * - * - * + * + * * - * - * + * + * * - * - * + * + * * - * - * + * + * * - * - * + * + * * * - *
Split example showing regex, limit, and result
Regex    Limit    Result    
RegexLimitResult
:2
:2{@code { "boo", "and:foo" }}
:5
5{@code { "boo", "and", "foo" }}
:-2
-2{@code { "boo", "and", "foo" }}
o5
o5{@code { "b", "", ":and:f", "", "" }}
o-2
-2{@code { "b", "", ":and:f", "", "" }}
o0
0{@code { "b", "", ":and:f" }}
+ * * * @param input * The character sequence to be split @@ -1310,19 +1295,21 @@ *

The input {@code "boo:and:foo"}, for example, yields the following * results with these expressions: * - *

+ *
* * - * - * + * + * + * + * * * - * + * * - * + * * * - *
Split examples showing regex and result
Regex    Result
RegexResult
:
:{@code { "boo", "and", "foo" }}
o
o{@code { "b", "", ":and:f" }}
+ * * * * @param input