< prev index next >

src/java.base/share/classes/sun/text/normalizer/UCharacter.java

Print this page




 111  * like isblank(). Their use is generally discouraged because the C/POSIX
 112  * standards do not define their semantics beyond the ASCII range, which means
 113  * that different implementations exhibit very different behavior.
 114  * Instead, Unicode properties should be used directly.
 115  * </p>
 116  * <p>
 117  * There are also only a few, broad C/POSIX character classes, and they tend
 118  * to be used for conflicting purposes. For example, the "isalpha()" class
 119  * is sometimes used to determine word boundaries, while a more sophisticated
 120  * approach would at least distinguish initial letters from continuation
 121  * characters (the latter including combining marks).
 122  * (In ICU, BreakIterator is the most sophisticated API for word boundaries.)
 123  * Another example: There is no "istitle()" class for titlecase characters.
 124  * </p>
 125  * <p>
 126  * ICU 3.4 and later provides API access for all twelve C/POSIX character classes.
 127  * ICU implements them according to the Standard Recommendations in
 128  * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
 129  * (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
 130  * </p>
 131  * <p>
 132  * API access for C/POSIX character classes is as follows:
 133  * - alpha:     isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
 134  * - lower:     isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
 135  * - upper:     isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
 136  * - punct:     ((1<<getType(c)) & ((1<<DASH_PUNCTUATION)|(1<<START_PUNCTUATION)|(1<<END_PUNCTUATION)|(1<<CONNECTOR_PUNCTUATION)|(1<<OTHER_PUNCTUATION)|(1<<INITIAL_PUNCTUATION)|(1<<FINAL_PUNCTUATION)))!=0
 137  * - digit:     isDigit(c) or getType(c)==DECIMAL_DIGIT_NUMBER
 138  * - xdigit:    hasBinaryProperty(c, UProperty.POSIX_XDIGIT)
 139  * - alnum:     hasBinaryProperty(c, UProperty.POSIX_ALNUM)
 140  * - space:     isUWhiteSpace(c) or hasBinaryProperty(c, UProperty.WHITE_SPACE)
 141  * - blank:     hasBinaryProperty(c, UProperty.POSIX_BLANK)
 142  * - cntrl:     getType(c)==CONTROL
 143  * - graph:     hasBinaryProperty(c, UProperty.POSIX_GRAPH)
 144  * - print:     hasBinaryProperty(c, UProperty.POSIX_PRINT)
 145  * </p>
 146  * <p>
 147  * The C/POSIX character classes are also available in UnicodeSet patterns,
 148  * using patterns like [:graph:] or \p{graph}.
 149  * </p>
 150  * <p>
 151  * Note: There are several ICU (and Java) whitespace functions.
 152  * Comparison:
 153  * - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
 154  *       most of general categories "Z" (separators) + most whitespace ISO controls
 155  *       (including no-break spaces, but excluding IS1..IS4 and ZWSP)
 156  * - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
 157  * - isSpaceChar: just Z (including no-break spaces)
 158  * </p>
 159  * <p>
 160  * This class is not subclassable
 161  * </p>
 162  * @author Syn Wee Quek
 163  * @stable ICU 2.1
 164  * @see com.ibm.icu.lang.UCharacterEnums
 165  */




 111  * like isblank(). Their use is generally discouraged because the C/POSIX
 112  * standards do not define their semantics beyond the ASCII range, which means
 113  * that different implementations exhibit very different behavior.
 114  * Instead, Unicode properties should be used directly.
 115  * </p>
 116  * <p>
 117  * There are also only a few, broad C/POSIX character classes, and they tend
 118  * to be used for conflicting purposes. For example, the "isalpha()" class
 119  * is sometimes used to determine word boundaries, while a more sophisticated
 120  * approach would at least distinguish initial letters from continuation
 121  * characters (the latter including combining marks).
 122  * (In ICU, BreakIterator is the most sophisticated API for word boundaries.)
 123  * Another example: There is no "istitle()" class for titlecase characters.
 124  * </p>
 125  * <p>
 126  * ICU 3.4 and later provides API access for all twelve C/POSIX character classes.
 127  * ICU implements them according to the Standard Recommendations in
 128  * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
 129  * (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
 130  * </p>
 131  * <pre>{@code
 132  * API access for C/POSIX character classes is as follows:
 133  * - alpha:     isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
 134  * - lower:     isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
 135  * - upper:     isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
 136  * - punct:     ((1<<getType(c)) & ((1<<DASH_PUNCTUATION)|(1<<START_PUNCTUATION)|(1<<END_PUNCTUATION)|(1<<CONNECTOR_PUNCTUATION)|(1<<OTHER_PUNCTUATION)|(1<<INITIAL_PUNCTUATION)|(1<<FINAL_PUNCTUATION)))!=0
 137  * - digit:     isDigit(c) or getType(c)==DECIMAL_DIGIT_NUMBER
 138  * - xdigit:    hasBinaryProperty(c, UProperty.POSIX_XDIGIT)
 139  * - alnum:     hasBinaryProperty(c, UProperty.POSIX_ALNUM)
 140  * - space:     isUWhiteSpace(c) or hasBinaryProperty(c, UProperty.WHITE_SPACE)
 141  * - blank:     hasBinaryProperty(c, UProperty.POSIX_BLANK)
 142  * - cntrl:     getType(c)==CONTROL
 143  * - graph:     hasBinaryProperty(c, UProperty.POSIX_GRAPH)
 144  * - print:     hasBinaryProperty(c, UProperty.POSIX_PRINT)
 145  * }</pre>
 146  * <p>
 147  * The C/POSIX character classes are also available in UnicodeSet patterns,
 148  * using patterns like [:graph:] or \p{graph}.
 149  * </p>
 150  * <p>
 151  * Note: There are several ICU (and Java) whitespace functions.
 152  * Comparison:
 153  * - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
 154  *       most of general categories "Z" (separators) + most whitespace ISO controls
 155  *       (including no-break spaces, but excluding IS1..IS4 and ZWSP)
 156  * - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
 157  * - isSpaceChar: just Z (including no-break spaces)
 158  * </p>
 159  * <p>
 160  * This class is not subclassable
 161  * </p>
 162  * @author Syn Wee Quek
 163  * @stable ICU 2.1
 164  * @see com.ibm.icu.lang.UCharacterEnums
 165  */


< prev index next >