111 * like isblank(). Their use is generally discouraged because the C/POSIX 112 * standards do not define their semantics beyond the ASCII range, which means 113 * that different implementations exhibit very different behavior. 114 * Instead, Unicode properties should be used directly. 115 * </p> 116 * <p> 117 * There are also only a few, broad C/POSIX character classes, and they tend 118 * to be used for conflicting purposes. For example, the "isalpha()" class 119 * is sometimes used to determine word boundaries, while a more sophisticated 120 * approach would at least distinguish initial letters from continuation 121 * characters (the latter including combining marks). 122 * (In ICU, BreakIterator is the most sophisticated API for word boundaries.) 123 * Another example: There is no "istitle()" class for titlecase characters. 124 * </p> 125 * <p> 126 * ICU 3.4 and later provides API access for all twelve C/POSIX character classes. 127 * ICU implements them according to the Standard Recommendations in 128 * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions 129 * (http://www.unicode.org/reports/tr18/#Compatibility_Properties). 130 * </p> 131 * <p> 132 * API access for C/POSIX character classes is as follows: 133 * - alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC) 134 * - lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE) 135 * - upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE) 136 * - punct: ((1<<getType(c)) & ((1<<DASH_PUNCTUATION)|(1<<START_PUNCTUATION)|(1<<END_PUNCTUATION)|(1<<CONNECTOR_PUNCTUATION)|(1<<OTHER_PUNCTUATION)|(1<<INITIAL_PUNCTUATION)|(1<<FINAL_PUNCTUATION)))!=0 137 * - digit: isDigit(c) or getType(c)==DECIMAL_DIGIT_NUMBER 138 * - xdigit: hasBinaryProperty(c, UProperty.POSIX_XDIGIT) 139 * - alnum: hasBinaryProperty(c, UProperty.POSIX_ALNUM) 140 * - space: isUWhiteSpace(c) or hasBinaryProperty(c, UProperty.WHITE_SPACE) 141 * - blank: hasBinaryProperty(c, UProperty.POSIX_BLANK) 142 * - cntrl: getType(c)==CONTROL 143 * - graph: hasBinaryProperty(c, UProperty.POSIX_GRAPH) 144 * - print: hasBinaryProperty(c, UProperty.POSIX_PRINT) 145 * </p> 146 * <p> 147 * The C/POSIX character classes are also available in UnicodeSet patterns, 148 * using patterns like [:graph:] or \p{graph}. 149 * </p> 150 * <p> 151 * Note: There are several ICU (and Java) whitespace functions. 152 * Comparison: 153 * - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property; 154 * most of general categories "Z" (separators) + most whitespace ISO controls 155 * (including no-break spaces, but excluding IS1..IS4 and ZWSP) 156 * - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces 157 * - isSpaceChar: just Z (including no-break spaces) 158 * </p> 159 * <p> 160 * This class is not subclassable 161 * </p> 162 * @author Syn Wee Quek 163 * @stable ICU 2.1 164 * @see com.ibm.icu.lang.UCharacterEnums 165 */ | 111 * like isblank(). Their use is generally discouraged because the C/POSIX 112 * standards do not define their semantics beyond the ASCII range, which means 113 * that different implementations exhibit very different behavior. 114 * Instead, Unicode properties should be used directly. 115 * </p> 116 * <p> 117 * There are also only a few, broad C/POSIX character classes, and they tend 118 * to be used for conflicting purposes. For example, the "isalpha()" class 119 * is sometimes used to determine word boundaries, while a more sophisticated 120 * approach would at least distinguish initial letters from continuation 121 * characters (the latter including combining marks). 122 * (In ICU, BreakIterator is the most sophisticated API for word boundaries.) 123 * Another example: There is no "istitle()" class for titlecase characters. 124 * </p> 125 * <p> 126 * ICU 3.4 and later provides API access for all twelve C/POSIX character classes. 127 * ICU implements them according to the Standard Recommendations in 128 * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions 129 * (http://www.unicode.org/reports/tr18/#Compatibility_Properties). 130 * </p> 131 * <pre>{@code 132 * API access for C/POSIX character classes is as follows: 133 * - alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC) 134 * - lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE) 135 * - upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE) 136 * - punct: ((1<<getType(c)) & ((1<<DASH_PUNCTUATION)|(1<<START_PUNCTUATION)|(1<<END_PUNCTUATION)|(1<<CONNECTOR_PUNCTUATION)|(1<<OTHER_PUNCTUATION)|(1<<INITIAL_PUNCTUATION)|(1<<FINAL_PUNCTUATION)))!=0 137 * - digit: isDigit(c) or getType(c)==DECIMAL_DIGIT_NUMBER 138 * - xdigit: hasBinaryProperty(c, UProperty.POSIX_XDIGIT) 139 * - alnum: hasBinaryProperty(c, UProperty.POSIX_ALNUM) 140 * - space: isUWhiteSpace(c) or hasBinaryProperty(c, UProperty.WHITE_SPACE) 141 * - blank: hasBinaryProperty(c, UProperty.POSIX_BLANK) 142 * - cntrl: getType(c)==CONTROL 143 * - graph: hasBinaryProperty(c, UProperty.POSIX_GRAPH) 144 * - print: hasBinaryProperty(c, UProperty.POSIX_PRINT) 145 * }</pre> 146 * <p> 147 * The C/POSIX character classes are also available in UnicodeSet patterns, 148 * using patterns like [:graph:] or \p{graph}. 149 * </p> 150 * <p> 151 * Note: There are several ICU (and Java) whitespace functions. 152 * Comparison: 153 * - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property; 154 * most of general categories "Z" (separators) + most whitespace ISO controls 155 * (including no-break spaces, but excluding IS1..IS4 and ZWSP) 156 * - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces 157 * - isSpaceChar: just Z (including no-break spaces) 158 * </p> 159 * <p> 160 * This class is not subclassable 161 * </p> 162 * @author Syn Wee Quek 163 * @stable ICU 2.1 164 * @see com.ibm.icu.lang.UCharacterEnums 165 */ |