1 /* 2 * Copyright (c) 2002, 2019, Oracle and/or its affiliates. All rights reserved. 3 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 4 * 5 * This code is free software; you can redistribute it and/or modify it 6 * under the terms of the GNU General Public License version 2 only, as 7 * published by the Free Software Foundation. Oracle designates this 8 * particular file as subject to the "Classpath" exception as provided 9 * by Oracle in the LICENSE file that accompanied this code. 10 * 11 * This code is distributed in the hope that it will be useful, but WITHOUT 12 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 13 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 14 * version 2 for more details (a copy is included in the LICENSE file that 15 * accompanied this code). 16 * 17 * You should have received a copy of the GNU General Public License version 18 * 2 along with this work; if not, write to the Free Software Foundation, 19 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 20 * 21 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 22 * or visit www.oracle.com if you need additional information or have any 23 * questions. 24 */ 25 26 package java.lang; 27 28 import java.util.Arrays; 29 import java.util.Map; 30 import java.util.HashMap; 31 import java.util.Locale; 32 33 /** 34 * The {@code Character} class wraps a value of the primitive 35 * type {@code char} in an object. An object of class 36 * {@code Character} contains a single field whose type is 37 * {@code char}. 38 * <p> 39 * In addition, this class provides a large number of static methods for 40 * determining a character's category (lowercase letter, digit, etc.) 41 * and for converting characters from uppercase to lowercase and vice 42 * versa. 43 * 44 * <h3><a id="conformance">Unicode Conformance</a></h3> 45 * <p> 46 * The fields and methods of class {@code Character} are defined in terms 47 * of character information from the Unicode Standard, specifically the 48 * <i>UnicodeData</i> file that is part of the Unicode Character Database. 49 * This file specifies properties including name and category for every 50 * assigned Unicode code point or character range. The file is available 51 * from the Unicode Consortium at 52 * <a href="http://www.unicode.org">http://www.unicode.org</a>. 53 * <p> 54 * The Java SE 8 Platform uses character information from version 6.2 55 * of the Unicode Standard, with two extensions. First, the Java SE 8 Platform 56 * allows an implementation of class {@code Character} to use the Japanese Era 57 * code point, {@code U+32FF}, from the first version of the Unicode Standard 58 * after 6.2 that assigns the code point. Second, in recognition of the fact 59 * that new currencies appear frequently, the Java SE 8 Platform allows an 60 * implementation of class {@code Character} to use the Currency Symbols 61 * block from version 10.0 of the Unicode Standard. Consequently, the 62 * behavior of fields and methods of class {@code Character} may vary across 63 * implementations of the Java SE 8 Platform when processing the aforementioned 64 * code points ( outside of version 6.2 ), except for the following methods 65 * that define Java identifiers: 66 * {@link #isJavaIdentifierStart(int)}, {@link #isJavaIdentifierStart(char)}, 67 * {@link #isJavaIdentifierPart(int)}, and {@link #isJavaIdentifierPart(char)}. 68 * Code points in Java identifiers must be drawn from version 6.2 of 69 * the Unicode Standard. 70 * 71 * <h3><a name="unicode">Unicode Character Representations</a></h3> 72 * 73 * <p>The {@code char} data type (and therefore the value that a 74 * {@code Character} object encapsulates) are based on the 75 * original Unicode specification, which defined characters as 76 * fixed-width 16-bit entities. The Unicode Standard has since been 77 * changed to allow for characters whose representation requires more 78 * than 16 bits. The range of legal <em>code point</em>s is now 79 * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>. 80 * (Refer to the <a 81 * href="http://www.unicode.org/reports/tr27/#notation"><i> 82 * definition</i></a> of the U+<i>n</i> notation in the Unicode 83 * Standard.) 84 * 85 * <p><a name="BMP">The set of characters from U+0000 to U+FFFF</a> is 86 * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>. 87 * <a name="supplementary">Characters</a> whose code points are greater 88 * than U+FFFF are called <em>supplementary character</em>s. The Java 89 * platform uses the UTF-16 representation in {@code char} arrays and 90 * in the {@code String} and {@code StringBuffer} classes. In 91 * this representation, supplementary characters are represented as a pair 92 * of {@code char} values, the first from the <em>high-surrogates</em> 93 * range, (\uD800-\uDBFF), the second from the 94 * <em>low-surrogates</em> range (\uDC00-\uDFFF). 95 * 96 * <p>A {@code char} value, therefore, represents Basic 97 * Multilingual Plane (BMP) code points, including the surrogate 98 * code points, or code units of the UTF-16 encoding. An 99 * {@code int} value represents all Unicode code points, 100 * including supplementary code points. The lower (least significant) 101 * 21 bits of {@code int} are used to represent Unicode code 102 * points and the upper (most significant) 11 bits must be zero. 103 * Unless otherwise specified, the behavior with respect to 104 * supplementary characters and surrogate {@code char} values is 105 * as follows: 106 * 107 * <ul> 108 * <li>The methods that only accept a {@code char} value cannot support 109 * supplementary characters. They treat {@code char} values from the 110 * surrogate ranges as undefined characters. For example, 111 * {@code Character.isLetter('\u005CuD840')} returns {@code false}, even though 112 * this specific value if followed by any low-surrogate value in a string 113 * would represent a letter. 114 * 115 * <li>The methods that accept an {@code int} value support all 116 * Unicode characters, including supplementary characters. For 117 * example, {@code Character.isLetter(0x2F81A)} returns 118 * {@code true} because the code point value represents a letter 119 * (a CJK ideograph). 120 * </ul> 121 * 122 * <p>In the Java SE API documentation, <em>Unicode code point</em> is 123 * used for character values in the range between U+0000 and U+10FFFF, 124 * and <em>Unicode code unit</em> is used for 16-bit 125 * {@code char} values that are code units of the <em>UTF-16</em> 126 * encoding. For more information on Unicode terminology, refer to the 127 * <a href="http://www.unicode.org/glossary/">Unicode Glossary</a>. 128 * 129 * @author Lee Boynton 130 * @author Guy Steele 131 * @author Akira Tanaka 132 * @author Martin Buchholz 133 * @author Ulf Zibis 134 * @since 1.0 135 */ 136 public final 137 class Character implements java.io.Serializable, Comparable<Character> { 138 /** 139 * The minimum radix available for conversion to and from strings. 140 * The constant value of this field is the smallest value permitted 141 * for the radix argument in radix-conversion methods such as the 142 * {@code digit} method, the {@code forDigit} method, and the 143 * {@code toString} method of class {@code Integer}. 144 * 145 * @see Character#digit(char, int) 146 * @see Character#forDigit(int, int) 147 * @see Integer#toString(int, int) 148 * @see Integer#valueOf(String) 149 */ 150 public static final int MIN_RADIX = 2; 151 152 /** 153 * The maximum radix available for conversion to and from strings. 154 * The constant value of this field is the largest value permitted 155 * for the radix argument in radix-conversion methods such as the 156 * {@code digit} method, the {@code forDigit} method, and the 157 * {@code toString} method of class {@code Integer}. 158 * 159 * @see Character#digit(char, int) 160 * @see Character#forDigit(int, int) 161 * @see Integer#toString(int, int) 162 * @see Integer#valueOf(String) 163 */ 164 public static final int MAX_RADIX = 36; 165 166 /** 167 * The constant value of this field is the smallest value of type 168 * {@code char}, {@code '\u005Cu0000'}. 169 * 170 * @since 1.0.2 171 */ 172 public static final char MIN_VALUE = '\u0000'; 173 174 /** 175 * The constant value of this field is the largest value of type 176 * {@code char}, {@code '\u005CuFFFF'}. 177 * 178 * @since 1.0.2 179 */ 180 public static final char MAX_VALUE = '\uFFFF'; 181 182 /** 183 * The {@code Class} instance representing the primitive type 184 * {@code char}. 185 * 186 * @since 1.1 187 */ 188 @SuppressWarnings("unchecked") 189 public static final Class<Character> TYPE = (Class<Character>) Class.getPrimitiveClass("char"); 190 191 /* 192 * Normative general types 193 */ 194 195 /* 196 * General character types 197 */ 198 199 /** 200 * General category "Cn" in the Unicode specification. 201 * @since 1.1 202 */ 203 public static final byte UNASSIGNED = 0; 204 205 /** 206 * General category "Lu" in the Unicode specification. 207 * @since 1.1 208 */ 209 public static final byte UPPERCASE_LETTER = 1; 210 211 /** 212 * General category "Ll" in the Unicode specification. 213 * @since 1.1 214 */ 215 public static final byte LOWERCASE_LETTER = 2; 216 217 /** 218 * General category "Lt" in the Unicode specification. 219 * @since 1.1 220 */ 221 public static final byte TITLECASE_LETTER = 3; 222 223 /** 224 * General category "Lm" in the Unicode specification. 225 * @since 1.1 226 */ 227 public static final byte MODIFIER_LETTER = 4; 228 229 /** 230 * General category "Lo" in the Unicode specification. 231 * @since 1.1 232 */ 233 public static final byte OTHER_LETTER = 5; 234 235 /** 236 * General category "Mn" in the Unicode specification. 237 * @since 1.1 238 */ 239 public static final byte NON_SPACING_MARK = 6; 240 241 /** 242 * General category "Me" in the Unicode specification. 243 * @since 1.1 244 */ 245 public static final byte ENCLOSING_MARK = 7; 246 247 /** 248 * General category "Mc" in the Unicode specification. 249 * @since 1.1 250 */ 251 public static final byte COMBINING_SPACING_MARK = 8; 252 253 /** 254 * General category "Nd" in the Unicode specification. 255 * @since 1.1 256 */ 257 public static final byte DECIMAL_DIGIT_NUMBER = 9; 258 259 /** 260 * General category "Nl" in the Unicode specification. 261 * @since 1.1 262 */ 263 public static final byte LETTER_NUMBER = 10; 264 265 /** 266 * General category "No" in the Unicode specification. 267 * @since 1.1 268 */ 269 public static final byte OTHER_NUMBER = 11; 270 271 /** 272 * General category "Zs" in the Unicode specification. 273 * @since 1.1 274 */ 275 public static final byte SPACE_SEPARATOR = 12; 276 277 /** 278 * General category "Zl" in the Unicode specification. 279 * @since 1.1 280 */ 281 public static final byte LINE_SEPARATOR = 13; 282 283 /** 284 * General category "Zp" in the Unicode specification. 285 * @since 1.1 286 */ 287 public static final byte PARAGRAPH_SEPARATOR = 14; 288 289 /** 290 * General category "Cc" in the Unicode specification. 291 * @since 1.1 292 */ 293 public static final byte CONTROL = 15; 294 295 /** 296 * General category "Cf" in the Unicode specification. 297 * @since 1.1 298 */ 299 public static final byte FORMAT = 16; 300 301 /** 302 * General category "Co" in the Unicode specification. 303 * @since 1.1 304 */ 305 public static final byte PRIVATE_USE = 18; 306 307 /** 308 * General category "Cs" in the Unicode specification. 309 * @since 1.1 310 */ 311 public static final byte SURROGATE = 19; 312 313 /** 314 * General category "Pd" in the Unicode specification. 315 * @since 1.1 316 */ 317 public static final byte DASH_PUNCTUATION = 20; 318 319 /** 320 * General category "Ps" in the Unicode specification. 321 * @since 1.1 322 */ 323 public static final byte START_PUNCTUATION = 21; 324 325 /** 326 * General category "Pe" in the Unicode specification. 327 * @since 1.1 328 */ 329 public static final byte END_PUNCTUATION = 22; 330 331 /** 332 * General category "Pc" in the Unicode specification. 333 * @since 1.1 334 */ 335 public static final byte CONNECTOR_PUNCTUATION = 23; 336 337 /** 338 * General category "Po" in the Unicode specification. 339 * @since 1.1 340 */ 341 public static final byte OTHER_PUNCTUATION = 24; 342 343 /** 344 * General category "Sm" in the Unicode specification. 345 * @since 1.1 346 */ 347 public static final byte MATH_SYMBOL = 25; 348 349 /** 350 * General category "Sc" in the Unicode specification. 351 * @since 1.1 352 */ 353 public static final byte CURRENCY_SYMBOL = 26; 354 355 /** 356 * General category "Sk" in the Unicode specification. 357 * @since 1.1 358 */ 359 public static final byte MODIFIER_SYMBOL = 27; 360 361 /** 362 * General category "So" in the Unicode specification. 363 * @since 1.1 364 */ 365 public static final byte OTHER_SYMBOL = 28; 366 367 /** 368 * General category "Pi" in the Unicode specification. 369 * @since 1.4 370 */ 371 public static final byte INITIAL_QUOTE_PUNCTUATION = 29; 372 373 /** 374 * General category "Pf" in the Unicode specification. 375 * @since 1.4 376 */ 377 public static final byte FINAL_QUOTE_PUNCTUATION = 30; 378 379 /** 380 * Error flag. Use int (code point) to avoid confusion with U+FFFF. 381 */ 382 static final int ERROR = 0xFFFFFFFF; 383 384 385 /** 386 * Undefined bidirectional character type. Undefined {@code char} 387 * values have undefined directionality in the Unicode specification. 388 * @since 1.4 389 */ 390 public static final byte DIRECTIONALITY_UNDEFINED = -1; 391 392 /** 393 * Strong bidirectional character type "L" in the Unicode specification. 394 * @since 1.4 395 */ 396 public static final byte DIRECTIONALITY_LEFT_TO_RIGHT = 0; 397 398 /** 399 * Strong bidirectional character type "R" in the Unicode specification. 400 * @since 1.4 401 */ 402 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT = 1; 403 404 /** 405 * Strong bidirectional character type "AL" in the Unicode specification. 406 * @since 1.4 407 */ 408 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC = 2; 409 410 /** 411 * Weak bidirectional character type "EN" in the Unicode specification. 412 * @since 1.4 413 */ 414 public static final byte DIRECTIONALITY_EUROPEAN_NUMBER = 3; 415 416 /** 417 * Weak bidirectional character type "ES" in the Unicode specification. 418 * @since 1.4 419 */ 420 public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR = 4; 421 422 /** 423 * Weak bidirectional character type "ET" in the Unicode specification. 424 * @since 1.4 425 */ 426 public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR = 5; 427 428 /** 429 * Weak bidirectional character type "AN" in the Unicode specification. 430 * @since 1.4 431 */ 432 public static final byte DIRECTIONALITY_ARABIC_NUMBER = 6; 433 434 /** 435 * Weak bidirectional character type "CS" in the Unicode specification. 436 * @since 1.4 437 */ 438 public static final byte DIRECTIONALITY_COMMON_NUMBER_SEPARATOR = 7; 439 440 /** 441 * Weak bidirectional character type "NSM" in the Unicode specification. 442 * @since 1.4 443 */ 444 public static final byte DIRECTIONALITY_NONSPACING_MARK = 8; 445 446 /** 447 * Weak bidirectional character type "BN" in the Unicode specification. 448 * @since 1.4 449 */ 450 public static final byte DIRECTIONALITY_BOUNDARY_NEUTRAL = 9; 451 452 /** 453 * Neutral bidirectional character type "B" in the Unicode specification. 454 * @since 1.4 455 */ 456 public static final byte DIRECTIONALITY_PARAGRAPH_SEPARATOR = 10; 457 458 /** 459 * Neutral bidirectional character type "S" in the Unicode specification. 460 * @since 1.4 461 */ 462 public static final byte DIRECTIONALITY_SEGMENT_SEPARATOR = 11; 463 464 /** 465 * Neutral bidirectional character type "WS" in the Unicode specification. 466 * @since 1.4 467 */ 468 public static final byte DIRECTIONALITY_WHITESPACE = 12; 469 470 /** 471 * Neutral bidirectional character type "ON" in the Unicode specification. 472 * @since 1.4 473 */ 474 public static final byte DIRECTIONALITY_OTHER_NEUTRALS = 13; 475 476 /** 477 * Strong bidirectional character type "LRE" in the Unicode specification. 478 * @since 1.4 479 */ 480 public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING = 14; 481 482 /** 483 * Strong bidirectional character type "LRO" in the Unicode specification. 484 * @since 1.4 485 */ 486 public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE = 15; 487 488 /** 489 * Strong bidirectional character type "RLE" in the Unicode specification. 490 * @since 1.4 491 */ 492 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING = 16; 493 494 /** 495 * Strong bidirectional character type "RLO" in the Unicode specification. 496 * @since 1.4 497 */ 498 public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE = 17; 499 500 /** 501 * Weak bidirectional character type "PDF" in the Unicode specification. 502 * @since 1.4 503 */ 504 public static final byte DIRECTIONALITY_POP_DIRECTIONAL_FORMAT = 18; 505 506 /** 507 * The minimum value of a 508 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> 509 * Unicode high-surrogate code unit</a> 510 * in the UTF-16 encoding, constant {@code '\u005CuD800'}. 511 * A high-surrogate is also known as a <i>leading-surrogate</i>. 512 * 513 * @since 1.5 514 */ 515 public static final char MIN_HIGH_SURROGATE = '\uD800'; 516 517 /** 518 * The maximum value of a 519 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> 520 * Unicode high-surrogate code unit</a> 521 * in the UTF-16 encoding, constant {@code '\u005CuDBFF'}. 522 * A high-surrogate is also known as a <i>leading-surrogate</i>. 523 * 524 * @since 1.5 525 */ 526 public static final char MAX_HIGH_SURROGATE = '\uDBFF'; 527 528 /** 529 * The minimum value of a 530 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> 531 * Unicode low-surrogate code unit</a> 532 * in the UTF-16 encoding, constant {@code '\u005CuDC00'}. 533 * A low-surrogate is also known as a <i>trailing-surrogate</i>. 534 * 535 * @since 1.5 536 */ 537 public static final char MIN_LOW_SURROGATE = '\uDC00'; 538 539 /** 540 * The maximum value of a 541 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> 542 * Unicode low-surrogate code unit</a> 543 * in the UTF-16 encoding, constant {@code '\u005CuDFFF'}. 544 * A low-surrogate is also known as a <i>trailing-surrogate</i>. 545 * 546 * @since 1.5 547 */ 548 public static final char MAX_LOW_SURROGATE = '\uDFFF'; 549 550 /** 551 * The minimum value of a Unicode surrogate code unit in the 552 * UTF-16 encoding, constant {@code '\u005CuD800'}. 553 * 554 * @since 1.5 555 */ 556 public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE; 557 558 /** 559 * The maximum value of a Unicode surrogate code unit in the 560 * UTF-16 encoding, constant {@code '\u005CuDFFF'}. 561 * 562 * @since 1.5 563 */ 564 public static final char MAX_SURROGATE = MAX_LOW_SURROGATE; 565 566 /** 567 * The minimum value of a 568 * <a href="http://www.unicode.org/glossary/#supplementary_code_point"> 569 * Unicode supplementary code point</a>, constant {@code U+10000}. 570 * 571 * @since 1.5 572 */ 573 public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000; 574 575 /** 576 * The minimum value of a 577 * <a href="http://www.unicode.org/glossary/#code_point"> 578 * Unicode code point</a>, constant {@code U+0000}. 579 * 580 * @since 1.5 581 */ 582 public static final int MIN_CODE_POINT = 0x000000; 583 584 /** 585 * The maximum value of a 586 * <a href="http://www.unicode.org/glossary/#code_point"> 587 * Unicode code point</a>, constant {@code U+10FFFF}. 588 * 589 * @since 1.5 590 */ 591 public static final int MAX_CODE_POINT = 0X10FFFF; 592 593 594 /** 595 * Instances of this class represent particular subsets of the Unicode 596 * character set. The only family of subsets defined in the 597 * {@code Character} class is {@link Character.UnicodeBlock}. 598 * Other portions of the Java API may define other subsets for their 599 * own purposes. 600 * 601 * @since 1.2 602 */ 603 public static class Subset { 604 605 private String name; 606 607 /** 608 * Constructs a new {@code Subset} instance. 609 * 610 * @param name The name of this subset 611 * @exception NullPointerException if name is {@code null} 612 */ 613 protected Subset(String name) { 614 if (name == null) { 615 throw new NullPointerException("name"); 616 } 617 this.name = name; 618 } 619 620 /** 621 * Compares two {@code Subset} objects for equality. 622 * This method returns {@code true} if and only if 623 * {@code this} and the argument refer to the same 624 * object; since this method is {@code final}, this 625 * guarantee holds for all subclasses. 626 */ 627 public final boolean equals(Object obj) { 628 return (this == obj); 629 } 630 631 /** 632 * Returns the standard hash code as defined by the 633 * {@link Object#hashCode} method. This method 634 * is {@code final} in order to ensure that the 635 * {@code equals} and {@code hashCode} methods will 636 * be consistent in all subclasses. 637 */ 638 public final int hashCode() { 639 return super.hashCode(); 640 } 641 642 /** 643 * Returns the name of this subset. 644 */ 645 public final String toString() { 646 return name; 647 } 648 } 649 650 // See http://www.unicode.org/Public/UNIDATA/Blocks.txt 651 // for the latest specification of Unicode Blocks. 652 653 /** 654 * A family of character subsets representing the character blocks in the 655 * Unicode specification. Character blocks generally define characters 656 * used for a specific script or purpose. A character is contained by 657 * at most one Unicode block. 658 * 659 * @since 1.2 660 */ 661 public static final class UnicodeBlock extends Subset { 662 663 private static Map<String, UnicodeBlock> map = new HashMap<>(256); 664 665 /** 666 * Creates a UnicodeBlock with the given identifier name. 667 * This name must be the same as the block identifier. 668 */ 669 private UnicodeBlock(String idName) { 670 super(idName); 671 map.put(idName, this); 672 } 673 674 /** 675 * Creates a UnicodeBlock with the given identifier name and 676 * alias name. 677 */ 678 private UnicodeBlock(String idName, String alias) { 679 this(idName); 680 map.put(alias, this); 681 } 682 683 /** 684 * Creates a UnicodeBlock with the given identifier name and 685 * alias names. 686 */ 687 private UnicodeBlock(String idName, String... aliases) { 688 this(idName); 689 for (String alias : aliases) 690 map.put(alias, this); 691 } 692 693 /** 694 * Constant for the "Basic Latin" Unicode character block. 695 * @since 1.2 696 */ 697 public static final UnicodeBlock BASIC_LATIN = 698 new UnicodeBlock("BASIC_LATIN", 699 "BASIC LATIN", 700 "BASICLATIN"); 701 702 /** 703 * Constant for the "Latin-1 Supplement" Unicode character block. 704 * @since 1.2 705 */ 706 public static final UnicodeBlock LATIN_1_SUPPLEMENT = 707 new UnicodeBlock("LATIN_1_SUPPLEMENT", 708 "LATIN-1 SUPPLEMENT", 709 "LATIN-1SUPPLEMENT"); 710 711 /** 712 * Constant for the "Latin Extended-A" Unicode character block. 713 * @since 1.2 714 */ 715 public static final UnicodeBlock LATIN_EXTENDED_A = 716 new UnicodeBlock("LATIN_EXTENDED_A", 717 "LATIN EXTENDED-A", 718 "LATINEXTENDED-A"); 719 720 /** 721 * Constant for the "Latin Extended-B" Unicode character block. 722 * @since 1.2 723 */ 724 public static final UnicodeBlock LATIN_EXTENDED_B = 725 new UnicodeBlock("LATIN_EXTENDED_B", 726 "LATIN EXTENDED-B", 727 "LATINEXTENDED-B"); 728 729 /** 730 * Constant for the "IPA Extensions" Unicode character block. 731 * @since 1.2 732 */ 733 public static final UnicodeBlock IPA_EXTENSIONS = 734 new UnicodeBlock("IPA_EXTENSIONS", 735 "IPA EXTENSIONS", 736 "IPAEXTENSIONS"); 737 738 /** 739 * Constant for the "Spacing Modifier Letters" Unicode character block. 740 * @since 1.2 741 */ 742 public static final UnicodeBlock SPACING_MODIFIER_LETTERS = 743 new UnicodeBlock("SPACING_MODIFIER_LETTERS", 744 "SPACING MODIFIER LETTERS", 745 "SPACINGMODIFIERLETTERS"); 746 747 /** 748 * Constant for the "Combining Diacritical Marks" Unicode character block. 749 * @since 1.2 750 */ 751 public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS = 752 new UnicodeBlock("COMBINING_DIACRITICAL_MARKS", 753 "COMBINING DIACRITICAL MARKS", 754 "COMBININGDIACRITICALMARKS"); 755 756 /** 757 * Constant for the "Greek and Coptic" Unicode character block. 758 * <p> 759 * This block was previously known as the "Greek" block. 760 * 761 * @since 1.2 762 */ 763 public static final UnicodeBlock GREEK = 764 new UnicodeBlock("GREEK", 765 "GREEK AND COPTIC", 766 "GREEKANDCOPTIC"); 767 768 /** 769 * Constant for the "Cyrillic" Unicode character block. 770 * @since 1.2 771 */ 772 public static final UnicodeBlock CYRILLIC = 773 new UnicodeBlock("CYRILLIC"); 774 775 /** 776 * Constant for the "Armenian" Unicode character block. 777 * @since 1.2 778 */ 779 public static final UnicodeBlock ARMENIAN = 780 new UnicodeBlock("ARMENIAN"); 781 782 /** 783 * Constant for the "Hebrew" Unicode character block. 784 * @since 1.2 785 */ 786 public static final UnicodeBlock HEBREW = 787 new UnicodeBlock("HEBREW"); 788 789 /** 790 * Constant for the "Arabic" Unicode character block. 791 * @since 1.2 792 */ 793 public static final UnicodeBlock ARABIC = 794 new UnicodeBlock("ARABIC"); 795 796 /** 797 * Constant for the "Devanagari" Unicode character block. 798 * @since 1.2 799 */ 800 public static final UnicodeBlock DEVANAGARI = 801 new UnicodeBlock("DEVANAGARI"); 802 803 /** 804 * Constant for the "Bengali" Unicode character block. 805 * @since 1.2 806 */ 807 public static final UnicodeBlock BENGALI = 808 new UnicodeBlock("BENGALI"); 809 810 /** 811 * Constant for the "Gurmukhi" Unicode character block. 812 * @since 1.2 813 */ 814 public static final UnicodeBlock GURMUKHI = 815 new UnicodeBlock("GURMUKHI"); 816 817 /** 818 * Constant for the "Gujarati" Unicode character block. 819 * @since 1.2 820 */ 821 public static final UnicodeBlock GUJARATI = 822 new UnicodeBlock("GUJARATI"); 823 824 /** 825 * Constant for the "Oriya" Unicode character block. 826 * @since 1.2 827 */ 828 public static final UnicodeBlock ORIYA = 829 new UnicodeBlock("ORIYA"); 830 831 /** 832 * Constant for the "Tamil" Unicode character block. 833 * @since 1.2 834 */ 835 public static final UnicodeBlock TAMIL = 836 new UnicodeBlock("TAMIL"); 837 838 /** 839 * Constant for the "Telugu" Unicode character block. 840 * @since 1.2 841 */ 842 public static final UnicodeBlock TELUGU = 843 new UnicodeBlock("TELUGU"); 844 845 /** 846 * Constant for the "Kannada" Unicode character block. 847 * @since 1.2 848 */ 849 public static final UnicodeBlock KANNADA = 850 new UnicodeBlock("KANNADA"); 851 852 /** 853 * Constant for the "Malayalam" Unicode character block. 854 * @since 1.2 855 */ 856 public static final UnicodeBlock MALAYALAM = 857 new UnicodeBlock("MALAYALAM"); 858 859 /** 860 * Constant for the "Thai" Unicode character block. 861 * @since 1.2 862 */ 863 public static final UnicodeBlock THAI = 864 new UnicodeBlock("THAI"); 865 866 /** 867 * Constant for the "Lao" Unicode character block. 868 * @since 1.2 869 */ 870 public static final UnicodeBlock LAO = 871 new UnicodeBlock("LAO"); 872 873 /** 874 * Constant for the "Tibetan" Unicode character block. 875 * @since 1.2 876 */ 877 public static final UnicodeBlock TIBETAN = 878 new UnicodeBlock("TIBETAN"); 879 880 /** 881 * Constant for the "Georgian" Unicode character block. 882 * @since 1.2 883 */ 884 public static final UnicodeBlock GEORGIAN = 885 new UnicodeBlock("GEORGIAN"); 886 887 /** 888 * Constant for the "Hangul Jamo" Unicode character block. 889 * @since 1.2 890 */ 891 public static final UnicodeBlock HANGUL_JAMO = 892 new UnicodeBlock("HANGUL_JAMO", 893 "HANGUL JAMO", 894 "HANGULJAMO"); 895 896 /** 897 * Constant for the "Latin Extended Additional" Unicode character block. 898 * @since 1.2 899 */ 900 public static final UnicodeBlock LATIN_EXTENDED_ADDITIONAL = 901 new UnicodeBlock("LATIN_EXTENDED_ADDITIONAL", 902 "LATIN EXTENDED ADDITIONAL", 903 "LATINEXTENDEDADDITIONAL"); 904 905 /** 906 * Constant for the "Greek Extended" Unicode character block. 907 * @since 1.2 908 */ 909 public static final UnicodeBlock GREEK_EXTENDED = 910 new UnicodeBlock("GREEK_EXTENDED", 911 "GREEK EXTENDED", 912 "GREEKEXTENDED"); 913 914 /** 915 * Constant for the "General Punctuation" Unicode character block. 916 * @since 1.2 917 */ 918 public static final UnicodeBlock GENERAL_PUNCTUATION = 919 new UnicodeBlock("GENERAL_PUNCTUATION", 920 "GENERAL PUNCTUATION", 921 "GENERALPUNCTUATION"); 922 923 /** 924 * Constant for the "Superscripts and Subscripts" Unicode character 925 * block. 926 * @since 1.2 927 */ 928 public static final UnicodeBlock SUPERSCRIPTS_AND_SUBSCRIPTS = 929 new UnicodeBlock("SUPERSCRIPTS_AND_SUBSCRIPTS", 930 "SUPERSCRIPTS AND SUBSCRIPTS", 931 "SUPERSCRIPTSANDSUBSCRIPTS"); 932 933 /** 934 * Constant for the "Currency Symbols" Unicode character block. 935 * @since 1.2 936 */ 937 public static final UnicodeBlock CURRENCY_SYMBOLS = 938 new UnicodeBlock("CURRENCY_SYMBOLS", 939 "CURRENCY SYMBOLS", 940 "CURRENCYSYMBOLS"); 941 942 /** 943 * Constant for the "Combining Diacritical Marks for Symbols" Unicode 944 * character block. 945 * <p> 946 * This block was previously known as "Combining Marks for Symbols". 947 * @since 1.2 948 */ 949 public static final UnicodeBlock COMBINING_MARKS_FOR_SYMBOLS = 950 new UnicodeBlock("COMBINING_MARKS_FOR_SYMBOLS", 951 "COMBINING DIACRITICAL MARKS FOR SYMBOLS", 952 "COMBININGDIACRITICALMARKSFORSYMBOLS", 953 "COMBINING MARKS FOR SYMBOLS", 954 "COMBININGMARKSFORSYMBOLS"); 955 956 /** 957 * Constant for the "Letterlike Symbols" Unicode character block. 958 * @since 1.2 959 */ 960 public static final UnicodeBlock LETTERLIKE_SYMBOLS = 961 new UnicodeBlock("LETTERLIKE_SYMBOLS", 962 "LETTERLIKE SYMBOLS", 963 "LETTERLIKESYMBOLS"); 964 965 /** 966 * Constant for the "Number Forms" Unicode character block. 967 * @since 1.2 968 */ 969 public static final UnicodeBlock NUMBER_FORMS = 970 new UnicodeBlock("NUMBER_FORMS", 971 "NUMBER FORMS", 972 "NUMBERFORMS"); 973 974 /** 975 * Constant for the "Arrows" Unicode character block. 976 * @since 1.2 977 */ 978 public static final UnicodeBlock ARROWS = 979 new UnicodeBlock("ARROWS"); 980 981 /** 982 * Constant for the "Mathematical Operators" Unicode character block. 983 * @since 1.2 984 */ 985 public static final UnicodeBlock MATHEMATICAL_OPERATORS = 986 new UnicodeBlock("MATHEMATICAL_OPERATORS", 987 "MATHEMATICAL OPERATORS", 988 "MATHEMATICALOPERATORS"); 989 990 /** 991 * Constant for the "Miscellaneous Technical" Unicode character block. 992 * @since 1.2 993 */ 994 public static final UnicodeBlock MISCELLANEOUS_TECHNICAL = 995 new UnicodeBlock("MISCELLANEOUS_TECHNICAL", 996 "MISCELLANEOUS TECHNICAL", 997 "MISCELLANEOUSTECHNICAL"); 998 999 /** 1000 * Constant for the "Control Pictures" Unicode character block. 1001 * @since 1.2 1002 */ 1003 public static final UnicodeBlock CONTROL_PICTURES = 1004 new UnicodeBlock("CONTROL_PICTURES", 1005 "CONTROL PICTURES", 1006 "CONTROLPICTURES"); 1007 1008 /** 1009 * Constant for the "Optical Character Recognition" Unicode character block. 1010 * @since 1.2 1011 */ 1012 public static final UnicodeBlock OPTICAL_CHARACTER_RECOGNITION = 1013 new UnicodeBlock("OPTICAL_CHARACTER_RECOGNITION", 1014 "OPTICAL CHARACTER RECOGNITION", 1015 "OPTICALCHARACTERRECOGNITION"); 1016 1017 /** 1018 * Constant for the "Enclosed Alphanumerics" Unicode character block. 1019 * @since 1.2 1020 */ 1021 public static final UnicodeBlock ENCLOSED_ALPHANUMERICS = 1022 new UnicodeBlock("ENCLOSED_ALPHANUMERICS", 1023 "ENCLOSED ALPHANUMERICS", 1024 "ENCLOSEDALPHANUMERICS"); 1025 1026 /** 1027 * Constant for the "Box Drawing" Unicode character block. 1028 * @since 1.2 1029 */ 1030 public static final UnicodeBlock BOX_DRAWING = 1031 new UnicodeBlock("BOX_DRAWING", 1032 "BOX DRAWING", 1033 "BOXDRAWING"); 1034 1035 /** 1036 * Constant for the "Block Elements" Unicode character block. 1037 * @since 1.2 1038 */ 1039 public static final UnicodeBlock BLOCK_ELEMENTS = 1040 new UnicodeBlock("BLOCK_ELEMENTS", 1041 "BLOCK ELEMENTS", 1042 "BLOCKELEMENTS"); 1043 1044 /** 1045 * Constant for the "Geometric Shapes" Unicode character block. 1046 * @since 1.2 1047 */ 1048 public static final UnicodeBlock GEOMETRIC_SHAPES = 1049 new UnicodeBlock("GEOMETRIC_SHAPES", 1050 "GEOMETRIC SHAPES", 1051 "GEOMETRICSHAPES"); 1052 1053 /** 1054 * Constant for the "Miscellaneous Symbols" Unicode character block. 1055 * @since 1.2 1056 */ 1057 public static final UnicodeBlock MISCELLANEOUS_SYMBOLS = 1058 new UnicodeBlock("MISCELLANEOUS_SYMBOLS", 1059 "MISCELLANEOUS SYMBOLS", 1060 "MISCELLANEOUSSYMBOLS"); 1061 1062 /** 1063 * Constant for the "Dingbats" Unicode character block. 1064 * @since 1.2 1065 */ 1066 public static final UnicodeBlock DINGBATS = 1067 new UnicodeBlock("DINGBATS"); 1068 1069 /** 1070 * Constant for the "CJK Symbols and Punctuation" Unicode character block. 1071 * @since 1.2 1072 */ 1073 public static final UnicodeBlock CJK_SYMBOLS_AND_PUNCTUATION = 1074 new UnicodeBlock("CJK_SYMBOLS_AND_PUNCTUATION", 1075 "CJK SYMBOLS AND PUNCTUATION", 1076 "CJKSYMBOLSANDPUNCTUATION"); 1077 1078 /** 1079 * Constant for the "Hiragana" Unicode character block. 1080 * @since 1.2 1081 */ 1082 public static final UnicodeBlock HIRAGANA = 1083 new UnicodeBlock("HIRAGANA"); 1084 1085 /** 1086 * Constant for the "Katakana" Unicode character block. 1087 * @since 1.2 1088 */ 1089 public static final UnicodeBlock KATAKANA = 1090 new UnicodeBlock("KATAKANA"); 1091 1092 /** 1093 * Constant for the "Bopomofo" Unicode character block. 1094 * @since 1.2 1095 */ 1096 public static final UnicodeBlock BOPOMOFO = 1097 new UnicodeBlock("BOPOMOFO"); 1098 1099 /** 1100 * Constant for the "Hangul Compatibility Jamo" Unicode character block. 1101 * @since 1.2 1102 */ 1103 public static final UnicodeBlock HANGUL_COMPATIBILITY_JAMO = 1104 new UnicodeBlock("HANGUL_COMPATIBILITY_JAMO", 1105 "HANGUL COMPATIBILITY JAMO", 1106 "HANGULCOMPATIBILITYJAMO"); 1107 1108 /** 1109 * Constant for the "Kanbun" Unicode character block. 1110 * @since 1.2 1111 */ 1112 public static final UnicodeBlock KANBUN = 1113 new UnicodeBlock("KANBUN"); 1114 1115 /** 1116 * Constant for the "Enclosed CJK Letters and Months" Unicode character block. 1117 * @since 1.2 1118 */ 1119 public static final UnicodeBlock ENCLOSED_CJK_LETTERS_AND_MONTHS = 1120 new UnicodeBlock("ENCLOSED_CJK_LETTERS_AND_MONTHS", 1121 "ENCLOSED CJK LETTERS AND MONTHS", 1122 "ENCLOSEDCJKLETTERSANDMONTHS"); 1123 1124 /** 1125 * Constant for the "CJK Compatibility" Unicode character block. 1126 * @since 1.2 1127 */ 1128 public static final UnicodeBlock CJK_COMPATIBILITY = 1129 new UnicodeBlock("CJK_COMPATIBILITY", 1130 "CJK COMPATIBILITY", 1131 "CJKCOMPATIBILITY"); 1132 1133 /** 1134 * Constant for the "CJK Unified Ideographs" Unicode character block. 1135 * @since 1.2 1136 */ 1137 public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS = 1138 new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS", 1139 "CJK UNIFIED IDEOGRAPHS", 1140 "CJKUNIFIEDIDEOGRAPHS"); 1141 1142 /** 1143 * Constant for the "Hangul Syllables" Unicode character block. 1144 * @since 1.2 1145 */ 1146 public static final UnicodeBlock HANGUL_SYLLABLES = 1147 new UnicodeBlock("HANGUL_SYLLABLES", 1148 "HANGUL SYLLABLES", 1149 "HANGULSYLLABLES"); 1150 1151 /** 1152 * Constant for the "Private Use Area" Unicode character block. 1153 * @since 1.2 1154 */ 1155 public static final UnicodeBlock PRIVATE_USE_AREA = 1156 new UnicodeBlock("PRIVATE_USE_AREA", 1157 "PRIVATE USE AREA", 1158 "PRIVATEUSEAREA"); 1159 1160 /** 1161 * Constant for the "CJK Compatibility Ideographs" Unicode character 1162 * block. 1163 * @since 1.2 1164 */ 1165 public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS = 1166 new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS", 1167 "CJK COMPATIBILITY IDEOGRAPHS", 1168 "CJKCOMPATIBILITYIDEOGRAPHS"); 1169 1170 /** 1171 * Constant for the "Alphabetic Presentation Forms" Unicode character block. 1172 * @since 1.2 1173 */ 1174 public static final UnicodeBlock ALPHABETIC_PRESENTATION_FORMS = 1175 new UnicodeBlock("ALPHABETIC_PRESENTATION_FORMS", 1176 "ALPHABETIC PRESENTATION FORMS", 1177 "ALPHABETICPRESENTATIONFORMS"); 1178 1179 /** 1180 * Constant for the "Arabic Presentation Forms-A" Unicode character 1181 * block. 1182 * @since 1.2 1183 */ 1184 public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_A = 1185 new UnicodeBlock("ARABIC_PRESENTATION_FORMS_A", 1186 "ARABIC PRESENTATION FORMS-A", 1187 "ARABICPRESENTATIONFORMS-A"); 1188 1189 /** 1190 * Constant for the "Combining Half Marks" Unicode character block. 1191 * @since 1.2 1192 */ 1193 public static final UnicodeBlock COMBINING_HALF_MARKS = 1194 new UnicodeBlock("COMBINING_HALF_MARKS", 1195 "COMBINING HALF MARKS", 1196 "COMBININGHALFMARKS"); 1197 1198 /** 1199 * Constant for the "CJK Compatibility Forms" Unicode character block. 1200 * @since 1.2 1201 */ 1202 public static final UnicodeBlock CJK_COMPATIBILITY_FORMS = 1203 new UnicodeBlock("CJK_COMPATIBILITY_FORMS", 1204 "CJK COMPATIBILITY FORMS", 1205 "CJKCOMPATIBILITYFORMS"); 1206 1207 /** 1208 * Constant for the "Small Form Variants" Unicode character block. 1209 * @since 1.2 1210 */ 1211 public static final UnicodeBlock SMALL_FORM_VARIANTS = 1212 new UnicodeBlock("SMALL_FORM_VARIANTS", 1213 "SMALL FORM VARIANTS", 1214 "SMALLFORMVARIANTS"); 1215 1216 /** 1217 * Constant for the "Arabic Presentation Forms-B" Unicode character block. 1218 * @since 1.2 1219 */ 1220 public static final UnicodeBlock ARABIC_PRESENTATION_FORMS_B = 1221 new UnicodeBlock("ARABIC_PRESENTATION_FORMS_B", 1222 "ARABIC PRESENTATION FORMS-B", 1223 "ARABICPRESENTATIONFORMS-B"); 1224 1225 /** 1226 * Constant for the "Halfwidth and Fullwidth Forms" Unicode character 1227 * block. 1228 * @since 1.2 1229 */ 1230 public static final UnicodeBlock HALFWIDTH_AND_FULLWIDTH_FORMS = 1231 new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS", 1232 "HALFWIDTH AND FULLWIDTH FORMS", 1233 "HALFWIDTHANDFULLWIDTHFORMS"); 1234 1235 /** 1236 * Constant for the "Specials" Unicode character block. 1237 * @since 1.2 1238 */ 1239 public static final UnicodeBlock SPECIALS = 1240 new UnicodeBlock("SPECIALS"); 1241 1242 /** 1243 * @deprecated As of J2SE 5, use {@link #HIGH_SURROGATES}, 1244 * {@link #HIGH_PRIVATE_USE_SURROGATES}, and 1245 * {@link #LOW_SURROGATES}. These new constants match 1246 * the block definitions of the Unicode Standard. 1247 * The {@link #of(char)} and {@link #of(int)} methods 1248 * return the new constants, not SURROGATES_AREA. 1249 */ 1250 @Deprecated 1251 public static final UnicodeBlock SURROGATES_AREA = 1252 new UnicodeBlock("SURROGATES_AREA"); 1253 1254 /** 1255 * Constant for the "Syriac" Unicode character block. 1256 * @since 1.4 1257 */ 1258 public static final UnicodeBlock SYRIAC = 1259 new UnicodeBlock("SYRIAC"); 1260 1261 /** 1262 * Constant for the "Thaana" Unicode character block. 1263 * @since 1.4 1264 */ 1265 public static final UnicodeBlock THAANA = 1266 new UnicodeBlock("THAANA"); 1267 1268 /** 1269 * Constant for the "Sinhala" Unicode character block. 1270 * @since 1.4 1271 */ 1272 public static final UnicodeBlock SINHALA = 1273 new UnicodeBlock("SINHALA"); 1274 1275 /** 1276 * Constant for the "Myanmar" Unicode character block. 1277 * @since 1.4 1278 */ 1279 public static final UnicodeBlock MYANMAR = 1280 new UnicodeBlock("MYANMAR"); 1281 1282 /** 1283 * Constant for the "Ethiopic" Unicode character block. 1284 * @since 1.4 1285 */ 1286 public static final UnicodeBlock ETHIOPIC = 1287 new UnicodeBlock("ETHIOPIC"); 1288 1289 /** 1290 * Constant for the "Cherokee" Unicode character block. 1291 * @since 1.4 1292 */ 1293 public static final UnicodeBlock CHEROKEE = 1294 new UnicodeBlock("CHEROKEE"); 1295 1296 /** 1297 * Constant for the "Unified Canadian Aboriginal Syllabics" Unicode character block. 1298 * @since 1.4 1299 */ 1300 public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS = 1301 new UnicodeBlock("UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS", 1302 "UNIFIED CANADIAN ABORIGINAL SYLLABICS", 1303 "UNIFIEDCANADIANABORIGINALSYLLABICS"); 1304 1305 /** 1306 * Constant for the "Ogham" Unicode character block. 1307 * @since 1.4 1308 */ 1309 public static final UnicodeBlock OGHAM = 1310 new UnicodeBlock("OGHAM"); 1311 1312 /** 1313 * Constant for the "Runic" Unicode character block. 1314 * @since 1.4 1315 */ 1316 public static final UnicodeBlock RUNIC = 1317 new UnicodeBlock("RUNIC"); 1318 1319 /** 1320 * Constant for the "Khmer" Unicode character block. 1321 * @since 1.4 1322 */ 1323 public static final UnicodeBlock KHMER = 1324 new UnicodeBlock("KHMER"); 1325 1326 /** 1327 * Constant for the "Mongolian" Unicode character block. 1328 * @since 1.4 1329 */ 1330 public static final UnicodeBlock MONGOLIAN = 1331 new UnicodeBlock("MONGOLIAN"); 1332 1333 /** 1334 * Constant for the "Braille Patterns" Unicode character block. 1335 * @since 1.4 1336 */ 1337 public static final UnicodeBlock BRAILLE_PATTERNS = 1338 new UnicodeBlock("BRAILLE_PATTERNS", 1339 "BRAILLE PATTERNS", 1340 "BRAILLEPATTERNS"); 1341 1342 /** 1343 * Constant for the "CJK Radicals Supplement" Unicode character block. 1344 * @since 1.4 1345 */ 1346 public static final UnicodeBlock CJK_RADICALS_SUPPLEMENT = 1347 new UnicodeBlock("CJK_RADICALS_SUPPLEMENT", 1348 "CJK RADICALS SUPPLEMENT", 1349 "CJKRADICALSSUPPLEMENT"); 1350 1351 /** 1352 * Constant for the "Kangxi Radicals" Unicode character block. 1353 * @since 1.4 1354 */ 1355 public static final UnicodeBlock KANGXI_RADICALS = 1356 new UnicodeBlock("KANGXI_RADICALS", 1357 "KANGXI RADICALS", 1358 "KANGXIRADICALS"); 1359 1360 /** 1361 * Constant for the "Ideographic Description Characters" Unicode character block. 1362 * @since 1.4 1363 */ 1364 public static final UnicodeBlock IDEOGRAPHIC_DESCRIPTION_CHARACTERS = 1365 new UnicodeBlock("IDEOGRAPHIC_DESCRIPTION_CHARACTERS", 1366 "IDEOGRAPHIC DESCRIPTION CHARACTERS", 1367 "IDEOGRAPHICDESCRIPTIONCHARACTERS"); 1368 1369 /** 1370 * Constant for the "Bopomofo Extended" Unicode character block. 1371 * @since 1.4 1372 */ 1373 public static final UnicodeBlock BOPOMOFO_EXTENDED = 1374 new UnicodeBlock("BOPOMOFO_EXTENDED", 1375 "BOPOMOFO EXTENDED", 1376 "BOPOMOFOEXTENDED"); 1377 1378 /** 1379 * Constant for the "CJK Unified Ideographs Extension A" Unicode character block. 1380 * @since 1.4 1381 */ 1382 public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A = 1383 new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A", 1384 "CJK UNIFIED IDEOGRAPHS EXTENSION A", 1385 "CJKUNIFIEDIDEOGRAPHSEXTENSIONA"); 1386 1387 /** 1388 * Constant for the "Yi Syllables" Unicode character block. 1389 * @since 1.4 1390 */ 1391 public static final UnicodeBlock YI_SYLLABLES = 1392 new UnicodeBlock("YI_SYLLABLES", 1393 "YI SYLLABLES", 1394 "YISYLLABLES"); 1395 1396 /** 1397 * Constant for the "Yi Radicals" Unicode character block. 1398 * @since 1.4 1399 */ 1400 public static final UnicodeBlock YI_RADICALS = 1401 new UnicodeBlock("YI_RADICALS", 1402 "YI RADICALS", 1403 "YIRADICALS"); 1404 1405 /** 1406 * Constant for the "Cyrillic Supplementary" Unicode character block. 1407 * @since 1.5 1408 */ 1409 public static final UnicodeBlock CYRILLIC_SUPPLEMENTARY = 1410 new UnicodeBlock("CYRILLIC_SUPPLEMENTARY", 1411 "CYRILLIC SUPPLEMENTARY", 1412 "CYRILLICSUPPLEMENTARY", 1413 "CYRILLIC SUPPLEMENT", 1414 "CYRILLICSUPPLEMENT"); 1415 1416 /** 1417 * Constant for the "Tagalog" Unicode character block. 1418 * @since 1.5 1419 */ 1420 public static final UnicodeBlock TAGALOG = 1421 new UnicodeBlock("TAGALOG"); 1422 1423 /** 1424 * Constant for the "Hanunoo" Unicode character block. 1425 * @since 1.5 1426 */ 1427 public static final UnicodeBlock HANUNOO = 1428 new UnicodeBlock("HANUNOO"); 1429 1430 /** 1431 * Constant for the "Buhid" Unicode character block. 1432 * @since 1.5 1433 */ 1434 public static final UnicodeBlock BUHID = 1435 new UnicodeBlock("BUHID"); 1436 1437 /** 1438 * Constant for the "Tagbanwa" Unicode character block. 1439 * @since 1.5 1440 */ 1441 public static final UnicodeBlock TAGBANWA = 1442 new UnicodeBlock("TAGBANWA"); 1443 1444 /** 1445 * Constant for the "Limbu" Unicode character block. 1446 * @since 1.5 1447 */ 1448 public static final UnicodeBlock LIMBU = 1449 new UnicodeBlock("LIMBU"); 1450 1451 /** 1452 * Constant for the "Tai Le" Unicode character block. 1453 * @since 1.5 1454 */ 1455 public static final UnicodeBlock TAI_LE = 1456 new UnicodeBlock("TAI_LE", 1457 "TAI LE", 1458 "TAILE"); 1459 1460 /** 1461 * Constant for the "Khmer Symbols" Unicode character block. 1462 * @since 1.5 1463 */ 1464 public static final UnicodeBlock KHMER_SYMBOLS = 1465 new UnicodeBlock("KHMER_SYMBOLS", 1466 "KHMER SYMBOLS", 1467 "KHMERSYMBOLS"); 1468 1469 /** 1470 * Constant for the "Phonetic Extensions" Unicode character block. 1471 * @since 1.5 1472 */ 1473 public static final UnicodeBlock PHONETIC_EXTENSIONS = 1474 new UnicodeBlock("PHONETIC_EXTENSIONS", 1475 "PHONETIC EXTENSIONS", 1476 "PHONETICEXTENSIONS"); 1477 1478 /** 1479 * Constant for the "Miscellaneous Mathematical Symbols-A" Unicode character block. 1480 * @since 1.5 1481 */ 1482 public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A = 1483 new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A", 1484 "MISCELLANEOUS MATHEMATICAL SYMBOLS-A", 1485 "MISCELLANEOUSMATHEMATICALSYMBOLS-A"); 1486 1487 /** 1488 * Constant for the "Supplemental Arrows-A" Unicode character block. 1489 * @since 1.5 1490 */ 1491 public static final UnicodeBlock SUPPLEMENTAL_ARROWS_A = 1492 new UnicodeBlock("SUPPLEMENTAL_ARROWS_A", 1493 "SUPPLEMENTAL ARROWS-A", 1494 "SUPPLEMENTALARROWS-A"); 1495 1496 /** 1497 * Constant for the "Supplemental Arrows-B" Unicode character block. 1498 * @since 1.5 1499 */ 1500 public static final UnicodeBlock SUPPLEMENTAL_ARROWS_B = 1501 new UnicodeBlock("SUPPLEMENTAL_ARROWS_B", 1502 "SUPPLEMENTAL ARROWS-B", 1503 "SUPPLEMENTALARROWS-B"); 1504 1505 /** 1506 * Constant for the "Miscellaneous Mathematical Symbols-B" Unicode 1507 * character block. 1508 * @since 1.5 1509 */ 1510 public static final UnicodeBlock MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B = 1511 new UnicodeBlock("MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B", 1512 "MISCELLANEOUS MATHEMATICAL SYMBOLS-B", 1513 "MISCELLANEOUSMATHEMATICALSYMBOLS-B"); 1514 1515 /** 1516 * Constant for the "Supplemental Mathematical Operators" Unicode 1517 * character block. 1518 * @since 1.5 1519 */ 1520 public static final UnicodeBlock SUPPLEMENTAL_MATHEMATICAL_OPERATORS = 1521 new UnicodeBlock("SUPPLEMENTAL_MATHEMATICAL_OPERATORS", 1522 "SUPPLEMENTAL MATHEMATICAL OPERATORS", 1523 "SUPPLEMENTALMATHEMATICALOPERATORS"); 1524 1525 /** 1526 * Constant for the "Miscellaneous Symbols and Arrows" Unicode character 1527 * block. 1528 * @since 1.5 1529 */ 1530 public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_ARROWS = 1531 new UnicodeBlock("MISCELLANEOUS_SYMBOLS_AND_ARROWS", 1532 "MISCELLANEOUS SYMBOLS AND ARROWS", 1533 "MISCELLANEOUSSYMBOLSANDARROWS"); 1534 1535 /** 1536 * Constant for the "Katakana Phonetic Extensions" Unicode character 1537 * block. 1538 * @since 1.5 1539 */ 1540 public static final UnicodeBlock KATAKANA_PHONETIC_EXTENSIONS = 1541 new UnicodeBlock("KATAKANA_PHONETIC_EXTENSIONS", 1542 "KATAKANA PHONETIC EXTENSIONS", 1543 "KATAKANAPHONETICEXTENSIONS"); 1544 1545 /** 1546 * Constant for the "Yijing Hexagram Symbols" Unicode character block. 1547 * @since 1.5 1548 */ 1549 public static final UnicodeBlock YIJING_HEXAGRAM_SYMBOLS = 1550 new UnicodeBlock("YIJING_HEXAGRAM_SYMBOLS", 1551 "YIJING HEXAGRAM SYMBOLS", 1552 "YIJINGHEXAGRAMSYMBOLS"); 1553 1554 /** 1555 * Constant for the "Variation Selectors" Unicode character block. 1556 * @since 1.5 1557 */ 1558 public static final UnicodeBlock VARIATION_SELECTORS = 1559 new UnicodeBlock("VARIATION_SELECTORS", 1560 "VARIATION SELECTORS", 1561 "VARIATIONSELECTORS"); 1562 1563 /** 1564 * Constant for the "Linear B Syllabary" Unicode character block. 1565 * @since 1.5 1566 */ 1567 public static final UnicodeBlock LINEAR_B_SYLLABARY = 1568 new UnicodeBlock("LINEAR_B_SYLLABARY", 1569 "LINEAR B SYLLABARY", 1570 "LINEARBSYLLABARY"); 1571 1572 /** 1573 * Constant for the "Linear B Ideograms" Unicode character block. 1574 * @since 1.5 1575 */ 1576 public static final UnicodeBlock LINEAR_B_IDEOGRAMS = 1577 new UnicodeBlock("LINEAR_B_IDEOGRAMS", 1578 "LINEAR B IDEOGRAMS", 1579 "LINEARBIDEOGRAMS"); 1580 1581 /** 1582 * Constant for the "Aegean Numbers" Unicode character block. 1583 * @since 1.5 1584 */ 1585 public static final UnicodeBlock AEGEAN_NUMBERS = 1586 new UnicodeBlock("AEGEAN_NUMBERS", 1587 "AEGEAN NUMBERS", 1588 "AEGEANNUMBERS"); 1589 1590 /** 1591 * Constant for the "Old Italic" Unicode character block. 1592 * @since 1.5 1593 */ 1594 public static final UnicodeBlock OLD_ITALIC = 1595 new UnicodeBlock("OLD_ITALIC", 1596 "OLD ITALIC", 1597 "OLDITALIC"); 1598 1599 /** 1600 * Constant for the "Gothic" Unicode character block. 1601 * @since 1.5 1602 */ 1603 public static final UnicodeBlock GOTHIC = 1604 new UnicodeBlock("GOTHIC"); 1605 1606 /** 1607 * Constant for the "Ugaritic" Unicode character block. 1608 * @since 1.5 1609 */ 1610 public static final UnicodeBlock UGARITIC = 1611 new UnicodeBlock("UGARITIC"); 1612 1613 /** 1614 * Constant for the "Deseret" Unicode character block. 1615 * @since 1.5 1616 */ 1617 public static final UnicodeBlock DESERET = 1618 new UnicodeBlock("DESERET"); 1619 1620 /** 1621 * Constant for the "Shavian" Unicode character block. 1622 * @since 1.5 1623 */ 1624 public static final UnicodeBlock SHAVIAN = 1625 new UnicodeBlock("SHAVIAN"); 1626 1627 /** 1628 * Constant for the "Osmanya" Unicode character block. 1629 * @since 1.5 1630 */ 1631 public static final UnicodeBlock OSMANYA = 1632 new UnicodeBlock("OSMANYA"); 1633 1634 /** 1635 * Constant for the "Cypriot Syllabary" Unicode character block. 1636 * @since 1.5 1637 */ 1638 public static final UnicodeBlock CYPRIOT_SYLLABARY = 1639 new UnicodeBlock("CYPRIOT_SYLLABARY", 1640 "CYPRIOT SYLLABARY", 1641 "CYPRIOTSYLLABARY"); 1642 1643 /** 1644 * Constant for the "Byzantine Musical Symbols" Unicode character block. 1645 * @since 1.5 1646 */ 1647 public static final UnicodeBlock BYZANTINE_MUSICAL_SYMBOLS = 1648 new UnicodeBlock("BYZANTINE_MUSICAL_SYMBOLS", 1649 "BYZANTINE MUSICAL SYMBOLS", 1650 "BYZANTINEMUSICALSYMBOLS"); 1651 1652 /** 1653 * Constant for the "Musical Symbols" Unicode character block. 1654 * @since 1.5 1655 */ 1656 public static final UnicodeBlock MUSICAL_SYMBOLS = 1657 new UnicodeBlock("MUSICAL_SYMBOLS", 1658 "MUSICAL SYMBOLS", 1659 "MUSICALSYMBOLS"); 1660 1661 /** 1662 * Constant for the "Tai Xuan Jing Symbols" Unicode character block. 1663 * @since 1.5 1664 */ 1665 public static final UnicodeBlock TAI_XUAN_JING_SYMBOLS = 1666 new UnicodeBlock("TAI_XUAN_JING_SYMBOLS", 1667 "TAI XUAN JING SYMBOLS", 1668 "TAIXUANJINGSYMBOLS"); 1669 1670 /** 1671 * Constant for the "Mathematical Alphanumeric Symbols" Unicode 1672 * character block. 1673 * @since 1.5 1674 */ 1675 public static final UnicodeBlock MATHEMATICAL_ALPHANUMERIC_SYMBOLS = 1676 new UnicodeBlock("MATHEMATICAL_ALPHANUMERIC_SYMBOLS", 1677 "MATHEMATICAL ALPHANUMERIC SYMBOLS", 1678 "MATHEMATICALALPHANUMERICSYMBOLS"); 1679 1680 /** 1681 * Constant for the "CJK Unified Ideographs Extension B" Unicode 1682 * character block. 1683 * @since 1.5 1684 */ 1685 public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B = 1686 new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B", 1687 "CJK UNIFIED IDEOGRAPHS EXTENSION B", 1688 "CJKUNIFIEDIDEOGRAPHSEXTENSIONB"); 1689 1690 /** 1691 * Constant for the "CJK Compatibility Ideographs Supplement" Unicode character block. 1692 * @since 1.5 1693 */ 1694 public static final UnicodeBlock CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT = 1695 new UnicodeBlock("CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT", 1696 "CJK COMPATIBILITY IDEOGRAPHS SUPPLEMENT", 1697 "CJKCOMPATIBILITYIDEOGRAPHSSUPPLEMENT"); 1698 1699 /** 1700 * Constant for the "Tags" Unicode character block. 1701 * @since 1.5 1702 */ 1703 public static final UnicodeBlock TAGS = 1704 new UnicodeBlock("TAGS"); 1705 1706 /** 1707 * Constant for the "Variation Selectors Supplement" Unicode character 1708 * block. 1709 * @since 1.5 1710 */ 1711 public static final UnicodeBlock VARIATION_SELECTORS_SUPPLEMENT = 1712 new UnicodeBlock("VARIATION_SELECTORS_SUPPLEMENT", 1713 "VARIATION SELECTORS SUPPLEMENT", 1714 "VARIATIONSELECTORSSUPPLEMENT"); 1715 1716 /** 1717 * Constant for the "Supplementary Private Use Area-A" Unicode character 1718 * block. 1719 * @since 1.5 1720 */ 1721 public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_A = 1722 new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_A", 1723 "SUPPLEMENTARY PRIVATE USE AREA-A", 1724 "SUPPLEMENTARYPRIVATEUSEAREA-A"); 1725 1726 /** 1727 * Constant for the "Supplementary Private Use Area-B" Unicode character 1728 * block. 1729 * @since 1.5 1730 */ 1731 public static final UnicodeBlock SUPPLEMENTARY_PRIVATE_USE_AREA_B = 1732 new UnicodeBlock("SUPPLEMENTARY_PRIVATE_USE_AREA_B", 1733 "SUPPLEMENTARY PRIVATE USE AREA-B", 1734 "SUPPLEMENTARYPRIVATEUSEAREA-B"); 1735 1736 /** 1737 * Constant for the "High Surrogates" Unicode character block. 1738 * This block represents codepoint values in the high surrogate 1739 * range: U+D800 through U+DB7F 1740 * 1741 * @since 1.5 1742 */ 1743 public static final UnicodeBlock HIGH_SURROGATES = 1744 new UnicodeBlock("HIGH_SURROGATES", 1745 "HIGH SURROGATES", 1746 "HIGHSURROGATES"); 1747 1748 /** 1749 * Constant for the "High Private Use Surrogates" Unicode character 1750 * block. 1751 * This block represents codepoint values in the private use high 1752 * surrogate range: U+DB80 through U+DBFF 1753 * 1754 * @since 1.5 1755 */ 1756 public static final UnicodeBlock HIGH_PRIVATE_USE_SURROGATES = 1757 new UnicodeBlock("HIGH_PRIVATE_USE_SURROGATES", 1758 "HIGH PRIVATE USE SURROGATES", 1759 "HIGHPRIVATEUSESURROGATES"); 1760 1761 /** 1762 * Constant for the "Low Surrogates" Unicode character block. 1763 * This block represents codepoint values in the low surrogate 1764 * range: U+DC00 through U+DFFF 1765 * 1766 * @since 1.5 1767 */ 1768 public static final UnicodeBlock LOW_SURROGATES = 1769 new UnicodeBlock("LOW_SURROGATES", 1770 "LOW SURROGATES", 1771 "LOWSURROGATES"); 1772 1773 /** 1774 * Constant for the "Arabic Supplement" Unicode character block. 1775 * @since 1.7 1776 */ 1777 public static final UnicodeBlock ARABIC_SUPPLEMENT = 1778 new UnicodeBlock("ARABIC_SUPPLEMENT", 1779 "ARABIC SUPPLEMENT", 1780 "ARABICSUPPLEMENT"); 1781 1782 /** 1783 * Constant for the "NKo" Unicode character block. 1784 * @since 1.7 1785 */ 1786 public static final UnicodeBlock NKO = 1787 new UnicodeBlock("NKO"); 1788 1789 /** 1790 * Constant for the "Samaritan" Unicode character block. 1791 * @since 1.7 1792 */ 1793 public static final UnicodeBlock SAMARITAN = 1794 new UnicodeBlock("SAMARITAN"); 1795 1796 /** 1797 * Constant for the "Mandaic" Unicode character block. 1798 * @since 1.7 1799 */ 1800 public static final UnicodeBlock MANDAIC = 1801 new UnicodeBlock("MANDAIC"); 1802 1803 /** 1804 * Constant for the "Ethiopic Supplement" Unicode character block. 1805 * @since 1.7 1806 */ 1807 public static final UnicodeBlock ETHIOPIC_SUPPLEMENT = 1808 new UnicodeBlock("ETHIOPIC_SUPPLEMENT", 1809 "ETHIOPIC SUPPLEMENT", 1810 "ETHIOPICSUPPLEMENT"); 1811 1812 /** 1813 * Constant for the "Unified Canadian Aboriginal Syllabics Extended" 1814 * Unicode character block. 1815 * @since 1.7 1816 */ 1817 public static final UnicodeBlock UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED = 1818 new UnicodeBlock("UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED", 1819 "UNIFIED CANADIAN ABORIGINAL SYLLABICS EXTENDED", 1820 "UNIFIEDCANADIANABORIGINALSYLLABICSEXTENDED"); 1821 1822 /** 1823 * Constant for the "New Tai Lue" Unicode character block. 1824 * @since 1.7 1825 */ 1826 public static final UnicodeBlock NEW_TAI_LUE = 1827 new UnicodeBlock("NEW_TAI_LUE", 1828 "NEW TAI LUE", 1829 "NEWTAILUE"); 1830 1831 /** 1832 * Constant for the "Buginese" Unicode character block. 1833 * @since 1.7 1834 */ 1835 public static final UnicodeBlock BUGINESE = 1836 new UnicodeBlock("BUGINESE"); 1837 1838 /** 1839 * Constant for the "Tai Tham" Unicode character block. 1840 * @since 1.7 1841 */ 1842 public static final UnicodeBlock TAI_THAM = 1843 new UnicodeBlock("TAI_THAM", 1844 "TAI THAM", 1845 "TAITHAM"); 1846 1847 /** 1848 * Constant for the "Balinese" Unicode character block. 1849 * @since 1.7 1850 */ 1851 public static final UnicodeBlock BALINESE = 1852 new UnicodeBlock("BALINESE"); 1853 1854 /** 1855 * Constant for the "Sundanese" Unicode character block. 1856 * @since 1.7 1857 */ 1858 public static final UnicodeBlock SUNDANESE = 1859 new UnicodeBlock("SUNDANESE"); 1860 1861 /** 1862 * Constant for the "Batak" Unicode character block. 1863 * @since 1.7 1864 */ 1865 public static final UnicodeBlock BATAK = 1866 new UnicodeBlock("BATAK"); 1867 1868 /** 1869 * Constant for the "Lepcha" Unicode character block. 1870 * @since 1.7 1871 */ 1872 public static final UnicodeBlock LEPCHA = 1873 new UnicodeBlock("LEPCHA"); 1874 1875 /** 1876 * Constant for the "Ol Chiki" Unicode character block. 1877 * @since 1.7 1878 */ 1879 public static final UnicodeBlock OL_CHIKI = 1880 new UnicodeBlock("OL_CHIKI", 1881 "OL CHIKI", 1882 "OLCHIKI"); 1883 1884 /** 1885 * Constant for the "Vedic Extensions" Unicode character block. 1886 * @since 1.7 1887 */ 1888 public static final UnicodeBlock VEDIC_EXTENSIONS = 1889 new UnicodeBlock("VEDIC_EXTENSIONS", 1890 "VEDIC EXTENSIONS", 1891 "VEDICEXTENSIONS"); 1892 1893 /** 1894 * Constant for the "Phonetic Extensions Supplement" Unicode character 1895 * block. 1896 * @since 1.7 1897 */ 1898 public static final UnicodeBlock PHONETIC_EXTENSIONS_SUPPLEMENT = 1899 new UnicodeBlock("PHONETIC_EXTENSIONS_SUPPLEMENT", 1900 "PHONETIC EXTENSIONS SUPPLEMENT", 1901 "PHONETICEXTENSIONSSUPPLEMENT"); 1902 1903 /** 1904 * Constant for the "Combining Diacritical Marks Supplement" Unicode 1905 * character block. 1906 * @since 1.7 1907 */ 1908 public static final UnicodeBlock COMBINING_DIACRITICAL_MARKS_SUPPLEMENT = 1909 new UnicodeBlock("COMBINING_DIACRITICAL_MARKS_SUPPLEMENT", 1910 "COMBINING DIACRITICAL MARKS SUPPLEMENT", 1911 "COMBININGDIACRITICALMARKSSUPPLEMENT"); 1912 1913 /** 1914 * Constant for the "Glagolitic" Unicode character block. 1915 * @since 1.7 1916 */ 1917 public static final UnicodeBlock GLAGOLITIC = 1918 new UnicodeBlock("GLAGOLITIC"); 1919 1920 /** 1921 * Constant for the "Latin Extended-C" Unicode character block. 1922 * @since 1.7 1923 */ 1924 public static final UnicodeBlock LATIN_EXTENDED_C = 1925 new UnicodeBlock("LATIN_EXTENDED_C", 1926 "LATIN EXTENDED-C", 1927 "LATINEXTENDED-C"); 1928 1929 /** 1930 * Constant for the "Coptic" Unicode character block. 1931 * @since 1.7 1932 */ 1933 public static final UnicodeBlock COPTIC = 1934 new UnicodeBlock("COPTIC"); 1935 1936 /** 1937 * Constant for the "Georgian Supplement" Unicode character block. 1938 * @since 1.7 1939 */ 1940 public static final UnicodeBlock GEORGIAN_SUPPLEMENT = 1941 new UnicodeBlock("GEORGIAN_SUPPLEMENT", 1942 "GEORGIAN SUPPLEMENT", 1943 "GEORGIANSUPPLEMENT"); 1944 1945 /** 1946 * Constant for the "Tifinagh" Unicode character block. 1947 * @since 1.7 1948 */ 1949 public static final UnicodeBlock TIFINAGH = 1950 new UnicodeBlock("TIFINAGH"); 1951 1952 /** 1953 * Constant for the "Ethiopic Extended" Unicode character block. 1954 * @since 1.7 1955 */ 1956 public static final UnicodeBlock ETHIOPIC_EXTENDED = 1957 new UnicodeBlock("ETHIOPIC_EXTENDED", 1958 "ETHIOPIC EXTENDED", 1959 "ETHIOPICEXTENDED"); 1960 1961 /** 1962 * Constant for the "Cyrillic Extended-A" Unicode character block. 1963 * @since 1.7 1964 */ 1965 public static final UnicodeBlock CYRILLIC_EXTENDED_A = 1966 new UnicodeBlock("CYRILLIC_EXTENDED_A", 1967 "CYRILLIC EXTENDED-A", 1968 "CYRILLICEXTENDED-A"); 1969 1970 /** 1971 * Constant for the "Supplemental Punctuation" Unicode character block. 1972 * @since 1.7 1973 */ 1974 public static final UnicodeBlock SUPPLEMENTAL_PUNCTUATION = 1975 new UnicodeBlock("SUPPLEMENTAL_PUNCTUATION", 1976 "SUPPLEMENTAL PUNCTUATION", 1977 "SUPPLEMENTALPUNCTUATION"); 1978 1979 /** 1980 * Constant for the "CJK Strokes" Unicode character block. 1981 * @since 1.7 1982 */ 1983 public static final UnicodeBlock CJK_STROKES = 1984 new UnicodeBlock("CJK_STROKES", 1985 "CJK STROKES", 1986 "CJKSTROKES"); 1987 1988 /** 1989 * Constant for the "Lisu" Unicode character block. 1990 * @since 1.7 1991 */ 1992 public static final UnicodeBlock LISU = 1993 new UnicodeBlock("LISU"); 1994 1995 /** 1996 * Constant for the "Vai" Unicode character block. 1997 * @since 1.7 1998 */ 1999 public static final UnicodeBlock VAI = 2000 new UnicodeBlock("VAI"); 2001 2002 /** 2003 * Constant for the "Cyrillic Extended-B" Unicode character block. 2004 * @since 1.7 2005 */ 2006 public static final UnicodeBlock CYRILLIC_EXTENDED_B = 2007 new UnicodeBlock("CYRILLIC_EXTENDED_B", 2008 "CYRILLIC EXTENDED-B", 2009 "CYRILLICEXTENDED-B"); 2010 2011 /** 2012 * Constant for the "Bamum" Unicode character block. 2013 * @since 1.7 2014 */ 2015 public static final UnicodeBlock BAMUM = 2016 new UnicodeBlock("BAMUM"); 2017 2018 /** 2019 * Constant for the "Modifier Tone Letters" Unicode character block. 2020 * @since 1.7 2021 */ 2022 public static final UnicodeBlock MODIFIER_TONE_LETTERS = 2023 new UnicodeBlock("MODIFIER_TONE_LETTERS", 2024 "MODIFIER TONE LETTERS", 2025 "MODIFIERTONELETTERS"); 2026 2027 /** 2028 * Constant for the "Latin Extended-D" Unicode character block. 2029 * @since 1.7 2030 */ 2031 public static final UnicodeBlock LATIN_EXTENDED_D = 2032 new UnicodeBlock("LATIN_EXTENDED_D", 2033 "LATIN EXTENDED-D", 2034 "LATINEXTENDED-D"); 2035 2036 /** 2037 * Constant for the "Syloti Nagri" Unicode character block. 2038 * @since 1.7 2039 */ 2040 public static final UnicodeBlock SYLOTI_NAGRI = 2041 new UnicodeBlock("SYLOTI_NAGRI", 2042 "SYLOTI NAGRI", 2043 "SYLOTINAGRI"); 2044 2045 /** 2046 * Constant for the "Common Indic Number Forms" Unicode character block. 2047 * @since 1.7 2048 */ 2049 public static final UnicodeBlock COMMON_INDIC_NUMBER_FORMS = 2050 new UnicodeBlock("COMMON_INDIC_NUMBER_FORMS", 2051 "COMMON INDIC NUMBER FORMS", 2052 "COMMONINDICNUMBERFORMS"); 2053 2054 /** 2055 * Constant for the "Phags-pa" Unicode character block. 2056 * @since 1.7 2057 */ 2058 public static final UnicodeBlock PHAGS_PA = 2059 new UnicodeBlock("PHAGS_PA", 2060 "PHAGS-PA"); 2061 2062 /** 2063 * Constant for the "Saurashtra" Unicode character block. 2064 * @since 1.7 2065 */ 2066 public static final UnicodeBlock SAURASHTRA = 2067 new UnicodeBlock("SAURASHTRA"); 2068 2069 /** 2070 * Constant for the "Devanagari Extended" Unicode character block. 2071 * @since 1.7 2072 */ 2073 public static final UnicodeBlock DEVANAGARI_EXTENDED = 2074 new UnicodeBlock("DEVANAGARI_EXTENDED", 2075 "DEVANAGARI EXTENDED", 2076 "DEVANAGARIEXTENDED"); 2077 2078 /** 2079 * Constant for the "Kayah Li" Unicode character block. 2080 * @since 1.7 2081 */ 2082 public static final UnicodeBlock KAYAH_LI = 2083 new UnicodeBlock("KAYAH_LI", 2084 "KAYAH LI", 2085 "KAYAHLI"); 2086 2087 /** 2088 * Constant for the "Rejang" Unicode character block. 2089 * @since 1.7 2090 */ 2091 public static final UnicodeBlock REJANG = 2092 new UnicodeBlock("REJANG"); 2093 2094 /** 2095 * Constant for the "Hangul Jamo Extended-A" Unicode character block. 2096 * @since 1.7 2097 */ 2098 public static final UnicodeBlock HANGUL_JAMO_EXTENDED_A = 2099 new UnicodeBlock("HANGUL_JAMO_EXTENDED_A", 2100 "HANGUL JAMO EXTENDED-A", 2101 "HANGULJAMOEXTENDED-A"); 2102 2103 /** 2104 * Constant for the "Javanese" Unicode character block. 2105 * @since 1.7 2106 */ 2107 public static final UnicodeBlock JAVANESE = 2108 new UnicodeBlock("JAVANESE"); 2109 2110 /** 2111 * Constant for the "Cham" Unicode character block. 2112 * @since 1.7 2113 */ 2114 public static final UnicodeBlock CHAM = 2115 new UnicodeBlock("CHAM"); 2116 2117 /** 2118 * Constant for the "Myanmar Extended-A" Unicode character block. 2119 * @since 1.7 2120 */ 2121 public static final UnicodeBlock MYANMAR_EXTENDED_A = 2122 new UnicodeBlock("MYANMAR_EXTENDED_A", 2123 "MYANMAR EXTENDED-A", 2124 "MYANMAREXTENDED-A"); 2125 2126 /** 2127 * Constant for the "Tai Viet" Unicode character block. 2128 * @since 1.7 2129 */ 2130 public static final UnicodeBlock TAI_VIET = 2131 new UnicodeBlock("TAI_VIET", 2132 "TAI VIET", 2133 "TAIVIET"); 2134 2135 /** 2136 * Constant for the "Ethiopic Extended-A" Unicode character block. 2137 * @since 1.7 2138 */ 2139 public static final UnicodeBlock ETHIOPIC_EXTENDED_A = 2140 new UnicodeBlock("ETHIOPIC_EXTENDED_A", 2141 "ETHIOPIC EXTENDED-A", 2142 "ETHIOPICEXTENDED-A"); 2143 2144 /** 2145 * Constant for the "Meetei Mayek" Unicode character block. 2146 * @since 1.7 2147 */ 2148 public static final UnicodeBlock MEETEI_MAYEK = 2149 new UnicodeBlock("MEETEI_MAYEK", 2150 "MEETEI MAYEK", 2151 "MEETEIMAYEK"); 2152 2153 /** 2154 * Constant for the "Hangul Jamo Extended-B" Unicode character block. 2155 * @since 1.7 2156 */ 2157 public static final UnicodeBlock HANGUL_JAMO_EXTENDED_B = 2158 new UnicodeBlock("HANGUL_JAMO_EXTENDED_B", 2159 "HANGUL JAMO EXTENDED-B", 2160 "HANGULJAMOEXTENDED-B"); 2161 2162 /** 2163 * Constant for the "Vertical Forms" Unicode character block. 2164 * @since 1.7 2165 */ 2166 public static final UnicodeBlock VERTICAL_FORMS = 2167 new UnicodeBlock("VERTICAL_FORMS", 2168 "VERTICAL FORMS", 2169 "VERTICALFORMS"); 2170 2171 /** 2172 * Constant for the "Ancient Greek Numbers" Unicode character block. 2173 * @since 1.7 2174 */ 2175 public static final UnicodeBlock ANCIENT_GREEK_NUMBERS = 2176 new UnicodeBlock("ANCIENT_GREEK_NUMBERS", 2177 "ANCIENT GREEK NUMBERS", 2178 "ANCIENTGREEKNUMBERS"); 2179 2180 /** 2181 * Constant for the "Ancient Symbols" Unicode character block. 2182 * @since 1.7 2183 */ 2184 public static final UnicodeBlock ANCIENT_SYMBOLS = 2185 new UnicodeBlock("ANCIENT_SYMBOLS", 2186 "ANCIENT SYMBOLS", 2187 "ANCIENTSYMBOLS"); 2188 2189 /** 2190 * Constant for the "Phaistos Disc" Unicode character block. 2191 * @since 1.7 2192 */ 2193 public static final UnicodeBlock PHAISTOS_DISC = 2194 new UnicodeBlock("PHAISTOS_DISC", 2195 "PHAISTOS DISC", 2196 "PHAISTOSDISC"); 2197 2198 /** 2199 * Constant for the "Lycian" Unicode character block. 2200 * @since 1.7 2201 */ 2202 public static final UnicodeBlock LYCIAN = 2203 new UnicodeBlock("LYCIAN"); 2204 2205 /** 2206 * Constant for the "Carian" Unicode character block. 2207 * @since 1.7 2208 */ 2209 public static final UnicodeBlock CARIAN = 2210 new UnicodeBlock("CARIAN"); 2211 2212 /** 2213 * Constant for the "Old Persian" Unicode character block. 2214 * @since 1.7 2215 */ 2216 public static final UnicodeBlock OLD_PERSIAN = 2217 new UnicodeBlock("OLD_PERSIAN", 2218 "OLD PERSIAN", 2219 "OLDPERSIAN"); 2220 2221 /** 2222 * Constant for the "Imperial Aramaic" Unicode character block. 2223 * @since 1.7 2224 */ 2225 public static final UnicodeBlock IMPERIAL_ARAMAIC = 2226 new UnicodeBlock("IMPERIAL_ARAMAIC", 2227 "IMPERIAL ARAMAIC", 2228 "IMPERIALARAMAIC"); 2229 2230 /** 2231 * Constant for the "Phoenician" Unicode character block. 2232 * @since 1.7 2233 */ 2234 public static final UnicodeBlock PHOENICIAN = 2235 new UnicodeBlock("PHOENICIAN"); 2236 2237 /** 2238 * Constant for the "Lydian" Unicode character block. 2239 * @since 1.7 2240 */ 2241 public static final UnicodeBlock LYDIAN = 2242 new UnicodeBlock("LYDIAN"); 2243 2244 /** 2245 * Constant for the "Kharoshthi" Unicode character block. 2246 * @since 1.7 2247 */ 2248 public static final UnicodeBlock KHAROSHTHI = 2249 new UnicodeBlock("KHAROSHTHI"); 2250 2251 /** 2252 * Constant for the "Old South Arabian" Unicode character block. 2253 * @since 1.7 2254 */ 2255 public static final UnicodeBlock OLD_SOUTH_ARABIAN = 2256 new UnicodeBlock("OLD_SOUTH_ARABIAN", 2257 "OLD SOUTH ARABIAN", 2258 "OLDSOUTHARABIAN"); 2259 2260 /** 2261 * Constant for the "Avestan" Unicode character block. 2262 * @since 1.7 2263 */ 2264 public static final UnicodeBlock AVESTAN = 2265 new UnicodeBlock("AVESTAN"); 2266 2267 /** 2268 * Constant for the "Inscriptional Parthian" Unicode character block. 2269 * @since 1.7 2270 */ 2271 public static final UnicodeBlock INSCRIPTIONAL_PARTHIAN = 2272 new UnicodeBlock("INSCRIPTIONAL_PARTHIAN", 2273 "INSCRIPTIONAL PARTHIAN", 2274 "INSCRIPTIONALPARTHIAN"); 2275 2276 /** 2277 * Constant for the "Inscriptional Pahlavi" Unicode character block. 2278 * @since 1.7 2279 */ 2280 public static final UnicodeBlock INSCRIPTIONAL_PAHLAVI = 2281 new UnicodeBlock("INSCRIPTIONAL_PAHLAVI", 2282 "INSCRIPTIONAL PAHLAVI", 2283 "INSCRIPTIONALPAHLAVI"); 2284 2285 /** 2286 * Constant for the "Old Turkic" Unicode character block. 2287 * @since 1.7 2288 */ 2289 public static final UnicodeBlock OLD_TURKIC = 2290 new UnicodeBlock("OLD_TURKIC", 2291 "OLD TURKIC", 2292 "OLDTURKIC"); 2293 2294 /** 2295 * Constant for the "Rumi Numeral Symbols" Unicode character block. 2296 * @since 1.7 2297 */ 2298 public static final UnicodeBlock RUMI_NUMERAL_SYMBOLS = 2299 new UnicodeBlock("RUMI_NUMERAL_SYMBOLS", 2300 "RUMI NUMERAL SYMBOLS", 2301 "RUMINUMERALSYMBOLS"); 2302 2303 /** 2304 * Constant for the "Brahmi" Unicode character block. 2305 * @since 1.7 2306 */ 2307 public static final UnicodeBlock BRAHMI = 2308 new UnicodeBlock("BRAHMI"); 2309 2310 /** 2311 * Constant for the "Kaithi" Unicode character block. 2312 * @since 1.7 2313 */ 2314 public static final UnicodeBlock KAITHI = 2315 new UnicodeBlock("KAITHI"); 2316 2317 /** 2318 * Constant for the "Cuneiform" Unicode character block. 2319 * @since 1.7 2320 */ 2321 public static final UnicodeBlock CUNEIFORM = 2322 new UnicodeBlock("CUNEIFORM"); 2323 2324 /** 2325 * Constant for the "Cuneiform Numbers and Punctuation" Unicode 2326 * character block. 2327 * @since 1.7 2328 */ 2329 public static final UnicodeBlock CUNEIFORM_NUMBERS_AND_PUNCTUATION = 2330 new UnicodeBlock("CUNEIFORM_NUMBERS_AND_PUNCTUATION", 2331 "CUNEIFORM NUMBERS AND PUNCTUATION", 2332 "CUNEIFORMNUMBERSANDPUNCTUATION"); 2333 2334 /** 2335 * Constant for the "Egyptian Hieroglyphs" Unicode character block. 2336 * @since 1.7 2337 */ 2338 public static final UnicodeBlock EGYPTIAN_HIEROGLYPHS = 2339 new UnicodeBlock("EGYPTIAN_HIEROGLYPHS", 2340 "EGYPTIAN HIEROGLYPHS", 2341 "EGYPTIANHIEROGLYPHS"); 2342 2343 /** 2344 * Constant for the "Bamum Supplement" Unicode character block. 2345 * @since 1.7 2346 */ 2347 public static final UnicodeBlock BAMUM_SUPPLEMENT = 2348 new UnicodeBlock("BAMUM_SUPPLEMENT", 2349 "BAMUM SUPPLEMENT", 2350 "BAMUMSUPPLEMENT"); 2351 2352 /** 2353 * Constant for the "Kana Supplement" Unicode character block. 2354 * @since 1.7 2355 */ 2356 public static final UnicodeBlock KANA_SUPPLEMENT = 2357 new UnicodeBlock("KANA_SUPPLEMENT", 2358 "KANA SUPPLEMENT", 2359 "KANASUPPLEMENT"); 2360 2361 /** 2362 * Constant for the "Ancient Greek Musical Notation" Unicode character 2363 * block. 2364 * @since 1.7 2365 */ 2366 public static final UnicodeBlock ANCIENT_GREEK_MUSICAL_NOTATION = 2367 new UnicodeBlock("ANCIENT_GREEK_MUSICAL_NOTATION", 2368 "ANCIENT GREEK MUSICAL NOTATION", 2369 "ANCIENTGREEKMUSICALNOTATION"); 2370 2371 /** 2372 * Constant for the "Counting Rod Numerals" Unicode character block. 2373 * @since 1.7 2374 */ 2375 public static final UnicodeBlock COUNTING_ROD_NUMERALS = 2376 new UnicodeBlock("COUNTING_ROD_NUMERALS", 2377 "COUNTING ROD NUMERALS", 2378 "COUNTINGRODNUMERALS"); 2379 2380 /** 2381 * Constant for the "Mahjong Tiles" Unicode character block. 2382 * @since 1.7 2383 */ 2384 public static final UnicodeBlock MAHJONG_TILES = 2385 new UnicodeBlock("MAHJONG_TILES", 2386 "MAHJONG TILES", 2387 "MAHJONGTILES"); 2388 2389 /** 2390 * Constant for the "Domino Tiles" Unicode character block. 2391 * @since 1.7 2392 */ 2393 public static final UnicodeBlock DOMINO_TILES = 2394 new UnicodeBlock("DOMINO_TILES", 2395 "DOMINO TILES", 2396 "DOMINOTILES"); 2397 2398 /** 2399 * Constant for the "Playing Cards" Unicode character block. 2400 * @since 1.7 2401 */ 2402 public static final UnicodeBlock PLAYING_CARDS = 2403 new UnicodeBlock("PLAYING_CARDS", 2404 "PLAYING CARDS", 2405 "PLAYINGCARDS"); 2406 2407 /** 2408 * Constant for the "Enclosed Alphanumeric Supplement" Unicode character 2409 * block. 2410 * @since 1.7 2411 */ 2412 public static final UnicodeBlock ENCLOSED_ALPHANUMERIC_SUPPLEMENT = 2413 new UnicodeBlock("ENCLOSED_ALPHANUMERIC_SUPPLEMENT", 2414 "ENCLOSED ALPHANUMERIC SUPPLEMENT", 2415 "ENCLOSEDALPHANUMERICSUPPLEMENT"); 2416 2417 /** 2418 * Constant for the "Enclosed Ideographic Supplement" Unicode character 2419 * block. 2420 * @since 1.7 2421 */ 2422 public static final UnicodeBlock ENCLOSED_IDEOGRAPHIC_SUPPLEMENT = 2423 new UnicodeBlock("ENCLOSED_IDEOGRAPHIC_SUPPLEMENT", 2424 "ENCLOSED IDEOGRAPHIC SUPPLEMENT", 2425 "ENCLOSEDIDEOGRAPHICSUPPLEMENT"); 2426 2427 /** 2428 * Constant for the "Miscellaneous Symbols And Pictographs" Unicode 2429 * character block. 2430 * @since 1.7 2431 */ 2432 public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS = 2433 new UnicodeBlock("MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS", 2434 "MISCELLANEOUS SYMBOLS AND PICTOGRAPHS", 2435 "MISCELLANEOUSSYMBOLSANDPICTOGRAPHS"); 2436 2437 /** 2438 * Constant for the "Emoticons" Unicode character block. 2439 * @since 1.7 2440 */ 2441 public static final UnicodeBlock EMOTICONS = 2442 new UnicodeBlock("EMOTICONS"); 2443 2444 /** 2445 * Constant for the "Transport And Map Symbols" Unicode character block. 2446 * @since 1.7 2447 */ 2448 public static final UnicodeBlock TRANSPORT_AND_MAP_SYMBOLS = 2449 new UnicodeBlock("TRANSPORT_AND_MAP_SYMBOLS", 2450 "TRANSPORT AND MAP SYMBOLS", 2451 "TRANSPORTANDMAPSYMBOLS"); 2452 2453 /** 2454 * Constant for the "Alchemical Symbols" Unicode character block. 2455 * @since 1.7 2456 */ 2457 public static final UnicodeBlock ALCHEMICAL_SYMBOLS = 2458 new UnicodeBlock("ALCHEMICAL_SYMBOLS", 2459 "ALCHEMICAL SYMBOLS", 2460 "ALCHEMICALSYMBOLS"); 2461 2462 /** 2463 * Constant for the "CJK Unified Ideographs Extension C" Unicode 2464 * character block. 2465 * @since 1.7 2466 */ 2467 public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C = 2468 new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C", 2469 "CJK UNIFIED IDEOGRAPHS EXTENSION C", 2470 "CJKUNIFIEDIDEOGRAPHSEXTENSIONC"); 2471 2472 /** 2473 * Constant for the "CJK Unified Ideographs Extension D" Unicode 2474 * character block. 2475 * @since 1.7 2476 */ 2477 public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D = 2478 new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D", 2479 "CJK UNIFIED IDEOGRAPHS EXTENSION D", 2480 "CJKUNIFIEDIDEOGRAPHSEXTENSIOND"); 2481 2482 /** 2483 * Constant for the "Arabic Extended-A" Unicode character block. 2484 * @since 1.8 2485 */ 2486 public static final UnicodeBlock ARABIC_EXTENDED_A = 2487 new UnicodeBlock("ARABIC_EXTENDED_A", 2488 "ARABIC EXTENDED-A", 2489 "ARABICEXTENDED-A"); 2490 2491 /** 2492 * Constant for the "Sundanese Supplement" Unicode character block. 2493 * @since 1.8 2494 */ 2495 public static final UnicodeBlock SUNDANESE_SUPPLEMENT = 2496 new UnicodeBlock("SUNDANESE_SUPPLEMENT", 2497 "SUNDANESE SUPPLEMENT", 2498 "SUNDANESESUPPLEMENT"); 2499 2500 /** 2501 * Constant for the "Meetei Mayek Extensions" Unicode character block. 2502 * @since 1.8 2503 */ 2504 public static final UnicodeBlock MEETEI_MAYEK_EXTENSIONS = 2505 new UnicodeBlock("MEETEI_MAYEK_EXTENSIONS", 2506 "MEETEI MAYEK EXTENSIONS", 2507 "MEETEIMAYEKEXTENSIONS"); 2508 2509 /** 2510 * Constant for the "Meroitic Hieroglyphs" Unicode character block. 2511 * @since 1.8 2512 */ 2513 public static final UnicodeBlock MEROITIC_HIEROGLYPHS = 2514 new UnicodeBlock("MEROITIC_HIEROGLYPHS", 2515 "MEROITIC HIEROGLYPHS", 2516 "MEROITICHIEROGLYPHS"); 2517 2518 /** 2519 * Constant for the "Meroitic Cursive" Unicode character block. 2520 * @since 1.8 2521 */ 2522 public static final UnicodeBlock MEROITIC_CURSIVE = 2523 new UnicodeBlock("MEROITIC_CURSIVE", 2524 "MEROITIC CURSIVE", 2525 "MEROITICCURSIVE"); 2526 2527 /** 2528 * Constant for the "Sora Sompeng" Unicode character block. 2529 * @since 1.8 2530 */ 2531 public static final UnicodeBlock SORA_SOMPENG = 2532 new UnicodeBlock("SORA_SOMPENG", 2533 "SORA SOMPENG", 2534 "SORASOMPENG"); 2535 2536 /** 2537 * Constant for the "Chakma" Unicode character block. 2538 * @since 1.8 2539 */ 2540 public static final UnicodeBlock CHAKMA = 2541 new UnicodeBlock("CHAKMA"); 2542 2543 /** 2544 * Constant for the "Sharada" Unicode character block. 2545 * @since 1.8 2546 */ 2547 public static final UnicodeBlock SHARADA = 2548 new UnicodeBlock("SHARADA"); 2549 2550 /** 2551 * Constant for the "Takri" Unicode character block. 2552 * @since 1.8 2553 */ 2554 public static final UnicodeBlock TAKRI = 2555 new UnicodeBlock("TAKRI"); 2556 2557 /** 2558 * Constant for the "Miao" Unicode character block. 2559 * @since 1.8 2560 */ 2561 public static final UnicodeBlock MIAO = 2562 new UnicodeBlock("MIAO"); 2563 2564 /** 2565 * Constant for the "Arabic Mathematical Alphabetic Symbols" Unicode 2566 * character block. 2567 * @since 1.8 2568 */ 2569 public static final UnicodeBlock ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS = 2570 new UnicodeBlock("ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS", 2571 "ARABIC MATHEMATICAL ALPHABETIC SYMBOLS", 2572 "ARABICMATHEMATICALALPHABETICSYMBOLS"); 2573 2574 private static final int blockStarts[] = { 2575 0x0000, // 0000..007F; Basic Latin 2576 0x0080, // 0080..00FF; Latin-1 Supplement 2577 0x0100, // 0100..017F; Latin Extended-A 2578 0x0180, // 0180..024F; Latin Extended-B 2579 0x0250, // 0250..02AF; IPA Extensions 2580 0x02B0, // 02B0..02FF; Spacing Modifier Letters 2581 0x0300, // 0300..036F; Combining Diacritical Marks 2582 0x0370, // 0370..03FF; Greek and Coptic 2583 0x0400, // 0400..04FF; Cyrillic 2584 0x0500, // 0500..052F; Cyrillic Supplement 2585 0x0530, // 0530..058F; Armenian 2586 0x0590, // 0590..05FF; Hebrew 2587 0x0600, // 0600..06FF; Arabic 2588 0x0700, // 0700..074F; Syriac 2589 0x0750, // 0750..077F; Arabic Supplement 2590 0x0780, // 0780..07BF; Thaana 2591 0x07C0, // 07C0..07FF; NKo 2592 0x0800, // 0800..083F; Samaritan 2593 0x0840, // 0840..085F; Mandaic 2594 0x0860, // unassigned 2595 0x08A0, // 08A0..08FF; Arabic Extended-A 2596 0x0900, // 0900..097F; Devanagari 2597 0x0980, // 0980..09FF; Bengali 2598 0x0A00, // 0A00..0A7F; Gurmukhi 2599 0x0A80, // 0A80..0AFF; Gujarati 2600 0x0B00, // 0B00..0B7F; Oriya 2601 0x0B80, // 0B80..0BFF; Tamil 2602 0x0C00, // 0C00..0C7F; Telugu 2603 0x0C80, // 0C80..0CFF; Kannada 2604 0x0D00, // 0D00..0D7F; Malayalam 2605 0x0D80, // 0D80..0DFF; Sinhala 2606 0x0E00, // 0E00..0E7F; Thai 2607 0x0E80, // 0E80..0EFF; Lao 2608 0x0F00, // 0F00..0FFF; Tibetan 2609 0x1000, // 1000..109F; Myanmar 2610 0x10A0, // 10A0..10FF; Georgian 2611 0x1100, // 1100..11FF; Hangul Jamo 2612 0x1200, // 1200..137F; Ethiopic 2613 0x1380, // 1380..139F; Ethiopic Supplement 2614 0x13A0, // 13A0..13FF; Cherokee 2615 0x1400, // 1400..167F; Unified Canadian Aboriginal Syllabics 2616 0x1680, // 1680..169F; Ogham 2617 0x16A0, // 16A0..16FF; Runic 2618 0x1700, // 1700..171F; Tagalog 2619 0x1720, // 1720..173F; Hanunoo 2620 0x1740, // 1740..175F; Buhid 2621 0x1760, // 1760..177F; Tagbanwa 2622 0x1780, // 1780..17FF; Khmer 2623 0x1800, // 1800..18AF; Mongolian 2624 0x18B0, // 18B0..18FF; Unified Canadian Aboriginal Syllabics Extended 2625 0x1900, // 1900..194F; Limbu 2626 0x1950, // 1950..197F; Tai Le 2627 0x1980, // 1980..19DF; New Tai Lue 2628 0x19E0, // 19E0..19FF; Khmer Symbols 2629 0x1A00, // 1A00..1A1F; Buginese 2630 0x1A20, // 1A20..1AAF; Tai Tham 2631 0x1AB0, // unassigned 2632 0x1B00, // 1B00..1B7F; Balinese 2633 0x1B80, // 1B80..1BBF; Sundanese 2634 0x1BC0, // 1BC0..1BFF; Batak 2635 0x1C00, // 1C00..1C4F; Lepcha 2636 0x1C50, // 1C50..1C7F; Ol Chiki 2637 0x1C80, // unassigned 2638 0x1CC0, // 1CC0..1CCF; Sundanese Supplement 2639 0x1CD0, // 1CD0..1CFF; Vedic Extensions 2640 0x1D00, // 1D00..1D7F; Phonetic Extensions 2641 0x1D80, // 1D80..1DBF; Phonetic Extensions Supplement 2642 0x1DC0, // 1DC0..1DFF; Combining Diacritical Marks Supplement 2643 0x1E00, // 1E00..1EFF; Latin Extended Additional 2644 0x1F00, // 1F00..1FFF; Greek Extended 2645 0x2000, // 2000..206F; General Punctuation 2646 0x2070, // 2070..209F; Superscripts and Subscripts 2647 0x20A0, // 20A0..20CF; Currency Symbols 2648 0x20D0, // 20D0..20FF; Combining Diacritical Marks for Symbols 2649 0x2100, // 2100..214F; Letterlike Symbols 2650 0x2150, // 2150..218F; Number Forms 2651 0x2190, // 2190..21FF; Arrows 2652 0x2200, // 2200..22FF; Mathematical Operators 2653 0x2300, // 2300..23FF; Miscellaneous Technical 2654 0x2400, // 2400..243F; Control Pictures 2655 0x2440, // 2440..245F; Optical Character Recognition 2656 0x2460, // 2460..24FF; Enclosed Alphanumerics 2657 0x2500, // 2500..257F; Box Drawing 2658 0x2580, // 2580..259F; Block Elements 2659 0x25A0, // 25A0..25FF; Geometric Shapes 2660 0x2600, // 2600..26FF; Miscellaneous Symbols 2661 0x2700, // 2700..27BF; Dingbats 2662 0x27C0, // 27C0..27EF; Miscellaneous Mathematical Symbols-A 2663 0x27F0, // 27F0..27FF; Supplemental Arrows-A 2664 0x2800, // 2800..28FF; Braille Patterns 2665 0x2900, // 2900..297F; Supplemental Arrows-B 2666 0x2980, // 2980..29FF; Miscellaneous Mathematical Symbols-B 2667 0x2A00, // 2A00..2AFF; Supplemental Mathematical Operators 2668 0x2B00, // 2B00..2BFF; Miscellaneous Symbols and Arrows 2669 0x2C00, // 2C00..2C5F; Glagolitic 2670 0x2C60, // 2C60..2C7F; Latin Extended-C 2671 0x2C80, // 2C80..2CFF; Coptic 2672 0x2D00, // 2D00..2D2F; Georgian Supplement 2673 0x2D30, // 2D30..2D7F; Tifinagh 2674 0x2D80, // 2D80..2DDF; Ethiopic Extended 2675 0x2DE0, // 2DE0..2DFF; Cyrillic Extended-A 2676 0x2E00, // 2E00..2E7F; Supplemental Punctuation 2677 0x2E80, // 2E80..2EFF; CJK Radicals Supplement 2678 0x2F00, // 2F00..2FDF; Kangxi Radicals 2679 0x2FE0, // unassigned 2680 0x2FF0, // 2FF0..2FFF; Ideographic Description Characters 2681 0x3000, // 3000..303F; CJK Symbols and Punctuation 2682 0x3040, // 3040..309F; Hiragana 2683 0x30A0, // 30A0..30FF; Katakana 2684 0x3100, // 3100..312F; Bopomofo 2685 0x3130, // 3130..318F; Hangul Compatibility Jamo 2686 0x3190, // 3190..319F; Kanbun 2687 0x31A0, // 31A0..31BF; Bopomofo Extended 2688 0x31C0, // 31C0..31EF; CJK Strokes 2689 0x31F0, // 31F0..31FF; Katakana Phonetic Extensions 2690 0x3200, // 3200..32FF; Enclosed CJK Letters and Months 2691 0x3300, // 3300..33FF; CJK Compatibility 2692 0x3400, // 3400..4DBF; CJK Unified Ideographs Extension A 2693 0x4DC0, // 4DC0..4DFF; Yijing Hexagram Symbols 2694 0x4E00, // 4E00..9FFF; CJK Unified Ideographs 2695 0xA000, // A000..A48F; Yi Syllables 2696 0xA490, // A490..A4CF; Yi Radicals 2697 0xA4D0, // A4D0..A4FF; Lisu 2698 0xA500, // A500..A63F; Vai 2699 0xA640, // A640..A69F; Cyrillic Extended-B 2700 0xA6A0, // A6A0..A6FF; Bamum 2701 0xA700, // A700..A71F; Modifier Tone Letters 2702 0xA720, // A720..A7FF; Latin Extended-D 2703 0xA800, // A800..A82F; Syloti Nagri 2704 0xA830, // A830..A83F; Common Indic Number Forms 2705 0xA840, // A840..A87F; Phags-pa 2706 0xA880, // A880..A8DF; Saurashtra 2707 0xA8E0, // A8E0..A8FF; Devanagari Extended 2708 0xA900, // A900..A92F; Kayah Li 2709 0xA930, // A930..A95F; Rejang 2710 0xA960, // A960..A97F; Hangul Jamo Extended-A 2711 0xA980, // A980..A9DF; Javanese 2712 0xA9E0, // unassigned 2713 0xAA00, // AA00..AA5F; Cham 2714 0xAA60, // AA60..AA7F; Myanmar Extended-A 2715 0xAA80, // AA80..AADF; Tai Viet 2716 0xAAE0, // AAE0..AAFF; Meetei Mayek Extensions 2717 0xAB00, // AB00..AB2F; Ethiopic Extended-A 2718 0xAB30, // unassigned 2719 0xABC0, // ABC0..ABFF; Meetei Mayek 2720 0xAC00, // AC00..D7AF; Hangul Syllables 2721 0xD7B0, // D7B0..D7FF; Hangul Jamo Extended-B 2722 0xD800, // D800..DB7F; High Surrogates 2723 0xDB80, // DB80..DBFF; High Private Use Surrogates 2724 0xDC00, // DC00..DFFF; Low Surrogates 2725 0xE000, // E000..F8FF; Private Use Area 2726 0xF900, // F900..FAFF; CJK Compatibility Ideographs 2727 0xFB00, // FB00..FB4F; Alphabetic Presentation Forms 2728 0xFB50, // FB50..FDFF; Arabic Presentation Forms-A 2729 0xFE00, // FE00..FE0F; Variation Selectors 2730 0xFE10, // FE10..FE1F; Vertical Forms 2731 0xFE20, // FE20..FE2F; Combining Half Marks 2732 0xFE30, // FE30..FE4F; CJK Compatibility Forms 2733 0xFE50, // FE50..FE6F; Small Form Variants 2734 0xFE70, // FE70..FEFF; Arabic Presentation Forms-B 2735 0xFF00, // FF00..FFEF; Halfwidth and Fullwidth Forms 2736 0xFFF0, // FFF0..FFFF; Specials 2737 0x10000, // 10000..1007F; Linear B Syllabary 2738 0x10080, // 10080..100FF; Linear B Ideograms 2739 0x10100, // 10100..1013F; Aegean Numbers 2740 0x10140, // 10140..1018F; Ancient Greek Numbers 2741 0x10190, // 10190..101CF; Ancient Symbols 2742 0x101D0, // 101D0..101FF; Phaistos Disc 2743 0x10200, // unassigned 2744 0x10280, // 10280..1029F; Lycian 2745 0x102A0, // 102A0..102DF; Carian 2746 0x102E0, // unassigned 2747 0x10300, // 10300..1032F; Old Italic 2748 0x10330, // 10330..1034F; Gothic 2749 0x10350, // unassigned 2750 0x10380, // 10380..1039F; Ugaritic 2751 0x103A0, // 103A0..103DF; Old Persian 2752 0x103E0, // unassigned 2753 0x10400, // 10400..1044F; Deseret 2754 0x10450, // 10450..1047F; Shavian 2755 0x10480, // 10480..104AF; Osmanya 2756 0x104B0, // unassigned 2757 0x10800, // 10800..1083F; Cypriot Syllabary 2758 0x10840, // 10840..1085F; Imperial Aramaic 2759 0x10860, // unassigned 2760 0x10900, // 10900..1091F; Phoenician 2761 0x10920, // 10920..1093F; Lydian 2762 0x10940, // unassigned 2763 0x10980, // 10980..1099F; Meroitic Hieroglyphs 2764 0x109A0, // 109A0..109FF; Meroitic Cursive 2765 0x10A00, // 10A00..10A5F; Kharoshthi 2766 0x10A60, // 10A60..10A7F; Old South Arabian 2767 0x10A80, // unassigned 2768 0x10B00, // 10B00..10B3F; Avestan 2769 0x10B40, // 10B40..10B5F; Inscriptional Parthian 2770 0x10B60, // 10B60..10B7F; Inscriptional Pahlavi 2771 0x10B80, // unassigned 2772 0x10C00, // 10C00..10C4F; Old Turkic 2773 0x10C50, // unassigned 2774 0x10E60, // 10E60..10E7F; Rumi Numeral Symbols 2775 0x10E80, // unassigned 2776 0x11000, // 11000..1107F; Brahmi 2777 0x11080, // 11080..110CF; Kaithi 2778 0x110D0, // 110D0..110FF; Sora Sompeng 2779 0x11100, // 11100..1114F; Chakma 2780 0x11150, // unassigned 2781 0x11180, // 11180..111DF; Sharada 2782 0x111E0, // unassigned 2783 0x11680, // 11680..116CF; Takri 2784 0x116D0, // unassigned 2785 0x12000, // 12000..123FF; Cuneiform 2786 0x12400, // 12400..1247F; Cuneiform Numbers and Punctuation 2787 0x12480, // unassigned 2788 0x13000, // 13000..1342F; Egyptian Hieroglyphs 2789 0x13430, // unassigned 2790 0x16800, // 16800..16A3F; Bamum Supplement 2791 0x16A40, // unassigned 2792 0x16F00, // 16F00..16F9F; Miao 2793 0x16FA0, // unassigned 2794 0x1B000, // 1B000..1B0FF; Kana Supplement 2795 0x1B100, // unassigned 2796 0x1D000, // 1D000..1D0FF; Byzantine Musical Symbols 2797 0x1D100, // 1D100..1D1FF; Musical Symbols 2798 0x1D200, // 1D200..1D24F; Ancient Greek Musical Notation 2799 0x1D250, // unassigned 2800 0x1D300, // 1D300..1D35F; Tai Xuan Jing Symbols 2801 0x1D360, // 1D360..1D37F; Counting Rod Numerals 2802 0x1D380, // unassigned 2803 0x1D400, // 1D400..1D7FF; Mathematical Alphanumeric Symbols 2804 0x1D800, // unassigned 2805 0x1EE00, // 1EE00..1EEFF; Arabic Mathematical Alphabetic Symbols 2806 0x1EF00, // unassigned 2807 0x1F000, // 1F000..1F02F; Mahjong Tiles 2808 0x1F030, // 1F030..1F09F; Domino Tiles 2809 0x1F0A0, // 1F0A0..1F0FF; Playing Cards 2810 0x1F100, // 1F100..1F1FF; Enclosed Alphanumeric Supplement 2811 0x1F200, // 1F200..1F2FF; Enclosed Ideographic Supplement 2812 0x1F300, // 1F300..1F5FF; Miscellaneous Symbols And Pictographs 2813 0x1F600, // 1F600..1F64F; Emoticons 2814 0x1F650, // unassigned 2815 0x1F680, // 1F680..1F6FF; Transport And Map Symbols 2816 0x1F700, // 1F700..1F77F; Alchemical Symbols 2817 0x1F780, // unassigned 2818 0x20000, // 20000..2A6DF; CJK Unified Ideographs Extension B 2819 0x2A6E0, // unassigned 2820 0x2A700, // 2A700..2B73F; CJK Unified Ideographs Extension C 2821 0x2B740, // 2B740..2B81F; CJK Unified Ideographs Extension D 2822 0x2B820, // unassigned 2823 0x2F800, // 2F800..2FA1F; CJK Compatibility Ideographs Supplement 2824 0x2FA20, // unassigned 2825 0xE0000, // E0000..E007F; Tags 2826 0xE0080, // unassigned 2827 0xE0100, // E0100..E01EF; Variation Selectors Supplement 2828 0xE01F0, // unassigned 2829 0xF0000, // F0000..FFFFF; Supplementary Private Use Area-A 2830 0x100000 // 100000..10FFFF; Supplementary Private Use Area-B 2831 }; 2832 2833 private static final UnicodeBlock[] blocks = { 2834 BASIC_LATIN, 2835 LATIN_1_SUPPLEMENT, 2836 LATIN_EXTENDED_A, 2837 LATIN_EXTENDED_B, 2838 IPA_EXTENSIONS, 2839 SPACING_MODIFIER_LETTERS, 2840 COMBINING_DIACRITICAL_MARKS, 2841 GREEK, 2842 CYRILLIC, 2843 CYRILLIC_SUPPLEMENTARY, 2844 ARMENIAN, 2845 HEBREW, 2846 ARABIC, 2847 SYRIAC, 2848 ARABIC_SUPPLEMENT, 2849 THAANA, 2850 NKO, 2851 SAMARITAN, 2852 MANDAIC, 2853 null, 2854 ARABIC_EXTENDED_A, 2855 DEVANAGARI, 2856 BENGALI, 2857 GURMUKHI, 2858 GUJARATI, 2859 ORIYA, 2860 TAMIL, 2861 TELUGU, 2862 KANNADA, 2863 MALAYALAM, 2864 SINHALA, 2865 THAI, 2866 LAO, 2867 TIBETAN, 2868 MYANMAR, 2869 GEORGIAN, 2870 HANGUL_JAMO, 2871 ETHIOPIC, 2872 ETHIOPIC_SUPPLEMENT, 2873 CHEROKEE, 2874 UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS, 2875 OGHAM, 2876 RUNIC, 2877 TAGALOG, 2878 HANUNOO, 2879 BUHID, 2880 TAGBANWA, 2881 KHMER, 2882 MONGOLIAN, 2883 UNIFIED_CANADIAN_ABORIGINAL_SYLLABICS_EXTENDED, 2884 LIMBU, 2885 TAI_LE, 2886 NEW_TAI_LUE, 2887 KHMER_SYMBOLS, 2888 BUGINESE, 2889 TAI_THAM, 2890 null, 2891 BALINESE, 2892 SUNDANESE, 2893 BATAK, 2894 LEPCHA, 2895 OL_CHIKI, 2896 null, 2897 SUNDANESE_SUPPLEMENT, 2898 VEDIC_EXTENSIONS, 2899 PHONETIC_EXTENSIONS, 2900 PHONETIC_EXTENSIONS_SUPPLEMENT, 2901 COMBINING_DIACRITICAL_MARKS_SUPPLEMENT, 2902 LATIN_EXTENDED_ADDITIONAL, 2903 GREEK_EXTENDED, 2904 GENERAL_PUNCTUATION, 2905 SUPERSCRIPTS_AND_SUBSCRIPTS, 2906 CURRENCY_SYMBOLS, 2907 COMBINING_MARKS_FOR_SYMBOLS, 2908 LETTERLIKE_SYMBOLS, 2909 NUMBER_FORMS, 2910 ARROWS, 2911 MATHEMATICAL_OPERATORS, 2912 MISCELLANEOUS_TECHNICAL, 2913 CONTROL_PICTURES, 2914 OPTICAL_CHARACTER_RECOGNITION, 2915 ENCLOSED_ALPHANUMERICS, 2916 BOX_DRAWING, 2917 BLOCK_ELEMENTS, 2918 GEOMETRIC_SHAPES, 2919 MISCELLANEOUS_SYMBOLS, 2920 DINGBATS, 2921 MISCELLANEOUS_MATHEMATICAL_SYMBOLS_A, 2922 SUPPLEMENTAL_ARROWS_A, 2923 BRAILLE_PATTERNS, 2924 SUPPLEMENTAL_ARROWS_B, 2925 MISCELLANEOUS_MATHEMATICAL_SYMBOLS_B, 2926 SUPPLEMENTAL_MATHEMATICAL_OPERATORS, 2927 MISCELLANEOUS_SYMBOLS_AND_ARROWS, 2928 GLAGOLITIC, 2929 LATIN_EXTENDED_C, 2930 COPTIC, 2931 GEORGIAN_SUPPLEMENT, 2932 TIFINAGH, 2933 ETHIOPIC_EXTENDED, 2934 CYRILLIC_EXTENDED_A, 2935 SUPPLEMENTAL_PUNCTUATION, 2936 CJK_RADICALS_SUPPLEMENT, 2937 KANGXI_RADICALS, 2938 null, 2939 IDEOGRAPHIC_DESCRIPTION_CHARACTERS, 2940 CJK_SYMBOLS_AND_PUNCTUATION, 2941 HIRAGANA, 2942 KATAKANA, 2943 BOPOMOFO, 2944 HANGUL_COMPATIBILITY_JAMO, 2945 KANBUN, 2946 BOPOMOFO_EXTENDED, 2947 CJK_STROKES, 2948 KATAKANA_PHONETIC_EXTENSIONS, 2949 ENCLOSED_CJK_LETTERS_AND_MONTHS, 2950 CJK_COMPATIBILITY, 2951 CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A, 2952 YIJING_HEXAGRAM_SYMBOLS, 2953 CJK_UNIFIED_IDEOGRAPHS, 2954 YI_SYLLABLES, 2955 YI_RADICALS, 2956 LISU, 2957 VAI, 2958 CYRILLIC_EXTENDED_B, 2959 BAMUM, 2960 MODIFIER_TONE_LETTERS, 2961 LATIN_EXTENDED_D, 2962 SYLOTI_NAGRI, 2963 COMMON_INDIC_NUMBER_FORMS, 2964 PHAGS_PA, 2965 SAURASHTRA, 2966 DEVANAGARI_EXTENDED, 2967 KAYAH_LI, 2968 REJANG, 2969 HANGUL_JAMO_EXTENDED_A, 2970 JAVANESE, 2971 null, 2972 CHAM, 2973 MYANMAR_EXTENDED_A, 2974 TAI_VIET, 2975 MEETEI_MAYEK_EXTENSIONS, 2976 ETHIOPIC_EXTENDED_A, 2977 null, 2978 MEETEI_MAYEK, 2979 HANGUL_SYLLABLES, 2980 HANGUL_JAMO_EXTENDED_B, 2981 HIGH_SURROGATES, 2982 HIGH_PRIVATE_USE_SURROGATES, 2983 LOW_SURROGATES, 2984 PRIVATE_USE_AREA, 2985 CJK_COMPATIBILITY_IDEOGRAPHS, 2986 ALPHABETIC_PRESENTATION_FORMS, 2987 ARABIC_PRESENTATION_FORMS_A, 2988 VARIATION_SELECTORS, 2989 VERTICAL_FORMS, 2990 COMBINING_HALF_MARKS, 2991 CJK_COMPATIBILITY_FORMS, 2992 SMALL_FORM_VARIANTS, 2993 ARABIC_PRESENTATION_FORMS_B, 2994 HALFWIDTH_AND_FULLWIDTH_FORMS, 2995 SPECIALS, 2996 LINEAR_B_SYLLABARY, 2997 LINEAR_B_IDEOGRAMS, 2998 AEGEAN_NUMBERS, 2999 ANCIENT_GREEK_NUMBERS, 3000 ANCIENT_SYMBOLS, 3001 PHAISTOS_DISC, 3002 null, 3003 LYCIAN, 3004 CARIAN, 3005 null, 3006 OLD_ITALIC, 3007 GOTHIC, 3008 null, 3009 UGARITIC, 3010 OLD_PERSIAN, 3011 null, 3012 DESERET, 3013 SHAVIAN, 3014 OSMANYA, 3015 null, 3016 CYPRIOT_SYLLABARY, 3017 IMPERIAL_ARAMAIC, 3018 null, 3019 PHOENICIAN, 3020 LYDIAN, 3021 null, 3022 MEROITIC_HIEROGLYPHS, 3023 MEROITIC_CURSIVE, 3024 KHAROSHTHI, 3025 OLD_SOUTH_ARABIAN, 3026 null, 3027 AVESTAN, 3028 INSCRIPTIONAL_PARTHIAN, 3029 INSCRIPTIONAL_PAHLAVI, 3030 null, 3031 OLD_TURKIC, 3032 null, 3033 RUMI_NUMERAL_SYMBOLS, 3034 null, 3035 BRAHMI, 3036 KAITHI, 3037 SORA_SOMPENG, 3038 CHAKMA, 3039 null, 3040 SHARADA, 3041 null, 3042 TAKRI, 3043 null, 3044 CUNEIFORM, 3045 CUNEIFORM_NUMBERS_AND_PUNCTUATION, 3046 null, 3047 EGYPTIAN_HIEROGLYPHS, 3048 null, 3049 BAMUM_SUPPLEMENT, 3050 null, 3051 MIAO, 3052 null, 3053 KANA_SUPPLEMENT, 3054 null, 3055 BYZANTINE_MUSICAL_SYMBOLS, 3056 MUSICAL_SYMBOLS, 3057 ANCIENT_GREEK_MUSICAL_NOTATION, 3058 null, 3059 TAI_XUAN_JING_SYMBOLS, 3060 COUNTING_ROD_NUMERALS, 3061 null, 3062 MATHEMATICAL_ALPHANUMERIC_SYMBOLS, 3063 null, 3064 ARABIC_MATHEMATICAL_ALPHABETIC_SYMBOLS, 3065 null, 3066 MAHJONG_TILES, 3067 DOMINO_TILES, 3068 PLAYING_CARDS, 3069 ENCLOSED_ALPHANUMERIC_SUPPLEMENT, 3070 ENCLOSED_IDEOGRAPHIC_SUPPLEMENT, 3071 MISCELLANEOUS_SYMBOLS_AND_PICTOGRAPHS, 3072 EMOTICONS, 3073 null, 3074 TRANSPORT_AND_MAP_SYMBOLS, 3075 ALCHEMICAL_SYMBOLS, 3076 null, 3077 CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B, 3078 null, 3079 CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C, 3080 CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D, 3081 null, 3082 CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT, 3083 null, 3084 TAGS, 3085 null, 3086 VARIATION_SELECTORS_SUPPLEMENT, 3087 null, 3088 SUPPLEMENTARY_PRIVATE_USE_AREA_A, 3089 SUPPLEMENTARY_PRIVATE_USE_AREA_B 3090 }; 3091 3092 3093 /** 3094 * Returns the object representing the Unicode block containing the 3095 * given character, or {@code null} if the character is not a 3096 * member of a defined block. 3097 * 3098 * <p><b>Note:</b> This method cannot handle 3099 * <a href="Character.html#supplementary"> supplementary 3100 * characters</a>. To support all Unicode characters, including 3101 * supplementary characters, use the {@link #of(int)} method. 3102 * 3103 * @param c The character in question 3104 * @return The {@code UnicodeBlock} instance representing the 3105 * Unicode block of which this character is a member, or 3106 * {@code null} if the character is not a member of any 3107 * Unicode block 3108 */ 3109 public static UnicodeBlock of(char c) { 3110 return of((int)c); 3111 } 3112 3113 /** 3114 * Returns the object representing the Unicode block 3115 * containing the given character (Unicode code point), or 3116 * {@code null} if the character is not a member of a 3117 * defined block. 3118 * 3119 * @param codePoint the character (Unicode code point) in question. 3120 * @return The {@code UnicodeBlock} instance representing the 3121 * Unicode block of which this character is a member, or 3122 * {@code null} if the character is not a member of any 3123 * Unicode block 3124 * @exception IllegalArgumentException if the specified 3125 * {@code codePoint} is an invalid Unicode code point. 3126 * @see Character#isValidCodePoint(int) 3127 * @since 1.5 3128 */ 3129 public static UnicodeBlock of(int codePoint) { 3130 if (!isValidCodePoint(codePoint)) { 3131 throw new IllegalArgumentException(); 3132 } 3133 3134 int top, bottom, current; 3135 bottom = 0; 3136 top = blockStarts.length; 3137 current = top/2; 3138 3139 // invariant: top > current >= bottom && codePoint >= unicodeBlockStarts[bottom] 3140 while (top - bottom > 1) { 3141 if (codePoint >= blockStarts[current]) { 3142 bottom = current; 3143 } else { 3144 top = current; 3145 } 3146 current = (top + bottom) / 2; 3147 } 3148 return blocks[current]; 3149 } 3150 3151 /** 3152 * Returns the UnicodeBlock with the given name. Block 3153 * names are determined by The Unicode Standard. The file 3154 * Blocks-<version>.txt defines blocks for a particular 3155 * version of the standard. The {@link Character} class specifies 3156 * the version of the standard that it supports. 3157 * <p> 3158 * This method accepts block names in the following forms: 3159 * <ol> 3160 * <li> Canonical block names as defined by the Unicode Standard. 3161 * For example, the standard defines a "Basic Latin" block. Therefore, this 3162 * method accepts "Basic Latin" as a valid block name. The documentation of 3163 * each UnicodeBlock provides the canonical name. 3164 * <li>Canonical block names with all spaces removed. For example, "BasicLatin" 3165 * is a valid block name for the "Basic Latin" block. 3166 * <li>The text representation of each constant UnicodeBlock identifier. 3167 * For example, this method will return the {@link #BASIC_LATIN} block if 3168 * provided with the "BASIC_LATIN" name. This form replaces all spaces and 3169 * hyphens in the canonical name with underscores. 3170 * </ol> 3171 * Finally, character case is ignored for all of the valid block name forms. 3172 * For example, "BASIC_LATIN" and "basic_latin" are both valid block names. 3173 * The en_US locale's case mapping rules are used to provide case-insensitive 3174 * string comparisons for block name validation. 3175 * <p> 3176 * If the Unicode Standard changes block names, both the previous and 3177 * current names will be accepted. 3178 * 3179 * @param blockName A {@code UnicodeBlock} name. 3180 * @return The {@code UnicodeBlock} instance identified 3181 * by {@code blockName} 3182 * @throws IllegalArgumentException if {@code blockName} is an 3183 * invalid name 3184 * @throws NullPointerException if {@code blockName} is null 3185 * @since 1.5 3186 */ 3187 public static final UnicodeBlock forName(String blockName) { 3188 UnicodeBlock block = map.get(blockName.toUpperCase(Locale.US)); 3189 if (block == null) { 3190 throw new IllegalArgumentException(); 3191 } 3192 return block; 3193 } 3194 } 3195 3196 3197 /** 3198 * A family of character subsets representing the character scripts 3199 * defined in the <a href="http://www.unicode.org/reports/tr24/"> 3200 * <i>Unicode Standard Annex #24: Script Names</i></a>. Every Unicode 3201 * character is assigned to a single Unicode script, either a specific 3202 * script, such as {@link Character.UnicodeScript#LATIN Latin}, or 3203 * one of the following three special values, 3204 * {@link Character.UnicodeScript#INHERITED Inherited}, 3205 * {@link Character.UnicodeScript#COMMON Common} or 3206 * {@link Character.UnicodeScript#UNKNOWN Unknown}. 3207 * 3208 * @since 1.7 3209 */ 3210 public static enum UnicodeScript { 3211 /** 3212 * Unicode script "Common". 3213 */ 3214 COMMON, 3215 3216 /** 3217 * Unicode script "Latin". 3218 */ 3219 LATIN, 3220 3221 /** 3222 * Unicode script "Greek". 3223 */ 3224 GREEK, 3225 3226 /** 3227 * Unicode script "Cyrillic". 3228 */ 3229 CYRILLIC, 3230 3231 /** 3232 * Unicode script "Armenian". 3233 */ 3234 ARMENIAN, 3235 3236 /** 3237 * Unicode script "Hebrew". 3238 */ 3239 HEBREW, 3240 3241 /** 3242 * Unicode script "Arabic". 3243 */ 3244 ARABIC, 3245 3246 /** 3247 * Unicode script "Syriac". 3248 */ 3249 SYRIAC, 3250 3251 /** 3252 * Unicode script "Thaana". 3253 */ 3254 THAANA, 3255 3256 /** 3257 * Unicode script "Devanagari". 3258 */ 3259 DEVANAGARI, 3260 3261 /** 3262 * Unicode script "Bengali". 3263 */ 3264 BENGALI, 3265 3266 /** 3267 * Unicode script "Gurmukhi". 3268 */ 3269 GURMUKHI, 3270 3271 /** 3272 * Unicode script "Gujarati". 3273 */ 3274 GUJARATI, 3275 3276 /** 3277 * Unicode script "Oriya". 3278 */ 3279 ORIYA, 3280 3281 /** 3282 * Unicode script "Tamil". 3283 */ 3284 TAMIL, 3285 3286 /** 3287 * Unicode script "Telugu". 3288 */ 3289 TELUGU, 3290 3291 /** 3292 * Unicode script "Kannada". 3293 */ 3294 KANNADA, 3295 3296 /** 3297 * Unicode script "Malayalam". 3298 */ 3299 MALAYALAM, 3300 3301 /** 3302 * Unicode script "Sinhala". 3303 */ 3304 SINHALA, 3305 3306 /** 3307 * Unicode script "Thai". 3308 */ 3309 THAI, 3310 3311 /** 3312 * Unicode script "Lao". 3313 */ 3314 LAO, 3315 3316 /** 3317 * Unicode script "Tibetan". 3318 */ 3319 TIBETAN, 3320 3321 /** 3322 * Unicode script "Myanmar". 3323 */ 3324 MYANMAR, 3325 3326 /** 3327 * Unicode script "Georgian". 3328 */ 3329 GEORGIAN, 3330 3331 /** 3332 * Unicode script "Hangul". 3333 */ 3334 HANGUL, 3335 3336 /** 3337 * Unicode script "Ethiopic". 3338 */ 3339 ETHIOPIC, 3340 3341 /** 3342 * Unicode script "Cherokee". 3343 */ 3344 CHEROKEE, 3345 3346 /** 3347 * Unicode script "Canadian_Aboriginal". 3348 */ 3349 CANADIAN_ABORIGINAL, 3350 3351 /** 3352 * Unicode script "Ogham". 3353 */ 3354 OGHAM, 3355 3356 /** 3357 * Unicode script "Runic". 3358 */ 3359 RUNIC, 3360 3361 /** 3362 * Unicode script "Khmer". 3363 */ 3364 KHMER, 3365 3366 /** 3367 * Unicode script "Mongolian". 3368 */ 3369 MONGOLIAN, 3370 3371 /** 3372 * Unicode script "Hiragana". 3373 */ 3374 HIRAGANA, 3375 3376 /** 3377 * Unicode script "Katakana". 3378 */ 3379 KATAKANA, 3380 3381 /** 3382 * Unicode script "Bopomofo". 3383 */ 3384 BOPOMOFO, 3385 3386 /** 3387 * Unicode script "Han". 3388 */ 3389 HAN, 3390 3391 /** 3392 * Unicode script "Yi". 3393 */ 3394 YI, 3395 3396 /** 3397 * Unicode script "Old_Italic". 3398 */ 3399 OLD_ITALIC, 3400 3401 /** 3402 * Unicode script "Gothic". 3403 */ 3404 GOTHIC, 3405 3406 /** 3407 * Unicode script "Deseret". 3408 */ 3409 DESERET, 3410 3411 /** 3412 * Unicode script "Inherited". 3413 */ 3414 INHERITED, 3415 3416 /** 3417 * Unicode script "Tagalog". 3418 */ 3419 TAGALOG, 3420 3421 /** 3422 * Unicode script "Hanunoo". 3423 */ 3424 HANUNOO, 3425 3426 /** 3427 * Unicode script "Buhid". 3428 */ 3429 BUHID, 3430 3431 /** 3432 * Unicode script "Tagbanwa". 3433 */ 3434 TAGBANWA, 3435 3436 /** 3437 * Unicode script "Limbu". 3438 */ 3439 LIMBU, 3440 3441 /** 3442 * Unicode script "Tai_Le". 3443 */ 3444 TAI_LE, 3445 3446 /** 3447 * Unicode script "Linear_B". 3448 */ 3449 LINEAR_B, 3450 3451 /** 3452 * Unicode script "Ugaritic". 3453 */ 3454 UGARITIC, 3455 3456 /** 3457 * Unicode script "Shavian". 3458 */ 3459 SHAVIAN, 3460 3461 /** 3462 * Unicode script "Osmanya". 3463 */ 3464 OSMANYA, 3465 3466 /** 3467 * Unicode script "Cypriot". 3468 */ 3469 CYPRIOT, 3470 3471 /** 3472 * Unicode script "Braille". 3473 */ 3474 BRAILLE, 3475 3476 /** 3477 * Unicode script "Buginese". 3478 */ 3479 BUGINESE, 3480 3481 /** 3482 * Unicode script "Coptic". 3483 */ 3484 COPTIC, 3485 3486 /** 3487 * Unicode script "New_Tai_Lue". 3488 */ 3489 NEW_TAI_LUE, 3490 3491 /** 3492 * Unicode script "Glagolitic". 3493 */ 3494 GLAGOLITIC, 3495 3496 /** 3497 * Unicode script "Tifinagh". 3498 */ 3499 TIFINAGH, 3500 3501 /** 3502 * Unicode script "Syloti_Nagri". 3503 */ 3504 SYLOTI_NAGRI, 3505 3506 /** 3507 * Unicode script "Old_Persian". 3508 */ 3509 OLD_PERSIAN, 3510 3511 /** 3512 * Unicode script "Kharoshthi". 3513 */ 3514 KHAROSHTHI, 3515 3516 /** 3517 * Unicode script "Balinese". 3518 */ 3519 BALINESE, 3520 3521 /** 3522 * Unicode script "Cuneiform". 3523 */ 3524 CUNEIFORM, 3525 3526 /** 3527 * Unicode script "Phoenician". 3528 */ 3529 PHOENICIAN, 3530 3531 /** 3532 * Unicode script "Phags_Pa". 3533 */ 3534 PHAGS_PA, 3535 3536 /** 3537 * Unicode script "Nko". 3538 */ 3539 NKO, 3540 3541 /** 3542 * Unicode script "Sundanese". 3543 */ 3544 SUNDANESE, 3545 3546 /** 3547 * Unicode script "Batak". 3548 */ 3549 BATAK, 3550 3551 /** 3552 * Unicode script "Lepcha". 3553 */ 3554 LEPCHA, 3555 3556 /** 3557 * Unicode script "Ol_Chiki". 3558 */ 3559 OL_CHIKI, 3560 3561 /** 3562 * Unicode script "Vai". 3563 */ 3564 VAI, 3565 3566 /** 3567 * Unicode script "Saurashtra". 3568 */ 3569 SAURASHTRA, 3570 3571 /** 3572 * Unicode script "Kayah_Li". 3573 */ 3574 KAYAH_LI, 3575 3576 /** 3577 * Unicode script "Rejang". 3578 */ 3579 REJANG, 3580 3581 /** 3582 * Unicode script "Lycian". 3583 */ 3584 LYCIAN, 3585 3586 /** 3587 * Unicode script "Carian". 3588 */ 3589 CARIAN, 3590 3591 /** 3592 * Unicode script "Lydian". 3593 */ 3594 LYDIAN, 3595 3596 /** 3597 * Unicode script "Cham". 3598 */ 3599 CHAM, 3600 3601 /** 3602 * Unicode script "Tai_Tham". 3603 */ 3604 TAI_THAM, 3605 3606 /** 3607 * Unicode script "Tai_Viet". 3608 */ 3609 TAI_VIET, 3610 3611 /** 3612 * Unicode script "Avestan". 3613 */ 3614 AVESTAN, 3615 3616 /** 3617 * Unicode script "Egyptian_Hieroglyphs". 3618 */ 3619 EGYPTIAN_HIEROGLYPHS, 3620 3621 /** 3622 * Unicode script "Samaritan". 3623 */ 3624 SAMARITAN, 3625 3626 /** 3627 * Unicode script "Mandaic". 3628 */ 3629 MANDAIC, 3630 3631 /** 3632 * Unicode script "Lisu". 3633 */ 3634 LISU, 3635 3636 /** 3637 * Unicode script "Bamum". 3638 */ 3639 BAMUM, 3640 3641 /** 3642 * Unicode script "Javanese". 3643 */ 3644 JAVANESE, 3645 3646 /** 3647 * Unicode script "Meetei_Mayek". 3648 */ 3649 MEETEI_MAYEK, 3650 3651 /** 3652 * Unicode script "Imperial_Aramaic". 3653 */ 3654 IMPERIAL_ARAMAIC, 3655 3656 /** 3657 * Unicode script "Old_South_Arabian". 3658 */ 3659 OLD_SOUTH_ARABIAN, 3660 3661 /** 3662 * Unicode script "Inscriptional_Parthian". 3663 */ 3664 INSCRIPTIONAL_PARTHIAN, 3665 3666 /** 3667 * Unicode script "Inscriptional_Pahlavi". 3668 */ 3669 INSCRIPTIONAL_PAHLAVI, 3670 3671 /** 3672 * Unicode script "Old_Turkic". 3673 */ 3674 OLD_TURKIC, 3675 3676 /** 3677 * Unicode script "Brahmi". 3678 */ 3679 BRAHMI, 3680 3681 /** 3682 * Unicode script "Kaithi". 3683 */ 3684 KAITHI, 3685 3686 /** 3687 * Unicode script "Meroitic Hieroglyphs". 3688 */ 3689 MEROITIC_HIEROGLYPHS, 3690 3691 /** 3692 * Unicode script "Meroitic Cursive". 3693 */ 3694 MEROITIC_CURSIVE, 3695 3696 /** 3697 * Unicode script "Sora Sompeng". 3698 */ 3699 SORA_SOMPENG, 3700 3701 /** 3702 * Unicode script "Chakma". 3703 */ 3704 CHAKMA, 3705 3706 /** 3707 * Unicode script "Sharada". 3708 */ 3709 SHARADA, 3710 3711 /** 3712 * Unicode script "Takri". 3713 */ 3714 TAKRI, 3715 3716 /** 3717 * Unicode script "Miao". 3718 */ 3719 MIAO, 3720 3721 /** 3722 * Unicode script "Unknown". 3723 */ 3724 UNKNOWN; 3725 3726 private static final int[] scriptStarts = { 3727 0x0000, // 0000..0040; COMMON 3728 0x0041, // 0041..005A; LATIN 3729 0x005B, // 005B..0060; COMMON 3730 0x0061, // 0061..007A; LATIN 3731 0x007B, // 007B..00A9; COMMON 3732 0x00AA, // 00AA..00AA; LATIN 3733 0x00AB, // 00AB..00B9; COMMON 3734 0x00BA, // 00BA..00BA; LATIN 3735 0x00BB, // 00BB..00BF; COMMON 3736 0x00C0, // 00C0..00D6; LATIN 3737 0x00D7, // 00D7..00D7; COMMON 3738 0x00D8, // 00D8..00F6; LATIN 3739 0x00F7, // 00F7..00F7; COMMON 3740 0x00F8, // 00F8..02B8; LATIN 3741 0x02B9, // 02B9..02DF; COMMON 3742 0x02E0, // 02E0..02E4; LATIN 3743 0x02E5, // 02E5..02E9; COMMON 3744 0x02EA, // 02EA..02EB; BOPOMOFO 3745 0x02EC, // 02EC..02FF; COMMON 3746 0x0300, // 0300..036F; INHERITED 3747 0x0370, // 0370..0373; GREEK 3748 0x0374, // 0374..0374; COMMON 3749 0x0375, // 0375..037D; GREEK 3750 0x037E, // 037E..0383; COMMON 3751 0x0384, // 0384..0384; GREEK 3752 0x0385, // 0385..0385; COMMON 3753 0x0386, // 0386..0386; GREEK 3754 0x0387, // 0387..0387; COMMON 3755 0x0388, // 0388..03E1; GREEK 3756 0x03E2, // 03E2..03EF; COPTIC 3757 0x03F0, // 03F0..03FF; GREEK 3758 0x0400, // 0400..0484; CYRILLIC 3759 0x0485, // 0485..0486; INHERITED 3760 0x0487, // 0487..0530; CYRILLIC 3761 0x0531, // 0531..0588; ARMENIAN 3762 0x0589, // 0589..0589; COMMON 3763 0x058A, // 058A..0590; ARMENIAN 3764 0x0591, // 0591..05FF; HEBREW 3765 0x0600, // 0600..060B; ARABIC 3766 0x060C, // 060C..060C; COMMON 3767 0x060D, // 060D..061A; ARABIC 3768 0x061B, // 061B..061D; COMMON 3769 0x061E, // 061E..061E; ARABIC 3770 0x061F, // 061F..061F; COMMON 3771 0x0620, // 0620..063F; ARABIC 3772 0x0640, // 0640..0640; COMMON 3773 0x0641, // 0641..064A; ARABIC 3774 0x064B, // 064B..0655; INHERITED 3775 0x0656, // 0656..065F; ARABIC 3776 0x0660, // 0660..0669; COMMON 3777 0x066A, // 066A..066F; ARABIC 3778 0x0670, // 0670..0670; INHERITED 3779 0x0671, // 0671..06DC; ARABIC 3780 0x06DD, // 06DD..06DD; COMMON 3781 0x06DE, // 06DE..06FF; ARABIC 3782 0x0700, // 0700..074F; SYRIAC 3783 0x0750, // 0750..077F; ARABIC 3784 0x0780, // 0780..07BF; THAANA 3785 0x07C0, // 07C0..07FF; NKO 3786 0x0800, // 0800..083F; SAMARITAN 3787 0x0840, // 0840..089F; MANDAIC 3788 0x08A0, // 08A0..08FF; ARABIC 3789 0x0900, // 0900..0950; DEVANAGARI 3790 0x0951, // 0951..0952; INHERITED 3791 0x0953, // 0953..0963; DEVANAGARI 3792 0x0964, // 0964..0965; COMMON 3793 0x0966, // 0966..0980; DEVANAGARI 3794 0x0981, // 0981..0A00; BENGALI 3795 0x0A01, // 0A01..0A80; GURMUKHI 3796 0x0A81, // 0A81..0B00; GUJARATI 3797 0x0B01, // 0B01..0B81; ORIYA 3798 0x0B82, // 0B82..0C00; TAMIL 3799 0x0C01, // 0C01..0C81; TELUGU 3800 0x0C82, // 0C82..0CF0; KANNADA 3801 0x0D02, // 0D02..0D81; MALAYALAM 3802 0x0D82, // 0D82..0E00; SINHALA 3803 0x0E01, // 0E01..0E3E; THAI 3804 0x0E3F, // 0E3F..0E3F; COMMON 3805 0x0E40, // 0E40..0E80; THAI 3806 0x0E81, // 0E81..0EFF; LAO 3807 0x0F00, // 0F00..0FD4; TIBETAN 3808 0x0FD5, // 0FD5..0FD8; COMMON 3809 0x0FD9, // 0FD9..0FFF; TIBETAN 3810 0x1000, // 1000..109F; MYANMAR 3811 0x10A0, // 10A0..10FA; GEORGIAN 3812 0x10FB, // 10FB..10FB; COMMON 3813 0x10FC, // 10FC..10FF; GEORGIAN 3814 0x1100, // 1100..11FF; HANGUL 3815 0x1200, // 1200..139F; ETHIOPIC 3816 0x13A0, // 13A0..13FF; CHEROKEE 3817 0x1400, // 1400..167F; CANADIAN_ABORIGINAL 3818 0x1680, // 1680..169F; OGHAM 3819 0x16A0, // 16A0..16EA; RUNIC 3820 0x16EB, // 16EB..16ED; COMMON 3821 0x16EE, // 16EE..16FF; RUNIC 3822 0x1700, // 1700..171F; TAGALOG 3823 0x1720, // 1720..1734; HANUNOO 3824 0x1735, // 1735..173F; COMMON 3825 0x1740, // 1740..175F; BUHID 3826 0x1760, // 1760..177F; TAGBANWA 3827 0x1780, // 1780..17FF; KHMER 3828 0x1800, // 1800..1801; MONGOLIAN 3829 0x1802, // 1802..1803; COMMON 3830 0x1804, // 1804..1804; MONGOLIAN 3831 0x1805, // 1805..1805; COMMON 3832 0x1806, // 1806..18AF; MONGOLIAN 3833 0x18B0, // 18B0..18FF; CANADIAN_ABORIGINAL 3834 0x1900, // 1900..194F; LIMBU 3835 0x1950, // 1950..197F; TAI_LE 3836 0x1980, // 1980..19DF; NEW_TAI_LUE 3837 0x19E0, // 19E0..19FF; KHMER 3838 0x1A00, // 1A00..1A1F; BUGINESE 3839 0x1A20, // 1A20..1AFF; TAI_THAM 3840 0x1B00, // 1B00..1B7F; BALINESE 3841 0x1B80, // 1B80..1BBF; SUNDANESE 3842 0x1BC0, // 1BC0..1BFF; BATAK 3843 0x1C00, // 1C00..1C4F; LEPCHA 3844 0x1C50, // 1C50..1CBF; OL_CHIKI 3845 0x1CC0, // 1CC0..1CCF; SUNDANESE 3846 0x1CD0, // 1CD0..1CD2; INHERITED 3847 0x1CD3, // 1CD3..1CD3; COMMON 3848 0x1CD4, // 1CD4..1CE0; INHERITED 3849 0x1CE1, // 1CE1..1CE1; COMMON 3850 0x1CE2, // 1CE2..1CE8; INHERITED 3851 0x1CE9, // 1CE9..1CEC; COMMON 3852 0x1CED, // 1CED..1CED; INHERITED 3853 0x1CEE, // 1CEE..1CF3; COMMON 3854 0x1CF4, // 1CF4..1CF4; INHERITED 3855 0x1CF5, // 1CF5..1CFF; COMMON 3856 0x1D00, // 1D00..1D25; LATIN 3857 0x1D26, // 1D26..1D2A; GREEK 3858 0x1D2B, // 1D2B..1D2B; CYRILLIC 3859 0x1D2C, // 1D2C..1D5C; LATIN 3860 0x1D5D, // 1D5D..1D61; GREEK 3861 0x1D62, // 1D62..1D65; LATIN 3862 0x1D66, // 1D66..1D6A; GREEK 3863 0x1D6B, // 1D6B..1D77; LATIN 3864 0x1D78, // 1D78..1D78; CYRILLIC 3865 0x1D79, // 1D79..1DBE; LATIN 3866 0x1DBF, // 1DBF..1DBF; GREEK 3867 0x1DC0, // 1DC0..1DFF; INHERITED 3868 0x1E00, // 1E00..1EFF; LATIN 3869 0x1F00, // 1F00..1FFF; GREEK 3870 0x2000, // 2000..200B; COMMON 3871 0x200C, // 200C..200D; INHERITED 3872 0x200E, // 200E..2070; COMMON 3873 0x2071, // 2071..2073; LATIN 3874 0x2074, // 2074..207E; COMMON 3875 0x207F, // 207F..207F; LATIN 3876 0x2080, // 2080..208F; COMMON 3877 0x2090, // 2090..209F; LATIN 3878 0x20A0, // 20A0..20CF; COMMON 3879 0x20D0, // 20D0..20FF; INHERITED 3880 0x2100, // 2100..2125; COMMON 3881 0x2126, // 2126..2126; GREEK 3882 0x2127, // 2127..2129; COMMON 3883 0x212A, // 212A..212B; LATIN 3884 0x212C, // 212C..2131; COMMON 3885 0x2132, // 2132..2132; LATIN 3886 0x2133, // 2133..214D; COMMON 3887 0x214E, // 214E..214E; LATIN 3888 0x214F, // 214F..215F; COMMON 3889 0x2160, // 2160..2188; LATIN 3890 0x2189, // 2189..27FF; COMMON 3891 0x2800, // 2800..28FF; BRAILLE 3892 0x2900, // 2900..2BFF; COMMON 3893 0x2C00, // 2C00..2C5F; GLAGOLITIC 3894 0x2C60, // 2C60..2C7F; LATIN 3895 0x2C80, // 2C80..2CFF; COPTIC 3896 0x2D00, // 2D00..2D2F; GEORGIAN 3897 0x2D30, // 2D30..2D7F; TIFINAGH 3898 0x2D80, // 2D80..2DDF; ETHIOPIC 3899 0x2DE0, // 2DE0..2DFF; CYRILLIC 3900 0x2E00, // 2E00..2E7F; COMMON 3901 0x2E80, // 2E80..2FEF; HAN 3902 0x2FF0, // 2FF0..3004; COMMON 3903 0x3005, // 3005..3005; HAN 3904 0x3006, // 3006..3006; COMMON 3905 0x3007, // 3007..3007; HAN 3906 0x3008, // 3008..3020; COMMON 3907 0x3021, // 3021..3029; HAN 3908 0x302A, // 302A..302D; INHERITED 3909 0x302E, // 302E..302F; HANGUL 3910 0x3030, // 3030..3037; COMMON 3911 0x3038, // 3038..303B; HAN 3912 0x303C, // 303C..3040; COMMON 3913 0x3041, // 3041..3098; HIRAGANA 3914 0x3099, // 3099..309A; INHERITED 3915 0x309B, // 309B..309C; COMMON 3916 0x309D, // 309D..309F; HIRAGANA 3917 0x30A0, // 30A0..30A0; COMMON 3918 0x30A1, // 30A1..30FA; KATAKANA 3919 0x30FB, // 30FB..30FC; COMMON 3920 0x30FD, // 30FD..3104; KATAKANA 3921 0x3105, // 3105..3130; BOPOMOFO 3922 0x3131, // 3131..318F; HANGUL 3923 0x3190, // 3190..319F; COMMON 3924 0x31A0, // 31A0..31BF; BOPOMOFO 3925 0x31C0, // 31C0..31EF; COMMON 3926 0x31F0, // 31F0..31FF; KATAKANA 3927 0x3200, // 3200..321F; HANGUL 3928 0x3220, // 3220..325F; COMMON 3929 0x3260, // 3260..327E; HANGUL 3930 0x327F, // 327F..32CF; COMMON 3931 0x32D0, // 32D0..3357; KATAKANA 3932 0x3358, // 3358..33FF; COMMON 3933 0x3400, // 3400..4DBF; HAN 3934 0x4DC0, // 4DC0..4DFF; COMMON 3935 0x4E00, // 4E00..9FFF; HAN 3936 0xA000, // A000..A4CF; YI 3937 0xA4D0, // A4D0..A4FF; LISU 3938 0xA500, // A500..A63F; VAI 3939 0xA640, // A640..A69F; CYRILLIC 3940 0xA6A0, // A6A0..A6FF; BAMUM 3941 0xA700, // A700..A721; COMMON 3942 0xA722, // A722..A787; LATIN 3943 0xA788, // A788..A78A; COMMON 3944 0xA78B, // A78B..A7FF; LATIN 3945 0xA800, // A800..A82F; SYLOTI_NAGRI 3946 0xA830, // A830..A83F; COMMON 3947 0xA840, // A840..A87F; PHAGS_PA 3948 0xA880, // A880..A8DF; SAURASHTRA 3949 0xA8E0, // A8E0..A8FF; DEVANAGARI 3950 0xA900, // A900..A92F; KAYAH_LI 3951 0xA930, // A930..A95F; REJANG 3952 0xA960, // A960..A97F; HANGUL 3953 0xA980, // A980..A9FF; JAVANESE 3954 0xAA00, // AA00..AA5F; CHAM 3955 0xAA60, // AA60..AA7F; MYANMAR 3956 0xAA80, // AA80..AADF; TAI_VIET 3957 0xAAE0, // AAE0..AB00; MEETEI_MAYEK 3958 0xAB01, // AB01..ABBF; ETHIOPIC 3959 0xABC0, // ABC0..ABFF; MEETEI_MAYEK 3960 0xAC00, // AC00..D7FB; HANGUL 3961 0xD7FC, // D7FC..F8FF; UNKNOWN 3962 0xF900, // F900..FAFF; HAN 3963 0xFB00, // FB00..FB12; LATIN 3964 0xFB13, // FB13..FB1C; ARMENIAN 3965 0xFB1D, // FB1D..FB4F; HEBREW 3966 0xFB50, // FB50..FD3D; ARABIC 3967 0xFD3E, // FD3E..FD4F; COMMON 3968 0xFD50, // FD50..FDFC; ARABIC 3969 0xFDFD, // FDFD..FDFF; COMMON 3970 0xFE00, // FE00..FE0F; INHERITED 3971 0xFE10, // FE10..FE1F; COMMON 3972 0xFE20, // FE20..FE2F; INHERITED 3973 0xFE30, // FE30..FE6F; COMMON 3974 0xFE70, // FE70..FEFE; ARABIC 3975 0xFEFF, // FEFF..FF20; COMMON 3976 0xFF21, // FF21..FF3A; LATIN 3977 0xFF3B, // FF3B..FF40; COMMON 3978 0xFF41, // FF41..FF5A; LATIN 3979 0xFF5B, // FF5B..FF65; COMMON 3980 0xFF66, // FF66..FF6F; KATAKANA 3981 0xFF70, // FF70..FF70; COMMON 3982 0xFF71, // FF71..FF9D; KATAKANA 3983 0xFF9E, // FF9E..FF9F; COMMON 3984 0xFFA0, // FFA0..FFDF; HANGUL 3985 0xFFE0, // FFE0..FFFF; COMMON 3986 0x10000, // 10000..100FF; LINEAR_B 3987 0x10100, // 10100..1013F; COMMON 3988 0x10140, // 10140..1018F; GREEK 3989 0x10190, // 10190..101FC; COMMON 3990 0x101FD, // 101FD..1027F; INHERITED 3991 0x10280, // 10280..1029F; LYCIAN 3992 0x102A0, // 102A0..102FF; CARIAN 3993 0x10300, // 10300..1032F; OLD_ITALIC 3994 0x10330, // 10330..1037F; GOTHIC 3995 0x10380, // 10380..1039F; UGARITIC 3996 0x103A0, // 103A0..103FF; OLD_PERSIAN 3997 0x10400, // 10400..1044F; DESERET 3998 0x10450, // 10450..1047F; SHAVIAN 3999 0x10480, // 10480..107FF; OSMANYA 4000 0x10800, // 10800..1083F; CYPRIOT 4001 0x10840, // 10840..108FF; IMPERIAL_ARAMAIC 4002 0x10900, // 10900..1091F; PHOENICIAN 4003 0x10920, // 10920..1097F; LYDIAN 4004 0x10980, // 10980..1099F; MEROITIC_HIEROGLYPHS 4005 0x109A0, // 109A0..109FF; MEROITIC_CURSIVE 4006 0x10A00, // 10A00..10A5F; KHAROSHTHI 4007 0x10A60, // 10A60..10AFF; OLD_SOUTH_ARABIAN 4008 0x10B00, // 10B00..10B3F; AVESTAN 4009 0x10B40, // 10B40..10B5F; INSCRIPTIONAL_PARTHIAN 4010 0x10B60, // 10B60..10BFF; INSCRIPTIONAL_PAHLAVI 4011 0x10C00, // 10C00..10E5F; OLD_TURKIC 4012 0x10E60, // 10E60..10FFF; ARABIC 4013 0x11000, // 11000..1107F; BRAHMI 4014 0x11080, // 11080..110CF; KAITHI 4015 0x110D0, // 110D0..110FF; SORA_SOMPENG 4016 0x11100, // 11100..1117F; CHAKMA 4017 0x11180, // 11180..1167F; SHARADA 4018 0x11680, // 11680..116CF; TAKRI 4019 0x12000, // 12000..12FFF; CUNEIFORM 4020 0x13000, // 13000..167FF; EGYPTIAN_HIEROGLYPHS 4021 0x16800, // 16800..16A38; BAMUM 4022 0x16F00, // 16F00..16F9F; MIAO 4023 0x1B000, // 1B000..1B000; KATAKANA 4024 0x1B001, // 1B001..1CFFF; HIRAGANA 4025 0x1D000, // 1D000..1D166; COMMON 4026 0x1D167, // 1D167..1D169; INHERITED 4027 0x1D16A, // 1D16A..1D17A; COMMON 4028 0x1D17B, // 1D17B..1D182; INHERITED 4029 0x1D183, // 1D183..1D184; COMMON 4030 0x1D185, // 1D185..1D18B; INHERITED 4031 0x1D18C, // 1D18C..1D1A9; COMMON 4032 0x1D1AA, // 1D1AA..1D1AD; INHERITED 4033 0x1D1AE, // 1D1AE..1D1FF; COMMON 4034 0x1D200, // 1D200..1D2FF; GREEK 4035 0x1D300, // 1D300..1EDFF; COMMON 4036 0x1EE00, // 1EE00..1EFFF; ARABIC 4037 0x1F000, // 1F000..1F1FF; COMMON 4038 0x1F200, // 1F200..1F200; HIRAGANA 4039 0x1F201, // 1F210..1FFFF; COMMON 4040 0x20000, // 20000..E0000; HAN 4041 0xE0001, // E0001..E00FF; COMMON 4042 0xE0100, // E0100..E01EF; INHERITED 4043 0xE01F0 // E01F0..10FFFF; UNKNOWN 4044 4045 }; 4046 4047 private static final UnicodeScript[] scripts = { 4048 COMMON, 4049 LATIN, 4050 COMMON, 4051 LATIN, 4052 COMMON, 4053 LATIN, 4054 COMMON, 4055 LATIN, 4056 COMMON, 4057 LATIN, 4058 COMMON, 4059 LATIN, 4060 COMMON, 4061 LATIN, 4062 COMMON, 4063 LATIN, 4064 COMMON, 4065 BOPOMOFO, 4066 COMMON, 4067 INHERITED, 4068 GREEK, 4069 COMMON, 4070 GREEK, 4071 COMMON, 4072 GREEK, 4073 COMMON, 4074 GREEK, 4075 COMMON, 4076 GREEK, 4077 COPTIC, 4078 GREEK, 4079 CYRILLIC, 4080 INHERITED, 4081 CYRILLIC, 4082 ARMENIAN, 4083 COMMON, 4084 ARMENIAN, 4085 HEBREW, 4086 ARABIC, 4087 COMMON, 4088 ARABIC, 4089 COMMON, 4090 ARABIC, 4091 COMMON, 4092 ARABIC, 4093 COMMON, 4094 ARABIC, 4095 INHERITED, 4096 ARABIC, 4097 COMMON, 4098 ARABIC, 4099 INHERITED, 4100 ARABIC, 4101 COMMON, 4102 ARABIC, 4103 SYRIAC, 4104 ARABIC, 4105 THAANA, 4106 NKO, 4107 SAMARITAN, 4108 MANDAIC, 4109 ARABIC, 4110 DEVANAGARI, 4111 INHERITED, 4112 DEVANAGARI, 4113 COMMON, 4114 DEVANAGARI, 4115 BENGALI, 4116 GURMUKHI, 4117 GUJARATI, 4118 ORIYA, 4119 TAMIL, 4120 TELUGU, 4121 KANNADA, 4122 MALAYALAM, 4123 SINHALA, 4124 THAI, 4125 COMMON, 4126 THAI, 4127 LAO, 4128 TIBETAN, 4129 COMMON, 4130 TIBETAN, 4131 MYANMAR, 4132 GEORGIAN, 4133 COMMON, 4134 GEORGIAN, 4135 HANGUL, 4136 ETHIOPIC, 4137 CHEROKEE, 4138 CANADIAN_ABORIGINAL, 4139 OGHAM, 4140 RUNIC, 4141 COMMON, 4142 RUNIC, 4143 TAGALOG, 4144 HANUNOO, 4145 COMMON, 4146 BUHID, 4147 TAGBANWA, 4148 KHMER, 4149 MONGOLIAN, 4150 COMMON, 4151 MONGOLIAN, 4152 COMMON, 4153 MONGOLIAN, 4154 CANADIAN_ABORIGINAL, 4155 LIMBU, 4156 TAI_LE, 4157 NEW_TAI_LUE, 4158 KHMER, 4159 BUGINESE, 4160 TAI_THAM, 4161 BALINESE, 4162 SUNDANESE, 4163 BATAK, 4164 LEPCHA, 4165 OL_CHIKI, 4166 SUNDANESE, 4167 INHERITED, 4168 COMMON, 4169 INHERITED, 4170 COMMON, 4171 INHERITED, 4172 COMMON, 4173 INHERITED, 4174 COMMON, 4175 INHERITED, 4176 COMMON, 4177 LATIN, 4178 GREEK, 4179 CYRILLIC, 4180 LATIN, 4181 GREEK, 4182 LATIN, 4183 GREEK, 4184 LATIN, 4185 CYRILLIC, 4186 LATIN, 4187 GREEK, 4188 INHERITED, 4189 LATIN, 4190 GREEK, 4191 COMMON, 4192 INHERITED, 4193 COMMON, 4194 LATIN, 4195 COMMON, 4196 LATIN, 4197 COMMON, 4198 LATIN, 4199 COMMON, 4200 INHERITED, 4201 COMMON, 4202 GREEK, 4203 COMMON, 4204 LATIN, 4205 COMMON, 4206 LATIN, 4207 COMMON, 4208 LATIN, 4209 COMMON, 4210 LATIN, 4211 COMMON, 4212 BRAILLE, 4213 COMMON, 4214 GLAGOLITIC, 4215 LATIN, 4216 COPTIC, 4217 GEORGIAN, 4218 TIFINAGH, 4219 ETHIOPIC, 4220 CYRILLIC, 4221 COMMON, 4222 HAN, 4223 COMMON, 4224 HAN, 4225 COMMON, 4226 HAN, 4227 COMMON, 4228 HAN, 4229 INHERITED, 4230 HANGUL, 4231 COMMON, 4232 HAN, 4233 COMMON, 4234 HIRAGANA, 4235 INHERITED, 4236 COMMON, 4237 HIRAGANA, 4238 COMMON, 4239 KATAKANA, 4240 COMMON, 4241 KATAKANA, 4242 BOPOMOFO, 4243 HANGUL, 4244 COMMON, 4245 BOPOMOFO, 4246 COMMON, 4247 KATAKANA, 4248 HANGUL, 4249 COMMON, 4250 HANGUL, 4251 COMMON, 4252 KATAKANA, 4253 COMMON, 4254 HAN, 4255 COMMON, 4256 HAN, 4257 YI, 4258 LISU, 4259 VAI, 4260 CYRILLIC, 4261 BAMUM, 4262 COMMON, 4263 LATIN, 4264 COMMON, 4265 LATIN, 4266 SYLOTI_NAGRI, 4267 COMMON, 4268 PHAGS_PA, 4269 SAURASHTRA, 4270 DEVANAGARI, 4271 KAYAH_LI, 4272 REJANG, 4273 HANGUL, 4274 JAVANESE, 4275 CHAM, 4276 MYANMAR, 4277 TAI_VIET, 4278 MEETEI_MAYEK, 4279 ETHIOPIC, 4280 MEETEI_MAYEK, 4281 HANGUL, 4282 UNKNOWN , 4283 HAN, 4284 LATIN, 4285 ARMENIAN, 4286 HEBREW, 4287 ARABIC, 4288 COMMON, 4289 ARABIC, 4290 COMMON, 4291 INHERITED, 4292 COMMON, 4293 INHERITED, 4294 COMMON, 4295 ARABIC, 4296 COMMON, 4297 LATIN, 4298 COMMON, 4299 LATIN, 4300 COMMON, 4301 KATAKANA, 4302 COMMON, 4303 KATAKANA, 4304 COMMON, 4305 HANGUL, 4306 COMMON, 4307 LINEAR_B, 4308 COMMON, 4309 GREEK, 4310 COMMON, 4311 INHERITED, 4312 LYCIAN, 4313 CARIAN, 4314 OLD_ITALIC, 4315 GOTHIC, 4316 UGARITIC, 4317 OLD_PERSIAN, 4318 DESERET, 4319 SHAVIAN, 4320 OSMANYA, 4321 CYPRIOT, 4322 IMPERIAL_ARAMAIC, 4323 PHOENICIAN, 4324 LYDIAN, 4325 MEROITIC_HIEROGLYPHS, 4326 MEROITIC_CURSIVE, 4327 KHAROSHTHI, 4328 OLD_SOUTH_ARABIAN, 4329 AVESTAN, 4330 INSCRIPTIONAL_PARTHIAN, 4331 INSCRIPTIONAL_PAHLAVI, 4332 OLD_TURKIC, 4333 ARABIC, 4334 BRAHMI, 4335 KAITHI, 4336 SORA_SOMPENG, 4337 CHAKMA, 4338 SHARADA, 4339 TAKRI, 4340 CUNEIFORM, 4341 EGYPTIAN_HIEROGLYPHS, 4342 BAMUM, 4343 MIAO, 4344 KATAKANA, 4345 HIRAGANA, 4346 COMMON, 4347 INHERITED, 4348 COMMON, 4349 INHERITED, 4350 COMMON, 4351 INHERITED, 4352 COMMON, 4353 INHERITED, 4354 COMMON, 4355 GREEK, 4356 COMMON, 4357 ARABIC, 4358 COMMON, 4359 HIRAGANA, 4360 COMMON, 4361 HAN, 4362 COMMON, 4363 INHERITED, 4364 UNKNOWN 4365 }; 4366 4367 private static HashMap<String, Character.UnicodeScript> aliases; 4368 static { 4369 aliases = new HashMap<>(128); 4370 aliases.put("ARAB", ARABIC); 4371 aliases.put("ARMI", IMPERIAL_ARAMAIC); 4372 aliases.put("ARMN", ARMENIAN); 4373 aliases.put("AVST", AVESTAN); 4374 aliases.put("BALI", BALINESE); 4375 aliases.put("BAMU", BAMUM); 4376 aliases.put("BATK", BATAK); 4377 aliases.put("BENG", BENGALI); 4378 aliases.put("BOPO", BOPOMOFO); 4379 aliases.put("BRAI", BRAILLE); 4380 aliases.put("BRAH", BRAHMI); 4381 aliases.put("BUGI", BUGINESE); 4382 aliases.put("BUHD", BUHID); 4383 aliases.put("CAKM", CHAKMA); 4384 aliases.put("CANS", CANADIAN_ABORIGINAL); 4385 aliases.put("CARI", CARIAN); 4386 aliases.put("CHAM", CHAM); 4387 aliases.put("CHER", CHEROKEE); 4388 aliases.put("COPT", COPTIC); 4389 aliases.put("CPRT", CYPRIOT); 4390 aliases.put("CYRL", CYRILLIC); 4391 aliases.put("DEVA", DEVANAGARI); 4392 aliases.put("DSRT", DESERET); 4393 aliases.put("EGYP", EGYPTIAN_HIEROGLYPHS); 4394 aliases.put("ETHI", ETHIOPIC); 4395 aliases.put("GEOR", GEORGIAN); 4396 aliases.put("GLAG", GLAGOLITIC); 4397 aliases.put("GOTH", GOTHIC); 4398 aliases.put("GREK", GREEK); 4399 aliases.put("GUJR", GUJARATI); 4400 aliases.put("GURU", GURMUKHI); 4401 aliases.put("HANG", HANGUL); 4402 aliases.put("HANI", HAN); 4403 aliases.put("HANO", HANUNOO); 4404 aliases.put("HEBR", HEBREW); 4405 aliases.put("HIRA", HIRAGANA); 4406 // it appears we don't have the KATAKANA_OR_HIRAGANA 4407 //aliases.put("HRKT", KATAKANA_OR_HIRAGANA); 4408 aliases.put("ITAL", OLD_ITALIC); 4409 aliases.put("JAVA", JAVANESE); 4410 aliases.put("KALI", KAYAH_LI); 4411 aliases.put("KANA", KATAKANA); 4412 aliases.put("KHAR", KHAROSHTHI); 4413 aliases.put("KHMR", KHMER); 4414 aliases.put("KNDA", KANNADA); 4415 aliases.put("KTHI", KAITHI); 4416 aliases.put("LANA", TAI_THAM); 4417 aliases.put("LAOO", LAO); 4418 aliases.put("LATN", LATIN); 4419 aliases.put("LEPC", LEPCHA); 4420 aliases.put("LIMB", LIMBU); 4421 aliases.put("LINB", LINEAR_B); 4422 aliases.put("LISU", LISU); 4423 aliases.put("LYCI", LYCIAN); 4424 aliases.put("LYDI", LYDIAN); 4425 aliases.put("MAND", MANDAIC); 4426 aliases.put("MERC", MEROITIC_CURSIVE); 4427 aliases.put("MERO", MEROITIC_HIEROGLYPHS); 4428 aliases.put("MLYM", MALAYALAM); 4429 aliases.put("MONG", MONGOLIAN); 4430 aliases.put("MTEI", MEETEI_MAYEK); 4431 aliases.put("MYMR", MYANMAR); 4432 aliases.put("NKOO", NKO); 4433 aliases.put("OGAM", OGHAM); 4434 aliases.put("OLCK", OL_CHIKI); 4435 aliases.put("ORKH", OLD_TURKIC); 4436 aliases.put("ORYA", ORIYA); 4437 aliases.put("OSMA", OSMANYA); 4438 aliases.put("PHAG", PHAGS_PA); 4439 aliases.put("PLRD", MIAO); 4440 aliases.put("PHLI", INSCRIPTIONAL_PAHLAVI); 4441 aliases.put("PHNX", PHOENICIAN); 4442 aliases.put("PRTI", INSCRIPTIONAL_PARTHIAN); 4443 aliases.put("RJNG", REJANG); 4444 aliases.put("RUNR", RUNIC); 4445 aliases.put("SAMR", SAMARITAN); 4446 aliases.put("SARB", OLD_SOUTH_ARABIAN); 4447 aliases.put("SAUR", SAURASHTRA); 4448 aliases.put("SHAW", SHAVIAN); 4449 aliases.put("SHRD", SHARADA); 4450 aliases.put("SINH", SINHALA); 4451 aliases.put("SORA", SORA_SOMPENG); 4452 aliases.put("SUND", SUNDANESE); 4453 aliases.put("SYLO", SYLOTI_NAGRI); 4454 aliases.put("SYRC", SYRIAC); 4455 aliases.put("TAGB", TAGBANWA); 4456 aliases.put("TALE", TAI_LE); 4457 aliases.put("TAKR", TAKRI); 4458 aliases.put("TALU", NEW_TAI_LUE); 4459 aliases.put("TAML", TAMIL); 4460 aliases.put("TAVT", TAI_VIET); 4461 aliases.put("TELU", TELUGU); 4462 aliases.put("TFNG", TIFINAGH); 4463 aliases.put("TGLG", TAGALOG); 4464 aliases.put("THAA", THAANA); 4465 aliases.put("THAI", THAI); 4466 aliases.put("TIBT", TIBETAN); 4467 aliases.put("UGAR", UGARITIC); 4468 aliases.put("VAII", VAI); 4469 aliases.put("XPEO", OLD_PERSIAN); 4470 aliases.put("XSUX", CUNEIFORM); 4471 aliases.put("YIII", YI); 4472 aliases.put("ZINH", INHERITED); 4473 aliases.put("ZYYY", COMMON); 4474 aliases.put("ZZZZ", UNKNOWN); 4475 } 4476 4477 /** 4478 * Returns the enum constant representing the Unicode script of which 4479 * the given character (Unicode code point) is assigned to. 4480 * 4481 * @param codePoint the character (Unicode code point) in question. 4482 * @return The {@code UnicodeScript} constant representing the 4483 * Unicode script of which this character is assigned to. 4484 * 4485 * @exception IllegalArgumentException if the specified 4486 * {@code codePoint} is an invalid Unicode code point. 4487 * @see Character#isValidCodePoint(int) 4488 * 4489 */ 4490 public static UnicodeScript of(int codePoint) { 4491 if (!isValidCodePoint(codePoint)) 4492 throw new IllegalArgumentException(); 4493 int type = getType(codePoint); 4494 // leave SURROGATE and PRIVATE_USE for table lookup 4495 if (type == UNASSIGNED) 4496 return UNKNOWN; 4497 int index = Arrays.binarySearch(scriptStarts, codePoint); 4498 if (index < 0) 4499 index = -index - 2; 4500 return scripts[index]; 4501 } 4502 4503 /** 4504 * Returns the UnicodeScript constant with the given Unicode script 4505 * name or the script name alias. Script names and their aliases are 4506 * determined by The Unicode Standard. The files Scripts<version>.txt 4507 * and PropertyValueAliases<version>.txt define script names 4508 * and the script name aliases for a particular version of the 4509 * standard. The {@link Character} class specifies the version of 4510 * the standard that it supports. 4511 * <p> 4512 * Character case is ignored for all of the valid script names. 4513 * The en_US locale's case mapping rules are used to provide 4514 * case-insensitive string comparisons for script name validation. 4515 * <p> 4516 * 4517 * @param scriptName A {@code UnicodeScript} name. 4518 * @return The {@code UnicodeScript} constant identified 4519 * by {@code scriptName} 4520 * @throws IllegalArgumentException if {@code scriptName} is an 4521 * invalid name 4522 * @throws NullPointerException if {@code scriptName} is null 4523 */ 4524 public static final UnicodeScript forName(String scriptName) { 4525 scriptName = scriptName.toUpperCase(Locale.ENGLISH); 4526 //.replace(' ', '_')); 4527 UnicodeScript sc = aliases.get(scriptName); 4528 if (sc != null) 4529 return sc; 4530 return valueOf(scriptName); 4531 } 4532 } 4533 4534 /** 4535 * The value of the {@code Character}. 4536 * 4537 * @serial 4538 */ 4539 private final char value; 4540 4541 /** use serialVersionUID from JDK 1.0.2 for interoperability */ 4542 private static final long serialVersionUID = 3786198910865385080L; 4543 4544 /** 4545 * Constructs a newly allocated {@code Character} object that 4546 * represents the specified {@code char} value. 4547 * 4548 * @param value the value to be represented by the 4549 * {@code Character} object. 4550 */ 4551 public Character(char value) { 4552 this.value = value; 4553 } 4554 4555 private static class CharacterCache { 4556 private CharacterCache(){} 4557 4558 static final Character cache[] = new Character[127 + 1]; 4559 4560 static { 4561 for (int i = 0; i < cache.length; i++) 4562 cache[i] = new Character((char)i); 4563 } 4564 } 4565 4566 /** 4567 * Returns a <tt>Character</tt> instance representing the specified 4568 * <tt>char</tt> value. 4569 * If a new <tt>Character</tt> instance is not required, this method 4570 * should generally be used in preference to the constructor 4571 * {@link #Character(char)}, as this method is likely to yield 4572 * significantly better space and time performance by caching 4573 * frequently requested values. 4574 * 4575 * This method will always cache values in the range {@code 4576 * '\u005Cu0000'} to {@code '\u005Cu007F'}, inclusive, and may 4577 * cache other values outside of this range. 4578 * 4579 * @param c a char value. 4580 * @return a <tt>Character</tt> instance representing <tt>c</tt>. 4581 * @since 1.5 4582 */ 4583 public static Character valueOf(char c) { 4584 if (c <= 127) { // must cache 4585 return CharacterCache.cache[(int)c]; 4586 } 4587 return new Character(c); 4588 } 4589 4590 /** 4591 * Returns the value of this {@code Character} object. 4592 * @return the primitive {@code char} value represented by 4593 * this object. 4594 */ 4595 public char charValue() { 4596 return value; 4597 } 4598 4599 /** 4600 * Returns a hash code for this {@code Character}; equal to the result 4601 * of invoking {@code charValue()}. 4602 * 4603 * @return a hash code value for this {@code Character} 4604 */ 4605 @Override 4606 public int hashCode() { 4607 return Character.hashCode(value); 4608 } 4609 4610 /** 4611 * Returns a hash code for a {@code char} value; compatible with 4612 * {@code Character.hashCode()}. 4613 * 4614 * @since 1.8 4615 * 4616 * @param value The {@code char} for which to return a hash code. 4617 * @return a hash code value for a {@code char} value. 4618 */ 4619 public static int hashCode(char value) { 4620 return (int)value; 4621 } 4622 4623 /** 4624 * Compares this object against the specified object. 4625 * The result is {@code true} if and only if the argument is not 4626 * {@code null} and is a {@code Character} object that 4627 * represents the same {@code char} value as this object. 4628 * 4629 * @param obj the object to compare with. 4630 * @return {@code true} if the objects are the same; 4631 * {@code false} otherwise. 4632 */ 4633 public boolean equals(Object obj) { 4634 if (obj instanceof Character) { 4635 return value == ((Character)obj).charValue(); 4636 } 4637 return false; 4638 } 4639 4640 /** 4641 * Returns a {@code String} object representing this 4642 * {@code Character}'s value. The result is a string of 4643 * length 1 whose sole component is the primitive 4644 * {@code char} value represented by this 4645 * {@code Character} object. 4646 * 4647 * @return a string representation of this object. 4648 */ 4649 public String toString() { 4650 char buf[] = {value}; 4651 return String.valueOf(buf); 4652 } 4653 4654 /** 4655 * Returns a {@code String} object representing the 4656 * specified {@code char}. The result is a string of length 4657 * 1 consisting solely of the specified {@code char}. 4658 * 4659 * @param c the {@code char} to be converted 4660 * @return the string representation of the specified {@code char} 4661 * @since 1.4 4662 */ 4663 public static String toString(char c) { 4664 return String.valueOf(c); 4665 } 4666 4667 /** 4668 * Determines whether the specified code point is a valid 4669 * <a href="http://www.unicode.org/glossary/#code_point"> 4670 * Unicode code point value</a>. 4671 * 4672 * @param codePoint the Unicode code point to be tested 4673 * @return {@code true} if the specified code point value is between 4674 * {@link #MIN_CODE_POINT} and 4675 * {@link #MAX_CODE_POINT} inclusive; 4676 * {@code false} otherwise. 4677 * @since 1.5 4678 */ 4679 public static boolean isValidCodePoint(int codePoint) { 4680 // Optimized form of: 4681 // codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT 4682 int plane = codePoint >>> 16; 4683 return plane < ((MAX_CODE_POINT + 1) >>> 16); 4684 } 4685 4686 /** 4687 * Determines whether the specified character (Unicode code point) 4688 * is in the <a href="#BMP">Basic Multilingual Plane (BMP)</a>. 4689 * Such code points can be represented using a single {@code char}. 4690 * 4691 * @param codePoint the character (Unicode code point) to be tested 4692 * @return {@code true} if the specified code point is between 4693 * {@link #MIN_VALUE} and {@link #MAX_VALUE} inclusive; 4694 * {@code false} otherwise. 4695 * @since 1.7 4696 */ 4697 public static boolean isBmpCodePoint(int codePoint) { 4698 return codePoint >>> 16 == 0; 4699 // Optimized form of: 4700 // codePoint >= MIN_VALUE && codePoint <= MAX_VALUE 4701 // We consistently use logical shift (>>>) to facilitate 4702 // additional runtime optimizations. 4703 } 4704 4705 /** 4706 * Determines whether the specified character (Unicode code point) 4707 * is in the <a href="#supplementary">supplementary character</a> range. 4708 * 4709 * @param codePoint the character (Unicode code point) to be tested 4710 * @return {@code true} if the specified code point is between 4711 * {@link #MIN_SUPPLEMENTARY_CODE_POINT} and 4712 * {@link #MAX_CODE_POINT} inclusive; 4713 * {@code false} otherwise. 4714 * @since 1.5 4715 */ 4716 public static boolean isSupplementaryCodePoint(int codePoint) { 4717 return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT 4718 && codePoint < MAX_CODE_POINT + 1; 4719 } 4720 4721 /** 4722 * Determines if the given {@code char} value is a 4723 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> 4724 * Unicode high-surrogate code unit</a> 4725 * (also known as <i>leading-surrogate code unit</i>). 4726 * 4727 * <p>Such values do not represent characters by themselves, 4728 * but are used in the representation of 4729 * <a href="#supplementary">supplementary characters</a> 4730 * in the UTF-16 encoding. 4731 * 4732 * @param ch the {@code char} value to be tested. 4733 * @return {@code true} if the {@code char} value is between 4734 * {@link #MIN_HIGH_SURROGATE} and 4735 * {@link #MAX_HIGH_SURROGATE} inclusive; 4736 * {@code false} otherwise. 4737 * @see Character#isLowSurrogate(char) 4738 * @see Character.UnicodeBlock#of(int) 4739 * @since 1.5 4740 */ 4741 public static boolean isHighSurrogate(char ch) { 4742 // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE 4743 return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1); 4744 } 4745 4746 /** 4747 * Determines if the given {@code char} value is a 4748 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> 4749 * Unicode low-surrogate code unit</a> 4750 * (also known as <i>trailing-surrogate code unit</i>). 4751 * 4752 * <p>Such values do not represent characters by themselves, 4753 * but are used in the representation of 4754 * <a href="#supplementary">supplementary characters</a> 4755 * in the UTF-16 encoding. 4756 * 4757 * @param ch the {@code char} value to be tested. 4758 * @return {@code true} if the {@code char} value is between 4759 * {@link #MIN_LOW_SURROGATE} and 4760 * {@link #MAX_LOW_SURROGATE} inclusive; 4761 * {@code false} otherwise. 4762 * @see Character#isHighSurrogate(char) 4763 * @since 1.5 4764 */ 4765 public static boolean isLowSurrogate(char ch) { 4766 return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1); 4767 } 4768 4769 /** 4770 * Determines if the given {@code char} value is a Unicode 4771 * <i>surrogate code unit</i>. 4772 * 4773 * <p>Such values do not represent characters by themselves, 4774 * but are used in the representation of 4775 * <a href="#supplementary">supplementary characters</a> 4776 * in the UTF-16 encoding. 4777 * 4778 * <p>A char value is a surrogate code unit if and only if it is either 4779 * a {@linkplain #isLowSurrogate(char) low-surrogate code unit} or 4780 * a {@linkplain #isHighSurrogate(char) high-surrogate code unit}. 4781 * 4782 * @param ch the {@code char} value to be tested. 4783 * @return {@code true} if the {@code char} value is between 4784 * {@link #MIN_SURROGATE} and 4785 * {@link #MAX_SURROGATE} inclusive; 4786 * {@code false} otherwise. 4787 * @since 1.7 4788 */ 4789 public static boolean isSurrogate(char ch) { 4790 return ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1); 4791 } 4792 4793 /** 4794 * Determines whether the specified pair of {@code char} 4795 * values is a valid 4796 * <a href="http://www.unicode.org/glossary/#surrogate_pair"> 4797 * Unicode surrogate pair</a>. 4798 4799 * <p>This method is equivalent to the expression: 4800 * <blockquote><pre>{@code 4801 * isHighSurrogate(high) && isLowSurrogate(low) 4802 * }</pre></blockquote> 4803 * 4804 * @param high the high-surrogate code value to be tested 4805 * @param low the low-surrogate code value to be tested 4806 * @return {@code true} if the specified high and 4807 * low-surrogate code values represent a valid surrogate pair; 4808 * {@code false} otherwise. 4809 * @since 1.5 4810 */ 4811 public static boolean isSurrogatePair(char high, char low) { 4812 return isHighSurrogate(high) && isLowSurrogate(low); 4813 } 4814 4815 /** 4816 * Determines the number of {@code char} values needed to 4817 * represent the specified character (Unicode code point). If the 4818 * specified character is equal to or greater than 0x10000, then 4819 * the method returns 2. Otherwise, the method returns 1. 4820 * 4821 * <p>This method doesn't validate the specified character to be a 4822 * valid Unicode code point. The caller must validate the 4823 * character value using {@link #isValidCodePoint(int) isValidCodePoint} 4824 * if necessary. 4825 * 4826 * @param codePoint the character (Unicode code point) to be tested. 4827 * @return 2 if the character is a valid supplementary character; 1 otherwise. 4828 * @see Character#isSupplementaryCodePoint(int) 4829 * @since 1.5 4830 */ 4831 public static int charCount(int codePoint) { 4832 return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1; 4833 } 4834 4835 /** 4836 * Converts the specified surrogate pair to its supplementary code 4837 * point value. This method does not validate the specified 4838 * surrogate pair. The caller must validate it using {@link 4839 * #isSurrogatePair(char, char) isSurrogatePair} if necessary. 4840 * 4841 * @param high the high-surrogate code unit 4842 * @param low the low-surrogate code unit 4843 * @return the supplementary code point composed from the 4844 * specified surrogate pair. 4845 * @since 1.5 4846 */ 4847 public static int toCodePoint(char high, char low) { 4848 // Optimized form of: 4849 // return ((high - MIN_HIGH_SURROGATE) << 10) 4850 // + (low - MIN_LOW_SURROGATE) 4851 // + MIN_SUPPLEMENTARY_CODE_POINT; 4852 return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT 4853 - (MIN_HIGH_SURROGATE << 10) 4854 - MIN_LOW_SURROGATE); 4855 } 4856 4857 /** 4858 * Returns the code point at the given index of the 4859 * {@code CharSequence}. If the {@code char} value at 4860 * the given index in the {@code CharSequence} is in the 4861 * high-surrogate range, the following index is less than the 4862 * length of the {@code CharSequence}, and the 4863 * {@code char} value at the following index is in the 4864 * low-surrogate range, then the supplementary code point 4865 * corresponding to this surrogate pair is returned. Otherwise, 4866 * the {@code char} value at the given index is returned. 4867 * 4868 * @param seq a sequence of {@code char} values (Unicode code 4869 * units) 4870 * @param index the index to the {@code char} values (Unicode 4871 * code units) in {@code seq} to be converted 4872 * @return the Unicode code point at the given index 4873 * @exception NullPointerException if {@code seq} is null. 4874 * @exception IndexOutOfBoundsException if the value 4875 * {@code index} is negative or not less than 4876 * {@link CharSequence#length() seq.length()}. 4877 * @since 1.5 4878 */ 4879 public static int codePointAt(CharSequence seq, int index) { 4880 char c1 = seq.charAt(index); 4881 if (isHighSurrogate(c1) && ++index < seq.length()) { 4882 char c2 = seq.charAt(index); 4883 if (isLowSurrogate(c2)) { 4884 return toCodePoint(c1, c2); 4885 } 4886 } 4887 return c1; 4888 } 4889 4890 /** 4891 * Returns the code point at the given index of the 4892 * {@code char} array. If the {@code char} value at 4893 * the given index in the {@code char} array is in the 4894 * high-surrogate range, the following index is less than the 4895 * length of the {@code char} array, and the 4896 * {@code char} value at the following index is in the 4897 * low-surrogate range, then the supplementary code point 4898 * corresponding to this surrogate pair is returned. Otherwise, 4899 * the {@code char} value at the given index is returned. 4900 * 4901 * @param a the {@code char} array 4902 * @param index the index to the {@code char} values (Unicode 4903 * code units) in the {@code char} array to be converted 4904 * @return the Unicode code point at the given index 4905 * @exception NullPointerException if {@code a} is null. 4906 * @exception IndexOutOfBoundsException if the value 4907 * {@code index} is negative or not less than 4908 * the length of the {@code char} array. 4909 * @since 1.5 4910 */ 4911 public static int codePointAt(char[] a, int index) { 4912 return codePointAtImpl(a, index, a.length); 4913 } 4914 4915 /** 4916 * Returns the code point at the given index of the 4917 * {@code char} array, where only array elements with 4918 * {@code index} less than {@code limit} can be used. If 4919 * the {@code char} value at the given index in the 4920 * {@code char} array is in the high-surrogate range, the 4921 * following index is less than the {@code limit}, and the 4922 * {@code char} value at the following index is in the 4923 * low-surrogate range, then the supplementary code point 4924 * corresponding to this surrogate pair is returned. Otherwise, 4925 * the {@code char} value at the given index is returned. 4926 * 4927 * @param a the {@code char} array 4928 * @param index the index to the {@code char} values (Unicode 4929 * code units) in the {@code char} array to be converted 4930 * @param limit the index after the last array element that 4931 * can be used in the {@code char} array 4932 * @return the Unicode code point at the given index 4933 * @exception NullPointerException if {@code a} is null. 4934 * @exception IndexOutOfBoundsException if the {@code index} 4935 * argument is negative or not less than the {@code limit} 4936 * argument, or if the {@code limit} argument is negative or 4937 * greater than the length of the {@code char} array. 4938 * @since 1.5 4939 */ 4940 public static int codePointAt(char[] a, int index, int limit) { 4941 if (index >= limit || limit < 0 || limit > a.length) { 4942 throw new IndexOutOfBoundsException(); 4943 } 4944 return codePointAtImpl(a, index, limit); 4945 } 4946 4947 // throws ArrayIndexOutOfBoundsException if index out of bounds 4948 static int codePointAtImpl(char[] a, int index, int limit) { 4949 char c1 = a[index]; 4950 if (isHighSurrogate(c1) && ++index < limit) { 4951 char c2 = a[index]; 4952 if (isLowSurrogate(c2)) { 4953 return toCodePoint(c1, c2); 4954 } 4955 } 4956 return c1; 4957 } 4958 4959 /** 4960 * Returns the code point preceding the given index of the 4961 * {@code CharSequence}. If the {@code char} value at 4962 * {@code (index - 1)} in the {@code CharSequence} is in 4963 * the low-surrogate range, {@code (index - 2)} is not 4964 * negative, and the {@code char} value at {@code (index - 2)} 4965 * in the {@code CharSequence} is in the 4966 * high-surrogate range, then the supplementary code point 4967 * corresponding to this surrogate pair is returned. Otherwise, 4968 * the {@code char} value at {@code (index - 1)} is 4969 * returned. 4970 * 4971 * @param seq the {@code CharSequence} instance 4972 * @param index the index following the code point that should be returned 4973 * @return the Unicode code point value before the given index. 4974 * @exception NullPointerException if {@code seq} is null. 4975 * @exception IndexOutOfBoundsException if the {@code index} 4976 * argument is less than 1 or greater than {@link 4977 * CharSequence#length() seq.length()}. 4978 * @since 1.5 4979 */ 4980 public static int codePointBefore(CharSequence seq, int index) { 4981 char c2 = seq.charAt(--index); 4982 if (isLowSurrogate(c2) && index > 0) { 4983 char c1 = seq.charAt(--index); 4984 if (isHighSurrogate(c1)) { 4985 return toCodePoint(c1, c2); 4986 } 4987 } 4988 return c2; 4989 } 4990 4991 /** 4992 * Returns the code point preceding the given index of the 4993 * {@code char} array. If the {@code char} value at 4994 * {@code (index - 1)} in the {@code char} array is in 4995 * the low-surrogate range, {@code (index - 2)} is not 4996 * negative, and the {@code char} value at {@code (index - 2)} 4997 * in the {@code char} array is in the 4998 * high-surrogate range, then the supplementary code point 4999 * corresponding to this surrogate pair is returned. Otherwise, 5000 * the {@code char} value at {@code (index - 1)} is 5001 * returned. 5002 * 5003 * @param a the {@code char} array 5004 * @param index the index following the code point that should be returned 5005 * @return the Unicode code point value before the given index. 5006 * @exception NullPointerException if {@code a} is null. 5007 * @exception IndexOutOfBoundsException if the {@code index} 5008 * argument is less than 1 or greater than the length of the 5009 * {@code char} array 5010 * @since 1.5 5011 */ 5012 public static int codePointBefore(char[] a, int index) { 5013 return codePointBeforeImpl(a, index, 0); 5014 } 5015 5016 /** 5017 * Returns the code point preceding the given index of the 5018 * {@code char} array, where only array elements with 5019 * {@code index} greater than or equal to {@code start} 5020 * can be used. If the {@code char} value at {@code (index - 1)} 5021 * in the {@code char} array is in the 5022 * low-surrogate range, {@code (index - 2)} is not less than 5023 * {@code start}, and the {@code char} value at 5024 * {@code (index - 2)} in the {@code char} array is in 5025 * the high-surrogate range, then the supplementary code point 5026 * corresponding to this surrogate pair is returned. Otherwise, 5027 * the {@code char} value at {@code (index - 1)} is 5028 * returned. 5029 * 5030 * @param a the {@code char} array 5031 * @param index the index following the code point that should be returned 5032 * @param start the index of the first array element in the 5033 * {@code char} array 5034 * @return the Unicode code point value before the given index. 5035 * @exception NullPointerException if {@code a} is null. 5036 * @exception IndexOutOfBoundsException if the {@code index} 5037 * argument is not greater than the {@code start} argument or 5038 * is greater than the length of the {@code char} array, or 5039 * if the {@code start} argument is negative or not less than 5040 * the length of the {@code char} array. 5041 * @since 1.5 5042 */ 5043 public static int codePointBefore(char[] a, int index, int start) { 5044 if (index <= start || start < 0 || start >= a.length) { 5045 throw new IndexOutOfBoundsException(); 5046 } 5047 return codePointBeforeImpl(a, index, start); 5048 } 5049 5050 // throws ArrayIndexOutOfBoundsException if index-1 out of bounds 5051 static int codePointBeforeImpl(char[] a, int index, int start) { 5052 char c2 = a[--index]; 5053 if (isLowSurrogate(c2) && index > start) { 5054 char c1 = a[--index]; 5055 if (isHighSurrogate(c1)) { 5056 return toCodePoint(c1, c2); 5057 } 5058 } 5059 return c2; 5060 } 5061 5062 /** 5063 * Returns the leading surrogate (a 5064 * <a href="http://www.unicode.org/glossary/#high_surrogate_code_unit"> 5065 * high surrogate code unit</a>) of the 5066 * <a href="http://www.unicode.org/glossary/#surrogate_pair"> 5067 * surrogate pair</a> 5068 * representing the specified supplementary character (Unicode 5069 * code point) in the UTF-16 encoding. If the specified character 5070 * is not a 5071 * <a href="Character.html#supplementary">supplementary character</a>, 5072 * an unspecified {@code char} is returned. 5073 * 5074 * <p>If 5075 * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)} 5076 * is {@code true}, then 5077 * {@link #isHighSurrogate isHighSurrogate}{@code (highSurrogate(x))} and 5078 * {@link #toCodePoint toCodePoint}{@code (highSurrogate(x), }{@link #lowSurrogate lowSurrogate}{@code (x)) == x} 5079 * are also always {@code true}. 5080 * 5081 * @param codePoint a supplementary character (Unicode code point) 5082 * @return the leading surrogate code unit used to represent the 5083 * character in the UTF-16 encoding 5084 * @since 1.7 5085 */ 5086 public static char highSurrogate(int codePoint) { 5087 return (char) ((codePoint >>> 10) 5088 + (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10))); 5089 } 5090 5091 /** 5092 * Returns the trailing surrogate (a 5093 * <a href="http://www.unicode.org/glossary/#low_surrogate_code_unit"> 5094 * low surrogate code unit</a>) of the 5095 * <a href="http://www.unicode.org/glossary/#surrogate_pair"> 5096 * surrogate pair</a> 5097 * representing the specified supplementary character (Unicode 5098 * code point) in the UTF-16 encoding. If the specified character 5099 * is not a 5100 * <a href="Character.html#supplementary">supplementary character</a>, 5101 * an unspecified {@code char} is returned. 5102 * 5103 * <p>If 5104 * {@link #isSupplementaryCodePoint isSupplementaryCodePoint(x)} 5105 * is {@code true}, then 5106 * {@link #isLowSurrogate isLowSurrogate}{@code (lowSurrogate(x))} and 5107 * {@link #toCodePoint toCodePoint}{@code (}{@link #highSurrogate highSurrogate}{@code (x), lowSurrogate(x)) == x} 5108 * are also always {@code true}. 5109 * 5110 * @param codePoint a supplementary character (Unicode code point) 5111 * @return the trailing surrogate code unit used to represent the 5112 * character in the UTF-16 encoding 5113 * @since 1.7 5114 */ 5115 public static char lowSurrogate(int codePoint) { 5116 return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE); 5117 } 5118 5119 /** 5120 * Converts the specified character (Unicode code point) to its 5121 * UTF-16 representation. If the specified code point is a BMP 5122 * (Basic Multilingual Plane or Plane 0) value, the same value is 5123 * stored in {@code dst[dstIndex]}, and 1 is returned. If the 5124 * specified code point is a supplementary character, its 5125 * surrogate values are stored in {@code dst[dstIndex]} 5126 * (high-surrogate) and {@code dst[dstIndex+1]} 5127 * (low-surrogate), and 2 is returned. 5128 * 5129 * @param codePoint the character (Unicode code point) to be converted. 5130 * @param dst an array of {@code char} in which the 5131 * {@code codePoint}'s UTF-16 value is stored. 5132 * @param dstIndex the start index into the {@code dst} 5133 * array where the converted value is stored. 5134 * @return 1 if the code point is a BMP code point, 2 if the 5135 * code point is a supplementary code point. 5136 * @exception IllegalArgumentException if the specified 5137 * {@code codePoint} is not a valid Unicode code point. 5138 * @exception NullPointerException if the specified {@code dst} is null. 5139 * @exception IndexOutOfBoundsException if {@code dstIndex} 5140 * is negative or not less than {@code dst.length}, or if 5141 * {@code dst} at {@code dstIndex} doesn't have enough 5142 * array element(s) to store the resulting {@code char} 5143 * value(s). (If {@code dstIndex} is equal to 5144 * {@code dst.length-1} and the specified 5145 * {@code codePoint} is a supplementary character, the 5146 * high-surrogate value is not stored in 5147 * {@code dst[dstIndex]}.) 5148 * @since 1.5 5149 */ 5150 public static int toChars(int codePoint, char[] dst, int dstIndex) { 5151 if (isBmpCodePoint(codePoint)) { 5152 dst[dstIndex] = (char) codePoint; 5153 return 1; 5154 } else if (isValidCodePoint(codePoint)) { 5155 toSurrogates(codePoint, dst, dstIndex); 5156 return 2; 5157 } else { 5158 throw new IllegalArgumentException(); 5159 } 5160 } 5161 5162 /** 5163 * Converts the specified character (Unicode code point) to its 5164 * UTF-16 representation stored in a {@code char} array. If 5165 * the specified code point is a BMP (Basic Multilingual Plane or 5166 * Plane 0) value, the resulting {@code char} array has 5167 * the same value as {@code codePoint}. If the specified code 5168 * point is a supplementary code point, the resulting 5169 * {@code char} array has the corresponding surrogate pair. 5170 * 5171 * @param codePoint a Unicode code point 5172 * @return a {@code char} array having 5173 * {@code codePoint}'s UTF-16 representation. 5174 * @exception IllegalArgumentException if the specified 5175 * {@code codePoint} is not a valid Unicode code point. 5176 * @since 1.5 5177 */ 5178 public static char[] toChars(int codePoint) { 5179 if (isBmpCodePoint(codePoint)) { 5180 return new char[] { (char) codePoint }; 5181 } else if (isValidCodePoint(codePoint)) { 5182 char[] result = new char[2]; 5183 toSurrogates(codePoint, result, 0); 5184 return result; 5185 } else { 5186 throw new IllegalArgumentException(); 5187 } 5188 } 5189 5190 static void toSurrogates(int codePoint, char[] dst, int index) { 5191 // We write elements "backwards" to guarantee all-or-nothing 5192 dst[index+1] = lowSurrogate(codePoint); 5193 dst[index] = highSurrogate(codePoint); 5194 } 5195 5196 /** 5197 * Returns the number of Unicode code points in the text range of 5198 * the specified char sequence. The text range begins at the 5199 * specified {@code beginIndex} and extends to the 5200 * {@code char} at index {@code endIndex - 1}. Thus the 5201 * length (in {@code char}s) of the text range is 5202 * {@code endIndex-beginIndex}. Unpaired surrogates within 5203 * the text range count as one code point each. 5204 * 5205 * @param seq the char sequence 5206 * @param beginIndex the index to the first {@code char} of 5207 * the text range. 5208 * @param endIndex the index after the last {@code char} of 5209 * the text range. 5210 * @return the number of Unicode code points in the specified text 5211 * range 5212 * @exception NullPointerException if {@code seq} is null. 5213 * @exception IndexOutOfBoundsException if the 5214 * {@code beginIndex} is negative, or {@code endIndex} 5215 * is larger than the length of the given sequence, or 5216 * {@code beginIndex} is larger than {@code endIndex}. 5217 * @since 1.5 5218 */ 5219 public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) { 5220 int length = seq.length(); 5221 if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) { 5222 throw new IndexOutOfBoundsException(); 5223 } 5224 int n = endIndex - beginIndex; 5225 for (int i = beginIndex; i < endIndex; ) { 5226 if (isHighSurrogate(seq.charAt(i++)) && i < endIndex && 5227 isLowSurrogate(seq.charAt(i))) { 5228 n--; 5229 i++; 5230 } 5231 } 5232 return n; 5233 } 5234 5235 /** 5236 * Returns the number of Unicode code points in a subarray of the 5237 * {@code char} array argument. The {@code offset} 5238 * argument is the index of the first {@code char} of the 5239 * subarray and the {@code count} argument specifies the 5240 * length of the subarray in {@code char}s. Unpaired 5241 * surrogates within the subarray count as one code point each. 5242 * 5243 * @param a the {@code char} array 5244 * @param offset the index of the first {@code char} in the 5245 * given {@code char} array 5246 * @param count the length of the subarray in {@code char}s 5247 * @return the number of Unicode code points in the specified subarray 5248 * @exception NullPointerException if {@code a} is null. 5249 * @exception IndexOutOfBoundsException if {@code offset} or 5250 * {@code count} is negative, or if {@code offset + 5251 * count} is larger than the length of the given array. 5252 * @since 1.5 5253 */ 5254 public static int codePointCount(char[] a, int offset, int count) { 5255 if (count > a.length - offset || offset < 0 || count < 0) { 5256 throw new IndexOutOfBoundsException(); 5257 } 5258 return codePointCountImpl(a, offset, count); 5259 } 5260 5261 static int codePointCountImpl(char[] a, int offset, int count) { 5262 int endIndex = offset + count; 5263 int n = count; 5264 for (int i = offset; i < endIndex; ) { 5265 if (isHighSurrogate(a[i++]) && i < endIndex && 5266 isLowSurrogate(a[i])) { 5267 n--; 5268 i++; 5269 } 5270 } 5271 return n; 5272 } 5273 5274 /** 5275 * Returns the index within the given char sequence that is offset 5276 * from the given {@code index} by {@code codePointOffset} 5277 * code points. Unpaired surrogates within the text range given by 5278 * {@code index} and {@code codePointOffset} count as 5279 * one code point each. 5280 * 5281 * @param seq the char sequence 5282 * @param index the index to be offset 5283 * @param codePointOffset the offset in code points 5284 * @return the index within the char sequence 5285 * @exception NullPointerException if {@code seq} is null. 5286 * @exception IndexOutOfBoundsException if {@code index} 5287 * is negative or larger then the length of the char sequence, 5288 * or if {@code codePointOffset} is positive and the 5289 * subsequence starting with {@code index} has fewer than 5290 * {@code codePointOffset} code points, or if 5291 * {@code codePointOffset} is negative and the subsequence 5292 * before {@code index} has fewer than the absolute value 5293 * of {@code codePointOffset} code points. 5294 * @since 1.5 5295 */ 5296 public static int offsetByCodePoints(CharSequence seq, int index, 5297 int codePointOffset) { 5298 int length = seq.length(); 5299 if (index < 0 || index > length) { 5300 throw new IndexOutOfBoundsException(); 5301 } 5302 5303 int x = index; 5304 if (codePointOffset >= 0) { 5305 int i; 5306 for (i = 0; x < length && i < codePointOffset; i++) { 5307 if (isHighSurrogate(seq.charAt(x++)) && x < length && 5308 isLowSurrogate(seq.charAt(x))) { 5309 x++; 5310 } 5311 } 5312 if (i < codePointOffset) { 5313 throw new IndexOutOfBoundsException(); 5314 } 5315 } else { 5316 int i; 5317 for (i = codePointOffset; x > 0 && i < 0; i++) { 5318 if (isLowSurrogate(seq.charAt(--x)) && x > 0 && 5319 isHighSurrogate(seq.charAt(x-1))) { 5320 x--; 5321 } 5322 } 5323 if (i < 0) { 5324 throw new IndexOutOfBoundsException(); 5325 } 5326 } 5327 return x; 5328 } 5329 5330 /** 5331 * Returns the index within the given {@code char} subarray 5332 * that is offset from the given {@code index} by 5333 * {@code codePointOffset} code points. The 5334 * {@code start} and {@code count} arguments specify a 5335 * subarray of the {@code char} array. Unpaired surrogates 5336 * within the text range given by {@code index} and 5337 * {@code codePointOffset} count as one code point each. 5338 * 5339 * @param a the {@code char} array 5340 * @param start the index of the first {@code char} of the 5341 * subarray 5342 * @param count the length of the subarray in {@code char}s 5343 * @param index the index to be offset 5344 * @param codePointOffset the offset in code points 5345 * @return the index within the subarray 5346 * @exception NullPointerException if {@code a} is null. 5347 * @exception IndexOutOfBoundsException 5348 * if {@code start} or {@code count} is negative, 5349 * or if {@code start + count} is larger than the length of 5350 * the given array, 5351 * or if {@code index} is less than {@code start} or 5352 * larger then {@code start + count}, 5353 * or if {@code codePointOffset} is positive and the text range 5354 * starting with {@code index} and ending with {@code start + count - 1} 5355 * has fewer than {@code codePointOffset} code 5356 * points, 5357 * or if {@code codePointOffset} is negative and the text range 5358 * starting with {@code start} and ending with {@code index - 1} 5359 * has fewer than the absolute value of 5360 * {@code codePointOffset} code points. 5361 * @since 1.5 5362 */ 5363 public static int offsetByCodePoints(char[] a, int start, int count, 5364 int index, int codePointOffset) { 5365 if (count > a.length-start || start < 0 || count < 0 5366 || index < start || index > start+count) { 5367 throw new IndexOutOfBoundsException(); 5368 } 5369 return offsetByCodePointsImpl(a, start, count, index, codePointOffset); 5370 } 5371 5372 static int offsetByCodePointsImpl(char[]a, int start, int count, 5373 int index, int codePointOffset) { 5374 int x = index; 5375 if (codePointOffset >= 0) { 5376 int limit = start + count; 5377 int i; 5378 for (i = 0; x < limit && i < codePointOffset; i++) { 5379 if (isHighSurrogate(a[x++]) && x < limit && 5380 isLowSurrogate(a[x])) { 5381 x++; 5382 } 5383 } 5384 if (i < codePointOffset) { 5385 throw new IndexOutOfBoundsException(); 5386 } 5387 } else { 5388 int i; 5389 for (i = codePointOffset; x > start && i < 0; i++) { 5390 if (isLowSurrogate(a[--x]) && x > start && 5391 isHighSurrogate(a[x-1])) { 5392 x--; 5393 } 5394 } 5395 if (i < 0) { 5396 throw new IndexOutOfBoundsException(); 5397 } 5398 } 5399 return x; 5400 } 5401 5402 /** 5403 * Determines if the specified character is a lowercase character. 5404 * <p> 5405 * A character is lowercase if its general category type, provided 5406 * by {@code Character.getType(ch)}, is 5407 * {@code LOWERCASE_LETTER}, or it has contributory property 5408 * Other_Lowercase as defined by the Unicode Standard. 5409 * <p> 5410 * The following are examples of lowercase characters: 5411 * <blockquote><pre> 5412 * a b c d e f g h i j k l m n o p q r s t u v w x y z 5413 * '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6' 5414 * '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE' 5415 * '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6' 5416 * '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF' 5417 * </pre></blockquote> 5418 * <p> Many other Unicode characters are lowercase too. 5419 * 5420 * <p><b>Note:</b> This method cannot handle <a 5421 * href="#supplementary"> supplementary characters</a>. To support 5422 * all Unicode characters, including supplementary characters, use 5423 * the {@link #isLowerCase(int)} method. 5424 * 5425 * @param ch the character to be tested. 5426 * @return {@code true} if the character is lowercase; 5427 * {@code false} otherwise. 5428 * @see Character#isLowerCase(char) 5429 * @see Character#isTitleCase(char) 5430 * @see Character#toLowerCase(char) 5431 * @see Character#getType(char) 5432 */ 5433 public static boolean isLowerCase(char ch) { 5434 return isLowerCase((int)ch); 5435 } 5436 5437 /** 5438 * Determines if the specified character (Unicode code point) is a 5439 * lowercase character. 5440 * <p> 5441 * A character is lowercase if its general category type, provided 5442 * by {@link Character#getType getType(codePoint)}, is 5443 * {@code LOWERCASE_LETTER}, or it has contributory property 5444 * Other_Lowercase as defined by the Unicode Standard. 5445 * <p> 5446 * The following are examples of lowercase characters: 5447 * <blockquote><pre> 5448 * a b c d e f g h i j k l m n o p q r s t u v w x y z 5449 * '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6' 5450 * '\u00E7' '\u00E8' '\u00E9' '\u00EA' '\u00EB' '\u00EC' '\u00ED' '\u00EE' 5451 * '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6' 5452 * '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF' 5453 * </pre></blockquote> 5454 * <p> Many other Unicode characters are lowercase too. 5455 * 5456 * @param codePoint the character (Unicode code point) to be tested. 5457 * @return {@code true} if the character is lowercase; 5458 * {@code false} otherwise. 5459 * @see Character#isLowerCase(int) 5460 * @see Character#isTitleCase(int) 5461 * @see Character#toLowerCase(int) 5462 * @see Character#getType(int) 5463 * @since 1.5 5464 */ 5465 public static boolean isLowerCase(int codePoint) { 5466 return getType(codePoint) == Character.LOWERCASE_LETTER || 5467 CharacterData.of(codePoint).isOtherLowercase(codePoint); 5468 } 5469 5470 /** 5471 * Determines if the specified character is an uppercase character. 5472 * <p> 5473 * A character is uppercase if its general category type, provided by 5474 * {@code Character.getType(ch)}, is {@code UPPERCASE_LETTER}. 5475 * or it has contributory property Other_Uppercase as defined by the Unicode Standard. 5476 * <p> 5477 * The following are examples of uppercase characters: 5478 * <blockquote><pre> 5479 * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 5480 * '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7' 5481 * '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF' 5482 * '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8' 5483 * '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE' 5484 * </pre></blockquote> 5485 * <p> Many other Unicode characters are uppercase too. 5486 * 5487 * <p><b>Note:</b> This method cannot handle <a 5488 * href="#supplementary"> supplementary characters</a>. To support 5489 * all Unicode characters, including supplementary characters, use 5490 * the {@link #isUpperCase(int)} method. 5491 * 5492 * @param ch the character to be tested. 5493 * @return {@code true} if the character is uppercase; 5494 * {@code false} otherwise. 5495 * @see Character#isLowerCase(char) 5496 * @see Character#isTitleCase(char) 5497 * @see Character#toUpperCase(char) 5498 * @see Character#getType(char) 5499 * @since 1.0 5500 */ 5501 public static boolean isUpperCase(char ch) { 5502 return isUpperCase((int)ch); 5503 } 5504 5505 /** 5506 * Determines if the specified character (Unicode code point) is an uppercase character. 5507 * <p> 5508 * A character is uppercase if its general category type, provided by 5509 * {@link Character#getType(int) getType(codePoint)}, is {@code UPPERCASE_LETTER}, 5510 * or it has contributory property Other_Uppercase as defined by the Unicode Standard. 5511 * <p> 5512 * The following are examples of uppercase characters: 5513 * <blockquote><pre> 5514 * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 5515 * '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7' 5516 * '\u00C8' '\u00C9' '\u00CA' '\u00CB' '\u00CC' '\u00CD' '\u00CE' '\u00CF' 5517 * '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8' 5518 * '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE' 5519 * </pre></blockquote> 5520 * <p> Many other Unicode characters are uppercase too.<p> 5521 * 5522 * @param codePoint the character (Unicode code point) to be tested. 5523 * @return {@code true} if the character is uppercase; 5524 * {@code false} otherwise. 5525 * @see Character#isLowerCase(int) 5526 * @see Character#isTitleCase(int) 5527 * @see Character#toUpperCase(int) 5528 * @see Character#getType(int) 5529 * @since 1.5 5530 */ 5531 public static boolean isUpperCase(int codePoint) { 5532 return getType(codePoint) == Character.UPPERCASE_LETTER || 5533 CharacterData.of(codePoint).isOtherUppercase(codePoint); 5534 } 5535 5536 /** 5537 * Determines if the specified character is a titlecase character. 5538 * <p> 5539 * A character is a titlecase character if its general 5540 * category type, provided by {@code Character.getType(ch)}, 5541 * is {@code TITLECASE_LETTER}. 5542 * <p> 5543 * Some characters look like pairs of Latin letters. For example, there 5544 * is an uppercase letter that looks like "LJ" and has a corresponding 5545 * lowercase letter that looks like "lj". A third form, which looks like "Lj", 5546 * is the appropriate form to use when rendering a word in lowercase 5547 * with initial capitals, as for a book title. 5548 * <p> 5549 * These are some of the Unicode characters for which this method returns 5550 * {@code true}: 5551 * <ul> 5552 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON} 5553 * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J} 5554 * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J} 5555 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z} 5556 * </ul> 5557 * <p> Many other Unicode characters are titlecase too. 5558 * 5559 * <p><b>Note:</b> This method cannot handle <a 5560 * href="#supplementary"> supplementary characters</a>. To support 5561 * all Unicode characters, including supplementary characters, use 5562 * the {@link #isTitleCase(int)} method. 5563 * 5564 * @param ch the character to be tested. 5565 * @return {@code true} if the character is titlecase; 5566 * {@code false} otherwise. 5567 * @see Character#isLowerCase(char) 5568 * @see Character#isUpperCase(char) 5569 * @see Character#toTitleCase(char) 5570 * @see Character#getType(char) 5571 * @since 1.0.2 5572 */ 5573 public static boolean isTitleCase(char ch) { 5574 return isTitleCase((int)ch); 5575 } 5576 5577 /** 5578 * Determines if the specified character (Unicode code point) is a titlecase character. 5579 * <p> 5580 * A character is a titlecase character if its general 5581 * category type, provided by {@link Character#getType(int) getType(codePoint)}, 5582 * is {@code TITLECASE_LETTER}. 5583 * <p> 5584 * Some characters look like pairs of Latin letters. For example, there 5585 * is an uppercase letter that looks like "LJ" and has a corresponding 5586 * lowercase letter that looks like "lj". A third form, which looks like "Lj", 5587 * is the appropriate form to use when rendering a word in lowercase 5588 * with initial capitals, as for a book title. 5589 * <p> 5590 * These are some of the Unicode characters for which this method returns 5591 * {@code true}: 5592 * <ul> 5593 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON} 5594 * <li>{@code LATIN CAPITAL LETTER L WITH SMALL LETTER J} 5595 * <li>{@code LATIN CAPITAL LETTER N WITH SMALL LETTER J} 5596 * <li>{@code LATIN CAPITAL LETTER D WITH SMALL LETTER Z} 5597 * </ul> 5598 * <p> Many other Unicode characters are titlecase too.<p> 5599 * 5600 * @param codePoint the character (Unicode code point) to be tested. 5601 * @return {@code true} if the character is titlecase; 5602 * {@code false} otherwise. 5603 * @see Character#isLowerCase(int) 5604 * @see Character#isUpperCase(int) 5605 * @see Character#toTitleCase(int) 5606 * @see Character#getType(int) 5607 * @since 1.5 5608 */ 5609 public static boolean isTitleCase(int codePoint) { 5610 return getType(codePoint) == Character.TITLECASE_LETTER; 5611 } 5612 5613 /** 5614 * Determines if the specified character is a digit. 5615 * <p> 5616 * A character is a digit if its general category type, provided 5617 * by {@code Character.getType(ch)}, is 5618 * {@code DECIMAL_DIGIT_NUMBER}. 5619 * <p> 5620 * Some Unicode character ranges that contain digits: 5621 * <ul> 5622 * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'}, 5623 * ISO-LATIN-1 digits ({@code '0'} through {@code '9'}) 5624 * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'}, 5625 * Arabic-Indic digits 5626 * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'}, 5627 * Extended Arabic-Indic digits 5628 * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'}, 5629 * Devanagari digits 5630 * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'}, 5631 * Fullwidth digits 5632 * </ul> 5633 * 5634 * Many other character ranges contain digits as well. 5635 * 5636 * <p><b>Note:</b> This method cannot handle <a 5637 * href="#supplementary"> supplementary characters</a>. To support 5638 * all Unicode characters, including supplementary characters, use 5639 * the {@link #isDigit(int)} method. 5640 * 5641 * @param ch the character to be tested. 5642 * @return {@code true} if the character is a digit; 5643 * {@code false} otherwise. 5644 * @see Character#digit(char, int) 5645 * @see Character#forDigit(int, int) 5646 * @see Character#getType(char) 5647 */ 5648 public static boolean isDigit(char ch) { 5649 return isDigit((int)ch); 5650 } 5651 5652 /** 5653 * Determines if the specified character (Unicode code point) is a digit. 5654 * <p> 5655 * A character is a digit if its general category type, provided 5656 * by {@link Character#getType(int) getType(codePoint)}, is 5657 * {@code DECIMAL_DIGIT_NUMBER}. 5658 * <p> 5659 * Some Unicode character ranges that contain digits: 5660 * <ul> 5661 * <li>{@code '\u005Cu0030'} through {@code '\u005Cu0039'}, 5662 * ISO-LATIN-1 digits ({@code '0'} through {@code '9'}) 5663 * <li>{@code '\u005Cu0660'} through {@code '\u005Cu0669'}, 5664 * Arabic-Indic digits 5665 * <li>{@code '\u005Cu06F0'} through {@code '\u005Cu06F9'}, 5666 * Extended Arabic-Indic digits 5667 * <li>{@code '\u005Cu0966'} through {@code '\u005Cu096F'}, 5668 * Devanagari digits 5669 * <li>{@code '\u005CuFF10'} through {@code '\u005CuFF19'}, 5670 * Fullwidth digits 5671 * </ul> 5672 * 5673 * Many other character ranges contain digits as well. 5674 * 5675 * @param codePoint the character (Unicode code point) to be tested. 5676 * @return {@code true} if the character is a digit; 5677 * {@code false} otherwise. 5678 * @see Character#forDigit(int, int) 5679 * @see Character#getType(int) 5680 * @since 1.5 5681 */ 5682 public static boolean isDigit(int codePoint) { 5683 return getType(codePoint) == Character.DECIMAL_DIGIT_NUMBER; 5684 } 5685 5686 /** 5687 * Determines if a character is defined in Unicode. 5688 * <p> 5689 * A character is defined if at least one of the following is true: 5690 * <ul> 5691 * <li>It has an entry in the UnicodeData file. 5692 * <li>It has a value in a range defined by the UnicodeData file. 5693 * </ul> 5694 * 5695 * <p><b>Note:</b> This method cannot handle <a 5696 * href="#supplementary"> supplementary characters</a>. To support 5697 * all Unicode characters, including supplementary characters, use 5698 * the {@link #isDefined(int)} method. 5699 * 5700 * @param ch the character to be tested 5701 * @return {@code true} if the character has a defined meaning 5702 * in Unicode; {@code false} otherwise. 5703 * @see Character#isDigit(char) 5704 * @see Character#isLetter(char) 5705 * @see Character#isLetterOrDigit(char) 5706 * @see Character#isLowerCase(char) 5707 * @see Character#isTitleCase(char) 5708 * @see Character#isUpperCase(char) 5709 * @since 1.0.2 5710 */ 5711 public static boolean isDefined(char ch) { 5712 return isDefined((int)ch); 5713 } 5714 5715 /** 5716 * Determines if a character (Unicode code point) is defined in Unicode. 5717 * <p> 5718 * A character is defined if at least one of the following is true: 5719 * <ul> 5720 * <li>It has an entry in the UnicodeData file. 5721 * <li>It has a value in a range defined by the UnicodeData file. 5722 * </ul> 5723 * 5724 * @param codePoint the character (Unicode code point) to be tested. 5725 * @return {@code true} if the character has a defined meaning 5726 * in Unicode; {@code false} otherwise. 5727 * @see Character#isDigit(int) 5728 * @see Character#isLetter(int) 5729 * @see Character#isLetterOrDigit(int) 5730 * @see Character#isLowerCase(int) 5731 * @see Character#isTitleCase(int) 5732 * @see Character#isUpperCase(int) 5733 * @since 1.5 5734 */ 5735 public static boolean isDefined(int codePoint) { 5736 return getType(codePoint) != Character.UNASSIGNED; 5737 } 5738 5739 /** 5740 * Determines if the specified character is a letter. 5741 * <p> 5742 * A character is considered to be a letter if its general 5743 * category type, provided by {@code Character.getType(ch)}, 5744 * is any of the following: 5745 * <ul> 5746 * <li> {@code UPPERCASE_LETTER} 5747 * <li> {@code LOWERCASE_LETTER} 5748 * <li> {@code TITLECASE_LETTER} 5749 * <li> {@code MODIFIER_LETTER} 5750 * <li> {@code OTHER_LETTER} 5751 * </ul> 5752 * 5753 * Not all letters have case. Many characters are 5754 * letters but are neither uppercase nor lowercase nor titlecase. 5755 * 5756 * <p><b>Note:</b> This method cannot handle <a 5757 * href="#supplementary"> supplementary characters</a>. To support 5758 * all Unicode characters, including supplementary characters, use 5759 * the {@link #isLetter(int)} method. 5760 * 5761 * @param ch the character to be tested. 5762 * @return {@code true} if the character is a letter; 5763 * {@code false} otherwise. 5764 * @see Character#isDigit(char) 5765 * @see Character#isJavaIdentifierStart(char) 5766 * @see Character#isJavaLetter(char) 5767 * @see Character#isJavaLetterOrDigit(char) 5768 * @see Character#isLetterOrDigit(char) 5769 * @see Character#isLowerCase(char) 5770 * @see Character#isTitleCase(char) 5771 * @see Character#isUnicodeIdentifierStart(char) 5772 * @see Character#isUpperCase(char) 5773 */ 5774 public static boolean isLetter(char ch) { 5775 return isLetter((int)ch); 5776 } 5777 5778 /** 5779 * Determines if the specified character (Unicode code point) is a letter. 5780 * <p> 5781 * A character is considered to be a letter if its general 5782 * category type, provided by {@link Character#getType(int) getType(codePoint)}, 5783 * is any of the following: 5784 * <ul> 5785 * <li> {@code UPPERCASE_LETTER} 5786 * <li> {@code LOWERCASE_LETTER} 5787 * <li> {@code TITLECASE_LETTER} 5788 * <li> {@code MODIFIER_LETTER} 5789 * <li> {@code OTHER_LETTER} 5790 * </ul> 5791 * 5792 * Not all letters have case. Many characters are 5793 * letters but are neither uppercase nor lowercase nor titlecase. 5794 * 5795 * @param codePoint the character (Unicode code point) to be tested. 5796 * @return {@code true} if the character is a letter; 5797 * {@code false} otherwise. 5798 * @see Character#isDigit(int) 5799 * @see Character#isJavaIdentifierStart(int) 5800 * @see Character#isLetterOrDigit(int) 5801 * @see Character#isLowerCase(int) 5802 * @see Character#isTitleCase(int) 5803 * @see Character#isUnicodeIdentifierStart(int) 5804 * @see Character#isUpperCase(int) 5805 * @since 1.5 5806 */ 5807 public static boolean isLetter(int codePoint) { 5808 return ((((1 << Character.UPPERCASE_LETTER) | 5809 (1 << Character.LOWERCASE_LETTER) | 5810 (1 << Character.TITLECASE_LETTER) | 5811 (1 << Character.MODIFIER_LETTER) | 5812 (1 << Character.OTHER_LETTER)) >> getType(codePoint)) & 1) 5813 != 0; 5814 } 5815 5816 /** 5817 * Determines if the specified character is a letter or digit. 5818 * <p> 5819 * A character is considered to be a letter or digit if either 5820 * {@code Character.isLetter(char ch)} or 5821 * {@code Character.isDigit(char ch)} returns 5822 * {@code true} for the character. 5823 * 5824 * <p><b>Note:</b> This method cannot handle <a 5825 * href="#supplementary"> supplementary characters</a>. To support 5826 * all Unicode characters, including supplementary characters, use 5827 * the {@link #isLetterOrDigit(int)} method. 5828 * 5829 * @param ch the character to be tested. 5830 * @return {@code true} if the character is a letter or digit; 5831 * {@code false} otherwise. 5832 * @see Character#isDigit(char) 5833 * @see Character#isJavaIdentifierPart(char) 5834 * @see Character#isJavaLetter(char) 5835 * @see Character#isJavaLetterOrDigit(char) 5836 * @see Character#isLetter(char) 5837 * @see Character#isUnicodeIdentifierPart(char) 5838 * @since 1.0.2 5839 */ 5840 public static boolean isLetterOrDigit(char ch) { 5841 return isLetterOrDigit((int)ch); 5842 } 5843 5844 /** 5845 * Determines if the specified character (Unicode code point) is a letter or digit. 5846 * <p> 5847 * A character is considered to be a letter or digit if either 5848 * {@link #isLetter(int) isLetter(codePoint)} or 5849 * {@link #isDigit(int) isDigit(codePoint)} returns 5850 * {@code true} for the character. 5851 * 5852 * @param codePoint the character (Unicode code point) to be tested. 5853 * @return {@code true} if the character is a letter or digit; 5854 * {@code false} otherwise. 5855 * @see Character#isDigit(int) 5856 * @see Character#isJavaIdentifierPart(int) 5857 * @see Character#isLetter(int) 5858 * @see Character#isUnicodeIdentifierPart(int) 5859 * @since 1.5 5860 */ 5861 public static boolean isLetterOrDigit(int codePoint) { 5862 return ((((1 << Character.UPPERCASE_LETTER) | 5863 (1 << Character.LOWERCASE_LETTER) | 5864 (1 << Character.TITLECASE_LETTER) | 5865 (1 << Character.MODIFIER_LETTER) | 5866 (1 << Character.OTHER_LETTER) | 5867 (1 << Character.DECIMAL_DIGIT_NUMBER)) >> getType(codePoint)) & 1) 5868 != 0; 5869 } 5870 5871 /** 5872 * Determines if the specified character is permissible as the first 5873 * character in a Java identifier. 5874 * <p> 5875 * A character may start a Java identifier if and only if 5876 * one of the following conditions is true: 5877 * <ul> 5878 * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} 5879 * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER} 5880 * <li> {@code ch} is a currency symbol (such as {@code '$'}) 5881 * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}). 5882 * </ul> 5883 * 5884 * These conditions are tested against the character information from version 5885 * 6.2 of the Unicode Standard. 5886 * 5887 * @param ch the character to be tested. 5888 * @return {@code true} if the character may start a Java 5889 * identifier; {@code false} otherwise. 5890 * @see Character#isJavaLetterOrDigit(char) 5891 * @see Character#isJavaIdentifierStart(char) 5892 * @see Character#isJavaIdentifierPart(char) 5893 * @see Character#isLetter(char) 5894 * @see Character#isLetterOrDigit(char) 5895 * @see Character#isUnicodeIdentifierStart(char) 5896 * @since 1.02 5897 * @deprecated Replaced by isJavaIdentifierStart(char). 5898 */ 5899 @Deprecated 5900 public static boolean isJavaLetter(char ch) { 5901 return isJavaIdentifierStart(ch); 5902 } 5903 5904 /** 5905 * Determines if the specified character may be part of a Java 5906 * identifier as other than the first character. 5907 * <p> 5908 * A character may be part of a Java identifier if and only if any 5909 * of the following conditions are true: 5910 * <ul> 5911 * <li> it is a letter 5912 * <li> it is a currency symbol (such as {@code '$'}) 5913 * <li> it is a connecting punctuation character (such as {@code '_'}) 5914 * <li> it is a digit 5915 * <li> it is a numeric letter (such as a Roman numeral character) 5916 * <li> it is a combining mark 5917 * <li> it is a non-spacing mark 5918 * <li> {@code isIdentifierIgnorable} returns 5919 * {@code true} for the character. 5920 * </ul> 5921 * 5922 * These conditions are tested against the character information from version 5923 * 6.2 of the Unicode Standard. 5924 * 5925 * @param ch the character to be tested. 5926 * @return {@code true} if the character may be part of a 5927 * Java identifier; {@code false} otherwise. 5928 * @see Character#isJavaLetter(char) 5929 * @see Character#isJavaIdentifierStart(char) 5930 * @see Character#isJavaIdentifierPart(char) 5931 * @see Character#isLetter(char) 5932 * @see Character#isLetterOrDigit(char) 5933 * @see Character#isUnicodeIdentifierPart(char) 5934 * @see Character#isIdentifierIgnorable(char) 5935 * @since 1.02 5936 * @deprecated Replaced by isJavaIdentifierPart(char). 5937 */ 5938 @Deprecated 5939 public static boolean isJavaLetterOrDigit(char ch) { 5940 return isJavaIdentifierPart(ch); 5941 } 5942 5943 /** 5944 * Determines if the specified character (Unicode code point) is an alphabet. 5945 * <p> 5946 * A character is considered to be alphabetic if its general category type, 5947 * provided by {@link Character#getType(int) getType(codePoint)}, is any of 5948 * the following: 5949 * <ul> 5950 * <li> <code>UPPERCASE_LETTER</code> 5951 * <li> <code>LOWERCASE_LETTER</code> 5952 * <li> <code>TITLECASE_LETTER</code> 5953 * <li> <code>MODIFIER_LETTER</code> 5954 * <li> <code>OTHER_LETTER</code> 5955 * <li> <code>LETTER_NUMBER</code> 5956 * </ul> 5957 * or it has contributory property Other_Alphabetic as defined by the 5958 * Unicode Standard. 5959 * 5960 * @param codePoint the character (Unicode code point) to be tested. 5961 * @return <code>true</code> if the character is a Unicode alphabet 5962 * character, <code>false</code> otherwise. 5963 * @since 1.7 5964 */ 5965 public static boolean isAlphabetic(int codePoint) { 5966 return (((((1 << Character.UPPERCASE_LETTER) | 5967 (1 << Character.LOWERCASE_LETTER) | 5968 (1 << Character.TITLECASE_LETTER) | 5969 (1 << Character.MODIFIER_LETTER) | 5970 (1 << Character.OTHER_LETTER) | 5971 (1 << Character.LETTER_NUMBER)) >> getType(codePoint)) & 1) != 0) || 5972 CharacterData.of(codePoint).isOtherAlphabetic(codePoint); 5973 } 5974 5975 /** 5976 * Determines if the specified character (Unicode code point) is a CJKV 5977 * (Chinese, Japanese, Korean and Vietnamese) ideograph, as defined by 5978 * the Unicode Standard. 5979 * 5980 * @param codePoint the character (Unicode code point) to be tested. 5981 * @return <code>true</code> if the character is a Unicode ideograph 5982 * character, <code>false</code> otherwise. 5983 * @since 1.7 5984 */ 5985 public static boolean isIdeographic(int codePoint) { 5986 return CharacterData.of(codePoint).isIdeographic(codePoint); 5987 } 5988 5989 /** 5990 * Determines if the specified character is 5991 * permissible as the first character in a Java identifier. 5992 * <p> 5993 * A character may start a Java identifier if and only if 5994 * one of the following conditions is true: 5995 * <ul> 5996 * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} 5997 * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER} 5998 * <li> {@code ch} is a currency symbol (such as {@code '$'}) 5999 * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}). 6000 * </ul> 6001 * 6002 * These conditions are tested against the character information from version 6003 * 6.2 of the Unicode Standard. 6004 * 6005 * <p><b>Note:</b> This method cannot handle <a 6006 * href="#supplementary"> supplementary characters</a>. To support 6007 * all Unicode characters, including supplementary characters, use 6008 * the {@link #isJavaIdentifierStart(int)} method. 6009 * 6010 * @param ch the character to be tested. 6011 * @return {@code true} if the character may start a Java identifier; 6012 * {@code false} otherwise. 6013 * @see Character#isJavaIdentifierPart(char) 6014 * @see Character#isLetter(char) 6015 * @see Character#isUnicodeIdentifierStart(char) 6016 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) 6017 * @since 1.1 6018 */ 6019 public static boolean isJavaIdentifierStart(char ch) { 6020 return isJavaIdentifierStart((int)ch); 6021 } 6022 6023 /** 6024 * Determines if the character (Unicode code point) is 6025 * permissible as the first character in a Java identifier. 6026 * <p> 6027 * A character may start a Java identifier if and only if 6028 * one of the following conditions is true: 6029 * <ul> 6030 * <li> {@link #isLetter(int) isLetter(codePoint)} 6031 * returns {@code true} 6032 * <li> {@link #getType(int) getType(codePoint)} 6033 * returns {@code LETTER_NUMBER} 6034 * <li> the referenced character is a currency symbol (such as {@code '$'}) 6035 * <li> the referenced character is a connecting punctuation character 6036 * (such as {@code '_'}). 6037 * </ul> 6038 * 6039 * These conditions are tested against the character information from version 6040 * 6.2 of the Unicode Standard. 6041 * 6042 * @param codePoint the character (Unicode code point) to be tested. 6043 * @return {@code true} if the character may start a Java identifier; 6044 * {@code false} otherwise. 6045 * @see Character#isJavaIdentifierPart(int) 6046 * @see Character#isLetter(int) 6047 * @see Character#isUnicodeIdentifierStart(int) 6048 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) 6049 * @since 1.5 6050 */ 6051 public static boolean isJavaIdentifierStart(int codePoint) { 6052 return CharacterData.of(codePoint).isJavaIdentifierStart(codePoint); 6053 } 6054 6055 /** 6056 * Determines if the specified character may be part of a Java 6057 * identifier as other than the first character. 6058 * <p> 6059 * A character may be part of a Java identifier if any of the following 6060 * conditions are true: 6061 * <ul> 6062 * <li> it is a letter 6063 * <li> it is a currency symbol (such as {@code '$'}) 6064 * <li> it is a connecting punctuation character (such as {@code '_'}) 6065 * <li> it is a digit 6066 * <li> it is a numeric letter (such as a Roman numeral character) 6067 * <li> it is a combining mark 6068 * <li> it is a non-spacing mark 6069 * <li> {@code isIdentifierIgnorable} returns 6070 * {@code true} for the character 6071 * </ul> 6072 * 6073 * These conditions are tested against the character information from version 6074 * 6.2 of the Unicode Standard. 6075 * 6076 * <p><b>Note:</b> This method cannot handle <a 6077 * href="#supplementary"> supplementary characters</a>. To support 6078 * all Unicode characters, including supplementary characters, use 6079 * the {@link #isJavaIdentifierPart(int)} method. 6080 * 6081 * @param ch the character to be tested. 6082 * @return {@code true} if the character may be part of a 6083 * Java identifier; {@code false} otherwise. 6084 * @see Character#isIdentifierIgnorable(char) 6085 * @see Character#isJavaIdentifierStart(char) 6086 * @see Character#isLetterOrDigit(char) 6087 * @see Character#isUnicodeIdentifierPart(char) 6088 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) 6089 * @since 1.1 6090 */ 6091 public static boolean isJavaIdentifierPart(char ch) { 6092 return isJavaIdentifierPart((int)ch); 6093 } 6094 6095 /** 6096 * Determines if the character (Unicode code point) may be part of a Java 6097 * identifier as other than the first character. 6098 * <p> 6099 * A character may be part of a Java identifier if any of the following 6100 * conditions are true: 6101 * <ul> 6102 * <li> it is a letter 6103 * <li> it is a currency symbol (such as {@code '$'}) 6104 * <li> it is a connecting punctuation character (such as {@code '_'}) 6105 * <li> it is a digit 6106 * <li> it is a numeric letter (such as a Roman numeral character) 6107 * <li> it is a combining mark 6108 * <li> it is a non-spacing mark 6109 * <li> {@link #isIdentifierIgnorable(int) 6110 * isIdentifierIgnorable(codePoint)} returns {@code true} for 6111 * the code point 6112 * </ul> 6113 * 6114 * These conditions are tested against the character information from version 6115 * 6.2 of the Unicode Standard. 6116 * 6117 * @param codePoint the character (Unicode code point) to be tested. 6118 * @return {@code true} if the character may be part of a 6119 * Java identifier; {@code false} otherwise. 6120 * @see Character#isIdentifierIgnorable(int) 6121 * @see Character#isJavaIdentifierStart(int) 6122 * @see Character#isLetterOrDigit(int) 6123 * @see Character#isUnicodeIdentifierPart(int) 6124 * @see javax.lang.model.SourceVersion#isIdentifier(CharSequence) 6125 * @since 1.5 6126 */ 6127 public static boolean isJavaIdentifierPart(int codePoint) { 6128 return CharacterData.of(codePoint).isJavaIdentifierPart(codePoint); 6129 } 6130 6131 /** 6132 * Determines if the specified character is permissible as the 6133 * first character in a Unicode identifier. 6134 * <p> 6135 * A character may start a Unicode identifier if and only if 6136 * one of the following conditions is true: 6137 * <ul> 6138 * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true} 6139 * <li> {@link #getType(char) getType(ch)} returns 6140 * {@code LETTER_NUMBER}. 6141 * </ul> 6142 * 6143 * <p><b>Note:</b> This method cannot handle <a 6144 * href="#supplementary"> supplementary characters</a>. To support 6145 * all Unicode characters, including supplementary characters, use 6146 * the {@link #isUnicodeIdentifierStart(int)} method. 6147 * 6148 * @param ch the character to be tested. 6149 * @return {@code true} if the character may start a Unicode 6150 * identifier; {@code false} otherwise. 6151 * @see Character#isJavaIdentifierStart(char) 6152 * @see Character#isLetter(char) 6153 * @see Character#isUnicodeIdentifierPart(char) 6154 * @since 1.1 6155 */ 6156 public static boolean isUnicodeIdentifierStart(char ch) { 6157 return isUnicodeIdentifierStart((int)ch); 6158 } 6159 6160 /** 6161 * Determines if the specified character (Unicode code point) is permissible as the 6162 * first character in a Unicode identifier. 6163 * <p> 6164 * A character may start a Unicode identifier if and only if 6165 * one of the following conditions is true: 6166 * <ul> 6167 * <li> {@link #isLetter(int) isLetter(codePoint)} 6168 * returns {@code true} 6169 * <li> {@link #getType(int) getType(codePoint)} 6170 * returns {@code LETTER_NUMBER}. 6171 * </ul> 6172 * @param codePoint the character (Unicode code point) to be tested. 6173 * @return {@code true} if the character may start a Unicode 6174 * identifier; {@code false} otherwise. 6175 * @see Character#isJavaIdentifierStart(int) 6176 * @see Character#isLetter(int) 6177 * @see Character#isUnicodeIdentifierPart(int) 6178 * @since 1.5 6179 */ 6180 public static boolean isUnicodeIdentifierStart(int codePoint) { 6181 return CharacterData.of(codePoint).isUnicodeIdentifierStart(codePoint); 6182 } 6183 6184 /** 6185 * Determines if the specified character may be part of a Unicode 6186 * identifier as other than the first character. 6187 * <p> 6188 * A character may be part of a Unicode identifier if and only if 6189 * one of the following statements is true: 6190 * <ul> 6191 * <li> it is a letter 6192 * <li> it is a connecting punctuation character (such as {@code '_'}) 6193 * <li> it is a digit 6194 * <li> it is a numeric letter (such as a Roman numeral character) 6195 * <li> it is a combining mark 6196 * <li> it is a non-spacing mark 6197 * <li> {@code isIdentifierIgnorable} returns 6198 * {@code true} for this character. 6199 * </ul> 6200 * 6201 * <p><b>Note:</b> This method cannot handle <a 6202 * href="#supplementary"> supplementary characters</a>. To support 6203 * all Unicode characters, including supplementary characters, use 6204 * the {@link #isUnicodeIdentifierPart(int)} method. 6205 * 6206 * @param ch the character to be tested. 6207 * @return {@code true} if the character may be part of a 6208 * Unicode identifier; {@code false} otherwise. 6209 * @see Character#isIdentifierIgnorable(char) 6210 * @see Character#isJavaIdentifierPart(char) 6211 * @see Character#isLetterOrDigit(char) 6212 * @see Character#isUnicodeIdentifierStart(char) 6213 * @since 1.1 6214 */ 6215 public static boolean isUnicodeIdentifierPart(char ch) { 6216 return isUnicodeIdentifierPart((int)ch); 6217 } 6218 6219 /** 6220 * Determines if the specified character (Unicode code point) may be part of a Unicode 6221 * identifier as other than the first character. 6222 * <p> 6223 * A character may be part of a Unicode identifier if and only if 6224 * one of the following statements is true: 6225 * <ul> 6226 * <li> it is a letter 6227 * <li> it is a connecting punctuation character (such as {@code '_'}) 6228 * <li> it is a digit 6229 * <li> it is a numeric letter (such as a Roman numeral character) 6230 * <li> it is a combining mark 6231 * <li> it is a non-spacing mark 6232 * <li> {@code isIdentifierIgnorable} returns 6233 * {@code true} for this character. 6234 * </ul> 6235 * @param codePoint the character (Unicode code point) to be tested. 6236 * @return {@code true} if the character may be part of a 6237 * Unicode identifier; {@code false} otherwise. 6238 * @see Character#isIdentifierIgnorable(int) 6239 * @see Character#isJavaIdentifierPart(int) 6240 * @see Character#isLetterOrDigit(int) 6241 * @see Character#isUnicodeIdentifierStart(int) 6242 * @since 1.5 6243 */ 6244 public static boolean isUnicodeIdentifierPart(int codePoint) { 6245 return CharacterData.of(codePoint).isUnicodeIdentifierPart(codePoint); 6246 } 6247 6248 /** 6249 * Determines if the specified character should be regarded as 6250 * an ignorable character in a Java identifier or a Unicode identifier. 6251 * <p> 6252 * The following Unicode characters are ignorable in a Java identifier 6253 * or a Unicode identifier: 6254 * <ul> 6255 * <li>ISO control characters that are not whitespace 6256 * <ul> 6257 * <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'} 6258 * <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'} 6259 * <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'} 6260 * </ul> 6261 * 6262 * <li>all characters that have the {@code FORMAT} general 6263 * category value 6264 * </ul> 6265 * 6266 * <p><b>Note:</b> This method cannot handle <a 6267 * href="#supplementary"> supplementary characters</a>. To support 6268 * all Unicode characters, including supplementary characters, use 6269 * the {@link #isIdentifierIgnorable(int)} method. 6270 * 6271 * @param ch the character to be tested. 6272 * @return {@code true} if the character is an ignorable control 6273 * character that may be part of a Java or Unicode identifier; 6274 * {@code false} otherwise. 6275 * @see Character#isJavaIdentifierPart(char) 6276 * @see Character#isUnicodeIdentifierPart(char) 6277 * @since 1.1 6278 */ 6279 public static boolean isIdentifierIgnorable(char ch) { 6280 return isIdentifierIgnorable((int)ch); 6281 } 6282 6283 /** 6284 * Determines if the specified character (Unicode code point) should be regarded as 6285 * an ignorable character in a Java identifier or a Unicode identifier. 6286 * <p> 6287 * The following Unicode characters are ignorable in a Java identifier 6288 * or a Unicode identifier: 6289 * <ul> 6290 * <li>ISO control characters that are not whitespace 6291 * <ul> 6292 * <li>{@code '\u005Cu0000'} through {@code '\u005Cu0008'} 6293 * <li>{@code '\u005Cu000E'} through {@code '\u005Cu001B'} 6294 * <li>{@code '\u005Cu007F'} through {@code '\u005Cu009F'} 6295 * </ul> 6296 * 6297 * <li>all characters that have the {@code FORMAT} general 6298 * category value 6299 * </ul> 6300 * 6301 * @param codePoint the character (Unicode code point) to be tested. 6302 * @return {@code true} if the character is an ignorable control 6303 * character that may be part of a Java or Unicode identifier; 6304 * {@code false} otherwise. 6305 * @see Character#isJavaIdentifierPart(int) 6306 * @see Character#isUnicodeIdentifierPart(int) 6307 * @since 1.5 6308 */ 6309 public static boolean isIdentifierIgnorable(int codePoint) { 6310 return CharacterData.of(codePoint).isIdentifierIgnorable(codePoint); 6311 } 6312 6313 /** 6314 * Converts the character argument to lowercase using case 6315 * mapping information from the UnicodeData file. 6316 * <p> 6317 * Note that 6318 * {@code Character.isLowerCase(Character.toLowerCase(ch))} 6319 * does not always return {@code true} for some ranges of 6320 * characters, particularly those that are symbols or ideographs. 6321 * 6322 * <p>In general, {@link String#toLowerCase()} should be used to map 6323 * characters to lowercase. {@code String} case mapping methods 6324 * have several benefits over {@code Character} case mapping methods. 6325 * {@code String} case mapping methods can perform locale-sensitive 6326 * mappings, context-sensitive mappings, and 1:M character mappings, whereas 6327 * the {@code Character} case mapping methods cannot. 6328 * 6329 * <p><b>Note:</b> This method cannot handle <a 6330 * href="#supplementary"> supplementary characters</a>. To support 6331 * all Unicode characters, including supplementary characters, use 6332 * the {@link #toLowerCase(int)} method. 6333 * 6334 * @param ch the character to be converted. 6335 * @return the lowercase equivalent of the character, if any; 6336 * otherwise, the character itself. 6337 * @see Character#isLowerCase(char) 6338 * @see String#toLowerCase() 6339 */ 6340 public static char toLowerCase(char ch) { 6341 return (char)toLowerCase((int)ch); 6342 } 6343 6344 /** 6345 * Converts the character (Unicode code point) argument to 6346 * lowercase using case mapping information from the UnicodeData 6347 * file. 6348 * 6349 * <p> Note that 6350 * {@code Character.isLowerCase(Character.toLowerCase(codePoint))} 6351 * does not always return {@code true} for some ranges of 6352 * characters, particularly those that are symbols or ideographs. 6353 * 6354 * <p>In general, {@link String#toLowerCase()} should be used to map 6355 * characters to lowercase. {@code String} case mapping methods 6356 * have several benefits over {@code Character} case mapping methods. 6357 * {@code String} case mapping methods can perform locale-sensitive 6358 * mappings, context-sensitive mappings, and 1:M character mappings, whereas 6359 * the {@code Character} case mapping methods cannot. 6360 * 6361 * @param codePoint the character (Unicode code point) to be converted. 6362 * @return the lowercase equivalent of the character (Unicode code 6363 * point), if any; otherwise, the character itself. 6364 * @see Character#isLowerCase(int) 6365 * @see String#toLowerCase() 6366 * 6367 * @since 1.5 6368 */ 6369 public static int toLowerCase(int codePoint) { 6370 return CharacterData.of(codePoint).toLowerCase(codePoint); 6371 } 6372 6373 /** 6374 * Converts the character argument to uppercase using case mapping 6375 * information from the UnicodeData file. 6376 * <p> 6377 * Note that 6378 * {@code Character.isUpperCase(Character.toUpperCase(ch))} 6379 * does not always return {@code true} for some ranges of 6380 * characters, particularly those that are symbols or ideographs. 6381 * 6382 * <p>In general, {@link String#toUpperCase()} should be used to map 6383 * characters to uppercase. {@code String} case mapping methods 6384 * have several benefits over {@code Character} case mapping methods. 6385 * {@code String} case mapping methods can perform locale-sensitive 6386 * mappings, context-sensitive mappings, and 1:M character mappings, whereas 6387 * the {@code Character} case mapping methods cannot. 6388 * 6389 * <p><b>Note:</b> This method cannot handle <a 6390 * href="#supplementary"> supplementary characters</a>. To support 6391 * all Unicode characters, including supplementary characters, use 6392 * the {@link #toUpperCase(int)} method. 6393 * 6394 * @param ch the character to be converted. 6395 * @return the uppercase equivalent of the character, if any; 6396 * otherwise, the character itself. 6397 * @see Character#isUpperCase(char) 6398 * @see String#toUpperCase() 6399 */ 6400 public static char toUpperCase(char ch) { 6401 return (char)toUpperCase((int)ch); 6402 } 6403 6404 /** 6405 * Converts the character (Unicode code point) argument to 6406 * uppercase using case mapping information from the UnicodeData 6407 * file. 6408 * 6409 * <p>Note that 6410 * {@code Character.isUpperCase(Character.toUpperCase(codePoint))} 6411 * does not always return {@code true} for some ranges of 6412 * characters, particularly those that are symbols or ideographs. 6413 * 6414 * <p>In general, {@link String#toUpperCase()} should be used to map 6415 * characters to uppercase. {@code String} case mapping methods 6416 * have several benefits over {@code Character} case mapping methods. 6417 * {@code String} case mapping methods can perform locale-sensitive 6418 * mappings, context-sensitive mappings, and 1:M character mappings, whereas 6419 * the {@code Character} case mapping methods cannot. 6420 * 6421 * @param codePoint the character (Unicode code point) to be converted. 6422 * @return the uppercase equivalent of the character, if any; 6423 * otherwise, the character itself. 6424 * @see Character#isUpperCase(int) 6425 * @see String#toUpperCase() 6426 * 6427 * @since 1.5 6428 */ 6429 public static int toUpperCase(int codePoint) { 6430 return CharacterData.of(codePoint).toUpperCase(codePoint); 6431 } 6432 6433 /** 6434 * Converts the character argument to titlecase using case mapping 6435 * information from the UnicodeData file. If a character has no 6436 * explicit titlecase mapping and is not itself a titlecase char 6437 * according to UnicodeData, then the uppercase mapping is 6438 * returned as an equivalent titlecase mapping. If the 6439 * {@code char} argument is already a titlecase 6440 * {@code char}, the same {@code char} value will be 6441 * returned. 6442 * <p> 6443 * Note that 6444 * {@code Character.isTitleCase(Character.toTitleCase(ch))} 6445 * does not always return {@code true} for some ranges of 6446 * characters. 6447 * 6448 * <p><b>Note:</b> This method cannot handle <a 6449 * href="#supplementary"> supplementary characters</a>. To support 6450 * all Unicode characters, including supplementary characters, use 6451 * the {@link #toTitleCase(int)} method. 6452 * 6453 * @param ch the character to be converted. 6454 * @return the titlecase equivalent of the character, if any; 6455 * otherwise, the character itself. 6456 * @see Character#isTitleCase(char) 6457 * @see Character#toLowerCase(char) 6458 * @see Character#toUpperCase(char) 6459 * @since 1.0.2 6460 */ 6461 public static char toTitleCase(char ch) { 6462 return (char)toTitleCase((int)ch); 6463 } 6464 6465 /** 6466 * Converts the character (Unicode code point) argument to titlecase using case mapping 6467 * information from the UnicodeData file. If a character has no 6468 * explicit titlecase mapping and is not itself a titlecase char 6469 * according to UnicodeData, then the uppercase mapping is 6470 * returned as an equivalent titlecase mapping. If the 6471 * character argument is already a titlecase 6472 * character, the same character value will be 6473 * returned. 6474 * 6475 * <p>Note that 6476 * {@code Character.isTitleCase(Character.toTitleCase(codePoint))} 6477 * does not always return {@code true} for some ranges of 6478 * characters. 6479 * 6480 * @param codePoint the character (Unicode code point) to be converted. 6481 * @return the titlecase equivalent of the character, if any; 6482 * otherwise, the character itself. 6483 * @see Character#isTitleCase(int) 6484 * @see Character#toLowerCase(int) 6485 * @see Character#toUpperCase(int) 6486 * @since 1.5 6487 */ 6488 public static int toTitleCase(int codePoint) { 6489 return CharacterData.of(codePoint).toTitleCase(codePoint); 6490 } 6491 6492 /** 6493 * Returns the numeric value of the character {@code ch} in the 6494 * specified radix. 6495 * <p> 6496 * If the radix is not in the range {@code MIN_RADIX} ≤ 6497 * {@code radix} ≤ {@code MAX_RADIX} or if the 6498 * value of {@code ch} is not a valid digit in the specified 6499 * radix, {@code -1} is returned. A character is a valid digit 6500 * if at least one of the following is true: 6501 * <ul> 6502 * <li>The method {@code isDigit} is {@code true} of the character 6503 * and the Unicode decimal digit value of the character (or its 6504 * single-character decomposition) is less than the specified radix. 6505 * In this case the decimal digit value is returned. 6506 * <li>The character is one of the uppercase Latin letters 6507 * {@code 'A'} through {@code 'Z'} and its code is less than 6508 * {@code radix + 'A' - 10}. 6509 * In this case, {@code ch - 'A' + 10} 6510 * is returned. 6511 * <li>The character is one of the lowercase Latin letters 6512 * {@code 'a'} through {@code 'z'} and its code is less than 6513 * {@code radix + 'a' - 10}. 6514 * In this case, {@code ch - 'a' + 10} 6515 * is returned. 6516 * <li>The character is one of the fullwidth uppercase Latin letters A 6517 * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'}) 6518 * and its code is less than 6519 * {@code radix + '\u005CuFF21' - 10}. 6520 * In this case, {@code ch - '\u005CuFF21' + 10} 6521 * is returned. 6522 * <li>The character is one of the fullwidth lowercase Latin letters a 6523 * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'}) 6524 * and its code is less than 6525 * {@code radix + '\u005CuFF41' - 10}. 6526 * In this case, {@code ch - '\u005CuFF41' + 10} 6527 * is returned. 6528 * </ul> 6529 * 6530 * <p><b>Note:</b> This method cannot handle <a 6531 * href="#supplementary"> supplementary characters</a>. To support 6532 * all Unicode characters, including supplementary characters, use 6533 * the {@link #digit(int, int)} method. 6534 * 6535 * @param ch the character to be converted. 6536 * @param radix the radix. 6537 * @return the numeric value represented by the character in the 6538 * specified radix. 6539 * @see Character#forDigit(int, int) 6540 * @see Character#isDigit(char) 6541 */ 6542 public static int digit(char ch, int radix) { 6543 return digit((int)ch, radix); 6544 } 6545 6546 /** 6547 * Returns the numeric value of the specified character (Unicode 6548 * code point) in the specified radix. 6549 * 6550 * <p>If the radix is not in the range {@code MIN_RADIX} ≤ 6551 * {@code radix} ≤ {@code MAX_RADIX} or if the 6552 * character is not a valid digit in the specified 6553 * radix, {@code -1} is returned. A character is a valid digit 6554 * if at least one of the following is true: 6555 * <ul> 6556 * <li>The method {@link #isDigit(int) isDigit(codePoint)} is {@code true} of the character 6557 * and the Unicode decimal digit value of the character (or its 6558 * single-character decomposition) is less than the specified radix. 6559 * In this case the decimal digit value is returned. 6560 * <li>The character is one of the uppercase Latin letters 6561 * {@code 'A'} through {@code 'Z'} and its code is less than 6562 * {@code radix + 'A' - 10}. 6563 * In this case, {@code codePoint - 'A' + 10} 6564 * is returned. 6565 * <li>The character is one of the lowercase Latin letters 6566 * {@code 'a'} through {@code 'z'} and its code is less than 6567 * {@code radix + 'a' - 10}. 6568 * In this case, {@code codePoint - 'a' + 10} 6569 * is returned. 6570 * <li>The character is one of the fullwidth uppercase Latin letters A 6571 * ({@code '\u005CuFF21'}) through Z ({@code '\u005CuFF3A'}) 6572 * and its code is less than 6573 * {@code radix + '\u005CuFF21' - 10}. 6574 * In this case, 6575 * {@code codePoint - '\u005CuFF21' + 10} 6576 * is returned. 6577 * <li>The character is one of the fullwidth lowercase Latin letters a 6578 * ({@code '\u005CuFF41'}) through z ({@code '\u005CuFF5A'}) 6579 * and its code is less than 6580 * {@code radix + '\u005CuFF41'- 10}. 6581 * In this case, 6582 * {@code codePoint - '\u005CuFF41' + 10} 6583 * is returned. 6584 * </ul> 6585 * 6586 * @param codePoint the character (Unicode code point) to be converted. 6587 * @param radix the radix. 6588 * @return the numeric value represented by the character in the 6589 * specified radix. 6590 * @see Character#forDigit(int, int) 6591 * @see Character#isDigit(int) 6592 * @since 1.5 6593 */ 6594 public static int digit(int codePoint, int radix) { 6595 return CharacterData.of(codePoint).digit(codePoint, radix); 6596 } 6597 6598 /** 6599 * Returns the {@code int} value that the specified Unicode 6600 * character represents. For example, the character 6601 * {@code '\u005Cu216C'} (the roman numeral fifty) will return 6602 * an int with a value of 50. 6603 * <p> 6604 * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through 6605 * {@code '\u005Cu005A'}), lowercase 6606 * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and 6607 * full width variant ({@code '\u005CuFF21'} through 6608 * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through 6609 * {@code '\u005CuFF5A'}) forms have numeric values from 10 6610 * through 35. This is independent of the Unicode specification, 6611 * which does not assign numeric values to these {@code char} 6612 * values. 6613 * <p> 6614 * If the character does not have a numeric value, then -1 is returned. 6615 * If the character has a numeric value that cannot be represented as a 6616 * nonnegative integer (for example, a fractional value), then -2 6617 * is returned. 6618 * 6619 * <p><b>Note:</b> This method cannot handle <a 6620 * href="#supplementary"> supplementary characters</a>. To support 6621 * all Unicode characters, including supplementary characters, use 6622 * the {@link #getNumericValue(int)} method. 6623 * 6624 * @param ch the character to be converted. 6625 * @return the numeric value of the character, as a nonnegative {@code int} 6626 * value; -2 if the character has a numeric value that is not a 6627 * nonnegative integer; -1 if the character has no numeric value. 6628 * @see Character#forDigit(int, int) 6629 * @see Character#isDigit(char) 6630 * @since 1.1 6631 */ 6632 public static int getNumericValue(char ch) { 6633 return getNumericValue((int)ch); 6634 } 6635 6636 /** 6637 * Returns the {@code int} value that the specified 6638 * character (Unicode code point) represents. For example, the character 6639 * {@code '\u005Cu216C'} (the Roman numeral fifty) will return 6640 * an {@code int} with a value of 50. 6641 * <p> 6642 * The letters A-Z in their uppercase ({@code '\u005Cu0041'} through 6643 * {@code '\u005Cu005A'}), lowercase 6644 * ({@code '\u005Cu0061'} through {@code '\u005Cu007A'}), and 6645 * full width variant ({@code '\u005CuFF21'} through 6646 * {@code '\u005CuFF3A'} and {@code '\u005CuFF41'} through 6647 * {@code '\u005CuFF5A'}) forms have numeric values from 10 6648 * through 35. This is independent of the Unicode specification, 6649 * which does not assign numeric values to these {@code char} 6650 * values. 6651 * <p> 6652 * If the character does not have a numeric value, then -1 is returned. 6653 * If the character has a numeric value that cannot be represented as a 6654 * nonnegative integer (for example, a fractional value), then -2 6655 * is returned. 6656 * 6657 * @param codePoint the character (Unicode code point) to be converted. 6658 * @return the numeric value of the character, as a nonnegative {@code int} 6659 * value; -2 if the character has a numeric value that is not a 6660 * nonnegative integer; -1 if the character has no numeric value. 6661 * @see Character#forDigit(int, int) 6662 * @see Character#isDigit(int) 6663 * @since 1.5 6664 */ 6665 public static int getNumericValue(int codePoint) { 6666 return CharacterData.of(codePoint).getNumericValue(codePoint); 6667 } 6668 6669 /** 6670 * Determines if the specified character is ISO-LATIN-1 white space. 6671 * This method returns {@code true} for the following five 6672 * characters only: 6673 * <table summary="truechars"> 6674 * <tr><td>{@code '\t'}</td> <td>{@code U+0009}</td> 6675 * <td>{@code HORIZONTAL TABULATION}</td></tr> 6676 * <tr><td>{@code '\n'}</td> <td>{@code U+000A}</td> 6677 * <td>{@code NEW LINE}</td></tr> 6678 * <tr><td>{@code '\f'}</td> <td>{@code U+000C}</td> 6679 * <td>{@code FORM FEED}</td></tr> 6680 * <tr><td>{@code '\r'}</td> <td>{@code U+000D}</td> 6681 * <td>{@code CARRIAGE RETURN}</td></tr> 6682 * <tr><td>{@code ' '}</td> <td>{@code U+0020}</td> 6683 * <td>{@code SPACE}</td></tr> 6684 * </table> 6685 * 6686 * @param ch the character to be tested. 6687 * @return {@code true} if the character is ISO-LATIN-1 white 6688 * space; {@code false} otherwise. 6689 * @see Character#isSpaceChar(char) 6690 * @see Character#isWhitespace(char) 6691 * @deprecated Replaced by isWhitespace(char). 6692 */ 6693 @Deprecated 6694 public static boolean isSpace(char ch) { 6695 return (ch <= 0x0020) && 6696 (((((1L << 0x0009) | 6697 (1L << 0x000A) | 6698 (1L << 0x000C) | 6699 (1L << 0x000D) | 6700 (1L << 0x0020)) >> ch) & 1L) != 0); 6701 } 6702 6703 6704 /** 6705 * Determines if the specified character is a Unicode space character. 6706 * A character is considered to be a space character if and only if 6707 * it is specified to be a space character by the Unicode Standard. This 6708 * method returns true if the character's general category type is any of 6709 * the following: 6710 * <ul> 6711 * <li> {@code SPACE_SEPARATOR} 6712 * <li> {@code LINE_SEPARATOR} 6713 * <li> {@code PARAGRAPH_SEPARATOR} 6714 * </ul> 6715 * 6716 * <p><b>Note:</b> This method cannot handle <a 6717 * href="#supplementary"> supplementary characters</a>. To support 6718 * all Unicode characters, including supplementary characters, use 6719 * the {@link #isSpaceChar(int)} method. 6720 * 6721 * @param ch the character to be tested. 6722 * @return {@code true} if the character is a space character; 6723 * {@code false} otherwise. 6724 * @see Character#isWhitespace(char) 6725 * @since 1.1 6726 */ 6727 public static boolean isSpaceChar(char ch) { 6728 return isSpaceChar((int)ch); 6729 } 6730 6731 /** 6732 * Determines if the specified character (Unicode code point) is a 6733 * Unicode space character. A character is considered to be a 6734 * space character if and only if it is specified to be a space 6735 * character by the Unicode Standard. This method returns true if 6736 * the character's general category type is any of the following: 6737 * 6738 * <ul> 6739 * <li> {@link #SPACE_SEPARATOR} 6740 * <li> {@link #LINE_SEPARATOR} 6741 * <li> {@link #PARAGRAPH_SEPARATOR} 6742 * </ul> 6743 * 6744 * @param codePoint the character (Unicode code point) to be tested. 6745 * @return {@code true} if the character is a space character; 6746 * {@code false} otherwise. 6747 * @see Character#isWhitespace(int) 6748 * @since 1.5 6749 */ 6750 public static boolean isSpaceChar(int codePoint) { 6751 return ((((1 << Character.SPACE_SEPARATOR) | 6752 (1 << Character.LINE_SEPARATOR) | 6753 (1 << Character.PARAGRAPH_SEPARATOR)) >> getType(codePoint)) & 1) 6754 != 0; 6755 } 6756 6757 /** 6758 * Determines if the specified character is white space according to Java. 6759 * A character is a Java whitespace character if and only if it satisfies 6760 * one of the following criteria: 6761 * <ul> 6762 * <li> It is a Unicode space character ({@code SPACE_SEPARATOR}, 6763 * {@code LINE_SEPARATOR}, or {@code PARAGRAPH_SEPARATOR}) 6764 * but is not also a non-breaking space ({@code '\u005Cu00A0'}, 6765 * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}). 6766 * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION. 6767 * <li> It is {@code '\u005Cn'}, U+000A LINE FEED. 6768 * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION. 6769 * <li> It is {@code '\u005Cf'}, U+000C FORM FEED. 6770 * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN. 6771 * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR. 6772 * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR. 6773 * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR. 6774 * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR. 6775 * </ul> 6776 * 6777 * <p><b>Note:</b> This method cannot handle <a 6778 * href="#supplementary"> supplementary characters</a>. To support 6779 * all Unicode characters, including supplementary characters, use 6780 * the {@link #isWhitespace(int)} method. 6781 * 6782 * @param ch the character to be tested. 6783 * @return {@code true} if the character is a Java whitespace 6784 * character; {@code false} otherwise. 6785 * @see Character#isSpaceChar(char) 6786 * @since 1.1 6787 */ 6788 public static boolean isWhitespace(char ch) { 6789 return isWhitespace((int)ch); 6790 } 6791 6792 /** 6793 * Determines if the specified character (Unicode code point) is 6794 * white space according to Java. A character is a Java 6795 * whitespace character if and only if it satisfies one of the 6796 * following criteria: 6797 * <ul> 6798 * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR}, 6799 * {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR}) 6800 * but is not also a non-breaking space ({@code '\u005Cu00A0'}, 6801 * {@code '\u005Cu2007'}, {@code '\u005Cu202F'}). 6802 * <li> It is {@code '\u005Ct'}, U+0009 HORIZONTAL TABULATION. 6803 * <li> It is {@code '\u005Cn'}, U+000A LINE FEED. 6804 * <li> It is {@code '\u005Cu000B'}, U+000B VERTICAL TABULATION. 6805 * <li> It is {@code '\u005Cf'}, U+000C FORM FEED. 6806 * <li> It is {@code '\u005Cr'}, U+000D CARRIAGE RETURN. 6807 * <li> It is {@code '\u005Cu001C'}, U+001C FILE SEPARATOR. 6808 * <li> It is {@code '\u005Cu001D'}, U+001D GROUP SEPARATOR. 6809 * <li> It is {@code '\u005Cu001E'}, U+001E RECORD SEPARATOR. 6810 * <li> It is {@code '\u005Cu001F'}, U+001F UNIT SEPARATOR. 6811 * </ul> 6812 * <p> 6813 * 6814 * @param codePoint the character (Unicode code point) to be tested. 6815 * @return {@code true} if the character is a Java whitespace 6816 * character; {@code false} otherwise. 6817 * @see Character#isSpaceChar(int) 6818 * @since 1.5 6819 */ 6820 public static boolean isWhitespace(int codePoint) { 6821 return CharacterData.of(codePoint).isWhitespace(codePoint); 6822 } 6823 6824 /** 6825 * Determines if the specified character is an ISO control 6826 * character. A character is considered to be an ISO control 6827 * character if its code is in the range {@code '\u005Cu0000'} 6828 * through {@code '\u005Cu001F'} or in the range 6829 * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}. 6830 * 6831 * <p><b>Note:</b> This method cannot handle <a 6832 * href="#supplementary"> supplementary characters</a>. To support 6833 * all Unicode characters, including supplementary characters, use 6834 * the {@link #isISOControl(int)} method. 6835 * 6836 * @param ch the character to be tested. 6837 * @return {@code true} if the character is an ISO control character; 6838 * {@code false} otherwise. 6839 * 6840 * @see Character#isSpaceChar(char) 6841 * @see Character#isWhitespace(char) 6842 * @since 1.1 6843 */ 6844 public static boolean isISOControl(char ch) { 6845 return isISOControl((int)ch); 6846 } 6847 6848 /** 6849 * Determines if the referenced character (Unicode code point) is an ISO control 6850 * character. A character is considered to be an ISO control 6851 * character if its code is in the range {@code '\u005Cu0000'} 6852 * through {@code '\u005Cu001F'} or in the range 6853 * {@code '\u005Cu007F'} through {@code '\u005Cu009F'}. 6854 * 6855 * @param codePoint the character (Unicode code point) to be tested. 6856 * @return {@code true} if the character is an ISO control character; 6857 * {@code false} otherwise. 6858 * @see Character#isSpaceChar(int) 6859 * @see Character#isWhitespace(int) 6860 * @since 1.5 6861 */ 6862 public static boolean isISOControl(int codePoint) { 6863 // Optimized form of: 6864 // (codePoint >= 0x00 && codePoint <= 0x1F) || 6865 // (codePoint >= 0x7F && codePoint <= 0x9F); 6866 return codePoint <= 0x9F && 6867 (codePoint >= 0x7F || (codePoint >>> 5 == 0)); 6868 } 6869 6870 /** 6871 * Returns a value indicating a character's general category. 6872 * 6873 * <p><b>Note:</b> This method cannot handle <a 6874 * href="#supplementary"> supplementary characters</a>. To support 6875 * all Unicode characters, including supplementary characters, use 6876 * the {@link #getType(int)} method. 6877 * 6878 * @param ch the character to be tested. 6879 * @return a value of type {@code int} representing the 6880 * character's general category. 6881 * @see Character#COMBINING_SPACING_MARK 6882 * @see Character#CONNECTOR_PUNCTUATION 6883 * @see Character#CONTROL 6884 * @see Character#CURRENCY_SYMBOL 6885 * @see Character#DASH_PUNCTUATION 6886 * @see Character#DECIMAL_DIGIT_NUMBER 6887 * @see Character#ENCLOSING_MARK 6888 * @see Character#END_PUNCTUATION 6889 * @see Character#FINAL_QUOTE_PUNCTUATION 6890 * @see Character#FORMAT 6891 * @see Character#INITIAL_QUOTE_PUNCTUATION 6892 * @see Character#LETTER_NUMBER 6893 * @see Character#LINE_SEPARATOR 6894 * @see Character#LOWERCASE_LETTER 6895 * @see Character#MATH_SYMBOL 6896 * @see Character#MODIFIER_LETTER 6897 * @see Character#MODIFIER_SYMBOL 6898 * @see Character#NON_SPACING_MARK 6899 * @see Character#OTHER_LETTER 6900 * @see Character#OTHER_NUMBER 6901 * @see Character#OTHER_PUNCTUATION 6902 * @see Character#OTHER_SYMBOL 6903 * @see Character#PARAGRAPH_SEPARATOR 6904 * @see Character#PRIVATE_USE 6905 * @see Character#SPACE_SEPARATOR 6906 * @see Character#START_PUNCTUATION 6907 * @see Character#SURROGATE 6908 * @see Character#TITLECASE_LETTER 6909 * @see Character#UNASSIGNED 6910 * @see Character#UPPERCASE_LETTER 6911 * @since 1.1 6912 */ 6913 public static int getType(char ch) { 6914 return getType((int)ch); 6915 } 6916 6917 /** 6918 * Returns a value indicating a character's general category. 6919 * 6920 * @param codePoint the character (Unicode code point) to be tested. 6921 * @return a value of type {@code int} representing the 6922 * character's general category. 6923 * @see Character#COMBINING_SPACING_MARK COMBINING_SPACING_MARK 6924 * @see Character#CONNECTOR_PUNCTUATION CONNECTOR_PUNCTUATION 6925 * @see Character#CONTROL CONTROL 6926 * @see Character#CURRENCY_SYMBOL CURRENCY_SYMBOL 6927 * @see Character#DASH_PUNCTUATION DASH_PUNCTUATION 6928 * @see Character#DECIMAL_DIGIT_NUMBER DECIMAL_DIGIT_NUMBER 6929 * @see Character#ENCLOSING_MARK ENCLOSING_MARK 6930 * @see Character#END_PUNCTUATION END_PUNCTUATION 6931 * @see Character#FINAL_QUOTE_PUNCTUATION FINAL_QUOTE_PUNCTUATION 6932 * @see Character#FORMAT FORMAT 6933 * @see Character#INITIAL_QUOTE_PUNCTUATION INITIAL_QUOTE_PUNCTUATION 6934 * @see Character#LETTER_NUMBER LETTER_NUMBER 6935 * @see Character#LINE_SEPARATOR LINE_SEPARATOR 6936 * @see Character#LOWERCASE_LETTER LOWERCASE_LETTER 6937 * @see Character#MATH_SYMBOL MATH_SYMBOL 6938 * @see Character#MODIFIER_LETTER MODIFIER_LETTER 6939 * @see Character#MODIFIER_SYMBOL MODIFIER_SYMBOL 6940 * @see Character#NON_SPACING_MARK NON_SPACING_MARK 6941 * @see Character#OTHER_LETTER OTHER_LETTER 6942 * @see Character#OTHER_NUMBER OTHER_NUMBER 6943 * @see Character#OTHER_PUNCTUATION OTHER_PUNCTUATION 6944 * @see Character#OTHER_SYMBOL OTHER_SYMBOL 6945 * @see Character#PARAGRAPH_SEPARATOR PARAGRAPH_SEPARATOR 6946 * @see Character#PRIVATE_USE PRIVATE_USE 6947 * @see Character#SPACE_SEPARATOR SPACE_SEPARATOR 6948 * @see Character#START_PUNCTUATION START_PUNCTUATION 6949 * @see Character#SURROGATE SURROGATE 6950 * @see Character#TITLECASE_LETTER TITLECASE_LETTER 6951 * @see Character#UNASSIGNED UNASSIGNED 6952 * @see Character#UPPERCASE_LETTER UPPERCASE_LETTER 6953 * @since 1.5 6954 */ 6955 public static int getType(int codePoint) { 6956 return CharacterData.of(codePoint).getType(codePoint); 6957 } 6958 6959 /** 6960 * Determines the character representation for a specific digit in 6961 * the specified radix. If the value of {@code radix} is not a 6962 * valid radix, or the value of {@code digit} is not a valid 6963 * digit in the specified radix, the null character 6964 * ({@code '\u005Cu0000'}) is returned. 6965 * <p> 6966 * The {@code radix} argument is valid if it is greater than or 6967 * equal to {@code MIN_RADIX} and less than or equal to 6968 * {@code MAX_RADIX}. The {@code digit} argument is valid if 6969 * {@code 0 <= digit < radix}. 6970 * <p> 6971 * If the digit is less than 10, then 6972 * {@code '0' + digit} is returned. Otherwise, the value 6973 * {@code 'a' + digit - 10} is returned. 6974 * 6975 * @param digit the number to convert to a character. 6976 * @param radix the radix. 6977 * @return the {@code char} representation of the specified digit 6978 * in the specified radix. 6979 * @see Character#MIN_RADIX 6980 * @see Character#MAX_RADIX 6981 * @see Character#digit(char, int) 6982 */ 6983 public static char forDigit(int digit, int radix) { 6984 if ((digit >= radix) || (digit < 0)) { 6985 return '\0'; 6986 } 6987 if ((radix < Character.MIN_RADIX) || (radix > Character.MAX_RADIX)) { 6988 return '\0'; 6989 } 6990 if (digit < 10) { 6991 return (char)('0' + digit); 6992 } 6993 return (char)('a' - 10 + digit); 6994 } 6995 6996 /** 6997 * Returns the Unicode directionality property for the given 6998 * character. Character directionality is used to calculate the 6999 * visual ordering of text. The directionality value of undefined 7000 * {@code char} values is {@code DIRECTIONALITY_UNDEFINED}. 7001 * 7002 * <p><b>Note:</b> This method cannot handle <a 7003 * href="#supplementary"> supplementary characters</a>. To support 7004 * all Unicode characters, including supplementary characters, use 7005 * the {@link #getDirectionality(int)} method. 7006 * 7007 * @param ch {@code char} for which the directionality property 7008 * is requested. 7009 * @return the directionality property of the {@code char} value. 7010 * 7011 * @see Character#DIRECTIONALITY_UNDEFINED 7012 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT 7013 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT 7014 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC 7015 * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER 7016 * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR 7017 * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR 7018 * @see Character#DIRECTIONALITY_ARABIC_NUMBER 7019 * @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR 7020 * @see Character#DIRECTIONALITY_NONSPACING_MARK 7021 * @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL 7022 * @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR 7023 * @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR 7024 * @see Character#DIRECTIONALITY_WHITESPACE 7025 * @see Character#DIRECTIONALITY_OTHER_NEUTRALS 7026 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING 7027 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE 7028 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING 7029 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE 7030 * @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT 7031 * @since 1.4 7032 */ 7033 public static byte getDirectionality(char ch) { 7034 return getDirectionality((int)ch); 7035 } 7036 7037 /** 7038 * Returns the Unicode directionality property for the given 7039 * character (Unicode code point). Character directionality is 7040 * used to calculate the visual ordering of text. The 7041 * directionality value of undefined character is {@link 7042 * #DIRECTIONALITY_UNDEFINED}. 7043 * 7044 * @param codePoint the character (Unicode code point) for which 7045 * the directionality property is requested. 7046 * @return the directionality property of the character. 7047 * 7048 * @see Character#DIRECTIONALITY_UNDEFINED DIRECTIONALITY_UNDEFINED 7049 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT DIRECTIONALITY_LEFT_TO_RIGHT 7050 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT DIRECTIONALITY_RIGHT_TO_LEFT 7051 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC 7052 * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER DIRECTIONALITY_EUROPEAN_NUMBER 7053 * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR 7054 * @see Character#DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR 7055 * @see Character#DIRECTIONALITY_ARABIC_NUMBER DIRECTIONALITY_ARABIC_NUMBER 7056 * @see Character#DIRECTIONALITY_COMMON_NUMBER_SEPARATOR DIRECTIONALITY_COMMON_NUMBER_SEPARATOR 7057 * @see Character#DIRECTIONALITY_NONSPACING_MARK DIRECTIONALITY_NONSPACING_MARK 7058 * @see Character#DIRECTIONALITY_BOUNDARY_NEUTRAL DIRECTIONALITY_BOUNDARY_NEUTRAL 7059 * @see Character#DIRECTIONALITY_PARAGRAPH_SEPARATOR DIRECTIONALITY_PARAGRAPH_SEPARATOR 7060 * @see Character#DIRECTIONALITY_SEGMENT_SEPARATOR DIRECTIONALITY_SEGMENT_SEPARATOR 7061 * @see Character#DIRECTIONALITY_WHITESPACE DIRECTIONALITY_WHITESPACE 7062 * @see Character#DIRECTIONALITY_OTHER_NEUTRALS DIRECTIONALITY_OTHER_NEUTRALS 7063 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING 7064 * @see Character#DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE 7065 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING 7066 * @see Character#DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE 7067 * @see Character#DIRECTIONALITY_POP_DIRECTIONAL_FORMAT DIRECTIONALITY_POP_DIRECTIONAL_FORMAT 7068 * @since 1.5 7069 */ 7070 public static byte getDirectionality(int codePoint) { 7071 return CharacterData.of(codePoint).getDirectionality(codePoint); 7072 } 7073 7074 /** 7075 * Determines whether the character is mirrored according to the 7076 * Unicode specification. Mirrored characters should have their 7077 * glyphs horizontally mirrored when displayed in text that is 7078 * right-to-left. For example, {@code '\u005Cu0028'} LEFT 7079 * PARENTHESIS is semantically defined to be an <i>opening 7080 * parenthesis</i>. This will appear as a "(" in text that is 7081 * left-to-right but as a ")" in text that is right-to-left. 7082 * 7083 * <p><b>Note:</b> This method cannot handle <a 7084 * href="#supplementary"> supplementary characters</a>. To support 7085 * all Unicode characters, including supplementary characters, use 7086 * the {@link #isMirrored(int)} method. 7087 * 7088 * @param ch {@code char} for which the mirrored property is requested 7089 * @return {@code true} if the char is mirrored, {@code false} 7090 * if the {@code char} is not mirrored or is not defined. 7091 * @since 1.4 7092 */ 7093 public static boolean isMirrored(char ch) { 7094 return isMirrored((int)ch); 7095 } 7096 7097 /** 7098 * Determines whether the specified character (Unicode code point) 7099 * is mirrored according to the Unicode specification. Mirrored 7100 * characters should have their glyphs horizontally mirrored when 7101 * displayed in text that is right-to-left. For example, 7102 * {@code '\u005Cu0028'} LEFT PARENTHESIS is semantically 7103 * defined to be an <i>opening parenthesis</i>. This will appear 7104 * as a "(" in text that is left-to-right but as a ")" in text 7105 * that is right-to-left. 7106 * 7107 * @param codePoint the character (Unicode code point) to be tested. 7108 * @return {@code true} if the character is mirrored, {@code false} 7109 * if the character is not mirrored or is not defined. 7110 * @since 1.5 7111 */ 7112 public static boolean isMirrored(int codePoint) { 7113 return CharacterData.of(codePoint).isMirrored(codePoint); 7114 } 7115 7116 /** 7117 * Compares two {@code Character} objects numerically. 7118 * 7119 * @param anotherCharacter the {@code Character} to be compared. 7120 7121 * @return the value {@code 0} if the argument {@code Character} 7122 * is equal to this {@code Character}; a value less than 7123 * {@code 0} if this {@code Character} is numerically less 7124 * than the {@code Character} argument; and a value greater than 7125 * {@code 0} if this {@code Character} is numerically greater 7126 * than the {@code Character} argument (unsigned comparison). 7127 * Note that this is strictly a numerical comparison; it is not 7128 * locale-dependent. 7129 * @since 1.2 7130 */ 7131 public int compareTo(Character anotherCharacter) { 7132 return compare(this.value, anotherCharacter.value); 7133 } 7134 7135 /** 7136 * Compares two {@code char} values numerically. 7137 * The value returned is identical to what would be returned by: 7138 * <pre> 7139 * Character.valueOf(x).compareTo(Character.valueOf(y)) 7140 * </pre> 7141 * 7142 * @param x the first {@code char} to compare 7143 * @param y the second {@code char} to compare 7144 * @return the value {@code 0} if {@code x == y}; 7145 * a value less than {@code 0} if {@code x < y}; and 7146 * a value greater than {@code 0} if {@code x > y} 7147 * @since 1.7 7148 */ 7149 public static int compare(char x, char y) { 7150 return x - y; 7151 } 7152 7153 /** 7154 * Converts the character (Unicode code point) argument to uppercase using 7155 * information from the UnicodeData file. 7156 * <p> 7157 * 7158 * @param codePoint the character (Unicode code point) to be converted. 7159 * @return either the uppercase equivalent of the character, if 7160 * any, or an error flag ({@code Character.ERROR}) 7161 * that indicates that a 1:M {@code char} mapping exists. 7162 * @see Character#isLowerCase(char) 7163 * @see Character#isUpperCase(char) 7164 * @see Character#toLowerCase(char) 7165 * @see Character#toTitleCase(char) 7166 * @since 1.4 7167 */ 7168 static int toUpperCaseEx(int codePoint) { 7169 assert isValidCodePoint(codePoint); 7170 return CharacterData.of(codePoint).toUpperCaseEx(codePoint); 7171 } 7172 7173 /** 7174 * Converts the character (Unicode code point) argument to uppercase using case 7175 * mapping information from the SpecialCasing file in the Unicode 7176 * specification. If a character has no explicit uppercase 7177 * mapping, then the {@code char} itself is returned in the 7178 * {@code char[]}. 7179 * 7180 * @param codePoint the character (Unicode code point) to be converted. 7181 * @return a {@code char[]} with the uppercased character. 7182 * @since 1.4 7183 */ 7184 static char[] toUpperCaseCharArray(int codePoint) { 7185 // As of Unicode 6.0, 1:M uppercasings only happen in the BMP. 7186 assert isBmpCodePoint(codePoint); 7187 return CharacterData.of(codePoint).toUpperCaseCharArray(codePoint); 7188 } 7189 7190 /** 7191 * The number of bits used to represent a <tt>char</tt> value in unsigned 7192 * binary form, constant {@code 16}. 7193 * 7194 * @since 1.5 7195 */ 7196 public static final int SIZE = 16; 7197 7198 /** 7199 * The number of bytes used to represent a {@code char} value in unsigned 7200 * binary form. 7201 * 7202 * @since 1.8 7203 */ 7204 public static final int BYTES = SIZE / Byte.SIZE; 7205 7206 /** 7207 * Returns the value obtained by reversing the order of the bytes in the 7208 * specified <tt>char</tt> value. 7209 * 7210 * @param ch The {@code char} of which to reverse the byte order. 7211 * @return the value obtained by reversing (or, equivalently, swapping) 7212 * the bytes in the specified <tt>char</tt> value. 7213 * @since 1.5 7214 */ 7215 public static char reverseBytes(char ch) { 7216 return (char) (((ch & 0xFF00) >> 8) | (ch << 8)); 7217 } 7218 7219 /** 7220 * Returns the Unicode name of the specified character 7221 * {@code codePoint}, or null if the code point is 7222 * {@link #UNASSIGNED unassigned}. 7223 * <p> 7224 * Note: if the specified character is not assigned a name by 7225 * the <i>UnicodeData</i> file (part of the Unicode Character 7226 * Database maintained by the Unicode Consortium), the returned 7227 * name is the same as the result of expression. 7228 * 7229 * <blockquote>{@code 7230 * Character.UnicodeBlock.of(codePoint).toString().replace('_', ' ') 7231 * + " " 7232 * + Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH); 7233 * 7234 * }</blockquote> 7235 * 7236 * @param codePoint the character (Unicode code point) 7237 * 7238 * @return the Unicode name of the specified character, or null if 7239 * the code point is unassigned. 7240 * 7241 * @exception IllegalArgumentException if the specified 7242 * {@code codePoint} is not a valid Unicode 7243 * code point. 7244 * 7245 * @since 1.7 7246 */ 7247 public static String getName(int codePoint) { 7248 if (!isValidCodePoint(codePoint)) { 7249 throw new IllegalArgumentException(); 7250 } 7251 String name = CharacterName.get(codePoint); 7252 if (name != null) 7253 return name; 7254 if (getType(codePoint) == UNASSIGNED) 7255 return null; 7256 UnicodeBlock block = UnicodeBlock.of(codePoint); 7257 if (block != null) 7258 return block.toString().replace('_', ' ') + " " 7259 + Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH); 7260 // should never come here 7261 return Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH); 7262 } 7263 }