--- old/src/jdk.scripting.nashorn/share/classes/jdk/nashorn/internal/runtime/linker/NameCodec.java 2020-04-15 18:51:07.000000000 +0530 +++ /dev/null 2020-04-15 18:51:07.000000000 +0530 @@ -1,432 +0,0 @@ -/* - * Copyright (c) 2010, 2013, Oracle and/or its affiliates. All rights reserved. - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - * - * This code is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License version 2 only, as - * published by the Free Software Foundation. Oracle designates this - * particular file as subject to the "Classpath" exception as provided - * by Oracle in the LICENSE file that accompanied this code. - * - * This code is distributed in the hope that it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - * version 2 for more details (a copy is included in the LICENSE file that - * accompanied this code). - * - * You should have received a copy of the GNU General Public License version - * 2 along with this work; if not, write to the Free Software Foundation, - * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. - * - * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA - * or visit www.oracle.com if you need additional information or have any - * questions. - */ - -package jdk.nashorn.internal.runtime.linker; - -/** - *
- * Implements the name mangling and demangling as specified by John Rose's - * "Symbolic Freedom in the VM" article. Normally, you would - * mangle the names in the call sites as you're generating bytecode, and then - * demangle them when you receive them in bootstrap methods. - *
- *- * This code is derived from sun.invoke.util.BytecodeName. Apart from subsetting that - * class, we don't want to create dependency between non-exported package from java.base - * to nashorn module. - *
- * - *- * The JVM defines a very small set of characters which are illegal - * in name spellings. We will slightly extend and regularize this set - * into a group of dangerous characters. - * These characters will then be replaced, in mangled names, by escape sequences. - * In addition, accidental escape sequences must be further escaped. - * Finally, a special prefix will be applied if and only if - * the mangling would otherwise fail to begin with the escape character. - * This happens to cover the corner case of the null string, - * and also clearly marks symbols which need demangling. - *
- *
- * Dangerous characters are the union of all characters forbidden
- * or otherwise restricted by the JVM specification,
- * plus their mates, if they are brackets
- * ([
and ]
,
- * <
and >
),
- * plus, arbitrarily, the colon character :
.
- * There is no distinction between type, method, and field names.
- * This makes it easier to convert between mangled names of different
- * types, since they do not need to be decoded (demangled).
- *
- * The escape character is backslash \
- * (also known as reverse solidus).
- * This character is, until now, unheard of in bytecode names,
- * but traditional in the proposed role.
- *
- *
- * Every escape sequence is two characters - * (in fact, two UTF8 bytes) beginning with - * the escape character and followed by a - * replacement character. - * (Since the replacement character is never a backslash, - * iterated manglings do not double in size.) - *
- *- * Each dangerous character has some rough visual similarity - * to its corresponding replacement character. - * This makes mangled symbols easier to recognize by sight. - *
- *
- * The dangerous characters are
- * /
(forward slash, used to delimit package components),
- * .
(dot, also a package delimiter),
- * ;
(semicolon, used in signatures),
- * $
(dollar, used in inner classes and synthetic members),
- * <
(left angle),
- * >
(right angle),
- * [
(left square bracket, used in array types),
- * ]
(right square bracket, reserved in this scheme for language use),
- * and :
(colon, reserved in this scheme for language use).
- * Their replacements are, respectively,
- * |
(vertical bar),
- * ,
(comma),
- * ?
(question mark),
- * %
(percent),
- * ^
(caret),
- * _
(underscore), and
- * {
(left curly bracket),
- * }
(right curly bracket),
- * !
(exclamation mark).
- * In addition, the replacement character for the escape character itself is
- * -
(hyphen),
- * and the replacement character for the null prefix is
- * =
(equal sign).
- *
- * An escape character \
- * followed by any of these replacement characters
- * is an escape sequence, and there are no other escape sequences.
- * An equal sign is only part of an escape sequence
- * if it is the second character in the whole string, following a backslash.
- * Two consecutive backslashes do not form an escape sequence.
- *
- * Each escape sequence replaces a so-called original character - * which is either one of the dangerous characters or the escape character. - * A null prefix replaces an initial null string, not a character. - *
- *
- * All this implies that escape sequences cannot overlap and may be
- * determined all at once for a whole string. Note that a spelling
- * string can contain accidental escapes, apparent escape
- * sequences which must not be interpreted as manglings.
- * These are disabled by replacing their leading backslash with an
- * escape sequence (\-
). To mangle a string, three logical steps
- * are required, though they may be carried out in one pass:
- *
\-
).\|
for /
, etc.).\=
).Spelling strings which contain accidental - * escapes must have them replaced, even if those - * strings do not contain dangerous characters. - * This restriction means that mangling a string always - * requires a scan of the string for escapes. - * But then, a scan would be required anyway, - * to check for dangerous characters. - * - *
- *- * If a bytecode name does not contain any escape sequence, - * demangling is a no-op: The string demangles to itself. - * Such a string is called self-mangling. - * Almost all strings are self-mangling. - * In practice, to demangle almost any name “found in nature”, - * simply verify that it does not begin with a backslash. - *
- *
- * Mangling is a one-to-one function, while demangling
- * is a many-to-one function.
- * A mangled string is defined as validly mangled if
- * it is in fact the unique mangling of its spelling string.
- * Three examples of invalidly mangled strings are \=foo
,
- * \-bar
, and baz\!
, which demangle to foo
, \bar
, and
- * baz\!
, but then remangle to foo
, \bar
, and \=baz\-!
.
- * If a language back-end or runtime is using mangled names,
- * it should never present an invalidly mangled bytecode
- * name to the JVM. If the runtime encounters one,
- * it should also report an error, since such an occurrence
- * probably indicates a bug in name encoding which
- * will lead to errors in linkage.
- * However, this note does not propose that the JVM verifier
- * detect invalidly mangled names.
- *
- * As a result of these rules, it is a simple matter to - * compute validly mangled substrings and concatenations - * of validly mangled strings, and (with a little care) - * these correspond to corresponding operations on their - * spelling strings. - *
- *If languages that include non-Java symbol spellings use this - * mangling convention, they will enjoy the following advantages: - *
- *- * For human readable displays of symbols, - * it will be better to present a string-like quoted - * representation of the spelling, because JVM users - * are generally familiar with such tokens. - * We suggest using single or double quotes before and after - * mangled symbols which are not valid Java identifiers, - * with quotes, backslashes, and non-printing characters - * escaped as if for literals in the Java language. - *
- *
- * For example, an HTML-like spelling
- * <pre>
mangles to
- * \^pre\_
and could
- * display more cleanly as
- * '<pre>'
,
- * with the quotes included.
- * Such string-like conventions are not suitable
- * for mangled bytecode names, in part because
- * dangerous characters must be eliminated, rather
- * than just quoted. Otherwise internally structured
- * strings like package prefixes and method signatures
- * could not be reliably parsed.
- *
- * In such human-readable displays, invalidly mangled
- * names should not be demangled and quoted,
- * for this would be misleading. Likewise, JVM symbols
- * which contain dangerous characters (like dots in field
- * names or brackets in method names) should not be
- * simply quoted. The bytecode names
- * \=phase\,1
and
- * phase.1
are distinct,
- * and in demangled displays they should be presented as
- * 'phase.1'
and something like
- * 'phase'.1
, respectively.
- *