Relocate the CDS archive if it cannot be mapped to the requested address ------------------------------------------------------------------------------- 1. About This is a design document for the REF JDK-8231610 [1], whose name is the title of this document. 2. Motivation The CDS archive is mmaped to a fixed address range (starting at SharedBaseAddress, usually 0x800000000). Previously, if this requested address range is not available (usually due to Address Space Layout Randomization (ASLR) [2]), the JVM will give up and will load classes dynamically using class files. [a] This causes slow down in JVM start-up. [b] Handling of mapping failures causes unnecessary complication in the CDS tests. Here are some preliminary benchmarking results (using default CDS archive, running helloworld): (a) 47.1ms (CDS enabled, mapped at requested addr) "before" (b) 53.8ms (CDS enabled, mapped at alternate addr) "this RFE, with relocation" (c) 86.2ms (CDS disabled) The small degradation in (b) is caused by the relocation of absolute pointers embedded in the CDS archive. However, it is still a big improvement over case (c) 3. Overview Existing design of CDS: [a] The MC, RW, RO, and MD regions are laid out consecutively in memory, with the bottom of the MC region starting at SharedBaseAddress (0x800000000). [b] Objects in these regions use direct pointers to reference each other. For example, the first object in the RW region could be an InstanceKlass K, whose address is 0x800006000. The first word in K is its C++ vtpr, which points to a vtable in the MC region (e.g., 0x80000000). Therefore, we have *(intptr_t*)0x800006000 == 0x800000000. New Design Changes by this RFE If the requested address space is not available (i.e., the four regions cannot be mapped at 0x800000000): [c] We map the regions consecutively, at an arbitrary space allocated by the OS. E.g., 0x900000000. [d] We relocate all internal pointers. With the example pointer in [b] above, after mapping, we will do this operation: intptr_t delta = 0x900000000 - 0x800000000; *(intptr_t*)0x900006000 += delta; // becomes 0x900000000 [e] To facilitate [d], we compute a bitmap of all the pointers in the archive. This turns out to be easy, because we already traverses all pointers of all archived objects during CDS dump time (see [3]) [f] The above describes how the base CDS archived is handled. The JVM can also map a second archive (the "dynamic" archive), which is handled similarly. 4. Implementation Space Reservation (run-time) [a] The existing code for mapping the base and dynamic archives are quite incoherent. I took this chance to refactor it. Please start from the following function in the webrev: MetaspaceShared::initialize_runtime_shared_and_meta_spaces() [b] Also, the existing code for reserving the compress klass metaspace with CDS is messy (see [4]). I consolidated the code for reserving both the CDS space and the compress klass metaspace: We first attempt to reserve a space at the requested address, like this |<-- cds size->|<----- CompressedClassSpaceSize ---->| [.... cds .....|..... compressed class metaspace ....] ^ +- 0x800000000 = _narrow_oop._base (requested by archive) If this fails, we simply reserve a space at a location picked by the OS, whose size is big enough for both CDS and compressed class metaspace: |<-- cds size->|<----- CompressedClassSpaceSize ---->| [.... cds .....|..... compressed class metaspace ....] ^ +- 0x900000000 = _narrow_oop._base (picked by OS) (By reserving all these spaces at once, we avoids a problem with the previous code, where we are able to map CDS, but then aren't able to reserve the compress klass metaspace, which means we can't use CDS after all.) Pointer Bitmap Creation [c] As mentioned in section 3.e, most of the pointers are discovered during dump-time e metaspace object traversal. For static archive dumping, see ShallowCopyEmbeddedRefRelocator and RefRelocator in metaspaceShared.cpp. For dynamic archive dumping, there was existing code that maintained a bitmap already (dynamic archive has to be relocated at dump time for a different reason), so we basically reused the existing code. In addition, a small number of pointers need to be manually marked, using ArchivePtrMarker::mark_pointer(). The size of the bitmap is about 1.5% of the total archive size (188KB out of 12MB). Impact on shared heap objects [d] Shared heap objects are mostly unaffected, because they use compressed klass pointers. Even with relocation, the offset of each archived InstanceKlass from _narrow_oop._base is unchanged, which means we don't need to patch the header words of shared heap objects. [e] The only direct metaspace object pointers stored in the heap objects are Klass pointers in mirror objects. These can be relocated as each mirror object is restored at run time. Changes of FileMapHeader [f] Previously, FileMapHeader contains direct pointers into the mapped regions. These could be messy to keep track of during relocation, so I changed all of them into offsets from SharedBaseAddress. Windows Messiness [g] Windows has the unique problem where you can't mmap() a file into a ReservedSpace. Please comments inside MetaspaceShared::reserve_address_space_for_archives() in the webrev. Relocation at dump time [h] Previously, at dump time, we could also fail to reserve the address space at 0x800000000. In this case, we will reserve an address space at an arbitrary location and dump the CDS archive there. The problem is this arbitrary location may not be as "mmap friendly" as 0x800000000, which means we may have more mmap failures during run time. [i] Now that we already have the relocation bitmap (see 4.c), we simply relocate the archive back to 0x800000000 before dumping it out to file. As a result, now the requested base address of the CDS archive is always at 0x800000000. 5. Testing [a] (For debug builds only) -XX:SharedBaseAddress==0 means that the archive should always be mapped at an alternative address. This way we can reliably test all the relocation code. [b] I wrote new test cases to specific relocation (ArchiveRelocationTest.java, DynamicArchiveRelocationTest.java) All combinations of relocation are tested: - relocation happens only at dump time. - relocation happens only at run time. - relocation happens at both dump time and run time. [c] I added a new tier-4 test group (tier4-rt-cds-relocation) to run all CDS tests with -XX:SharedBaseAddress==0 6. Trade offs [a] Previously, if we fail to map the CDS archive, we will revert back to regular class loading. [b] With this RFE, the CDS archive will be still be mapped, but most of its contents will be overwritten (due to pointer relocation). Start-up speed: [c] In extreme cases, where the archive contains many classes, but the app uses only a small number of them, the cost of relocation may out-weight the speed-up gain from faster class loading. However, with the helloworld test (see section 2), which loads 402 out of all 1173 archived, we clearly see that the speed gain out-weights the relocation cost. (1) 47.1ms (CDS enabled, mapped at requested addr) "before" (2) 53.8ms (CDS enabled, mapped at alternate addr) "this RFE, with relocation" (3) 86.2ms (CDS disabled) So for helloworld, we probably won't see degradation over "CDS disabled" until the archive is increased to more than 6000 classes. Memory Footprint [d] In the same extreme cases as in [c], relocation of the archive may cause much higher memory usage. FYI, here are the memory usage for helloworld: (1) 36.18MB (CDS enabled, mapped at requested addr) "before" (2) 38.22MB (CDS enabled, mapped at alternate addr) "this RFE, with relocation" (3) 32.48MB (CDS disabled) [e] We might consider adding a flag to turn off relocation (i.e., revert to the old behavior) if this turns out to be an real-world problem. References: [1] https://bugs.openjdk.java.net/browse/JDK-8231610 [2] https://en.wikipedia.org/wiki/Address_space_layout_randomization [3] Traversal of metaspace pointers during CDS dump time http://hg.openjdk.java.net/jdk/jdk/file/94fe833a244b/src/hotspot/share/memory/metaspaceShared.cpp#l1282 [4] http://hg.openjdk.java.net/jdk/jdk/file/94fe833a244b/src/hotspot/share/memory/metaspace.cpp#l1036