package org.openjdk;

import org.openjdk.jmh.annotations.*;
import java.io.File;
import java.net.URI;
import java.util.concurrent.*;

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 20, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(4)
public class URIParserBench {

    /*

  URI$Parser optimization experiment, with the goal to improve
  startup (and footprint) while not impacting peak performance.

  Baseline includes recent improvements from removing redundant
  volatiles in URI, see JDK-8145862 and JDK-8145680

  Experiment description:

  - After improving URI w.r.t. volatile fields, I noticed interpreted execution of 
    URI constructors was suspiciously slow (>10us)

  - Initial pass at profiling showed time being spent in charAt/substring
    methods wrapping URI$Parser.input. Removing these improved speed in the 
    interpreter by ~20% without affecting peak performance.

  - Next profiling showed there was time being spent in synthetic methods that
    were being generated for the inner Parser class to be able to access
    private fields and methods in the parent, URI. Removing Parser and moving
    all methods to URI meant a further improvement to interpreted code but regressed
    throughput in JITted code due to having to access the volatile string field...

  - Explored restructuring code to make string final instead of volatile.

    This led to simplification/improvements of several operations: since string, scheme 
    and fragment are always set in constructor, getRawSchemeSpecificPart can simply 
    substring from string without any loss of performance (or any worse thread safety).

    All constructors set the string field except an internal constructor used when
    relativizing, normalizing and resolving URIs from other URI objects, so added some
    micros to measure this. While there might be a minimal footprint hit, it seems that
    the throughput hit is negligible. We're also still in definitely better shape compared 
    to before w.r.t. throughput. 

    Now, having string always being created for the operations *could* still be a problem
    (minimally increased footprint), so I scanned through some commonly used open source
    projects (netty, grizzly): while finding plenty of uses of public URI constructors, 
    I've not come up with any use of these secondary operations, rather these tools have 
    gone to lengths to reimplement normalization/relativization directly on Strings to 
    avoid the overhead of creating and transforming URIs...

  - Drive-by optimizations: Parser creates schemeSpecificPart eagerly, which is unnecessary
    for many uses, but for many types of URIs the SSP == path, so checking for this means
    we save both footprint and throughput.


  Test setup: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

  Baseline:

  Benchmark                                                    Mode  Cnt    Score    Error  Units
  URIParserBench.createJRT                                     avgt   20   92.821 ±  6.742  ns/op
  URIParserBench.createMixed                                   avgt   20  156.030 ± 11.319  ns/op
  URIParserBench.createModule                                  avgt   20  109.423 ±  8.717  ns/op
  URIParserBench.createWithQuery                               avgt   20  149.273 ± 13.337  ns/op
  URIParserBench.getRelativeURI                                avgt   20  417.349 ± 29.871  ns/op
  URIParserBench.getRelativeURIFromExisting                    avgt   20   96.893 ±  4.218  ns/op
  URIParserBench.getSchemeSpecificPartWithQuery                avgt   20  143.445 ±  7.081  ns/op
  URIParserBench.getSchemeSpecificPartWithoutQuery             avgt   20   95.484 ±  5.312  ns/op
  URIParserBench.getSchemeSpecificPartWithoutSchemeOrFragment  avgt   20  126.345 ±  5.550  ns/op

  Experiment:

  Benchmark                                                    Mode  Cnt    Score    Error  Units
  URIParserBench.createJRT                                     avgt   20   57.524 ±  2.639  ns/op
  URIParserBench.createMixed                                   avgt   20  130.450 ±  8.163  ns/op
  URIParserBench.createModule                                  avgt   20   85.525 ±  3.969  ns/op
  URIParserBench.createWithQuery                               avgt   20  127.554 ±  8.652  ns/op
  URIParserBench.getRelativeURI                                avgt   20  398.856 ± 21.873  ns/op
  URIParserBench.getRelativeURIFromExisting                    avgt   20  171.144 ±  7.601  ns/op
  URIParserBench.getSchemeSpecificPartWithQuery                avgt   20  143.126 ±  7.495  ns/op
  URIParserBench.getSchemeSpecificPartWithoutQuery             avgt   20   81.317 ±  4.413  ns/op
  URIParserBench.getSchemeSpecificPartWithoutSchemeOrFragment  avgt   20  109.398 ±  5.803  ns/op

  (Before removing volatiles and volatile inits from URI:

  Benchmark                                                    Mode  Cnt    Score    Error  Units
  URIParserBench.createJRT                                     avgt   10  125.016 ± 14.243  ns/op
  URIParserBench.createMixed                                   avgt   10  191.271 ± 15.204  ns/op
  URIParserBench.createModule                                  avgt   10  135.584 ± 17.641  ns/op
  URIParserBench.createWithQuery                               avgt   10  182.828 ± 17.690  ns/op
  URIParserBench.getRelativeURI                                avgt   10  482.073 ± 30.876  ns/op
  URIParserBench.getRelativeURIFromExisting                    avgt   10  120.606 ± 10.212  ns/op
  URIParserBench.getSchemeSpecificPartWithQuery                avgt   10  144.177 ±  7.413  ns/op
  URIParserBench.getSchemeSpecificPartWithoutQuery             avgt   10  101.323 ± 10.765  ns/op)

  All URI shapes are faster to construct with all the experiment changes in place, and footprint
  improves in the normal cases by not creating schemeSpecificPart eagerly.

  The getRelativeURI micros exercise the penalty of having to create the string field eagerly, 
  which is markedly more expensive when isolated (getRelativeURIFromExisting) but disappears in
  the noise when combined with URI creation (getRelativeURI). All other cases improve or stay 
  neutral.

  Comparing to before when all fields were volatile, the throughput is clearly better in all
  cases but getRelativeURIFromExisting, where there's still a noticeable regression. 

  -Xint

  Baseline:

  Benchmark                                                    Mode  Cnt   Score   Error  Units
  URIParserBench.createJRT                                     avgt   20  13.971 ± 0.130  us/op
  URIParserBench.createMixed                                   avgt   20  39.749 ± 0.768  us/op
  URIParserBench.createModule                                  avgt   20  29.047 ± 0.597  us/op
  URIParserBench.createWithQuery                               avgt   20  39.753 ± 0.919  us/op
  URIParserBench.getRelativeURI                                avgt   20  96.385 ± 1.998  us/op
  URIParserBench.getRelativeURIFromExisting                    avgt   20  21.066 ± 0.423  us/op
  URIParserBench.getSchemeSpecificPartWithQuery                avgt   20  40.650 ± 1.044  us/op
  URIParserBench.getSchemeSpecificPartWithoutQuery             avgt   20  30.110 ± 0.627  us/op
  URIParserBench.getSchemeSpecificPartWithoutSchemeOrFragment  avgt   20  51.043 ± 1.382  us/op

  Experiment:

  Benchmark                                                    Mode  Cnt   Score   Error  Units
  URIParserBench.createJRT                                     avgt   20  10.701 ± 0.151  us/op
  URIParserBench.createMixed                                   avgt   20  26.480 ± 0.503  us/op
  URIParserBench.createModule                                  avgt   20  19.547 ± 0.345  us/op
  URIParserBench.createWithQuery                               avgt   20  26.298 ± 0.458  us/op
  URIParserBench.getRelativeURI                                avgt   20  71.617 ± 1.005  us/op
  URIParserBench.getRelativeURIFromExisting                    avgt   20  25.007 ± 1.029  us/op
  URIParserBench.getSchemeSpecificPartWithQuery                avgt   20  28.534 ± 0.643  us/op
  URIParserBench.getSchemeSpecificPartWithoutQuery             avgt   20  21.027 ± 0.596  us/op
  URIParserBench.getSchemeSpecificPartWithoutSchemeOrFragment  avgt   20  32.636 ± 0.912  us/op

  The experiment greatly benefit interpreted code across the board, except for the .*FromExisting 
  case that see's a small regression.

  The size of URI.class is now 4.5Kb smaller than URI.class+URI$Parser.class before, which adds
  to the startup improvement.

     */

    @Setup
    public void setup() throws Exception {
        urls = new String[1024];
        for (int i = 0; i < 1024; i++) {
            if (ThreadLocalRandom.current().nextDouble(1.0) < 0.5) {
                urls[i] = "jrt:/jdk.location";
            } else {
                urls[i] = "jrt:/jdk.location?query";
            }
            if (ThreadLocalRandom.current().nextDouble(1.0) < 0.1) {
                urls[i] = urls[i] + "#fragment";
            }
        }
    }

    @Benchmark
    public URI createJRT() throws Exception {
        return URI.create("jrt:/");
    }

    public int index = 0;

    public String[] urls;

    public URI baseURI = URI.create("jrt:/java.base");

    public URI childURI = URI.create("jrt:/java.base/some/thing/else?query");


    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public URI createMixed() throws Exception {
        index++;
        return URI.create(urls[index % 1024]);
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public URI createModule() throws Exception {
        return URI.create("jrt:/java.base");
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public URI createWithQuery() throws Exception {
        return URI.create("jrt:/java.base?query");
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public String getSchemeSpecificPartWithQuery() throws Exception {
        return URI.create("jrt:/java.base?query").getSchemeSpecificPart();
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public String getSchemeSpecificPartWithoutQuery() throws Exception {
        return URI.create("jrt:/java.base").getSchemeSpecificPart();
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public String getSchemeSpecificPartWithoutSchemeOrFragment() throws Exception {
        return URI.create("/java.base?module=accessible").getSchemeSpecificPart();
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public URI getRelativeURIFromExisting() throws Exception {
        return baseURI.relativize(childURI);
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public URI getRelativeURI() throws Exception {
        return URI.create("jrt:/java.base").relativize(URI.create("jrt:/java.base/test?query"));
    }
}