Sunday, April 7, 2013

Understanding CMS GC Logs

In this article, we will examine the following topics:
  • What's Mark and Sweep algorithm?
  • What's CMS Collector in HotSpot?
  • What's the format of CMS logs?

Mark and Sweep (MS) Algorithm


The mark-and-sweep algorithm consists of two phases[3]: In the first phase, it finds and marks all accessible objects. The first phase is called the mark phase. In the second phase, the garbage collection algorithm scans through the heap and reclaims all the unmarked objects. The second phase is called the sweep phase. The algorithm can be expressed as follows:

for each root variable r
    mark (r);
sweep ();

The computational complexity of mark and sweep is both a function of the amount of live data on the heap (for mark) and the actual heap size (for sweep).
  • Mark
    • Add each object in the root set to a queue
      • Typically, the root set contains all objects that are available without having to trace any references[5], which includes all Java objects on local frames in whatever methods the program is executing when it is halted for GC. This includes:
        • Everything we can obtain from the user stack and registers in the thread contexts of the halted program. 
        • Global data, such as static fields.
    • For each object X in the queue Mark X reachable
      • mark bit is typically associated with each reachable object. 
    • Add all objects referenced from X to the queue
    • Marking is the most critical of the stages, as it usually takes up around 90 percent of the total garbage collection time. 
      • Fortunately, marking is very parallelizable and large parts of a mark phase can also be run concurrently with executing Java code. 
  • Sweep
    • For each object X on the heap, 
      • If the X not marked, garbage collect it
    • Sweeping (or compaction which is not included in MS algorithm), however, tend to be more troublesome to be made to run concurrently with the executing Java program.

Concurrent Mark Sweep (CMS) Collector


CMS (Concurrent Mark Sweep) is one of garbage collectors implemented in HotSpot JVM.  It is enabled using:
  • -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Note that there are two different implementations of parallel collectors for the young generation:
  • UseParNewGC
  • UseParallelGC 

UseParNewGC should be used with CMS, and UseParallelGC should be used with the throughput collector.  Also that there is another collector named iCMS,[4] which is a variant of CMS.  You should not use iCMS anymore because it is (or will be) deprecated.

CMS is designed to be mostly concurrent, requiring just two quick stop-the-world pauses per old space garbage collection cycle.   These two phases are the initial mark phase (single-threaded) and the remark phase (multithreaded).  It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads.

CMS does not perform compaction in normal CMS cycle.  However, an old generation overflow will trigger a stop-the-world compacting garbage collection.
  • Par new generation (or Young Gen)
    • Young generation space is further split into 
      • Eden 
      • Survivor spaces
    • On CMS, it would have defaulted to a maximum size 665 MB if you didn't explicitly set it.
    • Survivors from the young generation are evacuated to 
      • Other survivor space, or
      • Old generation space
  • Concurrent mark-sweep generation (or Tenured Gen)
    • Does in-place de-allocation (which can lead to heap fragmentation)
    • Is managed by free lists, which need synchronization
    • Concurrent marking phase
      • Two stop-the-world pauses
        • Initial mark
          • Marks reachable (live) objects
        • Remark
          • Unmarked objects are deducted to be unreachable (dead)
    • Concurrent sweeping phase
      • Sweeps over the heap
      • In-place de-allocated unmarked (dead) objects
      • End of concurrent sweeping--all unmarked objects have been de-allocated

Enabling Logging


We have used the following HotSpot VM options to generate the log file:
  • -Xloggc:/gc_0.log 
  • -verbose:gc 
  • -XX:+PrintGCDetails 
  • -XX:+PrintGCTimeStamps 
  • -XX:+PrintReferenceGC 

CMS GC Logs


Let's take a look at some of the CMS logs generated with 1.7.0_10:

2715.210: [GC 2715.210: [ParNew2715.282: [SoftReference, 0 refs, 0.0000120 secs]2715.282: [WeakReference, 2433 refs, 0.0004150 secs]2715.283: [FinalReference, 1947 refs, 0.0074410 secs]2715.290: [PhantomReference, 0 refs, 0.0000080 secs]2715.290: [JNI Weak Reference, 0.0000040 secs]: 310460K->27980K(318912K), 0.0804180 secs] 3828421K->3561073K(4158912K), 0.0805140 secs] [Times: user=0.36 sys=0.00, real=0.08 secs]
Young generation (ParNew) collection. Young generation capacity is 318912K and after the collection its occupancy drops down from 310460K to 27980K. This collection took 0.08 secs.

2715.294: [GC [1 CMS-initial-mark: 3533092K(3840000K)] 3564157K(4158912K), 0.0418950 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]

Beginning of tenured generation collection with CMS collector. This is initial Marking phase of CMS where all the objects directly reachable from roots are marked and this is done with all the mutator threads stopped.

Capacity of tenured generation space is 3840000K and CMS was triggered at the occupancy of 3533092K

2715.337: [CMS-concurrent-mark-start]
Start of concurrent marking phase.

In Concurrent Marking phase, threads stopped in the first phase are started again and all the objects transitively reachable from the objects marked in first phase are marked here.

2717.428: [CMS-concurrent-mark: 2.091/2.091 secs] [Times: user=6.34 sys=0.08, real=2.09 secs]
Concurrent marking took total 2.091 econds cpu time and 2.091 seconds wall time that includes the yield to other threads also.

2717.428: [CMS-concurrent-preclean-start]
Start of precleaning.

Precleaning is also a concurrent phase. Here in this phase we look at the objects in CMS heap which got updated by promotions from young generation or new allocations or got updated by mutators while we were doing the concurrent marking in the previous concurrent marking phase. By rescanning those objects concurrently, the precleaning phase helps reduce the work in the next stop-the-world “remark” phase.

2717.428: [Preclean SoftReferences, 0.0000020 secs]2717.428: [Preclean WeakReferences, 0.0066370 secs]2717.434: [Preclean FinalReferences, 0.0011100 secs]2717.436: [Preclean PhantomReferences, 0.0003480 secs]2717.475: [CMS-concurrent-preclean: 0.045/0.048 secs] [Times: user=0.19 sys=0.00, real=0.04 secs]

Concurrent precleaning took 0.045 secs total cpu time and 0.048 wall time.


2717.475: [CMS-concurrent-abortable-preclean-start]

2717.876: [GC 2717.876: [ParNew2717.958: [SoftReference, 0 refs, 0.0000140 secs]2717.958: [WeakReference, 2683 refs, 0.0004340 secs]2717.959: [FinalReference, 1972 refs, 0.0069420 secs]2717.966: [PhantomReference, 0 refs, 0.0000080 secs]2717.966: [JNI Weak Reference, 0.0000050 secs]: 311500K->28150K(318912K), 0.0900310 secs] 3844593K->3577211K(4158912K), 0.0901370 secs] [Times: user=0.41 sys=0.01, real=0.09 secs]

2719.157: [CMS-concurrent-abortable-preclean: 1.582/1.681 secs] [Times: user=4.16 sys=0.07, real=1.68 secs]

2719.157: [GC[YG occupancy: 171027 K (318912 K)]2719.157: [Rescan (parallel) , 0.0680250 secs]2719.225: [weak refs processing2719.226: [SoftReference, 0 refs, 0.0000070 secs]2719.226: [WeakReference, 9940 refs, 0.0010950 secs]2719.227: [FinalReference, 10877 refs, 0.0226430 secs]2719.249: [PhantomReference, 0 refs, 0.0000050 secs]2719.249: [JNI Weak Reference, 0.0000110 secs], 0.0238550 secs]2719.249: [scrub string table, 0.0057930 secs] [1 CMS-remark: 3549060K(3840000K)] 3720087K(4158912K), 0.0989920 secs] [Times: user=0.40 sys=0.01, real=0.10 secs]
In the above log, after a preclean, 'abortable preclean' starts. After the young generation collection, the young gen occupancy drops down from 328912K to 28150K When young gen occupancy reaches 171027 K which is 52% of the total capacity, precleaning is aborted and 'remark' phase is started.
Note that in young generation occupancy also gets printed in the final remark phase.
Stop-the-world phase. This phase rescans any residual updated objects in CMS heap, retraces from the roots and also processes Reference objects. Here the rescanning work took 0.0680250 secs and weak reference objects processing took 0.0010950 secs. This phase took total 0.0238550 secs to complete.

2719.257: [CMS-concurrent-sweep-start]
Start of sweeping of dead/non-marked objects. Sweeping is concurrent phase performed with all other threads running.

2722.945: [CMS-concurrent-sweep: 3.606/3.688 secs] [Times: user=8.79 sys=0.12, real=3.69 secs]
Sweeping took 3.69 secs.

2722.945: [CMS-concurrent-reset-start]
Start of reset.

2722.973: [CMS-concurrent-reset: 0.028/0.028 secs] [Times: user=0.05 sys=0.00, real=0.03 secs]
In this phase, the CMS data structures are reinitialized so that a new cycle may begin at a later time. In this case, it took 0.03 secs.

This was how a normal CMS cycle runs.  Note that, in our experiment, we didn't see the "concurrent mode failure".  So, we didn't discuss it here.  For more information of it, read [1].

References

  1. Understanding CMS GC Logs
  2. Java GC, HotSpot's CMS and heap fragmentation
  3. Mark-and-Sweep Garbage Collection
  4. Really? iCMS? Really?
  5. Oracle JRockit--The Definitive Guide
  6. JRockit: Parallel vs Concurrent Collectors (Xml and More)
  7. The Unspoken - CMS and PrintGCDetails (Jon Masamitsu's Weblog)
  8. The Unspoken - Phases of CMS (Jon Masamitsu's Weblog)

2 comments:

Unknown said...

For free gc log analysis, send the log file to performancetestexpert@gmail.com

Jim Taylor said...

This is Pretty Good information. Universal Garbage Collection log analyzer that parses any format of Garbage collection logs and generates WOW graphs & AHA metrics. Inbuilt intelligence has ability to discover any sort of memory problems.Excellence & Simplicity Devops tools for cloud. Gceasy Why CMS is deprecated in JDK 9