Object Locking

Practically all server applications require some sort of synchronization between multiple threads. Most of the synchronization work is done for us at the framework level, such as by our web server, DB client or messaging framework. Java and Scala provide a multitude of components to write solid multi-threaded applications. These include object pools, concurrent collections, advanced locks, execution contexts etc..

To better understand these, let’s explore the most synchronization idiom – the Object lock. This mechanism powers the synchronized keyword, making it one of, if not the most popular multi-threading idiom in Java. It is also at the base of many of the more complex patterns we use such as thread and connection pools, concurrent collections and more.

The synchronized keyword is used in two primary contexts:

  1. as a method modifier to mark a method that it can only be executed by one thread at a time.
  2. by declaring a code block as a critical section – one that’s only available to a single thread at any given point in time.

Locking Instructions

Fact #1. Synchronized code blocks are implemented using two dedicated bytecode instructions, which are part of the official specification – MonitorEnter and MonitorExit. This differs from other locking mechanisms, such as those found in the java.util.concurrent package, which are implemented (in the case of HotSpot) using a combination of Java code and native calls made through sun.misc.Unsafe.

These instructions operate on an object specified explicitly by the developer in the context of the synchronized block. For synchronized methods the lock is automatically selected to be the “this” variable. For static methods the lock will be placed on the Class object.

Synchronized methods can sometimes cause bad behavior. One example is creating implicit dependencies between different synchronized methods of the same object, as they share the same lock. A worse scenario is declaring synchronized methods in a base class (which might even be a 3rd party class) and then adding new synchronized methods to a derived class. This creates implicit synchronization dependencies across the hierarchy and has the potential of creating throughput issues or even deadlocks. To avoid these, it’s recommended to use a privately held object as a lock to prevent accidental sharing or escapement of locks.

The Compiler and Synchronization

There are two bytecode instructions responsible for synchronization. This is unusual, as most bytecode instructions are independent of each other, usually “communicating” with one another by placing values on the thread’s operand stack. The object to lock is also loaded from the operand stack, previously placed there by either dereferencing a variable, field or invoking a method returning an object.

Fact #2. So what happens if one of the two instructions is called without a respective call to the other? The Java compiler will not produce code that calls MonitorExit without calling MonitorEnter. Even so, from the JVM’s perspective such code is totally valid. The result of such a case would be that the MonitorExit instruction with throw an IllegalMonitorStateException.

A more dangerous case is what would happen if a lock is acquired via MonitorEnter, but isn’t released via a corresponding call to a MonitorExit. In this case the thread owning the lock can cause other threads who are trying to obtain the lock to block indefinitely. It’s worth noting that since the lock is reentrant, the thread owning the lock may continue to happily execute even if it were to reach and reenter the same lock again.

And here’s the catch. To prevent this from happening, the Java compiler generates matching enter and exit instructions in such a way that once execution has entered into a synchronized block or method, it must pass through a matching MonitorExit instruction for the same object. One thing that can throw a wrench into this, is if an exception is thrown within the critical section.

public void hello() {
synchronized(this) {
System.out.println("Hi!, I'm alone here");

Let’s analyze the bytecode –

aload_0 //load this into the operand stack
dup //load it again
astore_1 //backup this into an implicit variable stored at register 1
monitorenter //pop the value of this from the stack to enter the monitor

//the actual critical section
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Hi!, I'm alone here"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V

aload_1 //load the backup of this
monitorexit //pop up the var and exit the monitor
goto 14 // completed - jump to the end

// the added catch clause - we got here if an exception was thrown -
aload_1 // load the backup var.
monitorexit //exit the monitor
athrow // rethrow the exception object, loaded into the operand stack

The mechanism used by the compiler to prevent the stack from unwinding without going through the MonitorExit instruction is pretty straightforward – the compiler adds an implicit try…catch clause to release the lock and rethrow the exception.

Fact #3. Another question is where is the reference to the locked object stored between the corresponding enter and exit calls. Keep in mind that multiple threads could be executing the same synchronized block concurrently, using different lock objects. If the locked object is the result of a method being invoked, it’s highly unlikely the JVM will execute it again, as it may change the object’s state, or may not even return the same object. The same can be true for a variable or field which might have changed since the monitor was entered.

The monitor variable. To counter this, the compiler adds an implicit local variable to the method to hold the value of the locked object. This is a smart solution, as it imposes fairly minimal overhead on maintaining a reference to the locked object, as opposed to using a concurrent heap structure to map locked objects to threads (a structure which in itself might need synchronization). I first observed this new variable when building OverOps‘s stack analysis algorithm and saw there were unexpected variables popping up in the code.

Notice that all this work is done at the Java compiler level. The JVM is perfectly happy to enter a critical section through a MonitorEnter instruction without exiting it (or vice versa), or use different objects for what should be corresponding enter and exit methods.

Locking at the JVM Level

Let’s take a deeper look now into how locks are actually implemented at the JVM level. For this we’ll be examining the HotSpot SE 7 implementation, as this is VM specific. Since locking can have some pretty adverse implications on code throughput, the JVM has put in place some very strong optimizations to make acquiring and releasing locks as efficient as possible.

Fact #4. One of the strongest mechanisms put in place by the JVM is thread lock biasing. Locking is an intrinsic capability each Java objects has, much like having a system hashcode or a reference to its defining class. This is true regardless of the object’s type (you can even use a primitive array as a lock if you’d like).

These types of data are stored in each object’s header (also known as the object’s mark). Some of this data that is placed in the object’s header is reserved for describing the object’s locking state. This includes bit flags describing the object’s locking state (i.e. locked / unlocked) and a reference to the thread which currently owns the lock – the thread towards the object is biased.

In order to conserve space within the object header, Java thread objects are allocated in a lower segment of the VM’s heap in order to reduce the address size and save up on bits within each object’s header (54 or 23 bits for 64 and 32 bit JVMs respectively).

For 64 bit –

normal vs biased objects

The Locking Algorithm

When the JVM attempts to acquire a lock on an object it goes through a series of steps ranging from optimistic to the pessimistic.

Fact #5. A lock is acquired by a thread if it succeeds in establishing itself as the object lock’s owner. This is determined by whether the thread is able to install a reference to itself (a pointer to the internal JavaThread object) in the object’s header.

Acquiring the lock. A first attempt to do this is done using a simple compare-and-exchange (CAS) operation. This is very efficient as it can usually translate into a direct CPU instruction (e.g cmpxchg). CAS operations along with an OS specific thread parking routines serve as the building blocks for the object synchronization idiom.

If the lock is either free or has been previously biased toward this thread the lock on the object is obtained for the thread and execution can continue immediately. If the CAS fails the JVM will perform one round of spin locking where the thread parks to effectively put it to sleep between retrying the CAS. If these initial attempts fail (signaling a fairly higher level of contention for the lock) the thread will move itself to a blocked state and enqueue itself in the list of threads vying for the lock and begin a series of spinlocks.

Releasing the lock. When exiting the critical section through a MonitorExit instruction, the owner thread will try to see if it can wake any of the parked threads which may be waiting for the lock to be released. This process is known as choosing an “heir”. This is meant to increase liveliness, and to prevent a scenario where threads remain parked while the lock has already been released (also known as stranding).

Debugging server multi-threading problems is hard, as they tend to depend on very specific timing and OS heuristics. It was one of the reasons that got us working on OverOps in the first place. Also: see great input from Gil in the comments below

Additional Reading –

  1. If you’re interested in diving even deeper to learn how JVM locking is implemented, the code is available here and is well documented.
  2. A full guide to all the different Java synchronization APIs and techniques available can be found here.
  3. And an additional one on concurrency in Scala here.

Further reading:

Bug line

5 Error Tracking Tools Java Developers Should Know  – read more

Tal is the CTO of OverOps. Tal has been designing scalable, real-time Java and C++ applications for the past 15 years. He still enjoys analyzing a good bug though, and instrumenting code. In his free time Tal plays Jazz drums.
  • Gil Tene

    Dude, this is so not how locking really works in the JVM… I highly recommend you study up on it some more and do major edits to fix this article. Would be happy to help.

    • http://www.takipi.com/ Tal Weiss

      Hi Gil,

      Happy to see you on the blog, and we’re big fans of the Azul VM.

      Re locking, the monitor enter algorithm is implemented and documented in the link above to the JVM code, and it’s also what we’ve seen when debugging it.

      I’m very happy to hear and updates and corrections you might have and to reflect them in the post to provide the most accurate description to anyone interested.

      I look forward to hear your comments and feedback.


      • Gil Tene

        The actual monitor locking state machine is quite complex, and the source file linked to above (in its totality) is only part of its implementation. The description above is wrong in interpreting how biasing, spinning, blocking, and GC interactions in that code all work, and misses the fact that in most locking states (all except unlocked and thin-biased), the lock state is referred to by (and not included in) the object header.

        In the code, monitors can have “thin” (no separate monitor object) representations and “thick” (inflated monitor object pointed to by locked object) representations. The thin modes can be (among other things) unlocked, locked in a displaced-header mode (header pointing into thread stack), or biased (and not actually locked) to a specific thread. Inflation and deflation are cool, complex and very intricately racey things.

        In other details, blocking never involves spinning (spinning is just a quick optimization attempt to avoid blocking), no checks against GC are needed while blocking or holding a lock, holding a lock during GC is not a problem, and there are no CAS instructions involved biased locking (that’s the whole and pretty much only point in biased locking).

        I’ll be happy to help fix these various mistakes to make the article provide good information to readers.

        • http://www.takipi.com/ Tal Weiss

          Hi Gil,

          Thanks for the deep input!

          You’re right in that that there a lot of optimizations happening both at the javac, jit and the vm run-time system meant to reduce loss of performance as a result of having to perform coordination at the CPU(s) level.

          The locking algorithm here cases describes the “harder” cases where the JVM wasn’t able to bypass JIT optimizations and goes into the core algorithm. Within this algorithm Atomics are inevitable and performed in multiple locations (e.g ObjectMonitor::enter: 3195).

          Faster optimizations / paths are generated by the JIT in order to avoid atomics by trying to determine where they are required , and are not covered here (e.g x86_32.ad contains more on those). These merit a post of their own showing the actual diassembly generated for a synchronized method / block. I’d love to get your feedback on this post.

          Re parking, The thread parks using a combination of assembly nop instructions (e.g. nop) and OS native parking (e.g. pthread_cond_wait, WaitForSingleObject).

          Re STW – you’re totally right. I’ve removed.

          I’m really happy to get more comments / input and to apply.


          • Gil Tene

            Additional things to fix:

            Your fact #4: “One of the strongest mechanisms put in place by the JVM is thread lock biasing.” needs to be changed for the rest of the context and descriptions to make any sense.

            While thread lock biasing is a cool runtime optimization, it’s only purpose (and it’s only benefit) is avoiding CAS operations in the fast paths. What you probably meant to say there is something along the lines of “… is adaptive runtime optimization for uncontended locking situations”. The runtime mechanisms work hard to avoid inflating locks (creating actual monitor objects that the header points to) as long as no contention exists, and allows faster fast-path operations (e.g. no CAS when biased, and only a CAS when using stack-based displaced header locking) to be used when a lock is attempted in uncontended situations. This is a big win because dynamically, the vast-vast-vast majority of monitor enter/exit pairs are executed on uncontended objects
            Your Fact #5. “A lock is acquired by a thread if it succeeds in establishing itself as the object lock’s owner. This is determined by whether the thread is able to install a reference to itself (a pointer to the internal JavaThread object) in the object’s header.” :

            Only the first sentences is correct, the rest is wrong (as in not a fact). This is where the complexity of how the machine actually works makes it hard to make statements like you do about how successful lock ownership establishment is determined.

            Here are a few ways it is wrong:

            1. Installing a reference internal JavaThread object in the object header only happens the in one case: the state transition into the biased locking state. Biased lock ownership is not the same as lock ownership. Unlike the normal lock pwnership case which follows monitor enter/exit behavior, once bias is established and the thread becomes the biased lock owner, it remains it’s owner even when the monitor is existed. The bias and bias ownership does not change when subsequent monitor enter/exit paris occur it in the future, and only changes back (to unbiased state) when some other thread attempts to enter the same monitor and has to yank that ownership away via a JVM-global safepoint (that’s what the HotSpot JVM does right now, others do it differently). [This is how/why the specific biased locking state is able to avoid use of atomics, BTW].

            2. It doesn’t describe the main ways of establishing actual lock ownership:

            2.1 In the uninflated, uncontended, non-biased case (the one all objects will go through when synchronized): a thread establishes it’s lock ownership by CAS-replacing the entire mark work with a pointer to a displaced mark word copied to and stored inside the thread’s stack. The thread is identified by knowing that the pointer is pointing into a range of addresses in it’s own stack (making the “am I the owner” test easy, since all threads know where their stack is).

            2.2 In the uninflated & contended case (the one which all objects will go through upon first actual contention): A thread trying to lock an object that is currently locked, or is currently biased to another thread (regardless of whether or not it’s monitor is actually acquired by that thread) will force the inflation of that’s object’s monitor is one way or another (doing this for biased lock states is very intricate, and doing it safely in the non-biased case is no walk in the park either). Monitor inflation completely replaces the object header with a pointer to the inflated monitor, and the attempting thread then needs to establish it’s ownership for this inflated case (below).

            2.3 It ignores the inflated monitor cases (where an inflated object monitor exists, and which is the only way blocking ever happens):

            2.3.1 In the inflated & uncontended cases: A thread establishes it’s lock ownership by following the pointer in the object’s header to the inflated monitor & CAS-replacing the owner field in the inflated monitor with a pointer to itself (that’s what that ObjectMonitor::enter: 3195 reference you point to does).

            2.3.2 In the inflated & contended cases: After a very short amount of spinning-in-hope-of-the-contention-magically-going-away heuristics, a thread will block by enqueueing itself on the monitor’s list of locks vying for the lock. It will be woken up when the owner thread releases the lock (the safe coordination around this to avoid starvation, standing, thundering heards, and live locking is quite detailed).


            You comment about atomics being inevitable is wrong: Biased locking is all (and only) about avoiding CAS even when the JIT cannot prove the lock away. Avoiding atomics is it’s only reason for existence, and it works only in the uninflated states, and only as long as no contention occurs. The line you refer to (ObjectMonitor::enter: 3195) is dealing with locking in an inflated monitor state (which as noted, you should describe in the post, as it’s core to the parking and blocking parts you refer to). An object will move to this inflated monitor state if it experiences contention or if it is intentionally blocked on (via a wait()).


            The statement “…If these initial attempts fail (signaling a fairly higher level of contention for the lock) the thread will move itself to a blocked state and enqueue itself in the list of threads vying for the lock and begin a series of spin locks.” is wrong.

            Blocking never involves spinning.

            While many blocking paths may include some amount of very short optimistic “maybe the reason for blocking has gone away” spinning loops, once you actually block, you block. Period. A blocked a thread will only wake up when the OS wakes it up.

            No spin locks are involved in either case. (spinning for a short while in an optimistic fashion ahead of giving up and blocking is very different from a “spin lock”).

  • Prashant

    It was really very enlightening.

  • Seth @ FBT

    Hey Tal, You’ve made some very interesting points there on synchronization between Java and Scala. I thoroughly enjoyed reading this post. Your readers can also refer to our website at http://www.fireboxtraining.com/java for more reference materials.

  • Ashish Thakran

    Threads can be implemented by extending Thread class, implementing Runnable interface and Callable interface.

    If you want to return an value or throw an exception then use Callable otherwise use Runnable as extending Thread class limits the Class inheritance and also makes the process heavy.

    Below link can be useful if you want to find more:

    Different ways to implement Threads in Java