Skip to content

Scala – Best Practices – Concurrency

So we should preserve it. I don’t think that digital storage is necessarily a good thing, but I definitely think that digital manipulation is interesting.

-Sean Booth

This Scala best practice is inspired from Databricks Scala Guide. Most of day to day programming best practices are covered in this Scala best practice guide. I will keep updating this space from time to time. Happy coding.


  1. Concurrency

Concurrency

Scala concurrent.Map

Prefer java.util.concurrent.ConcurrentHashMap over scala.collection.concurrent.Map. In particular the getOrElseUpdate method in scala.collection.concurrent.Map is not atomic (fixed in Scala 2.11.6, SI-7943). Since all the projects we work on require cross-building for both Scala 2.10 and Scala 2.11, scala.collection.concurrent.Map should be avoided.

Explicit Synchronization vs Concurrent Collections

There are 3 recommended ways to make concurrent accesses to shared states safe. Do NOT mix them because that could make the program very hard to reason about and lead to deadlocks.

  1. java.util.concurrent.ConcurrentHashMap: Use when all states are captured in a map, and high degree of contention is expected.
private[this] val map = new java.util.concurrent.ConcurrentHashMap[String, String]
  1. java.util.Collections.synchronizedMap: Use when all states are captured in a map, and contention is not expected but you still want to make code safe. In case of no contention, the JVM JIT compiler is able to remove the synchronization overhead via biased locking.
private[this] val map = java.util.Collections.synchronizedMap(new java.util.HashMap[String, String])
  1. Explicit synchronization by synchronizing all critical sections: can used to guard multiple variables. Similar to 2, the JVM JIT compiler can remove the synchronization overhead via biased locking.
class Manager {
  private[this] var count = 0
  private[this] val map = new java.util.HashMap[String, String]
  def update(key: String, value: String): Unit = synchronized {
    map.put(key, value)
    count += 1
  }
  def getCount: Int = synchronized { count }
}

Note that for case 1 and case 2, do not let views or iterators of the collections escape the protected area. This can happen in non-obvious ways, e.g. when returning Map.keySet or Map.values. If views or values are required to pass around, make a copy of the data.

val map = java.util.Collections.synchronizedMap(new java.util.HashMap[String, String])

// This is broken!
def values: Iterable[String] = map.values

// Instead, copy the elements
def values: Iterable[String] = map.synchronized { Seq(map.values: _*) }

Explicit Synchronization vs Atomic Variables vs @volatile

The java.util.concurrent.atomic package provides primitives for lock-free access to primitive types, such as AtomicBooleanAtomicInteger, and AtomicReference.

Always prefer Atomic variables over @volatile. They have a strict superset of the functionality and are more visible in code. Atomic variables are implemented using @volatile under the hood.

Prefer Atomic variables over explicit synchronization when: (1) all critical updates for an object are confined to a single variable and contention is expected. Atomic variables are lock-free and permit more efficient contention. Or (2) synchronization is clearly expressed as a getAndSet operation. For example:

// good: clearly and efficiently express only-once execution of concurrent code
val initialized = new AtomicBoolean(false)
...
if (!initialized.getAndSet(true)) {
  ...
}

// poor: less clear what is guarded by synchronization, may unnecessarily synchronize
val initialized = false
...
var wasInitialized = false
synchronized {
  wasInitialized = initialized
  initialized = true
}
if (!wasInitialized) {
  ...
}

Private Fields

Note that private fields are still accessible by other instances of the same class, so protecting it with this.synchronized (or just synchronized) is not technically sufficient. Make the field private[this] instead.

// The following is still unsafe.
class Foo {
  private var count: Int = 0
  def inc(): Unit = synchronized { count += 1 }
}

// The following is safe.
class Foo {
  private[this] var count: Int = 0
  def inc(): Unit = synchronized { count += 1 }
}

Isolation

In general, concurrency and synchronization logic should be isolated and contained as much as possible. This effectively means:

  • Avoid surfacing the internals of synchronization primitives in APIs, user-facing methods, and callbacks.
  • For complex modules, create a small, inner module that captures the concurrency primitives.
Published inScalaTechnical Posts