So we should preserve it. I don’t think that digital storage is necessarily a good thing, but I definitely think that digital manipulation is interesting.
-Sean Booth
This Scala best practice is inspired from Databricks Scala Guide. Most of day to day programming best practices are covered in this Scala best practice guide. I will keep updating this space from time to time. Happy coding.
Concurrency
Scala concurrent.Map
Prefer java.util.concurrent.ConcurrentHashMap
over scala.collection.concurrent.Map
.
In particular the getOrElseUpdate
method
in scala.collection.concurrent.Map
is not atomic (fixed in Scala 2.11.6, SI-7943). Since all the projects we work on require
cross-building for both Scala 2.10 and Scala 2.11, scala.collection.concurrent.Map
should be
avoided.
Explicit Synchronization vs Concurrent Collections
There are 3 recommended ways to make concurrent accesses to shared states safe. Do NOT mix them because that could make the program very hard to reason about and lead to deadlocks.
java.util.concurrent.ConcurrentHashMap
: Use when all states are captured in a map, and high degree of contention is expected.
private[this] val map = new java.util.concurrent.ConcurrentHashMap[String, String]
java.util.Collections.synchronizedMap
: Use when all states are captured in a map, and contention is not expected but you still want to make code safe. In case of no contention, the JVM JIT compiler is able to remove the synchronization overhead via biased locking.
private[this] val map = java.util.Collections.synchronizedMap(new java.util.HashMap[String, String])
- Explicit synchronization by synchronizing all critical sections: can used to guard multiple variables. Similar to 2, the JVM JIT compiler can remove the synchronization overhead via biased locking.
class Manager { private[this] var count = 0 private[this] val map = new java.util.HashMap[String, String] def update(key: String, value: String): Unit = synchronized { map.put(key, value) count += 1 } def getCount: Int = synchronized { count } }
Note that for case 1 and case 2, do not let views or iterators of the collections escape the protected area. This can
happen in non-obvious ways, e.g. when returning Map.keySet
or Map.values
.
If views or values are required to pass around, make a copy of the data.
val map = java.util.Collections.synchronizedMap(new java.util.HashMap[String, String]) // This is broken! def values: Iterable[String] = map.values // Instead, copy the elements def values: Iterable[String] = map.synchronized { Seq(map.values: _*) }
Explicit Synchronization vs Atomic Variables vs @volatile
The java.util.concurrent.atomic
package provides primitives for lock-free access to primitive
types, such as AtomicBoolean
, AtomicInteger
,
and AtomicReference
.
Always prefer Atomic variables over @volatile
. They have a strict superset of the functionality and
are more visible in code. Atomic variables are implemented using @volatile
under the hood.
Prefer Atomic variables over explicit synchronization when: (1) all critical updates for an object are confined to
a single variable and contention is expected. Atomic variables are lock-free and permit more
efficient contention. Or (2) synchronization is clearly expressed as a getAndSet
operation.
For example:
// good: clearly and efficiently express only-once execution of concurrent code val initialized = new AtomicBoolean(false) ... if (!initialized.getAndSet(true)) { ... } // poor: less clear what is guarded by synchronization, may unnecessarily synchronize val initialized = false ... var wasInitialized = false synchronized { wasInitialized = initialized initialized = true } if (!wasInitialized) { ... }
Private Fields
Note that private
fields are still accessible by other instances of the same class, so
protecting it with this.synchronized
(or just synchronized
) is not
technically sufficient. Make the field private[this]
instead.
// The following is still unsafe. class Foo { private var count: Int = 0 def inc(): Unit = synchronized { count += 1 } } // The following is safe. class Foo { private[this] var count: Int = 0 def inc(): Unit = synchronized { count += 1 } }
Isolation
In general, concurrency and synchronization logic should be isolated and contained as much as possible. This effectively means:
- Avoid surfacing the internals of synchronization primitives in APIs, user-facing methods, and callbacks.
- For complex modules, create a small, inner module that captures the concurrency primitives.