So we should preserve it. I don’t think that digital storage is necessarily a good thing, but I definitely think that digital manipulation is interesting.
-Sean Booth
This Scala best practice is inspired from Databricks Scala Guide. Most of day to day programming best practices are covered in this Scala best practice guide. I will keep updating this space from time to time. Happy coding.
Scala Language Features
Case Classes and Immutability
Case classes are regular classes but extended by the compiler to automatically support:
- Public getters for constructor parameters
- Copy constructor
- Pattern matching on constructor parameters
- Automatic toString/hash/equals implementation
Constructor parameters should NOT be mutable for case classes. Instead, use copy constructor. Having mutable case classes can be error prone, e.g. hash maps might place the object in the wrong bucket using the old hash code.
// This is OK case class Person(name: String, age: Int) // This is NOT OK case class Person(name: String, var age: Int) // To change values, use the copy constructor to create a new instance val p1 = Person("Peter", 15) val p2 = p1.copy(age = 16)
apply Method
Avoid defining apply methods on classes. These methods tend to make the code less readable, especially for people less familiar with Scala. It is also harder for IDEs (or grep) to trace. In the worst case, it can also affect correctness of the code in surprising ways, as demonstrated in Parentheses.
It is acceptable to define apply methods on companion objects as factory methods. In these cases, the apply method should return the companion class type.
object TreeNode { // This is OK def apply(name: String): TreeNode = ... // This is bad because it does not return a TreeNode def apply(name: String): String = ... }
override Modifier
Always add override modifier for methods, both for overriding concrete methods and implementing abstract methods. The Scala compiler does not require override
for implementing abstract methods. However, we should always add override
to make the override obvious, and to avoid accidental non-overrides due to non-matching signatures.
trait Parent { def hello(data: Map[String, String]): Unit = { print(data) } } class Child extends Parent { import scala.collection.Map // The following method does NOT override Parent.hello, // because the two Maps have different types. // If we added "override" modifier, the compiler would've caught it. def hello(data: Map[String, String]): Unit = { print("This is supposed to override the parent method, but it is actually not!") } }
Destructuring Binds
Destructuring bind (sometimes called tuple extraction) is a convenient way to assign two variables in one expression.
val (a, b) = (1, 2)
However, do NOT use them in constructors, especially when a
and b
need to be marked transient. The Scala compiler generates an extra Tuple2 field that will not be transient for the above example.
class MyClass { // This will NOT work because the compiler generates a non-transient Tuple2 // that points to both a and b. @transient private val (a, b) = someFuncThatReturnsTuple2() }
Call by Name
Avoid using call by name. Use () => T
explicitly.
Background: Scala allows method parameters to be defined by-name, e.g. the following would work:
def print(value: => Int): Unit = { println(value) println(value + 1) } var a = 0 def inc(): Int = { a += 1 a } print(inc())
in the above code, inc()
is passed into print
as a closure and is executed (twice) in the print method, rather than being passed in as a value 1
. The main problem with call-by-name is that the caller cannot differentiate between call-by-name and call-by-value, and thus cannot know for sure whether the expression will be executed or not (or maybe worse, multiple times). This is especially dangerous for expressions that have side-effect.
Multiple Parameter Lists
Avoid using multiple parameter lists. They complicate operator overloading, and can confuse programmers less familiar with Scala. For example:
// Avoid this! case class Person(name: String, age: Int)(secret: String)
One notable exception is the use of a 2nd parameter list for implicits when defining low-level libraries. That said, implicits should be avoided!
Symbolic Methods (Operator Overloading)
Do NOT use symbolic method names, unless you are defining them for natural arithmetic operations (e.g. +
, -
, *
, /
). Under no other circumstances should they be used. Symbolic method names make it very hard to understand the intent of the methods. Consider the following two examples:
// symbolic method names are hard to understand channel ! msg stream1 >>= stream2 // self-evident what is going on channel.send(msg) stream1.join(stream2)
Type Inference
Scala type inference, especially left-side type inference and closure inference, can make code more concise. That said, there are a few cases where explicit typing should be used:
- Public methods should be explicitly typed, otherwise the compiler’s inferred type can often surprise you.
- Implicit methods should be explicitly typed, otherwise it can crash the Scala compiler with incremental compilation.
- Variables or closures with non-obvious types should be explicitly typed. A good litmus test is that explicit types should be used if a code reviewer cannot determine the type in 3 seconds.
Return Statements
Avoid using return in closures. return
is turned into try/catch
of scala.runtime.NonLocalReturnControl
by the compiler. This can lead to unexpected behaviors. Consider the following example:
def receive(rpc: WebSocketRPC): Option[Response] = { tableFut.onComplete { table => if (table.isFailure) { return None // Do not do that! } else { ... } } }
the .onComplete
method takes the anonymous closure { table => ... }
and passes it to a different thread. This closure eventually throws the NonLocalReturnControl
exception that is captured in a different thread . It has no effect on the poor method being executed here.
However, there are a few cases where return
is preferred.
- Use
return
as a guard to simplify control flow without adding a level of indentationdef doSomething(obj: Any): Any = { if (obj eq null) { return null } // do something … } - Use
return
to terminate a loop early, rather than constructing status flagswhile (true) { if (cond) { return } }
Recursion and Tail Recursion
Avoid using recursion, unless the problem can be naturally framed recursively (e.g. graph traversal, tree traversal).
For methods that are meant to be tail recursive, apply @tailrec
annotation to make sure the compiler can check it is tail recursive. (You will be surprised how often seemingly tail recursive code is actually not tail recursive due to the use of closures and functional transformations.)
Most code is easier to reason about with a simple loop and explicit state machines. Expressing it with tail recursions (and accumulators) can make it more verbose and harder to understand. For example, the following imperative code is more readable than the tail recursive version:
// Tail recursive version. def max(data: Array[Int]): Int = { @tailrec def max0(data: Array[Int], pos: Int, max: Int): Int = { if (pos == data.length) { max } else { max0(data, pos + 1, if (data(pos) > max) data(pos) else max) } } max0(data, 0, Int.MinValue) } // Explicit loop version def max(data: Array[Int]): Int = { var max = Int.MinValue for (v <- data) { if (v > max) { max = v } } max }
Implicits
Avoid using implicits, unless:
- you are building a domain-specific language
- you are using it for implicit type parameters (e.g.
ClassTag
,TypeTag
) - you are using it private to your own class to reduce verbosity of converting from one type to another (e.g. Scala closure to Java closure)
When implicits are used, we must ensure that another engineer who did not author the code can understand the semantics of the usage without reading the implicit definition itself. Implicits have very complicated resolution rules and make the code base extremely difficult to understand. From Twitter’s Effective Scala guide: “If you do find yourself using implicits, always ask yourself if there is a way to achieve the same thing without their help.”
If you must use them (e.g. enriching some DSL), do not overload implicit methods, i.e. make sure each implicit method has distinct names, so users can selectively import them.
// Don't do the following, as users cannot selectively import only one of the methods. object ImplicitHolder { def toRdd(seq: Seq[Int]): RDD[Int] = ... def toRdd(seq: Seq[Long]): RDD[Long] = ... } // Do the following: object ImplicitHolder { def intSeqToRdd(seq: Seq[Int]): RDD[Int] = ... def longSeqToRdd(seq: Seq[Long]): RDD[Long] = ... }
Symbol Literals
Avoid using symbol literals. Symbol literals (e.g. 'column
) were deprecated as of Scala 2.13 by Proposal to deprecate and remove symbol literals. Apache Spark used to leverage this syntax to provide DSL; however, now it started to remove this deprecated usage away. See also SPARK-29392.
Exception Handling (Try vs try)
- Do NOT catch Throwable or Exception. Use
scala.util.control.NonFatal
:try { … } catch { case NonFatal(e) => // handle exception; note that NonFatal does not match InterruptedException case e: InterruptedException => // handle InterruptedException }This ensures that we do not catchNonLocalReturnControl
(as explained in Return Statements). - Do NOT use
Try
in APIs, that is, do NOT return Try in any methods. Instead, prefer explicitly throwing exceptions for abnormal execution and Java style try/catch for exception handling.Background information: Scala provides monadic error handling (throughTry
,Success
, andFailure
) that facilitates chaining of actions. However, we found from our experience that the use of it often leads to more levels of nesting that are harder to read. In addition, it is often unclear what the semantics are for expected errors vs exceptions because those are not encoded inTry
. As a result, we discourage the use ofTry
for error handling. In particular:As a contrived example:class UserService { /** Look up a user’s profile in the user database. */ def get(userId: Int): Try[User] }is better written asclass UserService { /** * Look up a user’s profile in the user database. * @return None if the user is not found. * @throws DatabaseConnectionException when we have trouble connecting to the database/ */ @throws(DatabaseConnectionException) def get(userId: Int): Option[User] }The 2nd one makes it very obvious error cases the caller needs to handle.
Options
- Use
Option
when the value can be empty. Compared withnull
, anOption
explicitly states in the API contract that the value can beNone
. - When constructing an
Option
, useOption
rather thanSome
to guard againstnull
values.def myMethod1(input: String): Option[String] = Option(transform(input)) // This is not as robust because transform can return null, and then // myMethod2 will return Some(null). def myMethod2(input: String): Option[String] = Some(transform(input)) - Do not use None to represent exceptions. Instead, throw exceptions explicitly.
- Do not call
get
directly on anOption
, unless you know absolutely for sure theOption
has some value.
Monadic Chaining
One of Scala’s powerful features is monadic chaining. Almost everything (e.g. collections, Option, Future, Try) is a monad and operations on them can be chained together. This is an incredibly powerful concept, but chaining should be used sparingly. In particular:
- Avoid chaining (and/or nesting) more than 3 operations.
- If it takes more than 5 seconds to figure out what the logic is, try hard to think about how you can express the same functionality without using monadic chaining. As a general rule, watch out for flatMaps and folds.
- A chain should almost always be broken after a flatMap (because of the type change).
A chain can often be made more understandable by giving the intermediate result a variable name, by explicitly typing the variable, and by breaking it down into more procedural style. As a contrived example:
class Person(val data: Map[String, String]) val database = Map[String, Person] // Sometimes the client can store "null" value in the store "address" // A monadic chaining approach def getAddress(name: String): Option[String] = { database.get(name).flatMap { elem => elem.data.get("address") .flatMap(Option.apply) // handle null value } } // A more readable approach, despite much longer def getAddress(name: String): Option[String] = { if (!database.contains(name)) { return None } database(name).data.get("address") match { case Some(null) => None // handle null value case Some(addr) => Option(addr) case None => None } }