So we should preserve it. I don’t think that digital storage is necessarily a good thing, but I definitely think that digital manipulation is interesting.
-Sean Booth
This Scala best practice is inspired from Databricks Scala Guide. Most of day to day programming best practices are covered in this Scala best practice guide. I will keep updating this space from time to time. Happy coding.
Syntactic Style
Naming Convention
We mostly follow Java’s and Scala’s standard naming conventions.
- Classes, traits, objects should follow Java class convention, i.e. PascalCase style.class ClusterManager trait Expression
- Packages should follow Java package naming conventions, i.e. all-lowercase ASCII letters.package com.databricks.resourcemanager
- Methods/functions should be named in camelCase style.
- Constants should be all uppercase letters and be put in a companion object.object Configuration { val DEFAULT_PORT = 10000 }
- An enumeration class or object which extends the
Enumeration
class shall follow the convention for classes and objects, i.e. its name should be in PascalCase style. Enumeration values shall be in the upper case with words separated by the underscore character_
. For example: private object ParseState extends Enumeration { type ParseState = Value val PREFIX, TRIM_BEFORE_SIGN, SIGN, TRIM_BEFORE_VALUE, VALUE, VALUE_FRACTIONAL_PART, TRIM_BEFORE_UNIT, UNIT_BEGIN, UNIT_SUFFIX, UNIT_END = Value } - Annotations should also follow Java convention, i.e. PascalCase. Note that this differs from Scala’s official guide.final class MyAnnotation extends StaticAnnotation
Variable Naming Convention
- Variables should be named in camelCase style, and should have self-evident names.val serverPort = 1000 val clientPort = 2000
- It is OK to use one-character variable names in small, localized scope. For example, “i” is commonly used as the loop index for a small loop body (e.g. 10 lines of code). However, do NOT use “l” (as in Larry) as the identifier, because it is difficult to differentiate “l” from “1”, “|”, and “I”.
Line Length
- Limit lines to 100 characters.
- The only exceptions are import statements and URLs (although even for those, try to keep them under 100 chars).
Rule of 30
“If an element consists of more than 30 subelements, it is highly probable that there is a serious problem” – Refactoring in Large Software Projects.
In general:
- A method should contain less than 30 lines of code.
- A class should contain less than 30 methods.
Spacing and Indentation
- Put one space before and after operators, including the assignment operator.def add(int1: Int, int2: Int): Int = int1 + int2
- Put one space after commas.Seq(“a”, “b”, “c”) // do this Seq(“a”,”b”,”c”) // don’t omit spaces after commas
- Put one space after colons.// do this def getConf(key: String, defaultValue: String): String = { // some code } // don’t put spaces before colons def calculateHeaderPortionInBytes(count: Int) : Int = { // some code } // don’t omit spaces after colons def multiply(int1:Int, int2:Int): Int = int1 * int2
- Use 2-space indentation in general.if (true) { println(“Wow!”) }
- For method declarations, use 4 space indentation for their parameters and put each in each line when the parameters don’t fit in two lines. Return types can be either on the same line as the last parameter, or start a new line with 2 space indent.def newAPIHadoopFile[K, V, F <: NewInputFormat[K, V]]( path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration): RDD[(K, V)] = { // method body } def newAPIHadoopFile[K, V, F <: NewInputFormat[K, V]]( path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration) : RDD[(K, V)] = { // method body }
- For classes whose header doesn’t fit in two lines, use 4 space indentation for its parameters, put each in each line, put the extends on the next line with 2 space indent, and add a blank line after class header.class Foo( val param1: String, // 4 space indent for parameters val param2: String, val param3: Array[Byte]) extends FooInterface // 2 space indent here with Logging { def firstMethod(): Unit = { … } // blank line above }
- For method and class constructor invocations, use 2 space indentation for its parameters and put each in each line when the parameters don’t fit in two lines.foo( someVeryLongFieldName, // 2 space indent here andAnotherVeryLongFieldName, “this is a string”, 3.1415) new Bar( someVeryLongFieldName, // 2 space indent here andAnotherVeryLongFieldName, “this is a string”, 3.1415)
- Do NOT use vertical alignment. They draw attention to the wrong parts of the code and make the aligned code harder to change in the future.// Don’t align vertically val plus = “+” val minus = “-” val multiply = “*” // Do the following val plus = “+” val minus = “-” val multiply = “*”
Blank Lines (Vertical Whitespace)
- A single blank line appears:
- Between consecutive members (or initializers) of a class: fields, constructors, methods, nested classes, static initializers, instance initializers.
- Exception: A blank line between two consecutive fields (having no other code between them) is optional. Such blank lines are used as needed to create logical groupings of fields.
- Within method bodies, as needed to create logical groupings of statements.
- Optionally before the first member or after the last member of the class (neither encouraged nor discouraged).
- Between consecutive members (or initializers) of a class: fields, constructors, methods, nested classes, static initializers, instance initializers.
- Use one or two blank line(s) to separate class or object definitions.
- Excessive number of blank lines is discouraged.
Parentheses
- Methods should be declared with parentheses, unless they are accessors that have no side-effect (state mutation, I/O operations are considered side-effects).class Job { // Wrong: killJob changes state. Should have (). def killJob: Unit // Correct: def killJob(): Unit }
- Callsite should follow method declaration, i.e. if a method is declared with parentheses, call with parentheses. Note that this is not just syntactic. It can affect correctness when
apply
is defined in the return object.class Foo { def apply(args: String*): Int } class Bar { def foo: Foo } new Bar().foo // This returns a Foo new Bar().foo() // This returns an Int!
Curly Braces
Put curly braces even around one-line conditional or loop statements. The only exception is if you are using if/else as an one-line ternary operator that is also side-effect free.
// Correct: if (true) { println("Wow!") } // Correct: if (true) statement1 else statement2 // Correct: try { foo() } catch { ... } // Wrong: if (true) println("Wow!") // Wrong: try foo() catch { ... }
Long Literals
Suffix long literal values with uppercase L
. It is often hard to differentiate lowercase l
from 1
.
val longValue = 5432L // Do this val longValue = 5432l // Do NOT do this
Documentation Style
Use Java docs style instead of Scala docs style.
/** This is a correct one-liner, short description. */ /** * This is correct multi-line JavaDoc comment. And * this is my second line, and if I keep typing, this would be * my third line. */ /** In Spark, we don't use the ScalaDoc style so this * is not correct. */
Ordering within a Class
If a class is long and has many methods, group them logically into different sections, and use comment headers to organize them.
class DataFrame { /////////////////////////////////////////////////////////////////////////// // DataFrame operations /////////////////////////////////////////////////////////////////////////// ... /////////////////////////////////////////////////////////////////////////// // RDD operations /////////////////////////////////////////////////////////////////////////// ... }
Of course, the situation in which a class grows this long is strongly discouraged, and is generally reserved only for building certain public APIs.
Imports
- Avoid using wildcard imports, unless you are importing more than 6 entities, or implicit methods. Wildcard imports make the code less robust to external changes.
- Always import packages using absolute paths (e.g.
scala.util.Random
) instead of relative ones (e.g.util.Random
). - In addition, sort imports in the following order:
java.*
andjavax.*
scala.*
- Third-party libraries (
org.*
,com.*
, etc) - Project classes (
com.databricks.*
ororg.apache.spark
if you are working on Spark)
- Within each group, imports should be sorted in alphabetic ordering.
- You can use IntelliJ’s import organizer to handle this automatically, using the following config:
java javax _______ blank line _______ scala _______ blank line _______ all other imports _______ blank line _______ com.databricks // or org.apache.spark if you are working on Spark
Pattern Matching
- For method whose entire body is a pattern match expression, put the match on the same line as the method declaration if possible to reduce one level of indentation.def test(msg: Message): Unit = msg match { case … }
- When calling a function with a closure (or partial function), if there is only one case, put the case on the same line as the function invocation.list.zipWithIndex.map { case (elem, i) => // … }If there are multiple cases, indent and wrap them.list.map { case a: Foo => … case b: Bar => … }
- If the only goal is to match on the type of the object, do NOT expand fully all the arguments, as it makes refactoring more difficult and the code more error prone.case class Pokemon(name: String, weight: Int, hp: Int, attack: Int, defense: Int) case class Human(name: String, hp: Int) // Do NOT do the following, because // 1. When a new field is added to Pokemon, we need to change this pattern matching as well // 2. It is easy to mismatch the arguments, especially for the ones that have the same data types targets.foreach { case target @ Pokemon(_, _, hp, _, defense) => val loss = sys.min(0, myAttack – defense) target.copy(hp = hp – loss) case target @ Human(_, hp) => target.copy(hp = hp – myAttack) } // Do this: targets.foreach { case target: Pokemon => val loss = sys.min(0, myAttack – target.defense) target.copy(hp = target.hp – loss) case target: Human => target.copy(hp = target.hp – myAttack) }
Infix Methods
Avoid infix notation for methods that aren’t symbolic methods (i.e. operator overloading).
// Correct list.map(func) string.contains("foo") // Wrong list map (func) string contains "foo" // But overloaded operators should be invoked in infix style arrayBuffer += elem
Anonymous Methods
Avoid excessive parentheses and curly braces for anonymous methods.
// Correct list.map { item => ... } // Correct list.map(item => ...) // Wrong list.map(item => { ... }) // Wrong list.map { item => { ... }} // Wrong list.map({ item => ... })