Scala – Best Practices – Miscellaneous

So we should preserve it. I don’t think that digital storage is necessarily a good thing, but I definitely think that digital manipulation is interesting.
-Sean Booth

This Scala best practice is inspired from Databricks Scala Guide. Most of day to day programming best practices are covered in this Scala best practice guide. I will keep updating this space from time to time. Happy coding.

Miscellaneous

Miscellaneous

Prefer nanoTime over currentTimeMillis

When computing a duration or checking for a timeout, avoid using System.currentTimeMillis(). Use System.nanoTime() instead, even if you are not interested in sub-millisecond precision.

System.currentTimeMillis() returns current wallclock time and will follow changes to the system clock. Thus, negative wallclock adjustments can cause timeouts to “hang” for a long time (until wallclock time has caught up to its previous value again). This can happen when ntpd does a “step” after the network has been disconnected for some time. The most canonical example is during system bootup when DHCP takes longer than usual. This can lead to failures that are really hard to understand/reproduce. System.nanoTime() is guaranteed to be monotonically increasing irrespective of wallclock changes.

Caveats:

Never serialize an absolute nanoTime() value or pass it to another system. The absolute value is meaningless and system-specific and resets when the system reboots.
The absolute nanoTime() value is not guaranteed to be positive (but t2 - t1 is guaranteed to yield the right result)
nanoTime() rolls over every 292 years. So if your Spark job is going to take a really long time, you may need something else 🙂

Prefer URI over URL

When storing the URL of a service, you should use the URI representation.

The equality check of URL actually performs a (blocking) network call to resolve the IP address. The URI class performs field equality and is a superset of URL as to what it can represent.

Prefer existing well-tested methods over reinventing the wheel

When there is an existing well-tesed method and it doesn’t cause any performance issue, prefer to use it. Reimplementing such method may introduce bugs and requires spending time testing it (maybe we don’t even remember to test it!).

val beginNs = System.nanoTime()
// Do something
Thread.sleep(1000)
val elapsedNs = System.nanoTime() - beginNs

// This is WRONG. It uses magic numbers and is pretty easy to make mistakes
val elapsedMs = elapsedNs / 1000 / 1000

// Use the Java TimeUnit API. This is CORRECT
import java.util.concurrent.TimeUnit
val elapsedMs2 = TimeUnit.NANOSECONDS.toMillis(elapsedNs)

// Use the Scala Duration API. This is CORRECT
import scala.concurrent.duration._
val elapsedMs3 = elapsedNs.nanos.toMillis

Exceptions:

Using an existing well-tesed method requires adding a new dependency. If such method is pretty simple, reimplementing it is better than adding a dependency. But remember to test it.
The existing method is not optimized for our usage and is too slow. But benchmark it first, avoid premature optimization.