Skip to content

Setup Spark 3.2.0 in Mac OS(Catalina)


Installation package requirements :

Official requirement form Spark:

  • Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS).
  • Spark runs on Java 8/11,
  • Java JDK8/JDK1.8
  • Spark Version 3.2.0 = > spark-3.2.0-bin-hadoop3.2
  • Scala Version > 2.12 => scala-2.13.0
  • Python version > 3.6+ => Python 3.9
  • Python 3.6 support is deprecated as of Spark 3.2.0.
  • Java 8 prior to version 8 201 support is deprecated as of Spark 3.2.0.

Step 1 : Install Home Brew from – brew.sh website.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 2 : Setup system to run multiple versions of Java on MacOS with jenv.

For more details you can refer to below links.

https://akrabat.com/using-jenv-to-select-java-version-on-macos/
https://mungingdata.com/java/jenv-multiple-versions-java/

Install jenv with a Homebrew command: 

>> brew install jenv

Install Java 8

>> brew install --cask adoptopenjdk8

List all the Java environments

>> ls /Library/Java/JavaVirtualMachines/

Check if Java 8 is installed

>>ls /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 

Add Java to Environment

>> jenv add /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home 
Check Java environments and can see the environment is added. >> jenv versions Then create and update zsh or bash system environment. >>touch ~/.zprofile >>open ~/.zprofile update the below lined to update the jEnv configurations #jEnv Configurations export PATH="$HOME/.jenv/bin:$PATH" eval "$(jenv init -)"

Step 3 : Setup Scala 2.13.0

Normally I put all my Big data development related softwares in a specific directory.

Usual location is. : /Users/surarajpradhan/bigdata_projects/softwares/

So downloaded the Scala binaries for macOS  from this link and unzip the file to put the Scala 2.13.0 in : /Users/surarajpradhan/bigdata_projects/softwares/scala-2.13.0.

>> sudo curl https://downloads.lightbend.com/scala/2.13.0/scala-2.13.0.tgz && sudo tar -xzf scala-2.13.0.tgz -C /Users/surarajpradhan/bigdata_projects/softwares/scala-2.13.0

Finally update the environment PATH variable for SCALA_HOME in .zprofile

>>open ~/.zprofile

update with below line:

export SCALA_HOME=/Users/surarajpradhan/bigdata_projects/softwares/scala-2.13.0
export PATH=$SCALA_HOME/bin:$SCALA_HOME/lib:$PATH

Step 4 : Setup Python 3.9

Nonamlly while installing Brew, Python3.9 is installed automatically.

To find if the Python 3.9 is installed or not excute the below command : >> brew list

update with below line:

# Setting PATH for Python 3.9
# The original version is saved in .profile.pysave
PATH="/usr/local/Frameworks/Python.framework/Versions/3.9/bin:${PATH}"
export PATH
export PYSPARK_PYTHON=python3.9

Step 4 : Setup Spark 3.2.0

Finally update the environment PATH variable for SPARK_HOME and other others in .zprofile

>>open ~/.zprofile

update with below line:

export SPARK_HOME=/Users/surarajpradhan/bigdata_projects/softwares/spark-3.2.0-bin-hadoop3.2
export PATH=$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
export PYSPARK_PYTHON=python3.9

setup the IP address in environment 

export SPARK_LOCAL_IP=localhost
export SPARK_MASTER_HOST=localhost

After all this configuration the Spark 3.2.0 should be ready to access and execute the code.

Pyspark
spark-shell

If you have any difficulties please let me know, I will try to help setting up the environment.

Published inEnvironment SetupTechnical Posts