How to Deploy Spark Standalone in Oracle Cloud (OCI)

1 Introduction

The following walk-through guides you through the steps needed to set up your environment to run Spark and Hadoop in Oracle Cloud Infrastructure.

2 Prerequisites

You have deployed a VM 2.1 or + with Oracle Linux 7.9 (OEL7) in Oracle Cloud Infrastructure (OCI).

  • The installation of Oracle Linux 7.9 is using a JVM by default.
  • You have access to root either directly or via sudo. By default in OCI, you are connected like “opc” user with sudo privilege.
    [opc@xxx ~]$ java -version
    java version "1.8.0_281"
    Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
    Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)

3 Java Installation

The install is quite simple. It consists of setting up Java, installing Spark and Hadoop components and libraries. Lets start with setting up the Spark and Hadoop environment.

Download the last version of JDK 1.8 because Hadoop 2.X is using this Java version.

rpm -ivh /home/opc/jdk-8u271-linux-x64.rpm

Check Java Version.

java -version

4 Spark and Hadoop Setup

The next step is to install Spark and Hadoop environment.

First, choose the version of Spark and Hadoop you want to install. Then, download the version you want to install:

Download Spark 2.4.5 for Hadoop 2.7

cd /home/opc

Download Spark 2.4.7 for Hadoop 2.7


Download Spark 3.1.1 for Hadoop 3.2


Install the Spark and Hadoop Version

Install the Spark and Hadoop Version choosen in the directory “/opt”.

sudo -i
cd /opt
tar -zxvf /home/opc/spark-2.4.5-bin-hadoop2.7.tgz
tar -zxvf /home/opc/spark-2.4.7-bin-hadoop2.7.tgz
tar -zxvf /home/opc/spark-3.1.1-bin-hadoop3.2.tgz

5 Install PySpark in Python3 environment

/opt/Python-3.7.6/bin/pip3 install 'pyspark=2.4.7'
/opt/Python-3.7.6/bin/pip3 install findspark

Next we shall create a virtual environment and enable it.

Modify your environment to use this Spark and Hadoop Version

Add to “.bashrc” for the user “opc” the following lines:

# Add by %OP%
export PYTHONHOME=/opt/anaconda3

#export JAVA_HOME=$(/usr/libexec/java_home)
export SPARK_HOME=/opt/spark-2.4.5-bin-hadoop2.7
export PYSPARK_PYTHON=python3


6 Test your Spark and Hadoop Environment

If you’re running directly on a virtual machine and have a browser installed it should take you directly into the jupyter environment. Connect to your “”.

And upload the next notebooks: