Using Apache Spark
Apache Spark is a data processing engine that performs processing tasks for big data workloads.
The Thrift JDBC/ODBC server corresponds to the HiveServer2 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or Hive. To connect to the Spark Thrift Server from any machine in a Big Data Service cluster, use the spark-beeline command.
Spark Configuration Properties
Spark configuration properties included in Big Data Service 3.1.1 or later.
| Configuration | Property | Description |
|---|---|---|
spark3-env |
spark_history_secure_opts
|
Spark History Server Java options if security is enabled |
spark_history_log_opts
|
Spark History Server logging Java options | |
spark_thrift_log_opts
|
Spark Thrift Server logging Java options | |
spark_library_path
|
Paths containing shared libraries for Spark | |
spark_dist_classpath
|
Paths containing Hadoop libraries for Spark | |
spark_thrift_remotejmx_opts
|
Spark Thrift Server Java options if remote JMX is enabled | |
spark_history_remotejmx_opts
|
Spark History Server Java options if remote JMX is enabled | |
spark3-defaults | spark_history_store_path | Location of the Spark History Server cache. To access this property, go to the Ambari home page, select Spark3, then select Configs, and finally select Advanced spark3-defaults. The default value is |
livy2-env
|
livy_server_opts
|
Livy Server Java options |
Group Permission to Download Policies
You can grant users access to download Ranger policies using a user group that allows running SQL queries through a Spark job.
In a Big Data Service HA cluster with the Ranger-Spark plugin enabled, you must have access to download Ranger policies to run any SQL queries using a Spark jobs. To grant permission to download Ranger policies, the user must be included in the policy.download.auth.users and tag.download.auth.users lists. For more information, see Spark Job Might Fail With a 401 Error While Trying to Download the Ranger-Spark Policies.
Instead of specifying many users, you can configure the policy.download.auth.groups parameter with a user group in the Spark-Ranger repository in the Ranger UI. This allows all users in that group to download Ranger policies and this feature is supported from ODH version 2.0.10 or later.
Example:
Setting user-level permissions for Spark in Ranger
To manage which users can access Spark resources, set user-level permissions in the Ranger UI.
- Access the Ranger Admin UI.
- From the list of repositories, select the Spark service.
- Select Add New Policy or select an existing policy to edit.
- Select the resource (database, sparkservice, or other) you want to set permissions for.
- In the Allow Conditions section, under Select User, select a user's name from the list. Then, under Permissions, select the permissions you want to grant that user.
- Select Save.
Spark-Ranger Plugin Extension
The Spark-Ranger plugin extension can't be overridden at runtime in ODH version 2.0.10 or later.
Fine-grained access control can't be fully enforced in non-Spark Thrift Server use cases through the Spark Ranger plugin. Ranger Admin is expected to grant required file access permissions to data in HDFS through HDFS ranger policies.