提交Spark任务
1.spark-submit
https://spark.apache.org/docs/latest/submitting-applications.html
The spark-submit
script in Spark’s bin
directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application especially for each one.
1 | ./bin/spark-submit \ |
--class
: The entry point for your application (e.g.org.apache.spark.examples.SparkPi
)--master
: The master URL for the cluster (e.g.spark://23.195.26.187:7077
)--deploy-mode
: Whether to deploy your driver on the worker nodes (cluster
) or locally as an external client (client
) (default:client
) †--conf
: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown). Multiple configurations should be passed as separate arguments. (e.g.--conf = --conf =
)application-jar
: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, anhdfs://
path or afile://
path that is present on all nodes.application-arguments
: Arguments passed to the main method of your main class, if any
当前为客户端,driver在哪取决于deploy mode
2.python file.py
应该只能local和client
此时若是代码指定cluster会报错
1 | config("spark.submit.deployMode", "cluster") |
Exception in thread “main” org.apache.spark.SparkException: Cluster deploy mode is not applicable to Spark shells.
3.jupyter notebook
应该只能local和clien