spark常见错误

Python in worker has different version 2.7 than that in driver 3.7, PySpark cannot run with different minor versions

核心思路:分别指定driver和excutor的python版本,使其统一

方法一:修改环境变量

1./在环境变量文件 /etc/profile 中添加指定的pyspark,python的版本

1
2
export PYSPARK_PYTHON=指定的python路径
export PYSPARK_DRIVER_PYTHON=指定的python路径

保存后source一下 /etc/profile ,使之生效

2.代码内指定

1
2
os.environ["PYSPARK_DRIVER_PYTHON"]="" ##driver 
os.environ["PYSPARK_PYTHON"]="" ### worker ,excutor

方法二:spark-submit工具指定

在spark-submit时增加参数 --conf spark.pyspark.python--conf spark.pyspark.driver.python

1
2
3
spark-submit \
--driver-memory 5g --num-executors 5 --executor-cores 1 --executor-memory 1G
--conf spark.pyspark.python=./.../bin/python --conf spark.pyspark.driver.python=./.../bin/python xx.py

spark.sql 不能查询到hive的数据库,只查询到default数据库

说明spark没有连接到hive

https://www.cnblogs.com/yjt1993/p/13963144.html

Author

Lavine Hu

Posted on

2022-03-09

Updated on

2022-03-13

Licensed under

Comments

:D 一言句子获取中...