DAG
https://blog.csdn.net/qq_16669583/article/details/106026722
https://blog.csdn.net/u011564172/article/details/70172060
作用
根据rdd的依赖关系构建dag,然后基于dag划分stage
https://blog.csdn.net/qq_16669583/article/details/106026722
https://blog.csdn.net/u011564172/article/details/70172060
根据rdd的依赖关系构建dag,然后基于dag划分stage
https://www.syntelli.com/eight-performance-optimization-techniques-using-spark#
https://tech.meituan.com/2016/04/29/spark-tuning-basic.html
https://tech.meituan.com/2016/05/12/spark-tuning-pro.html
分析时间的消耗
1 | rdd1.map().map() -> rdd1.map() |
说白了就是多个action操作,transformation逻辑可以写一起,最后action
分数是说rdd分区
并行度是说executor num*executor core num
https://zhuanlan.zhihu.com/p/70424613
https://zhuanlan.zhihu.com/p/348024116
https://spark.apache.org/docs/latest/cluster-overview.html
https://www.zhihu.com/question/437293024
https://blog.csdn.net/mzqadl/article/details/104217828
https://www.cnblogs.com/ExMan/p/14358363.html
客户端指的是提交任务的机器
一个worker可以有多个excutor,默认情况下,只会启动一个Executor 一个excutor就是一个进程
一个executor包含多个线程,一个线程执行一个task