Use reduceByKey instead of groupByKey
groupByKey creates a lot of shuffling which hampers the performance, while reduceByKey does not shuffle the data as much
Use reduceByKey instead of groupByKey
groupByKey creates a lot of shuffling which hampers the performance, while reduceByKey does not shuffle the data as much
Use reduceByKey instead of groupByKey