Use reduceByKey instead of groupByKey
groupByKey creates a lot of shuffling which hampers the performance, while reduceByKey does not shuffle the data as much
groupByKey creates a lot of shuffling which hampers the performance, while reduceByKey does not shuffle the data as much