Maybe some of you have heard about 'MapReduce" but did not have time to search for it or did not get the main idea after reading its definition on the main article published by Google. Well, and when you search for 'MapReduce' tutorial, the first page is full of complicated and complex tutorials, nothing you can read in 5 minutes. So let's see in a simple way what is MapReduce!
MapReduce is a distributed and parallel approach to handle large amounts of data (datasets). This approach has 2 components - Map and Reduce (as the name suggests). Well, instead of a deep and boring technical explanation, let's make an analogy - we can compare MapReduce with preparing pop corn. How can we make a delicious pop corn? If you don't remember, take a look at the image below:
Easy, isn't it? So, when we make pop corn, we take an ear of corn and turn it pop corn. We have different forms of corn in the beginning and at the end of process, but we deal with corn and its intermediate states all time. Map and Reduce are similar to this:
Technically speaking, Map takes a set of input data (input data == ear of corn) and generates a set of key-value pairs (key-value pairs == corn grains). Reduce applies a new function over these key-values (add corn grains to the pan is this function) and transform them into data with a new format (data with new format == popcorn).
This is it! On future posts I'll show what can we do with MapReduce and when we can use this approach.
If you are interested, you can take a look at this tutorial or at Apache Hadoop MapReduce tutorial.
Recently, Fabiane Nardon gave a nice talk at QConSP about Big Data and published nice slides about MapReduce. Slides here.
This is it! Hope you have enjoyed the explanation.